Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
httpsdoiorg101007s41635-019-00075-9
Analysis of Secure Caches Using a Three-Step Modelfor Timing-Based Attacks
Shuwen Deng1 middotWenjie Xiong1 middot Jakub Szefer1
Received 19 February 2019 Accepted 23 July 2019copy Springer Nature Switzerland AG 2019
AbstractMany secure cache designs have been proposed in literature with the aim of mitigating different types of cache timingndashbasedattacks However there has so far been no systematic analysis of how these secure cache designs can or cannot protectagainst different types of the timing-based attacks To provide a means of analyzing the caches this paper presents a novelthree-step modeling approach that is used to exhaustively enumerate all the possible cache timingndashbased vulnerabilitiesThe model covers not only attacks that leverage cache accesses or flushes from the local processor core but also attacksthat leverage changes in the cache state due to the cache coherence protocol actions from remote cores Moreoverboth conventional attacks and speculative execution attacks are considered With the list of all possible cache timingvulnerabilities derived from the three-step model this work further manually analyzes each of the existing secure cachedesigns to show which types of timing-based side-channel vulnerabilities each secure cache can mitigate Based on thesecurity analysis of the existing secure cache designs using the new three-step model this paper further summarizes differenttechniques gleaned from the secure cache designs and their ability help mitigate different types of cache timingndashbasedvulnerabilities
Keywords Secure caches middot Timing-based attacks middot Security analysis middot Side channels middot Covert channels
1 Introduction
Research on timing-based attacks in computer processorcaches has a long history eg [1 3 5 19 36] predatingtheir recent use in Spectre [26] attacks These past attackshave shown the possibility to extract sensitive informationvia the timing-based channels and often the focus is onextracting cryptographic keys In addition due to the recentSpectre [26] attacks there is now renewed interested intiming channels Especially the Spectre attacks consist oftwo parts first speculative execution is used to access somesensitive information second a timing-based channel isused to actually transfer the information to the attacker
Shuwen Dengshuwendengyaleedu
Wenjie Xiongwenjiexiongyaleedu
Jakub Szeferjakubszeferyaleedu
1 Yale University New Haven CT 06520 USA
Whether by itself or combined with speculative executionthe timing-based channels in processors pose a threat to asystemrsquos security and should be mitigated
We have recently proposed a three-step model [11] inorder to analyze cache timingndashbased side-channel attacksThe previous model considers cache timingndashbased side-channel vulnerabilities as a set of three ldquostepsrdquo or actionsperformed by either the attacker or the victim which canaffect the states of the cache In this work our methodologyfrom [11] is improved to better represent actions of theattacker and the victim For each step all possible statesfor a cache block are enumerated in terms of whether theoperation is driven by the attacker or the victim whatmemory range the data being operated on belongs to andwhether the state is changed because of a memory accessor data invalidation operation (due to a cache coherenceoperation or a flush instruction for example) To understandwhich possible three-step actions can lead to an attack wefurther propose and develop a cache three-step simulatorand apply a set of reduction rules to derive a complete listof vulnerabilities by eliminating three-step combinationsthat do not map to an attack Furthermore we considerboth normal and speculative execution for the memory
Journal of Hardware and Systems Security (2019) 3397ndash425
Published online 20 November 2019
operations and modeling of the cache attacks Speculativeexecution has gotten increased attention due to recentSpectre [26] attacks many of which depend on timingchannels to actually extract informationmdashspeculation aloneis not enough for most of these attacks Our model considerstiming channels in general independent of whether it is aside or a covert channel
In the process of development of the improved three-step model we have uncovered 43 types of timing-basedvulnerabilities which have not been previously exploited(in addition there are 29 types that map to attacks alreadyknown in literature) We cannot directly compare the typesof vulnerabilities found in this work and in our priorwork [11] due to the improved and different categorizationsof the states of the cache block
To address the threat of the prior cache timingndashbasedattacks to date 18 different secure cache designs havebeen presented in academic literature [7 10 12 22 2325 27 29 30 38 48ndash53 56 57] The secure processorcaches are designed with different assumptions and oftenaddress only specific types of timing-based side-channel orcovert-channel attacks To help analyze the security of thesedesigns this work uses our three-step modeling approach toreason about all the possible timing-based vulnerabilitiesEspecially since our work demonstrates a number of newtiming-based attacks the existing secure caches have neverbeen analyzed with respect to these new attacks beforeFor this work we manually reviewed and analyzed the 18existing secure cache designs [7 10 12 22 23 25 2729 30 38 48ndash53 56 57] in terms of the security featuresand implementations Most of these designs do not havepublicly available hardware implementation source code soautomatic analysis of the caches is not possible
Based on the analysis we summarize cache featuresthat help improve security Especially we propose that anldquoidealrdquo secure caches and processor architectures shouldprovide new features to let software explicitly label memoryloads or stores of sensitive data and differentiate them fromnormal loads and stores so sensitive data can be efficientlyidentified and protected by the hardware The caches canuse partitioning to isolate the attacker and the victim andprevent the attacker from being able to set the victimrsquoscache blocks into a known state which is needed by manyattacks To mitigate attacks based on internal interferencethe caches can use randomization to de-correlate the datathat is accessed and the data that is placed in the cache Moredetails of the possible defenses are discussed in Section 5and Section 6
11 Contributions
The new contributions of this work over [11] are as follows
bull A new formulation of the three-step model with newcache states and derivation of a new set of typesfor covering all the cache timingndashbased vulnerabilities(Section 3)
ndash Inclusion of cache coherence issues into thethree-step mode
ndash Expansion of the three-step model to considerboth cases of normal and speculative executionattacks
ndash Design of reduction rules and cache three-step simulator to automatically derive theexhaustive list of all the three steps which mapto effective vulnerabilities and elimination ofthree-step patterns which do not map to apotential attack
bull Overview of the 18 secure cache designs that have beenpresented in academic literature (Section 4)
bull Manual evaluation of 18 secure processor cache designsto determine how they can help prevent timing-basedattacks and analysis of security features secure cachesused (Sections 5 and 6)
bull Discussion of ldquoidealrdquo secure caches and the featuresthey would need (Section 6)
bull Attack strategies description and comparison amongdifferent attack strategies (Appendix A)
bull Analysis of the soundness of the three-step model andwhy three-steps are able to describe all timing-basedvulnerabilities (Appendix B)
2 Cache TimingndashBased Attacksand the Threat Model
Modern processor caches are known to be vulnerable totiming-based attacks The timing of the memory accessesvaries due to cachesrsquo operation For example a cache hitis fast while a cache miss is slow The cache coherenceprotocol can also change the cache states and affect thetiming of the memory operations The cache coherencemay invalidate a cache block from a remote core resultingin a cache miss in the local core for example Alsothe timing of cache flush operations varies depending onwhether the data to be flushed is in the cache or notFlushing an address using clf lush with valid data in thecache is slow while flushing an address not in the cacheis fast for example From these timing differences ofmemory-related operations the attacker can infer a datarsquosspecific memory address or corresponding cache indexvalue and thus learn some information about the victimrsquossecrets
J Hardw Syst Secur (2019) 3397ndash425398
21 Threat Model
This work focuses only on timing-based attacks in processorcaches Numerous other types of side and covert channelsthat do not use timing or caches exist eg power-based [9]EM-based [2] (including RF) thermal-based [33] and inprocessor channels based on features such as power state ofthe AVX unit [40] for example This work aims to exploremain cache attacks only but similar approach can be donefor the other buffers or cache-like structures which may betarget of attack once main processor caches are secured
In our threat model an attackerrsquos objective is to retrievevictimrsquos secret information using timing-based channels inthe processor cache Specifically we consider the situationwhere the victim accesses an address u and the addressdepends on some secret information The address u is withinsome set of physical memory locations x which are knownto the attacker The goal of the attacker is to obtain theaddress u or at least partial bits of it which relate to the cacheindex of the address
We assume the attacker knows some of the source codeof the victim Especially the attacker can only learn someinformation 1 about the address u from the timing channelsbut with knowledge of the source code he or she can furtherinfer the likely specific value of u and thus infer the secrethe or she is trying to learn
The attacker cannot directly access any data in the statemachine of the cache logic nor directly read the data of thevictim if the two are not sharing the same address spaceThe attacker can however observe its own timing or thetiming of the victim process And the attacker knows howthe timing of the memory-related operations depends on thecache states
The attacker further is able to force the victim to executea specific function For example the attacker can requestvictim to decrypt a specific piece of data thus triggering thevictim to execute a function that makes use of a secret keyhe or she wants to learn The victim in the cache attacks canbe user software code in an enclave operating system oranother virtual machine
The processor microarchitecture and the operatingsystem are assumed to be able to differentiate between thevictim and the attacker in different processes by assigningdifferent process IDs If the victim and the attacker are inthe same process eg attacker is a malicious library theywill have the same process ID The system software (egoperating system or hypervisor) is responsible for properly
1For a hit-based vulnerabilities the attacker is able to learn thefull address of the victimrsquos sensitive data while for the miss-basedvulnerabilities the attacker usually can learn the cache index ofthe victimrsquos sensitive data For more details of these vulnerabilitiesrsquocategorizations please refer to Section 333
setting up virtual memory (page tables) and assigning IDswhich may be used by the hardware to identify differentthreads processes or virtual machines When analyzingsecure cache designs the system software is consideredtrusted and bug-free The attacker is also assumed not to beable to undermine the physical implementation or changethe hardware eg he or she cannot influence randomnessgenerated by any random number generators in hardwarePhysical or invasive attacks are not in scope of this workFor secure cache designs which add new instructionsfor security-related operations the victim process ormanagement software is assumed to correctly use theseinstructions During speculative execution the cache statecan be modified by the instructions executed speculativelyunless a processor cache architecture explicitly prevents orforbids certain speculative accesses
22 Side and Covert Channels
This work focuses on both side and covert channels Covertchannels use the same methods as side channels but theattacker controls both the sender and the receiver side ofthe channel All types of side-channel attacks are equallyapplicable to covert channels For brevity we just use theterm ldquovictimrdquo in the text to represent both the victim (forside channels) and the sender (for covert channels)
23 Hyperthreading Versus Timing-Slice Sharing
When the hyperthreading is supported in a system theattacker and the victim are able to run on different threadsin parallel instead of running once every time slice (whenno hyperthreading is used) Our model can be applied toboth of the scenarios since our model abstracts away howthe sharing happens
3Modeling of the Cache TimingndashBasedSide-Channel Vulnerabilities
This section explains how we developed the three-stepmodeling approach and used it to model the behavior ofthe cache logic and to enumerate all the possible cachetimingndashbased vulnerabilities
31 Introduction of the Three-StepModel
We have observed that all of the existing cache timingndashbased attacks can be modeled with three steps of memory-related operations Here ldquomemory-related operationrdquo refersto loads stores or different flushes that can be done by thevictim or the attacker on the same core or different coresWhen the victim and the attacker are on different cores
J Hardw Syst Secur (2019) 3397ndash425 399
cache coherence will also be triggered when one of thememory-related operations is performed
The three-step model has three steps as the nameimplies In Step 1 a memory operation is performedplacing the cache in an initial state known to the attacker(eg a new piece of data at some address is put into thecache or the cache block is invalidated) Then in Step 2a second memory operation alters the state of the cachefrom the initial state Finally in Step 3 a final memoryoperation is performed and the timing of the final operationreveals some information about the relationship among theaddresses from Step 1 Step 2 and Step 3
For example in Flush + Reload [55] attack in Step
1 a cache block is flushed by the attacker In Step 2security critical data is accessed by for example victimrsquosAES encryption operation In Step 3 the same cache blockas the one flushed in Step 1 will be accessed and the time ofthe access will be measured by the attacker If the victimrsquossecret-dependent operation in Step 2 accesses the cacheblock in Step 3 there will be a cache hit and fast timing
of the memory operation will be observed and the attackerlearns the victimrsquos secret address
To model all the timing-based attacks we write the threesteps as Step 1 Step2 Step3 which represents asequence of steps taken by the attacker or the victim Tosimplify the model we focus on memory-related operationaffecting one single cache block (also called cache slotcache entry or cache line) Cache block is the smallestunit of the cache Since all the cache blocks are updatedfollowing the same cache state machine logic it is sufficientto consider only one cache block
32 States of the Three-StepModel
When modeling the attacks we propose that there are 17possible states for a cache block Table 1 lists all the 17possible states of the cache block for each step in our three-step model and their formal definitions Figure 1 graphicallyshows for each possible state how the memory locationmaps to the cache block
Table 1 The 17 possible states for a single cache block in our three-step model
State Description
Vu A memory location u belonging to the victim is accessed and is placed in the cache block by the victim (V) Attacker doesnot know u but u is from a set x of memory locations a set which is known to the attacker It may have the same index as a
or aalias and thus conflict with them in the cache block The goal of the attacker is to learn the index of the address u Theattacker does not know the address u hence there is no Au in the model
Aa or Va The cache block contains a specific memory location a The memory location is placed in the cache block due to a memoryaccess by the attacker Aa or the victim Va The attacker knows the address a independent of whether the access was by thevictim or the attacker themselves The address a is within the range of sensitive locations x The address a is known to theattacker
Aaalias or Vaalias The cache block contains a memory address aalias The memory location is placed in the cache block due to a memory accessby the attacker Aaalias or the victim Vaalias The address aalias is within the range x and not the same as a but it has the sameaddress index and maps to the same cache block ie it ldquoaliasesrdquo to the same block The address aalias is known to the attacker
Ad or Vd The cache block contains a memory address d The memory address is placed in the cache block due to a memory access bythe attacker Ad or the victim Vd The address d is not within the range x The address d is known to the attacker
Ainv or V inv The cache block is now invalid The data and its address are ldquoremovedrdquo from the cache block by the attacker Ainv or thevictim V inv as a result of cache block being invalidated eg this is a cache flush of the whole cache
Ainva or V inv
a The cache block state can be anything except a in this cache block now The data and its address are ldquoremovedrdquo from thecache block by the attacker Ainv
a or the victim V inva Eg by using a flush instruction such as clf lush that can flush specific
address or by causing certain cache coherence protocol events that force a to be removed from the cache block The addressa is known to the attacker
Ainvaalias or V inv
aalias The cache block state can be anything except aalias in this cache block now The data and its address are ldquoremovedrdquo fromthe cache block by the attacker Ainv
aalias or the victim V invaalias Eg by using a flush instruction such as clf lush that can flush
specific address or by causing certain cache coherence protocol events that force aalias to be removed from the cache blockThe address aalias is known to the attacker
Ainvd or V inv
d The cache block state can be anything except d in this cache block now The data and its address are ldquoremovedrdquo from thecache block by the attacker Ainv
d or the victim V invd Eg by using a flush instruction such as clf lush that can flush specific
address or by causing certain cache coherence protocol events that force d to be removed from the cache block The addressd is known to the attacker
V invu The cache block state can be anything except u in the cache block The data and its address are ldquoremovedrdquo from the cache
block by the victim V invu as a result of cache block being invalidated eg by using a flush instruction such as clf lush or
by certain cache coherence protocol events that force u to be removed from the cache block The attacker does not know uTherefore the attacker is not able to trigger this invalidation and Ainv
u does not exist in the model
Any data or no data can be in the cache block The attacker has no knowledge of the memory address in this cache block
J Hardw Syst Secur (2019) 3397ndash425400
Fig 1 The 17 possible states for a single cache block in our three-step model a Vu b AaVaAaalias Vaalias AdVd c AinvV inv dAinv
a V inva Ainv
aalias V invaalias A
invd V inv
d e V invu f )
In each sub-figure of Fig 1 the left-most part shows thepossible state being described in the sub-figure The middlepart shows the possible situation of the cache state affectedby each For all sub-figures the middle cache block (shownin bold) is the targeted cache block Right-most part showsthe memory region in relation to the cache block Recallthe addresses a and aalias are within the sensitive set ofaddresses x while d is outside the set of sensitive addresses(for simplicity the set is shown as a contiguous region butit can be any set) Also recall A represents the operationsperformed by the attacker and V represents the victimrsquosoperations
Figure 1a shows the description of the possible state Vuwhere address u is within sensitive set and unknown to theattacker Therefore it can possibly map to any cache blockincluding the target cache block shown in the middle Sinceits position in the cache and specific address is unknown
we show Vu in dashed lines Meanwhile Fig 1e shows thedescription of the possible state V inv
u which is result ofthe victim invalidating data at the sensitive address u andpossibly invalidating some address within sensitive regionFurther Fig 1f shows the description of the possible statelowast which represents null knowledge of the address for theattacker to this corresponding cache block Therefore it canpossibly refers to any address in the memory or no validaddress at all
Figure 1b shows the description of the possible stateAaVaAaalias Vaalias AdVd Their addresses are all knownto the attacker and map to the same targeted cache blockBoth a and aalias are within the sensitive set of addressesx and aalias as its name indicates is a different addressthan a but still within set x and maps to the same cacheblock as a Address d is outside of the set x MeanwhileFigure 1d shows the description of the possible state
J Hardw Syst Secur (2019) 3397ndash425 401
Ainva V inv
a Ainvaalias V inv
aalias Ainvd V inv
d which correspondto invalidation of the address shown in the subscript ofthe state Some additional possible invalidation statesAinvV inv are shown in Figure 1c These states indicate novalid address is in the cache block Therefore all the possi-ble addresses that mapped to this cache block eg a aalias d and u (if it mapped to this block) before the invalidationstep AinvV inv will be flushed back to the memory
33 Derivation of All Cache TimingndashBasedVulnerabilities
With the 17 candidate states shown in Table 1 for eachstep there are in total 17 lowast 17 lowast 17 = 4913 combinationsof three steps We developed a cache three-step simulatorand a set of reduction rules to process all the three-step combinations and decide which ones can indicate areal attack As is shown in Fig 2 the exhaustive list ofthe 4913 combinations will first be input to the cachethree-step simulator where the preliminary classificationof vulnerabilities is derived The effective vulnerabilitieswill then be sent as the input to the reduction rules toremove the redundant three steps and obtain final list ofvulnerabilities
331 Cache Three-Step Simulator
We developed a cache three-step simulator that simulatesthe state of one cache block and derives the attackerrsquosobservations in the last step of the three-step patterns that itanalyzes for different possible u Since u is in secure rangex the possible candidates of u for a cache block are a aalias
and NIB (Not-In-Block) Here NIB indicates the case thatu does not have same index as a or aalias and thus does notmap to this cache block
The cache three-step simulator is implemented inPython script and its pesudo implementation is shown inAlgorithm 1 Simulatorrsquos inputs are 17 possible states foreach of the step Outputs are all the vulnerabilities thatbelong to the Strong or the Weak type or the Ineffective typeThe simulator uses a nested for loop to check all possiblecombinations (4913) of the three step pattern For each step
Exhaustive List
of all possible
three-step
combinations
Cache
Three-Step
Simulator
Preliminary Strong Vulnerability
Preliminary WeakVulnerability
Ineffective Three-Step
Reduction
Rules
StrongVulnerability
WeakVulnerability
Classification Step
Reduction Step
Vulnerability TypesVulnerability Types
4913
132
572
4209
72
64
Fig 2 Procedure to derive the effective types of three-step timing-based vulnerabilities Ovals refer to the number of vulnerabilitiesin each category
of each pattern if it is Vu this step will be extended to beone of three candidates Va Vaalias and VNIB If it is V inv
u this step will be extended to be one of three candidatesV inv
a V invaalias and V inv
NIB We wrote a function output timingthat takes three known memory access steps as input andoutput whether fast or slow timing will be observed forthe last step In this case for each of the u-related steprsquoscandidate we can derive a timing observation Using thesetiming observation function judge type decides whether athree-step pattern is a potential vulnerability by analyzingwhether the attacker is able to observe different andunambiguous timing for different values of u
The simulator categorizes all the three-step patternsinto three categories as listed below Figure 3 showstwo examples for the Strong Vulnerability (a b) WeakVulnerability (c d) and Ineffective Three-Step (e f)categories respectively
1 Strong Vulnerability When a fast or slow timing isobserved by the attacker he or she is able to uniquelydistinguish the value of u (either it maps to someknown address or has the same index with some knownaddress) In this case the vulnerability has stronginformation leakage (ie attacker can directly obtainthe value of u based on the observed timing) Wecategorize these vulnerabilities to be strong Eg forVd Vu Aa vulnerability shown in Fig 3aif u maps to a the attacker will always derive fasttiming If u is aalias or NIB slow timing will beobserved This indicates that the attacker is able tounambiguously infer the victimrsquos behavior (u) from thetiming observation
2 Weak Vulnerability When fast or slow timing isobserved by the attacker he or she knows it correspondsto more than one possible value of u (eg a or aalias)For these vulnerabilities timing variation can still beobserved due to different victimrsquos behavior Howeverthe attacker cannot learn the value of the index ofthe address u unambiguously For example for type Vu Ainv
a shown in Fig 3c when fast timingis observed u possibly maps to aalias or NIB (thereason for the possibility of u mapping to NIB to derivefast timing is that due to the in Step 1 the cachecould have a hit and then Aa would result in a cachehit) On the other hand when slow timing is observedu possibly maps to a or NIB This pattern leads touncertain u guess about value of u based on timingobservation
3 Ineffective Three-Step The remaining types are treatedto be ineffective For example for type Aa Vu Ad
shown in Fig 3f no matter what the value of u isattackerrsquos observation is always slow timing
J Hardw Syst Secur (2019) 3397ndash425402
Pseudo-code for the cache three-step simulator algorithm
Input a list containing 17 possible states for each of the step
Output a list containing all the vulnerabilities that belong to the Strong type
a list containing all the vulnerabilities that belong to the Weak type
a list containing all the ineffective typs
1 for 1 do2 for 2 do3 for 3 do4 1 2 3
5 array to store all possible candidate combinations of this three-step pattern
6 array to store all possible timing observation regading different candidate combinations for this
three-step pattern
7 if 0 or 1 or 2 then8 for 3 rsquos candidates are and rsquos candidates are
and Both candidatersquos number is 3 do9 0
1 2
10 end for11 for 3 do12
13 end for14 if then15 strongappend(steps)
16 else17 if then18
19 else20
21 end if22 end if23 else24
25 continue26 end if27 end for28 end for29 end for
Algorithm 1
After computing the type of all the three-step patternsthe cache three-step simulator will output effective (StrongVulnerability or Weak Vulnerability) three-step patternsDue to the space limit we only list and analyze the Strongvulnerabilities in this paper Weak vulnerabilities are left forfuture work when channels with smaller channel capacitiesare desired to be analyzed
332 Reduction Rules
We also have developed rules that can further reduce theoutput list of all the effective three steps from the cache
three-step simulator Figure 2 shows how the output of thesimulator is filtered through the reduction rules to get thefinal list of vulnerabilities Reductionrsquos goal is to removevulnerabilities of repeating or redundant types from the liststo form effective Strong Vulnerability or Weak Vulnerabilityoutput A script was developed that automatically appliesbelow reduction rules to the output of the simulator to getthe final list of vulnerabilities A three-step combinationwill be eliminated if it satisfies one of the below rules
1 Three-step patterns with two adjacent steps which arerepeating or which are both known to the attacker canbe eliminated eg Ad Aa Vu can be reduced
J Hardw Syst Secur (2019) 3397ndash425 403
Behavior (u)a
aalias
Observation
NIB
fast
slow
Vd Vu Aa
Behavior (u)a
aalias
Observation
NIB
fast
slow
Vu Aainv
Behavior (u)a
aalias
Observation
NIB
fast
slow
Vd Vuinv Vd
(a)
(c)
(e)
Behavior (u)a
aalias
Observation
NIB
fast
slow
Vu Ad Vuinv
Behavior (u)a
aalias
Observation
NIB
fast
slow
Aaaliasinv Vu
inv Va
Behavior (u)a
aalias
Observation
NIB
fast
slow
Aa Vu Ad
(b)
(d)
(f)
Eg Eg
Eg Eg
Eg Eg
Fig 3 Examples of relations between victimrsquos behavior (u) andattackerrsquos observation for each vulnerability type a b StrongVulnerability c d Weak Vulnerability e f Ineffective Three-Step
to Aa Vu which is equivalent to Aa VuTherefore Ad Aa Vu is a repeat type of Aa Vu and can be eliminated
2 Three-step patterns with a step involving a knownaddress a and an alias to that address aalias givesthe same information Thus three step combinationswhich only differ in use of a or aalias cannot representdifferent attacks and only one combination needs tobe considered For example Vu Aaalias Vu is arepeat type of Vu Aa Vu and we will eliminatethe first pattern
3 Three-step patterns with steps Vu and V invu in adjacent
consecutive steps with each other will only keep thelatter step and eliminate the first step For exampleAa Vu V inv
u can be reduced to Aa V invu and
further equivalent to Aa V invu So Aa Vu
V invu can be eliminated
333 Categorization of StrongVulnerabilities
As is shown in Fig 2 after applying the reduction rulesthere are 72 types of Strong vulnerabilities remaining InAppendix B we analyze the soundness of the three-stepmodel to demonstrate that the three-step model can cover allpossible cache timingndashbased side-channel vulnerabilitiesAnd if there is a vulnerability it can always be reduced toa model that requires only three steps Table 2 lists all the
vulnerability types of which the last step is a memory accessand Table 3 shows all the vulnerability types of which thelast step is an invalidation-related operation To ease theunderstanding of all the vulnerability types we group thevulnerabilities based on attack strategies (left-most columnin Tables 2 and 3) these strategies correspond to well-known names for the attacks if such exist otherwise weprovide a new name In Appendix A we provide descriptionfor each attack strategy to show the main idea behind themWe use existing names for attack strategies where suchexisted before even if similar attacks eg attacks differingin only one step have been given different names before Weuse these established names to avoid confusion but detailsome of the similarities in Appendix A as a clarification
The list of vulnerability types can be further collectedinto four simple macro types which cover one or morevulnerability types internal interference miss-based (IM)internal interference hit-based (IH) external interferencemiss-based (EM) external interference hit-based (EH) aslabeled in the Macro Type column of Tables 2 and 3 Allthe types of vulnerabilities that only involve the victimrsquosbehavior V in the states in Step 2 and Step 3 are calledinternal interference vulnerabilities (I) The remaining onesare called external interference (E) Some vulnerabilitiesallow the attacker to learn that the address of the victimaccesses map to the set the attacker is attacking by observingslow timing due to a cache miss or f ast timing due toinvalidation of data not in the cache2 We call these miss-based vulnerabilities (M) The remaining ones leverageobservation of f ast timing due to a cache hit or slow timingdue to an invalidation of an address that is currently valid inthe cache and are called hit-based vulnerabilities (H)
Many vulnerability types have been explored before Forexample the Cache Collision attack [5] is effectively basedon the Internal Collision and it maps to types labeled (2)in the Attack column in Tables 2 and 3 The types labelednew correspond to new attack not previously discussedin literature We believe these 43 are new attacks notpreviously analyzed nor known
4 Secure Caches
Having explained the three-step model we now explorethe various secure caches which have been presented inliterature to date [7 10 12 22 23 25 27 29 30 38 48ndash53 56 57] Later in Section 5 we will apply the three-stepmodel to check if the secure caches can defend some or allof the vulnerabilities in our model
2Invalidation is fast when the corresponding address which is to beinvalidated does not exist in the cache since no operation is needed forthe invalidation
J Hardw Syst Secur (2019) 3397ndash425404
Table 2 The table shows all the cache timing-based cache vulnerabilities where the last step is a memory access related operation The AttackStrategy column gives a common name for each set of one or more specific vulnerabilities that would be exploited in an attack in a similar mannerThe Vulnerability Type column gives the three steps that define each vulnerability For Step 3 fast indicates a cache hit must be observed toderive sensitive address information while slow indicates a cache miss must be observed The Macro Type column proposes the categorizationthe vulnerability belongs to ldquoErdquo is for external interference vulnerabilities ldquoIrdquo is for internal interference vulnerabilities ldquoMrdquo is for miss-basedvulnerabilities ldquoHrdquo is for hit-based vulnerabilities The Attack column shows if a type of vulnerability has been previously presented in literature
Attack Strategy Vulnerability Type Macro Type Attack
Step 1 Step 2 Step 3
Cache Internal Collision Ainv Vu Va (fast) IH (2)
V inv Vu Va (fast) IH (2)
Ad Vu Va (fast) IH (2)
Vd Vu Va (fast) IH (2)
Aaalias Vu Va (fast) IH (2)
Vaalias Vu Va (fast) IH (2)
Ainva Vu Va (fast) IH (2)
V inva Vu Va (fast) IH (2)
Flush + Reload Ainva Vu Aa (fast) EH (5)
V inva Vu Aa (fast) EH (5)
Ainv Vu Aa (fast) EH (5)
V inv Vu Aa (fast) EH (5)
Ad Vu Aa (fast) EH (5)
Vd Vu Aa (fast) EH (5)
Aaalias Vu Aa (fast) EH (5)
Vaalias Vu Aa (fast) EH (5)
Reload + Time V invu Aa Vu (fast) EH new
V invu Va Vu (fast) IH new
Flush + Probe Aa V invu Aa (slow) EM (6)
Aa V invu Va (slow) IM new
Va V invu Aa (slow) EM new
Va V invu Va (slow) IM new
Evict + Time Vu Ad Vu (slow) EM (1)
Vu Aa Vu (slow) EM (1)
Prime + Probe Ad Vu Ad (slow) EM (4)
Aa Vu Aa (slow) EM (4)
Bernsteinrsquos Attack Vu Va Vu (slow) IM (3)
Vu Vd Vu (slow) IM (3)
Vd Vu Vd (slow) IM (3)
Va Vu Va (slow) IM (3)
Evict + Probe Vd Vu Ad (slow) EM newVa Vu Aa (slow) EM new
Prime + Time Ad Vu Vd (slow) IM newAa Vu Va (slow) IM new
Flush + Time Vu Ainva Vu (slow) EM new
Vu V inva Vu (slow) IM new
(1) Evict + Time attack [34] (2) Cache Internal Collision attack [5] (3) Bernsteinrsquos attack [3] (4) Prime + Probe attack [34 36] Alias-drivenattack [18] (5) Flush + Reload attack [54 55] Evict + Reload attack [17] (6) SpectrePrime MeltdownPrime attack [46]
J Hardw Syst Secur (2019) 3397ndash425 405
Table 3 The table shows the second part of the timing-based cache side-channel vulnerabilities where the last step is an invalidation relatedoperation For Step 3 fast indicates no corresponding address of the data is invalidated while slow indicates invalidation operation makes somedata invalid causing longer processing time
Attack Vulnerability Type Macro Attack
Step 1 Step 2 Step 3
Strategy Type
Cache Internal Collision Invalidation Ainv Vu V inva (slow) IH new
V inv Vu V inva (slow) IH new
Ad Vu V inva (slow) IH new
Vd Vu V inva (slow) IH new
Aaalias Vu V inva (slow) IH new
Vaalias Vu V inva (slow) IH new
Flush + Flush Ainva Vu V inv
a (slow) IH (1)
V inva Vu V inv
a (slow) IH (1)
Ainva Vu Ainv
a (slow) EH (1)
V inva Vu Ainv
a (slow) EH (1)
Flush + Reload Invalidation Ainv Vu Ainva (slow) EH new
V inv Vu Ainva (slow) EH new
Ad Vu Ainva (slow) EH new
Vd Vu Ainva (slow) EH new
Aaalias Vu Ainva (slow) EH new
Vaalias Vu Ainva (slow) EH new
Reload + Time Invalidation V invu Aa V inv
u (slow) EH new
V invu Va V inv
u (slow) IH new
Flush + Probe Invalidation Aa V invu Ainv
a (fast) EM new
Aa V invu V inv
a (fast) IM new
Va V invu Ainv
a (fast) EM new
Va V invu V inv
a (fast) IM new
Evict + Time Invalidation Vu Ad V invu (fast) EM new
Vu Aa V invu (fast) EM new
Prime + Probe Invalidation Ad Vu Ainvd (fast) EM new
Aa Vu Ainva (fast) EM new
Bernsteinrsquos Invalidation Attack Vu Va V invu (fast) IM new
Vu Vd V invu (fast) IM new
Vd Vu V invd (fast) IM new
Va Vu V inva (fast) IM new
Evict + Probe Invalidation Vd Vu Ainvd (fast) EM new
Va Vu Ainva (fast) EM new
Prime + Time Invalidation Ad Vu V invd (fast) IM new
Aa Vu V inva (fast) IM new
Flush + Time Invalidation Vu Ainva V inv
u (fast) EM new
Vu V inva V inv
u (fast) IM new
(1) Flush + Flush attack [16]
J Hardw Syst Secur (2019) 3397ndash425406
This section gives brief overview of the 18 secure cachedesigns that have been presented in academic literaturein the last 15 years To the best of our knowledge thesecover all the secure cache designs proposed to date Mostof the designs have been realized in functional simulationeg [23 48] Some have been realized in FPGA eg [7]and a few have been realized in real ASIC hardwareeg [31] No specific secure caches have been implementedin commercial processors to the best of our knowledgehowever CATalyst [29] leverages Intelrsquos CAT (CacheAllocation Technology) technology available today in IntelXeon E5 2618L v3 processors and could be deployed today
When the secure cache description in the cited papersdid not mention the issue of using flush or cache coherencewe assume the victim or the attacker cannot invalidateeach otherrsquos cache blocks by using clf lush instructions orthrough cache coherence protocol operations but they canflush or use cache coherence to invalidate their own cachelines The victim and the attacker also cannot invalidateprotected or locked data Further if the authors specified anyspecific assumptions (mainly about the software) we list theassumption as part of the description of the cache Whatrsquosmore when the level of cache hierarchy was unspecifiedwe assume the secure cachesrsquo features can be appliedto all levels of caches including L1 cache L2 cacheand Last Level Cache (LLC) If the inclusivity of thecaches was not specified we assume they target inclusivecaches Following the below descriptions of each securecache design the analysis of the secure caches is given inSection 5
SPlowast Cache [20 27]3 uses partitioning techniques tostatically partition the cache ways into High and Lowpartition for the victim and the attacker according to theirdifferent process IDs The victim typically belongs to Highsecurity and attacker belongs to Low security Victimrsquosmemory accesses cannot modify Low partition (assigned toprocesses such as the attacker) while the attackerrsquos memoryaccesses cannot modify High partition (assigned to thevictim) However the memory accesses of both the victimand the attacker can result in a hit in either Low or Highpartition if the data is in the cache
SecVerilog Cache [56 57] statically partitions cacheblocks between security levels L (Low) and H (High)Each instruction in the source code for programs usingSecVerilog cache needs to include a t iming label whicheffectively represents whether the data being accessed by
3Two existing papers give slightly different definitions for an ldquoSPrdquocache thus we selected to define a new cache the SPlowast cachethat combines secure cache features of the Secret-Protecting cachefrom [27] with secure cache features of the Static-Partitioned cachefrom [20]
that instruction is Low or High based on the code andthis t iming label can be similar to a process ID thatdifferentiates attackerrsquos (Low) instructions from victimrsquos(High) instructions The cache is designed such thatoperations in the High partition cannot affect timing ofoperations in the Low partition For a cache miss due to Lowinstructions when the data is in the High partition it willbehave as a cache miss and the data will be moved from theHigh to the Low partition to preserve consistency HoweverHigh instructions are able to result in a cache hit in bothHigh and Low partitions if the data is already in the cache
SecDCPCache [48] builds on the SecVerilog cache and usespartitioning idea from the original SecVerilog cache but thepartitioning is dynamic It can support at least two securityclasses H (High) and L (Low) and configurations with moresecurity classes are possible They use the percentage ofcache misses for L instructions that was reduced (increased)when Lrsquos partition size was increased (reduced) by onecache way to adjust the number of ways of the cacheassigned to the Low partition When adjusting number ofways in the cache dedicated to each partition if Lrsquos partitionsize decreases the process ID is checked and L blocksare flushed before the way is reallocated to H On theother hand if Lrsquos partition size increases H blocks in theadjusted cache way remain unmodified so as to not addmore performance overhead and they will eventually beevicted by Lrsquos memory accesses However the feature ofnot flushing High partition data during way adjustment mayleak timing information to the attacker
NoMo Cache [12] dynamically partitions the cache waysamong the currently ldquoactiverdquo simultaneous multithreading(SMT) threads Each thread is exclusively reserved Y blocksin each cache set where Y is within the range of [0 N
M]
where N is the number of ways and M is the number ofSMT threads NoMo-0 equals to traditional set associativecache while NoMo- N
M partitions cache evenly for the
different threads and there are no non-reserved ways Thenumber of Y assigned to each thread is adjusted based onits activeness When adjusting number of blocks assigned toa thread Y blocks are invalidated for cache sets to protecttiming leakage Eviction is not allowed within each threadrsquosown reserved ways while it is possible for the shared waysTherefore to avoid eviction caused by the unreserved wayswe assume NoMo- N
M is used to fully partition the cache
When the attacker and the victim share the same librarythere will be a cache hit if accessing the shared data andthe normal cache hit policy holds to guarantee the cachecoherence
SHARP Cache [53] uses both partitioning and randomiza-tion techniques to prevent victimrsquos data from being evicted
J Hardw Syst Secur (2019) 3397ndash425 407
or flushed by other malicious processes and it targets on theinclusive caches Each cache block is augmented with thecore valid bits (CVB) to indicate which private cache (pro-cess) it belongs to (similar to the Process ID) where CVBstores a bitmap and ith bit in the bitmap is set if the lineis present in ith corersquos private cache Cache hit is allowedamong different processesrsquo data When there is cache missand data needs to be evicted data not belonging to any cur-rent processes will be evicted first If there is no such datadata belonging to the same process will be evicted If thereis no existing data in the cache that is in the same processa random data in the cache set will be evicted This randomeviction will generate an interrupt to the OS to notify it ofa suspicious activity For pages that are read-only or exe-cutable SHARP cache disallows flushing using clf lush inuser mode However invalidating victimrsquos blocks by usingcache coherence protocol is still possible
Sanctum Cache [10] focuses on isolation of enclaves(equivalent to Trusted Software Module in other designs)from each other and the operating system (OS) In terms ofcaches they implement security features for L1 cache TLBand LLC Cache isolation of LLC is achieved by assigningeach enclave or OS to different DRAM address regionsIt uses page-coloring-based cache partitioning scheme [2444] and a software security monitor that ensures per-core isolation between OS and enclaves For L1 cacheand TLB when there is a transition between enclave andnon-enclave mode the security monitor will flush thecore-private caches to achieve isolation Normal flushestriggered by the enclave or the OS can only be done withinenclave or not within enclave code Also timing-based side-channel attacks exploiting cache coherence are explicitlynot prevented thus behavior on cache coherence operationsis not defined This cache listed extra software assumptionsas follows
Assumption 1 Software security monitor guarantees thatvictim and attacker process cannot share the same cacheblocks It uses page coloring [24 44] to ensure that victimand attackerrsquos memory is never mapped to the same cacheblocks for the LLCAssumption 2 The software runs on a system with asingle processor core where victim and attacker alternateexecution but can never run truly in parallel Moreoversecurity critical data is always flushed by the securitymonitor when program execution switches away from thevictim program for the L1 cache and the TLB
MI6 Cache [7] is part of the memory hierarchy of theMI6 processor which combines Sanctum [10] cachersquossecurity feature with disabling speculation during thespeculative execution of memory-related operations During
normal processor execution for L1 caches and TLB thecorresponding states will be flushed across context switchesbetween software threads For the LLC set partitioning isused to divide DRAM into contiguous regions And cachesets are guaranteed to be strictly partitioned (two DRAMregions cannot map to the same cache set) Each enclaveis only able to access its own partition Speculation issimply disabled when enclave interacts with the outsideworld because of small performance influence based on therare cases of speculation This cache listed extra softwareassumptions as follows
Assumption 1 Software security monitor guarantees thatthe victim and the attacker process cannot share the samecache blocks It uses page coloring [24 44] to ensure thatvictimrsquos and attackerrsquos memory are never mapped to thesame cache blocks for the LLCAssumption 2 The software runs on a system with asingle processor core where victim and attacker alternateexecution but can never run truly in parallel Moreoversecurity critical data is always flushed by the securitymonitor when program execution switches away from thevictim program for the L1 cache and the TLBAssumption 3 When an enclave is interacting with theoutside environment the corresponding speculation isdisabled by the software
InvisiSpec Cache [52] is able to make speculation invisiblein the data cache hierarchy Before a visibility pointshows up when all of its prior control flow instructionsresolve unsafe speculative loads (USL) will be put intoa speculative buffer (SB) without modifying any cachestates When reaching the visibility point there are twocases In one case the USL and successive instructionswill be possibly squashed because of mismatch of datain the SB and the up-to-date values in the cache Inanother case the core receives possible invalidation fromthe OS before checking of memory consistency modeland no comparison is needed When speculative executionhappens the hardware puts the data into SB as to identifyvisibility point for dealing with final state transition of thespeculative execution InvisiSpec cache targets on Spectre-like attacks and futuristic attacks However InvisiSpeccache is vulnerable to all non-speculative side channels
CATalyst Cache [29] uses partitioning especially CacheAllocation Technology (CAT) [21] available in the LLCof some Intel processors CAT allocates up to 4 differentClasses of Services (CoS) for separate cache ways sothat replacement of cache blocks is only allowed withina certain CoS CATalyst first uses CAT mechanism topartition caches into secure and non-secure parts (non-secure parts may map to 3 CoS while secure parts map
J Hardw Syst Secur (2019) 3397ndash425408
to 1 CoS) Secure pages are assigned to virtual machines(VMs) at a granularity of a page and not shared by morethan one VM Here attacker and victim reside in differentVMs Combined with CAT technology and pseudo-lockingmechanism which pins certain page frames managed bysoftware CATalyst guarantees that malicious code cannotevict secure pages CATalyst implicitly performs preloadingby remapping security-critical code or data to secure pagesFlushes can only be done within each VM And cachecoherence is achieved by assigning secure pages to only oneprocessor and not sharing pages among VMs This cachelisted extra software assumptions as follows
Assumption 1 Security critical data is always preloadedinto the cache at the beginning of the whole programexecutionAssumption 2 Security critical data is always able to fitwithin the secure partition of the cache That is all datain the range x can fit in the secure partitionAssumption 3 The victim and the attacker process cannotshare the same memory spaceAssumption 4 Use pseudo-locking mechanism by soft-ware to make sure that victim and attacker process cannotshare the same cache blocksAssumption 5 Secure pages are reloaded immediatelyafter the flush which is done by the virtual machinemonitor (VMM) to make sure all the secure pages are stillpinned in the secure partition
DAWG Cache [25] (Dynamically Allocated Way Guard)partitions the cache by cache ways and provides fullisolation for hits misses and metadata updates acrossdifferent protection domains (between the attacker and thevictim) DAWG cache is partitioned for the attacker and thevictim and each of them keep their own different domain id(which is similar to process ID used in general caches) Eachdomain id has its own bit fields one is called policy fillmapfor masking fills and selecting the victim to replace anotheris called policy hitmap for masking hit ways Only boththe tag and the domain id are the same will a cache hithappen Therefore DAWG allows read-only cache lines tobe replicated across ways for different protection domainFor a cache miss the victim can only be chosen withinthe ways belonging to the same domain id recorded bythe policy fillmap Consistently the replacement policy isupdated with the victim selection and the metadata derivedfrom the policy fillmap for different domains is updatedas well The paper also proposes the idea to dynamicallypartitions the cache ways following the systemrsquos workloadchanges but does not actually implement it
RIC Cache [22] (Relaxed Inclusion Caches) proposes a low-complexity cache to defend against eviction-based timing-based side-channel attacks on the LLC Normally for an
inclusive cache if the data R is in the LLC it is also inthe higher level cache and eviction of the R in the LLCwill cause the same data in the higher level cache eg L1cache to be invalidated making eviction-based attacks inthe higher level cache possible (eg attacker is able to evictvictimrsquos security critical cache line) For RIC each cacheline is extended with a single bit to set the relaxed inclusionOnce the relaxed inclusion is set for that cache line thecorresponding LLC line eviction will not cause the sameline in the higher-level cache to be invalidated Two kinds ofdata will be set relaxed inclusion bit read only data andthread private data when they are loaded into the cacheThese two kinds of data are claimed by the paper to cover allthe critical data for ciphers Therefore RIC will not preventwritable in-private critical data which is currently not foundin any ciphers Apart from that RIC requires flushing for thecorresponding cache lines in the cases that the RIC bits aremodified or for thread migration events to avoid the timingleakage during transition time
PL Cache [49] provides isolation by partitioning the cachebased on cache blocks It extends each cache block with aprocess ID and a lock status bit The process ID and thelock status bits are controlled by the extended load and storeinstructions (ld lockld unlock and st lockst unlock)which allow the programmer and compiler to set or reset thelock bit through use of the right load or store instructionIn terms of cache replacement policy for a cache hit PLcache will perform the normal cache hit handling procedureand the instructions with locking or unlocking capability canupdate the process ID and the lock status bits while the hit isprocessed When there is a cache miss locked data cannotbe evicted by data that is not locked and locked data amongdifferent processes cannot be evicted by each other In thiscase the new data will be either loaded or stored withoutcaching In other cases data eviction is possible This cachelisted extra software assumption as follows
Assumption 1 Security critical data is always preloadedinto the cache at the beginning of the whole programexecution
RP Cache [49] uses randomization to de-correlate thememory address accessing and timing of the cache Foreach block of RP cache there is a process ID and oneprotection bit P set to indicate if this cache block needs to beprotected or not A permutation table (PT) stores each cachesetrsquos pre-computed permuted set number and the numberof tables depends on number of protected processes Formemory access operations cache hits need both process IDand address to be the same When a cache miss happensto data D of a cache set S if the to-be-evicted data andto-be-brought-in data belong to the same process but have
J Hardw Syst Secur (2019) 3397ndash425 409
different protection bit arbitrary data of a random cache setSprime will be evicted and D will be accessed without cachingIf they belong to different processes D will be stored in anevicted cache block of Sprime and mapping of S and Sprime will beswapped as well Otherwise the normal replacement policyis executed
Newcache Cache [31 50] dynamically randomizesmemory-to-cache mapping It introduced a ReMappingTable (RMT) and the mapping between memory addressesand this RMT is as in a direct mapped cache while themapping between the RMT and actual cache is fully asso-ciative The index bits of memory address are used to lookup entries in the RMT to find the cache block that shouldbe accessed It stores the most useful cache lines rather thanhold a fixed set of cache lines This index stored in RMTcombined with the process ID is used to look up the actualcache where each cache line is associated with its real indexand process ID Each cache block is also associated witha protection bit (P) to indicate if it is security critical Forcache replacement policy it is very similar to RP cacheCache hit needs both process ID and address to be the sameWhen cache miss happens to data D arbitrary data willbe evicted and D will be accessed without caching if theybelong to the same process but either one of their protec-tion bit is set If the evicted data and brought-in data havedifferent process IDs D will randomly replace a cache linesince it is fully associative in the actual cache Otherwisethe normal replacement policy for direct mapped cache isexecuted
Random Fill Cache [30] de-correlates cache fills withthe memory access using random filling technique Newinstructions used by applications in Random Fill cachecan control if the requested data belongs to a normalrequest or a random fill request Cache hits are processedas in normal cache For the security critical data accessesof the victim a Nof ill request is executed and therequested data access will be performed without cachingMeanwhile on a Random Fill request arbitrary datafrom the range of addresses will be brought into the cacheIn the paper [30] the authors show that random fill ofspatially near data does not hurt performance For otherprocessesrsquo memory accesses and normal victimrsquos memoryaccesses Normal request will be used to achieve normalreplacement policy Victim and attacker are able to removevictimrsquos own security critical data including using clf lush
instructions or cache coherence protocol since the flush willnot influence timing-based side-channel attack prevention(the random filling technique is used for this)
CEASER Cache [38] is able to mitigate conflict-basedLLC timingndashbased side-channel attacks using address
encryption and dynamic remapping CEASER cache doesnot differentiate whom the address belongs to and whetherthe address is security critical When memory access triesto modify the cache state the address will first be encryptedusing Low-Latency BlockCipher (LLBC) [6] which notonly randomizes the cache set it maps but also scatters theoriginal possibly ordered and location-intensive addressesto different cache sets decreasing the probability of conflictmisses The encryption and decryption can be done withintwo cycles using LLBC Furthermore the encryption keywill be periodically changed to avoid key reconstructionThe periodic re-keying will cause the address remapping todynamically change
SCATTER Cache [51] uses cache set randomization toprevent timing-based attacks It builds upon two ideas Firsta mapping function is used to translate memory addressand process information to cache set indices the mappingis different for each program or security domain Secondthe mapping function also calculates a different index foreach cache way in a similar way to the skewed associativecaches [41] The mapping function can be keyed hash orkeyed permutation derivation functionmdasha different key isused for different application or security domain resultingin a different mapping from address to cache sets for eachSoftware (eg the operating system) is responsible formanaging the security domains and process IDs which areused to differentiate the different software and assign itdifferent keys for the mapping For the hardware extensiona cryptographic primitive such as hashing and an indexdecoder for each scattered cache way is added SCATTERcache also stores the index bits of the physical addressto efficiently perform lookups and writebacks There isalso one bit per page-table entry added to allow the kernelto communicate with the user space for security domainidentification
Non Deterministic Cache [23] uses cache access delay torandomize the relation between cache block access andcache access timing There is no differentiation of datacaching between different process ID or whether the datais secure or not A per-cache-block counter records theinterval of its data activeness and is increased on eachglobal counter clock tick when the data is untouched Whenthe counter reaches a predefined value the correspondingcache line will be invalidated Non Deterministic Cacherandomly sets the local countersrsquo initial value that is lessthan the maximum value of the global counter In thiscase the cache delay is changed to be randomized Cachedelay interval controlled by this non-deterministic executioncan lead to different cache hit and miss statistics becausethe invalidation is determined by the randomized counterof each cache line and therefore de-correlates any cache
J Hardw Syst Secur (2019) 3397ndash425410
access time from the address being accessed However theperformance degradation is tremendous
5 Analysis of the Secure Caches
In this section we manually evaluate the effectiveness ofthe 18 secure caches [7 10 12 22 23 25 27 29 30 3848ndash53 56 57] We analyze how well the different cachescan protect against the 72 types of vulnerabilities definedin Tables 2 and 3 which cover all the possible Strong(according to the definition in Section 3) cache timingndashbased vulnerabilities Following the analysis discuss whattypes of secure caches and features are best suited fordefending different types of timing-based attacks
51 Effectiveness of the Secure Caches AgainstTiming-Based Attacks
Tables 4 and 5 list the result of our analysis of whichcaches can prevent which types of attacks Some cachesare able to prevent certain vulnerabilities denoted by acheckmark and green color in the table For exampleSPlowast cache can defend against Vu Ad Vu (slow)(one type of Evict + Time [34]) vulnerability For someother caches and vulnerabilities the cache is not able toprevent the vulnerabilities and it is indicated by times and redcolor For example SecDCP cache cannot defend againstVu Va Vu (slow) (one type of Bernsteinrsquos Attack [3])vulnerability
Each cache is analyzed for each type of vulnerabilitylisted in Tables 2 and 3 A cache is judged to be able toprevent a type of cache timingndashbased vulnerability in threecases
1 A cache can prevent a timing attack if the timing ofthe last step in a vulnerability is always constant andthe attacker can never observe fast and slow timingdifference for the given set of three steps For instancein a regular set-associative cache the Vd Vu Aa
(fast) (one type of Flush + Reload [55]) vulnerabilitywill allow the attacker to know that address a mapsto secret u when the attacker observes fast timingcompared with observing slow timing in the other casesHowever in case of the RP cache [49] will make thetiming of the last step to be always slow because RPcache does not allow data of different processes toderive cache hit between each other
2 A cache can prevent a timing attack if the timingof last step is randomized and cannot have originalcorresponding relation between victimrsquos behavior andattackerrsquos observation For instance Ad Vu Ainv
d (fast) (one type of Prime + Probe Invalidation)
vulnerability when executed on a normal set-associativecache will allow the attacker to know that the addressd has the same index with secret u when observing fasttiming compared with slow timing in the other casesHowever when executing this attacks on the RandomFill cache [30] for example a slow timing will notdetermine that u and d have the same index as thesecret since in Random Fill cache u would be accessedwithout caching and another random data would becached instead
3 A cache can prevent a timing attack if it disallowscertain steps from the three-step model to be executedthus prevents the corresponding vulnerability Forinstance when PL cache [49] preloads and locks thesecurity critical data in the cache vulnerabilities suchas Ad Vu V inv
d (slow) (one type of Prime + TimeInvalidation) will not be possible since a preloadedlocked security critical data will not allow Ad in Step 1to replace it In this case Ad cannot be in the cache sothis vulnerability cannot be triggered in PL cache
From the security perspective the entries of the securecache in Tables 4 and 5 should have as many green coloredcells as possible If a cache design has any red cells thenit cannot defend against that type of vulnerabilitymdashattackerusing the timing-based vulnerability that corresponds to thered cell can attack the system
The third column in Tables 4 and 5 shows a normalset associative cache which cannot defend against anytype of timing-based vulnerabilities Meanwhile the lastcolumn of Tables 4 and 5 shows the situation where thecache is fully disabled As is expected the timing-basedvulnerabilities are eliminated and timing-based attacks willnot succeed Disabling caches however has tremendousperformance penalty Similarly second-to-last columnshows Nondeterministic Cache which totally randomizescache access time It can defend all the attacks but againwill have a tremendous cost to security when the applicationis complex
For each of the entry that shows the effectivenessof a secure cache against a vulnerability there are tworesults listed Left one is for normal execution and theright one is for speculative execution Some secure cachessuch as InvisiSpec cache target timing-based channels inspeculative execution For most of the caches that do notdifferentiate speculative execution and normal executionthe two sub-columns for each cache are the same
6 Secure Cache Techniques
Among the secure cache designs presented in the priorsection there are three main techniques that the caches
J Hardw Syst Secur (2019) 3397ndash425 411
Table4
Exi
stin
gse
cure
cach
esrsquo
prot
ectio
nag
ains
tal
lpo
ssib
letim
ing-
base
dvu
lner
abili
ties
with
last
step
tobe
mem
ory
acce
ssre
late
dop
erat
ions
Sin
gle
ina
gree
nce
llm
eans
this
cach
eca
npr
even
tth
eco
rres
pond
ing
vuln
erab
ility
i
na
pink
cell
mea
nsth
isca
che
can
prev
ent
the
corr
espo
ndin
gvu
lner
abili
tyin
som
ede
gree
A
timesin
are
dce
llm
eans
this
cach
eca
nnot
prev
ent
this
vuln
erab
ility
Fur
ther
mor
efo
rea
chca
che
we
anal
yze
norm
alex
ecut
ion
(lef
tcol
umn
unde
rth
eca
che
nam
e)an
dsp
ecul
ativ
eex
ecut
ion
(rig
htco
lum
nun
der
the
cach
ena
me)
a Dyn
amic
adju
stm
ento
fw
ays
for
diff
eren
tthr
eads
isas
sum
edto
bepr
oper
lyus
edac
cord
ing
toth
eru
nnin
gpr
ogra
mrsquos
cach
eus
age
bSo
me
soft
war
eas
sum
ptio
nslis
ted
inth
een
trie
sin
this
colu
mn
have
been
impl
emen
ted
byth
eca
chersquo
sre
late
dso
ftw
are
c Flus
his
disa
bled
but
cach
eco
here
nce
mig
htbe
used
todo
the
data
rem
oval
dFo
rL
1ca
che
and
TL
Bf
lush
ing
isdo
nedu
ring
cont
exts
witc
he T
hete
chni
ques
are
impl
emen
ted
inL
1ca
che
TL
Ban
dla
st-l
evel
cach
ew
hich
cons
isto
fth
ew
hole
cach
ehi
erar
chy
whe
reL
1ca
che
and
TL
Bre
quir
eso
ftw
are
flus
hpr
otec
tion
and
the
last
-lev
elca
che
can
beac
hiev
edby
sim
ple
hard
war
epa
rtiti
onin
gTo
prot
ecta
llle
vels
ofca
ches
the
soft
war
eas
sum
ptio
nsne
edto
bead
ded
f The
tech
niqu
eis
now
only
impl
emen
ted
inla
st-l
evel
cach
egT
hete
chni
que
now
only
targ
ets
shar
edca
che
hT
hete
chni
que
only
targ
ets
incl
usio
nla
st-l
evel
cach
ei T
hete
chni
que
targ
ets
data
cach
ehi
erar
chy
j For
the
last
-lev
elca
che
cach
eis
part
ition
edbe
twee
nth
evi
ctim
and
the
atta
cker
kT
hete
chni
que
can
cont
rolt
hepr
obab
ilitie
sof
the
vuln
erab
ility
tobe
succ
essf
ulto
beex
trem
ely
smal
ll T
hete
chni
que
can
wor
kin
shar
edr
ead
only
mem
ory
whi
leno
twor
king
insh
ared
wri
tabl
em
emor
ym
Ran
dom
dela
ybu
tnot
rand
omm
appi
ngca
non
lyde
crea
seth
epr
obab
ilitie
sof
atta
cker
inso
me
limite
dde
gree
J Hardw Syst Secur (2019) 3397ndash425412
Table5
Exi
stin
gse
cure
cach
esrsquop
rote
ctio
nag
ains
tall
poss
ible
timin
g-ba
sed
vuln
erab
ilitie
sw
ithla
stst
epto
bein
valid
atio
nre
late
dop
erat
ions
Sin
gle
ina
gree
nce
llm
eans
this
cach
eca
npr
even
tth
eco
rres
pond
ing
vuln
erab
ility
in
api
nkce
llm
eans
this
cach
eca
npr
even
tthe
corr
espo
ndin
gvu
lner
abili
tyin
som
ede
gree
Atimes
ina
red
cell
mea
nsth
isca
che
cann
otpr
even
tthi
svu
lner
abili
ty
Furt
herm
ore
for
each
cach
ew
ean
alyz
eno
rmal
exec
utio
n(l
eftc
olum
nun
der
the
cach
ena
me)
and
spec
ulat
ive
exec
utio
n(r
ight
colu
mn
unde
rth
eca
che
nam
e)
a Dyn
amic
adju
stm
ento
fw
ays
for
diff
eren
tthr
eads
isas
sum
edto
bepr
oper
lyus
edac
cord
ing
toth
eru
nnin
gpr
ogra
mrsquos
cach
eus
age
bSo
me
soft
war
eas
sum
ptio
nslis
ted
inth
een
trie
sin
this
colu
mn
have
been
impl
emen
ted
byth
eca
chersquo
sre
late
dso
ftw
are
c Flus
his
disa
bled
but
cach
eco
here
nce
mig
htbe
used
todo
the
data
rem
oval
dFo
rL
1ca
che
and
TL
Bf
lush
ing
isdo
nedu
ring
cont
exts
witc
he T
hete
chni
ques
are
impl
emen
ted
inL
1ca
che
TL
Ban
dla
st-l
evel
cach
ew
hich
cons
isto
fth
ew
hole
cach
ehi
erar
chy
whe
reL
1ca
che
and
TL
Bre
quir
eso
ftw
are
flus
hpr
otec
tion
and
the
last
-lev
elca
che
can
beac
hiev
edby
sim
ple
hard
war
epa
rtiti
onin
gTo
prot
ecta
llle
vels
ofca
ches
the
soft
war
eas
sum
ptio
nsne
edto
bead
ded
f The
tech
niqu
eis
now
only
impl
emen
ted
inla
st-l
evel
cach
egT
hete
chni
que
now
only
targ
ets
shar
edca
che
hT
hete
chni
que
only
targ
ets
incl
usio
nla
st-l
evel
cach
ei T
hete
chni
que
targ
ets
data
cach
ehi
erar
chy
j For
the
last
-lev
elca
che
cach
eis
part
ition
edbe
twee
nth
evi
ctim
and
the
atta
cker
kT
hete
chni
que
can
cont
rolt
hepr
obab
ilitie
sof
the
vuln
erab
ility
tobe
succ
essf
ulto
beex
trem
ely
smal
ll T
hete
chni
que
can
wor
kin
shar
edr
ead
only
mem
ory
whi
leno
twor
king
insh
ared
wri
tabl
em
emor
ym
Ran
dom
dela
ybu
tnot
rand
omm
appi
ngca
non
lyde
crea
seth
epr
obab
ilitie
sof
atta
cker
inso
me
limite
dde
gree
J Hardw Syst Secur (2019) 3397ndash425 413
utilize differentiating sensitive data partitioning andrandomization
Differentiating Sensitive Data (columns for CATalyst cacheto columns for Random Fill cache in Tables 4 and 5) allowsthe victim or attacker software or management software toexplicitly label a certain range of the data of victim whichthey think is sensitive The victim process or managementsoftware is able to use cache-specific instructions to protectthe data and limit internal interference between victimrsquosown data For example it is possible to disable victimrsquosown flushing of victimrsquos labeled data and therefore preventvulnerabilities that leverage flushing This technique allowsthe designer to have stronger control over security criticaldata rather than forcing the system to assume all of victimrsquosdata is sensitive However how to identify sensitive dataand whether this identification process is reliable are openresearch questions for caches that support differentiation ofsensitive data
This technique is independent of whether a cache usespartitioning or randomization techniques to eliminate sidechannels between the attacker and the victim Caches thatare able to label and identify sensitive data have theadvantage in preventing internal interference since they areable to differentiate sensitive data from the normal dataand can make use of special instructions to give moreprivileges to sensitive data However it requires careful usewhen identifying the actual sensitive data and implementingcorresponding security features on the cache
Comparing PL cache with SPlowast cache although bothof them use partitioning flush is able to be implementedto be disabled for victimrsquos sensitive data in PL cachewhere Vu V inv
a Vu (slow) (one type of Flush +Time) is prevented Newcache is able to prevent Vu Va Vu (slow) (one type of Bernsteinrsquos Attack [3])while most of the caches without ability to differentiatesensitive data cannot because Newcache disallows replacingdata as long as either data to be evicted or data to becached is identified to be sensitive However permittingdifferentiation of sensitive data can potentially backfire onthe cache itself For example Random Fill cache cannotprevent Vu Ad Vu (slow) (one type of Evict +Time [34]) which most of the other caches can prevent oravoid because the random fill technique loses its intendedrandom behavior when the security critical data is initiallyloaded into the cache in Step 1
Partitioning-Based Caches usually limit the victim and theattacker to be able to only access a limited set of cacheblock (columns for SPlowast cache to column for PL cachein Tables 4 and 5) For example either there is staticor dynamic partitioning of caches which allocates someblocks to High victim and Low attacker The partitioning
can be based not just on whether the memory access isvictimrsquos or attackerrsquos but also on where the access is to(eg High partition is determined by the data address) Forspeculative execution attackerrsquos code can be the part ofspeculation or out-of-order load or store which is able to bepartitioned (eg using speculative load buffer) from othernormal operations The partitioning granularity can be cachesets cache lines or cache ways Partitioning-based securecaches are usually able to prevent external interference bypartitioning but are weak at preventing internal interferenceWhen partitioning is used interference between the attackerand the victim or data belonging to different securitylevels should not be possible and attacks based on externalinterference between the victim and the attacker will failHowever the internal interference of victimrsquos own datais hard to be prevented by the partitioning-based cachesWhatrsquos more partitioning is recognized to be wastefulin terms of cache space and inherently degrades systemperformance [49] Dynamic partitioning can help limit thenegative performance and space impacts but it could be ata cost of revealing some information when adjusting thepartitioning size for each part It also does not help withinternal interference prevention
In terms of the three-step model the partitioning-basedcaches excel at making use of partitioning techniques todisallow the attacker to set initial states (Step 0) of victimpartition by use of flushing or eviction and therefore bringuncertainty to the final timing observation made by theattacker
SPlowast cache can prevent external miss-based interferencebut it still allows the victim and the attacker to getcache hits due to each otherrsquos data which makes hit-based vulnerabilities happen eg Vd Vu Va (fast)(one type of Cache Internal Collision [5]) vulnerabilityis one of the examples that SPlowast cache cannot preventSecVerilog cache is similar to SPlowast cache but prevents theattacker from directly getting cache hit due to victimrsquos datafor confidentiality and therefore prevents vulnerabilitiessuch as Ainv
a Vu Aa (fast) (one type of Flush+ Reload [55]) SHARP cache mainly uses partitioningcombined with random eviction to minimize the probabilityof evicting victimrsquos data and prevent external miss-basedvulnerabilities It is vulnerable to hit-based or internalinterference vulnerabilities such as Vu Va Vu (slow)(one type of Bernsteinrsquos Attack [3]) vulnerability DAWGcache will only allow the data to get a cache hit if bothits address and the process ID are the same Thereforecompared with normal partitioning cache such as SPlowast cacheit is able to prevent vulnerabilities such as Vd Vu Ainv
d
(fast) (one type of Prime + Flush)SecDCP and NoMo cache both leverage dynamic
partitioning to improve performance Compared withSecVerilog cache SecDCP cache introduces certain side
J Hardw Syst Secur (2019) 3397ndash425414
channels which manifest themselves when the number ofways assigned to the victim and attacker changes egVu Ainv
a Vu (slow) (one type of Flush + Time)vulnerability NoMo cache behaves more carefully whenchanging the number of ways during dynamic partitioninghowever it requires victimrsquos sensitive data to fit intothe assigned partitions otherwise it will be put into theunreserved way and allow eviction by the attacker SecDCPdoes not have unreserved way All the space in the cachewill be either belongs to High or Low partition
Sanctum cache and CATalyst cache are both controlledby a powerful software monitor and they disallow securepage sharing between victim and attacker to preventvulnerabilities such as Ad Vu Aa (fast) (onetype of Flush + Reload [55]) Sanctum cache does notconsider internal interference while CATalyst cache ismore carefully designed to prevent different vulnerabilitieswith the implemented software system so far supportingpreventing all of the vulnerabilities but only works forLLC and with high software implementation complexityand some assumptions that might be hard to achievein other scenarios eg assuming the secure partition isbig enough to fit all the secure data MI6 cache is thecombination of Sanctum and disabling speculation wheninteracting with the outside world Therefore in normalexecution it behaves the same as Sanctum For speculativeexecution because it will simply disable all the speculationwhen involving the outside world the external interferencevulnerability such as Vd Vu Ad (slow) (one type ofEvict + Probe) vulnerability will be prevented
InvisiSpec cache does not modify the original cachestate but places the data in a speculative buffer partitionduring the speculation or out-of-order load or store Sinceduring speculation cache state is not actually updated thespeculative execution cannot trigger any of the steps inthe three-step model RIC cache focuses on eviction basedattack and therefore are good at preventing even someinternal miss-based vulnerability such as Vu Va Vu
(slow) (one type of Bernsteinrsquos Attack [3]) but are bad atall hit-based vulnerabilities PL cache is line-partitionedand uses locking techniques for victimrsquos security criticaldata It can prevent many vulnerabilities because preloadingand locking secure data disallow the attacker or non-securevictim data to set initial states (Step 0) for victim partitionand therefore brings uncertainty to the final observation bythe attacker eg Ad Vu Va (fast) (one type of CacheInternal Collision [5]) vulnerability is prevented
Randomization-Based Caches (columns for SHARP cacheand columns for RP cache to columns for Non Deterministiccache in Tables 4 and 5) inherently de-correlate therelationship between information of victimrsquos securitycritical datarsquos address and observed timing from cache
hit or miss or between the address and observed timingof flush or cache coherence operations For speculativeexecution they also de-correlate the relationship betweenthe address of the data being accessed during speculativeexecution or out-of-order load or store and the observedtiming from a cache hit or miss Randomization can beused when bringing data into the cache evicting data orboth Some designs randomize the address to cache setmapping As a result of the randomization the mutualinformation from the observed timing due to having ornot having data in the cache could be reduced to 0 ifrandomization is done on every memory access Somesecure caches use randomization to avoid many of the miss-based internal interference vulnerabilities However theymay still suffer from hit-based vulnerabilities especiallywhen the vulnerabilities are related to internal interferenceHowever randomization is also likewise recognized toincrease performance overheads [23] It also requires afast and secure random number generator Most of therandomization is cache-line-based and can be combinedwith differentiation of sensitive data to be more efficient
RP cache allows eviction between different sensitivedata which leaves vulnerabilities such as Vu Va Vu
(slow) (one type of Bernsteinrsquos Attack [3]) still possiblewhile Newcache prevents this Both of the RP cacheand Newcache are not able to prevent hit-based internal-interference vulnerabilities such as Ainv
a Vu Va
(fast) (one type of Cache Internal Collision [5]) RandomFill cache is able to use total de-correlation of memoryaccess and cache access of victimrsquos security critical datato prevent most of the internal and external interferenceHowever when security critical data is initially directlyloaded into the cache block for Step 1 Random Fill cachewill not randomly load security critical data and allowsvulnerabilities such as Vu V inv
a Vu (slow) (one typeof Flush + Time) vulnerability to exist CEASER cache usesencryption scheme plus dynamic remapping to randomizemapping from memory addresses to cache sets Howeverthis targets eviction-based attacks and cannot prevent hit-based vulnerabilities such as Va V inv
u V inva (fast)
(one type of Flush + Probe Invalidation) SCATTER cacheencrypts both the cache address and process ID whenmapping into different cache index to further prevent morehit-based vulnerabilities for shared and read only memoryNon Deterministic Cache totally randomizes timing ofcache accesses by adding delays and can prevent all attacks(but at tremendous performance cost)
61 Estimated Performance and Security Tradeoffs
Table 6 shows the implementation and performanceresults of the secure caches as listed by the designersin the different papers At the extreme end there is
J Hardw Syst Secur (2019) 3397ndash425 415
Table6
Exi
stin
gse
cure
cach
esrsquo
impl
emen
tatio
nm
etho
dpe
rfor
man
cep
ower
and
area
sum
mar
y
J Hardw Syst Secur (2019) 3397ndash425416
the Non Deterministic cache with random delay thesecure cache can prevent all the cache timingndashbasedvulnerabilities in some degreemdashwhile their paper reportsonly 7 degradation in performance we expect it tobe much more for more complex application than AESalgorithm Disabling caches eliminates the attacks but ata huge performance cost Normally a secure cache needsto sacrifice some performance in order to de-correlatememory access with the timing The secure caches thattend to be able to prevent more vulnerabilities usuallyhave weaker performance compared with other securecaches For example more security seems to imply lessperformance
62 Towards Ideal Secure Cache
Based on the above analysis a good secure cache shouldconsider all the 72 types of Strong vulnerabilities egexternal and internal interference and hit-based and miss-based vulnerabilities Considering all factors and basedon Tables 4 and 5 we have several suggestions andobservations for a secure cache design which can defendtiming-based attacks
bull Internal interference is important for caches to preventtiming-based attacks and is the weak point of most ofthe secure caches To prevent this the following threesubpoints should be considered
ndash Miss-based internal interference can be solvedby randomly evicting data to de-correlatememory access with timing information wheneither data to be evicted or data to be cachedis sensitive eg Newcache prevents Vu Va Vu (slow) (one type of BernsteinrsquosAttack [3]) vulnerability
ndash Hit-based internal interference can be solvedby randomly bringing data into the cache egRandom Fill cache prevents Ad Vu Va
(fast) (Cache Internal Collision) vulnerabilityndash To limit internal interference at lower perfor-
mance cost rather than simply assume all ofvictimrsquos data is sensitive it is better to differ-entiate real sensitive data from other data in thevictim code However identification of sen-sitive information needs to be carefully usedeg Random Fill cache is vulnerable to Vu Ad Vu (fast) (one type of Evict + Time [34])vulnerability which most of the secure cachesare able to prevent
bull Direct partitioning between the victim and the attackeralthough may hurt cache space utilization or perfor-mance is good at disallowing attacker to set known
initial state to victimrsquos partition and therefore pre-vents external interference Alternatively careful use ofrandomization can also prevent external interference
It should be noted that some cache designs only focuson certain levels eg CATalyst cache only works at thelast level cache In order to fully protect the whole cachesystem from timing-based attacks all levels of caches in thehierarchy should be protected with related security featuresFor example Sanctum is able to prevent all levels of cachesfrom L1 to last-level cache Consequently secure cachedesign needs to be realizable at all levels of the cachehierarchy
7 RelatedWork
There are a lot of existing attacks exploring timing-based cache channels eg [1 3 5 16ndash19 34 36 5455] Furthermore our recent paper [11] has summarizedcache timingndashbased side-channel vulnerabilities using athree-step model and inspired this work on checkingwhich vulnerability types are truly defeated by the securecaches in context of timing-based attacks In other workZhang and Lee [59] used finite-state machine to modelcache architectures and leveraged mutual information tomeasure potential side-channel leakage of the modeledcache architectures Meanwhile He and Lee [20] modeledinterference using probabilistic information flow graph andused attackerrsquos success probability to estimate differentcachesrsquo ability to defend against some cache timingndashbasedside-channel attacks However they did not explore allpossible vulnerabilities due to cache timingndashbased channels
There is also some other work focusing on cacheside-channel verification [13 14 47] Among theseCacheAudit [14] efficiently computes possible side-channelobservations using abstractions in a modular way Bit-level and arithmetic reasoning is used in [13] for memoryaccesses in the presence of dynamic memory allocationCacheD [47] detects potential cache differences at eachprogram point leveraging symbolic execution and constraintsolving
Hardware transactional memory has also been lever-aged to prevent timing-based cache side-channel attacks [815] Hardware transactional memory (HTM) is availableon modern commercial processors such as Intelrsquos Transac-tional Synchronization Extensions (TSX) Its main featureis to abort the transaction and roll back the modificationswhenever a cache block contained in the read set or writeset is evicted out of the cache In [15] HTM was com-bined with preloading strategy for code and data to preventFlush + Reload attacks in the local setting and Prime andProbe attacks in the cloud setting In [8] the software-level
J Hardw Syst Secur (2019) 3397ndash425 417
solution targets system calls page faults code refactoringand abort reasoning to eliminate not only Prime + ProbeFlush + Reload but also Evict + time and Cache Collisionattacks
8 Conclusion
This paper first proposed a new three-step model in orderto model all possible cache timing vulnerabilities It furtherprovided a cache three-step simulator and reduction rulesto derive effective vulnerabilities allowing us to find onesthat have not been exploited in literature With exhaustiveeffective vulnerability types listed this paper presentedanalysis of 18 secure processor cache designs with respectto how well they can defend against these timing-basedvulnerabilities Our work showed that vulnerabilities basedon internal interference of the victim application aredifficult to protect against and many secure cache designsfail in this We also provided a summary of secure processorcache features that could be integrated to make an idealsecure cache that is able to defend timing-based attacksOverall implementing a secure cache in a processor can be aviable alternative to defend timing-based attacks Howeverit requires design of an ideal secure cache or correction ofexisting secure cache designs to eliminate the few attacksthat they do not protect against
Acknowledgments This work was support in part by NSF grants1651945 and 1813797 and through Semiconductor Research Corpo-ration (SRC) under GRC Task 2844001
Appendix A Attack Strategies Descriptions
This appendix gives overview of the attack strategies shownin Tables 2 and 3 in Section 3 For each attack strategy anoverview of the three steps of the strategy is given Some ofthe strategies are similar and some may not be precise butwe keep and use the original names as they were assignedin prior work One advantage of our three-step model isthat it gives precise definition of each attack Neverthelessthe attack strategy names used before (and added by us forstrategies which did not have such names) may be useful torecall the attacksrsquo high-level operation
Cache Internal Collision In Step 1 cache blockrsquos data isinvalidated by flushing or eviction done by either theattacker or the victim Then the victim accesses secret datain Step 2 Finally the victim accesses data at a knownaddress in Step 3 if there is a cache hit then it reveals thatthere is an internal collision and leaks value of u
Flush + Reload In Step 1 either the attacker or the victiminvalidates the cache blockrsquos data by flushing or evictionThen the victim access secret data in Step 2 Finally theattacker tries to access some data in Step 2 using a knownaddress If a cache hit is observed then addresses from lasttwo steps are the same and the attacker learns the secretaddress This strategy has similar Step 1 and Step 2 asCache Internal Collision vulnerability but for Step 3 it isthe attacker who does the reload access
Reload + Time (New Name Assigned in this Paper) In Step
1 secret data is invalidated by the victim Then the attackerdoes some known data access in Step 2 that could possiblybring back the invalidated the victimrsquos secret data in Step
1 In Step 3 if the victim reloads the secret data a cachehit is observed and the attacker can derive the secret datarsquosaddress
Flush + Probe (New Name Assigned in this Paper) In Step
1 the victim or the attacker access some known addressIn Step 2 the victim invalidates secret data In Step 3reloading of Step 1rsquos data and observation of a cache misswill help the attacker learn that the secret data maps to theknown address from Step 1
Evict + Time In Step 1 some victimrsquos secret data is put intothe cache by the victim itself In Step 2 the attacker evicts aspecific cache set by performing a memory related operationthat is not a flush In Step 3 the victim reloads secret dataand if a cache miss is observed the will learn the secretdatarsquos cache set information This attack has similar Step 1and Step 3 as Flush + Time vulnerability but for Step 2 inEvict + Time the attacker invalidates some known addressallowing it to find the full address of the secret data insteadof evicting a cache set to only find the secret datarsquos cacheindex as in the Flush + Time attack
Prime + Probe In Step 1 the attacker primes the cache setusing data at address known to the attacker In Step 2 thevictim accesses the secret data which possibly evicts datafrom Step 1 In Step 3 the attacker probes each cache setand if a cache miss is observed the attacker knowns thesecret data maps to the cache set he or she primed
Bernsteinrsquos Attack This attack strategy leverages the vic-timrsquos internal interference to trigger the miss-based attackFor one case the victim does the same secret data access inStep 1 and Step 3 while in Step 2 the victim tries to evictone whole cache setrsquos data by known data accesses If cachemiss is observed in Step 3 that will tell the attacker thecache set is the one secret data maps to For another casethe victim primes and probe a cache set in Step 1 and Step
3 driven by the attacker while in Step 2 the victim tries to
J Hardw Syst Secur (2019) 3397ndash425418
access the secret data Similar to the first case observingcache miss in Step 3 tells the attacker the cache set is theone secret data maps to
Evict + Probe (NewName Assigned in this Paper) In Step 1victim evict the cache set using the access to a data at anaddress known to the attacker In Step 2 the victim accessessecret data which possibly evicts data from Step 1 In Step
3 the attacker probes each cache set using the same data asin Step 1 if a cache miss is observed the attacker knownsthe secret data maps to the cache set he or she primed Thisattack strategy has similar Step 2 and Step 3 as Prime +Probe attack but for Step 1 it is the victim that does theeviction accesses
Prime + Time (NewName Assigned in this Paper) In Step 1the attacker primes the cache set using access to data at anaddress known to the attacker In Step 2 the victim accessessecret data which possibly evicts data from Step 1 In Step
3 the victim probes each cache set using the same data Step
1 if a cache miss is observed the attacker knowns the secretdata maps to the cache set he or she primed in Step 1 Thisattack strategy has similar Step 1 and Step 2 as Prime +Probe attack but for Step 3 it is the victim that does theprobing accesses
Flush + Time (NewNameAssigned in this Paper) The victimaccesses the same secret data in Step 1 and Step 3 whilein Step 2 the attacker tries to invalidate data at a knownaddress If cache miss is observed in Step 3 that will tellthe attacker the data address he or she invalidated in Step 2maps to the secret data
Invalidation related (new names assigned in this paper)Vulnerabilities that have names ending with ldquoinvalidationrdquoin Table 3 correspond to the vulnerabilities that havethe same name (except for the ldquoinvalidationrdquo part) inTable 2 The difference between each set of correspondingvulnerabilities is that the vulnerabilities ending withldquoinvalidationrdquo use invalidation related operation in the laststep to derive the timing information rather than the normalmemory access related operations
Appendix B Soundness Analysisof the Three-StepModel
In this section we analyze the soundness of the three-stepmodel to demonstrate that the three-step model can coverall possible timing-based cache vulnerabilities in normalcaches If there is a vulnerability that is represented usingmore than three steps the steps can be reduced to only three
steps or a three-step sub-pattern can be found in the longerrepresentation
In the below analysis we use β to denote the numberof memory-related operations ie steps in a representationof a vulnerability We show that β = 1 is not sufficient torepresent a vulnerability β = 2 covers some vulnerabilitiesbut not all β = 3 represents all the vulnerabilities andβ gt 3 can be reduced to only three steps or a three-step sub-pattern can be found in the longer representation Knownaddresses refer to all the cache states that interferencewith the data a aalias and d Unknown address refersu An access to a known memory address is denoted asknown access operation and an invalidation of a knownmemory address is denoted as known inv operationThe known access operation and known inv operation
together make up not u operations An unknown memoryrelated operation (containing u) is denoted as u operation
B1 Patterns with β = 1
When β = 1 there is only one memory-relatedoperation and it is not possible to create interferencebetween memory-related operations since two memory-related operations are the minimum requirement for aninterference Furthermore β = 1 corresponds to the three-step pattern with both Step 1 and Step 2 being since thecache state gives no information and Step 3 being theone operation These types of patterns are all examined bythe cache three-step simulator and none of these types arefound to be effective Consequently a vulnerability cannotexit when β = 1
B2 Patterns with β = 2
When β = 2 it satisfies the minimum requirementof an interference for memory related operations andcorresponds to the three-step cases where Step 1 is and Step 2 and Step 3 are the two operations Thesetypes are all examined by the cache three-step simulatorand some of them belong to Weak Vulnerabilities like Aa Vu Therefore three-step cases where Step
1 is have corresponding effective vulnerabilities shownin Table 2 Consequently β = 2 can represent someweak vulnerabilities but not all vulnerabilities as there existsome that are represented with three steps as discussednext
B3 Patterns with β = 3
When β = 3 we have tested all possible combinations ofthree-step memory related operations in Section 33 usingour cache simulator for the three-step model We foundthat there are in total 72 types of Strong Vulnerabilities
J Hardw Syst Secur (2019) 3397ndash425 419
and 64 types of Weak Vulnerabilities that are representedby patterns with β = 3 steps Consequently β = 3can represent all the vulnerabilities (including some weakones where Step 1 is ) Using more steps to representvulnerabilities is not necessary as discussed next
B4 Patterns with β gt 3
When β gt 3 the pattern of memory-related operations fora vulnerability can be reduced using the following rules
B41 Subdivision Rules
First a set of subdivision rules is used to divide the longpattern into shorter patterns following the below rulesEach subdivision rule should be applied recursively beforeapplying the next rule
Subdivision Rule 1 If the longer pattern contains a sub-pattern such as the longer pattern canbe divided into two separate patterns where is assignedas Step 1 of the second pattern This is because givesno timing information and the attacker loses track of thecache state after This rule should be recursively applieduntil there are no sub-patterns left with a in the middleor as last step ( in the last step will be deleted) in thelonger pattern
Subdivision Rule 2 Next if a pattern (derived afterrecursive application of the Rule 1 contains a sub-patternsuch as AinvVinv the longer pattern canbe divided into two separate patterns where AinvVinv isassigned as Step 1 of the second pattern This is becauseAinvVinv will flush all the timing information of thecurrent block and it can be used as the flushing step forStep 1 eg vulnerability Ainv Vu Aa(f ast)shown in Table 2 AinvVinv cannot be a candidate formiddle steps or the last step because it will flush alltiming information making the attacker unable to deducethe final timing with victimrsquos sensitive address translationinformation This rule should be recursively applied untilthere are no sub-patterns left with a AinvVinv in themiddle or the last step (AinvVinv in the last step will bedeleted)
B42 Simplification Rules
For each of the patterns resulting from the subdivisionof the original pattern we define Commute Rules UnionRules and Reduction Rules for a each set of two adjacentsteps in these remaining patterns In Table 7 we showall the possible cases of the rule applying conditions for
each adjacent two steps regardless of the attackerrsquos access(A) or the victimrsquos access (V ) The table shows whetherthe corresponding two steps can be commuted reduced orunioned (and the reduced or the unioned result if the rulescan be applied)
B421 Commute Rules
Suppose there are two adjacent steps M and N for a memorysequences M N If commuting M andN lead to the same observation result ie M N and N M will have thesame timing observation information in the final step for theattacker we can freely exchange the place of M and N inthis pattern In this case we have more chance to Reduceand Union the steps within the memory sequence by thefollowing Rules In the possible commuting process we willtry every possible combinations to commute different pairsof two steps that are able to apply the Commute Rules andthen further apply Reduce Rules and Union Rules to seewhether the commute is effective ie there can be stepsreduced or unioned after the proper commuting process Thefollowing two adjacent memory-related operations can becommuted
ndash Commute Rule 1 For two adjacent steps if one stepis a known access operation and another step is aknown inv operation and the addresses they referto are different these two steps can be commuted nomatter which position of the two steps they are in withinthe whole memory sequence It will show a ldquoyesrdquo forthe corresponding two-step pattern for the CommuteRule 1 column if these two can be commuted in Table 7
ndash Commute Rule 2 A superset of two-step patterns thatcan apply Commute Rule 1 can be commuted if thesecond step of these two adjacent steps is not thelast step in the whole memory sequence There aresome two adjacent steps that can only be commutedif the second step of these two adjacent steps is notthe last step in the whole memory sequence Therewill be a ldquoyesrdquo for the corresponding two-step patternfor the Commute Rule 2 column and a ldquonordquo for thecorresponding two-step pattern for the Commute Rule 1column in Table 7
B422 Reduction Rules
If the memory sequence after applying Commute Ruleshave a sub-pattern that has two adjacent steps both relatedto known addresses or both related to unknown address(including repeating states) the two adjacent steps can bereduced to only one following the reduction rules (if thetwo-step pattern has ldquoyesrdquo for the Column ldquoUnion Rule or
J Hardw Syst Secur (2019) 3397ndash425420
Table7
Rul
esfo
rco
mbi
ning
two
adja
cent
step
s
Firs
tSe
cond
Com
mute
Com
mute
Union
Rule
orC
ombi
ned
Firs
tSe
cond
Com
mute
Com
mute
Union
Rule
orC
ombi
ned
Rule1
Rule2
ReduceRule
Step
Rule1
Rule2
ReduceRule
Step
aa
yes
yes
yes
aa
inv
ano
noye
sa
aa
ali
as
nono
yes
aali
as
ain
va
ali
as
yes
yes
yes
aali
as
ad
nono
yes
da
inv
dye
sye
sye
sd
au
nono
nominus
ain
vu
nono
nominus
aa
inv
nono
yes
ain
va
inv
ain
vye
sye
sye
sa
inv
aa
ali
asin
vye
sye
sye
sa
ain
va
ali
asin
vye
sye
sye
sU
nio
n(a
inva
ali
asin
v)
ad
inv
yes
yes
yes
aa
inv
din
vye
sye
sye
sU
nio
n(a
invd
inv)
au
inv
nono
nominus
ain
vu
inv
noye
sno
minusa
ali
as
ano
noye
sa
aali
asin
va
yes
yes
yes
a
aali
as
aali
as
yes
yes
yes
aali
as
aali
asin
va
ali
as
nono
yes
aali
as
aali
as
dno
noye
sd
aali
asin
vd
yes
yes
yes
d
aali
as
uno
nono
minusa
ali
asin
vu
nono
nominus
aali
as
ain
vye
sye
sye
sa
ali
as
aali
asin
va
inv
yes
yes
yes
Unio
n(a
inva
ali
asin
v)
aali
as
aali
asin
vno
noye
sa
ali
asin
va
ali
asin
va
ali
asin
vye
sye
sye
sa
ali
asin
v
aali
as
din
vye
sye
sye
sa
ali
as
aali
asin
vd
inv
yes
yes
yes
Unio
n(d
inva
ali
asin
v)
aali
as
uin
vno
nono
minusa
ali
asin
vu
inv
noye
sno
minusd
ano
noye
sa
din
va
yes
yes
yes
a
da
ali
as
nono
yes
aali
as
din
va
ali
as
yes
yes
yes
aali
as
dd
yes
yes
yes
dd
inv
dno
noye
sd
du
nono
nominus
din
vu
noye
sno
minusd
ain
vye
sye
sye
sd
din
va
inv
yes
yes
yes
Unio
n(a
invd
inv)
da
ali
asin
vye
sye
sye
sd
din
va
ali
asin
vye
sye
sye
sU
nio
n(d
inva
ali
asin
v)
dd
inv
nono
yes
din
vd
inv
din
vye
sye
sye
sd
inv
du
inv
noye
sno
minusd
inv
uin
vno
yes
nominus
ua
nono
nominus
uin
va
nono
nominus
ua
ali
as
nono
nominus
uin
va
ali
as
nono
nominus
ud
nono
nominus
uin
vd
noye
sno
minusu
uye
sye
sye
su
uin
vu
nono
yes
u
ua
inv
nono
nominus
uin
va
inv
noye
sno
minusu
aali
asin
vno
nono
minusu
inv
aali
asin
vno
yes
nominus
ud
inv
noye
sno
minusu
inv
din
vno
yes
nominus
uu
inv
nono
yes
uin
vu
inv
uin
vye
sye
sye
su
inv
J Hardw Syst Secur (2019) 3397ndash425 421
Reduce Rulerdquo and has no Union result for the ldquoCombinedSteprdquo Column in Table 7
ndash Reduction Rule 1 For two u operations although u isunknown both of the accesses target on the same u socan be reduced to only keep the second access in thememory sequence
ndash Reduction Rule 2 For two known adjacent memoryaccessndashrelated operations (known access operation)they always result in a deterministic state of the secondmemory accessndashrelated cache block so these two stepscan be reduced to only one step
ndash Reduction Rule 3 For two adjacent steps if onestep is known access operation and another one isknown inv operation no matter what order theyhave and the address they refer to is the same these twocan be reduced to one step which is the second step
B423 Union Rules
Suppose there are two adjacent steps M and N for memorysequences M N If combining M andN leads to the same timing observation result ie M N and Union(M N) will havethe same timing observation information in the final step forthe attacker we can combine step M and N to be a joint onestep for this memory sequence defined as Union(M N)Two adjacent steps that can be combined are discussed inthe following cases
ndash Union Rule 1 Two invalidations to two knowndifferent memory addresses can be applied UnionRule 1 known inv operation are two operations bothinvalidating some known address therefore they canbe combined to only one step The Union Rule can becontinuously done to union all the adjacent invalidationstep that invalidates known different memory addresses
B424 Final Check Rules
Each long memory sequence will recursively apply thesethree categorizations of the rules in the order Com-mute Rules first to put known access operations andknown inv operation that targets the same address as nearas possible and u operations and not u operations areputting together as much as possible The Reduced Rulesare then checked and applied to the processed memorysequence to reduce the steps Then the Union Rule isapplied to the processed memory sequence
The recursion at each application to these threecategorizations of the rules should be always applied andreduce at least one step until the resulting sequence matchesone of the two possible cases
bull the long (β gt 3) memory sequence with u operation
and not u operation is further reduced to a sequencewhere there are at most three steps in the followingpatterns or less
ndash u operation not u operation u operation
ndash not u operation u operation not u operation
There might be possible extra or AinvV inv beforethese three-step pattern where
bull An extra in the first step will not influencethe result and can be directly removed
bull If an extra AinvV inv in the first step
ndash If followed by known access ope-ration AinvV inv can be removeddue to the actual state further put intothe cache block
ndash If followed by known inv opera-t ion or V inv
u AinvV inv can also beremoved since the memory locationis repeatedly flushed by the two steps
ndash If followed by Vu worst case will beAinvV inv Vu not u operation u operationwhich is either an effective vul-nerability according to Table 2and reduction rules shown inSection 33 or AinvV inv Vu Ainv
d V invd u operation where
Vu Ainvd V inv
d can further beapplied Commute Rule 2 to reduceand be within three steps
In this case the steps are finally within three steps andthe checking is done
bull There exist two adjacent steps that cannot be appliedany Rules above and requires the Rest Checking
The only left two adjacent steps that cannot be appliedby any of the three categorizations of the Rules are thefollowing
ndash AaVaAaalias Vaalias AdVdAinva V inv
a
Ainvaalias V inv
aalias Vu ndash AaVaAaalias Vaalias V inv
u ndash Vu AaVaAaalias Vaalias AdVd
Ainva V inv
a Ainvaalias V inv
aalias ndash V inv
u AaVaAaalias Vaalias We manually checked all of the two adjacent step patterns
above and found that adding extra step before or after these
J Hardw Syst Secur (2019) 3397ndash425422
Algorithm 2 -Step ( 3) pattern reduction
Input number of steps of the pattern
a two-dimensional dynamic-size array [0] contains the states of each step of the original pattern in
order [1] [2] are empty initially
Output reduced effective vulnerability pattern(s) array It will be an empty list if the original pattern does not
correspond to an effective vulnerability
1 = Oslash
2 while contain( ) and index not 0 do3 = 1 ( )
4 end while5 while ( contain( ) and index not 0) or ( contain( ) and index not 0) do6 = 2 ( )
7 end while8 while ( is ineffecitve or has interval effective three steps) do9 = ( )
10 = ( )
11 = ( )
12 if ( is ineffecitve or has interval effective three steps) then13 += ( )
14 end if15 end while16 return
two steps can either generate two adjacent step patterns thatbe processed by the three Rules where further step can bereduced or construct effective vulnerability according toTable 2 and reduction rules shown in Section 33 wherethe corresponding pattern can be treated effective and thechecking is done
B43 Algorithm for Reducing and CheckingMemorySequence
Algorithm 2 is used to (i) reduce a β-step (β gt 3)pattern to a three-step pattern thus demonstrating that thecorresponding β gt 3 step pattern actually is equivalent tothe output three-step pattern and represents a vulnerabilitythat is captured by an existing three-step pattern or (ii)demonstrate that the β-step pattern can be mapped to one ormore three-step vulnerabilities It is not possible for a β-stepvulnerability pattern to not be either (i) or (ii) after doing theRule applications Key outcome of our analysis is that anyβ-step pattern is not a vulnerability or if it is a vulnerabilityit maps to either outputs (i) or (ii) of the algorithm
Inside the Algorithm 2 contain() represents a function tocheck if a list contains a corresponding state is ineffective()represents a function that checks the corresponding memorysequence does not contain any effective three-stepshas interval effective three steps() represents a functionthat check if the corresponding memory sequence can bemapped to one or more three-step vulnerabilities
B44 Summary
In conclusion the three-step model can model all possibletiming-based cache vulnerability in normal caches Vulner-abilities which are represented by more than three steps canbe always reduced to one (or more) vulnerabilities from ourthree-step model and thus using more than three step is notnecessary
References
1 Acıicmez O Koc CK (2006) Trace-driven cache attacks on AES(short paper) In International conference on information andcommunications security Springer pp 112ndash121
2 Agrawal D Archambeault B Rao JR Rohatgi P (2002) TheEM side-channel (s) In International workshop on cryptographichardware and embedded systems Springer pp 29ndash45
3 Bernstein DJ (2005) Cache-timing attacks on AES4 Binkert N Beckmann B Black G Reinhardt SK Saidi A Basu
A Hestness J Hower DR Krishna T Sardashti S et al (2011)The gem5 simulator ACM SIGARCH computer architecture news39(2)1ndash7
5 Bonneau J Mironov I (2006) Cache-collision timing attacksagainst AES In International workshop on cryptographichardware and embedded systems Springer pp 201ndash215
6 Borghoff J Canteaut A Guneysu T Kavun EB Knezevic MKnudsen LR Leander G Nikov V Paar C Rechberger Cet al (2012) Princendasha low-latency block cipher for pervasivecomputing applications In International conference on the theoryand application of cryptology and information security Springerpp 208ndash225
J Hardw Syst Secur (2019) 3397ndash425 423
7 Bourgeat T Lebedev I Wright A Zhang S Devadas S et al (2019)MI6 Secure enclaves in a speculative out-of-order processorIn Proceedings of the 52nd Annual IEEEACM InternationalSymposium on Microarchitecture ACM pp 42ndash56
8 Chen S Liu F Mi Z Zhang Y Lee RB Chen H Wang X(2018) Leveraging hardware transactional memory for cache side-channel defenses In Proceedings of the 2018 on Asia conferenceon computer and communications security ACM pp 601ndash608
9 Clark SS Ransford B Rahmati A Guineau S Sorber J Xu WFu K (2013) Wattsupdoc power side channels to nonintrusivelydiscover untargeted malware on embedded medical devices InPresented as part of the 2013 USENIX workshop on healthinformation technologies
10 Costan V Lebedev IA Devadas S (2016) Sanctum minimalhardware extensions for strong software isolation In USENIXsecurity symposium pp 857ndash874
11 Deng S Xiong W Szefer J (2018) Cache timing side-channel vulnerability checking with computation tree logic InProceedings of the 7th international workshop on hardware andarchitectural support for security and privacy ACM p 2
12 Domnitser L Jaleel A Loew J Abu-Ghazaleh N Ponomarev D(2012) Non-monopolizable caches low-complexity mitigation ofcache side channel attacks ACM Transactions on Architectureand Code Optimization (TACO) 8(4)35
13 Doychev G Kopf B (2017) Rigorous analysis of softwarecountermeasures against cache attacks In Proceedings of the 38thACM SIGPLAN conference on programming language design andimplementation ACM pp 406ndash421
14 Doychev G Kopf B Mauborgne L Reineke J (2015) CacheAudita tool for the static analysis of cache side channels ACM Trans-actions on information and system security (TISSEC) 18(1)4
15 Gruss D Lettner J Schuster F Ohrimenko O Haller I CostaM (2017) Strong and efficient cache side-channel protectionusing hardware transactional memory In USENIX securitysymposium pp 217ndash233
16 Gruss D Maurice C Wagner K Mangard S (2016) Flush+ flusha fast and stealthy cache attack In International conference ondetection of intrusions and malware and vulnerability assessmentSpringer pp 279ndash299
17 Gruss D Spreitzer R Mangard S (2015) Cache template attacksautomating attacks on inclusive last-level caches In USENIXsecurity symposium pp 897ndash912
18 Guanciale R Nemati H Baumann C Dam M (2016) Cache stor-age channels alias-driven attacks and verified countermeasuresIn 2016 IEEE symposium on security and privacy (SP) IEEEpp 38ndash55
19 Gullasch D Bangerter E Krenn S (2011) Cache gamesndashbringingaccess-based cache attacks on AES to practice In 2011 IEEEsymposium on security and privacy (SP) IEEE pp 490ndash505
20 He Z Lee RB (2017) How secure is your cache against side-channel attacks In Proceedings of the 50th annual IEEEACMinternational symposium on microarchitecture ACM pp 341ndash353
21 Intel C (2015) Improving real-time performance by utilizing cacheallocation technology Intel Corporation
22 Kayaalp M Khasawneh KN Esfeden HA Elwell J Abu-Ghazaleh N Ponomarev D Jaleel A (2017) RIC relaxed inclusioncaches for mitigating LLC side-channel attacks In 2017 54thACMEDACIEEE design automation conference (DAC) IEEEpp 1ndash6
23 Keramidas G Antonopoulos A Serpanos DN Kaxiras S (2008)Non deterministic caches a simple and effective defense againstside channel attacks Des Autom Embed Syst 12(3)221ndash230
24 Kessler RE Hill MD (1992) Page placement algorithms for largerealindexed caches ACM Trans Comput Syst (TOCS) 10(4)338ndash359
25 Kiriansky V Lebedev I Amarasinghe S Devadas S Emer J(2018) DAWG a defense against cache timing attacks in spec-ulative execution processors In 2018 51st Annual IEEEACMinternational symposium on microarchitecture (MICRO) IEEEpp 974ndash987
26 Kocher P Genkin D Gruss D Haas W Hamburg M LippM Mangard S Prescher T Schwarz M Yarom Y (2018)Spectre attacks exploiting speculativeexecution In 2019 IEEESymposium on Security and Privacy (SP) IEEE pp 1ndash19
27 Lee RB Kwan P McGregor JP Dwoskin J Wang Z (2005)Architecture for protecting critical secrets in microprocessorsIn ACM SIGARCH computer architecture news vol 33 IEEEComputer Society pp 2ndash13
28 Lee Y Waterman A Avizienis R Cook H Sun C StojanovicV Asanovic K (2014) A 45nm 13 GHz 167 double-precisionGFLOPSW RISC-V processor with vector accelerators InEuropean solid state circuits conference (ESSCIRC) ESSCIRC2014-40th IEEE pp 199ndash202
29 Liu F Ge Q Yarom Y Mckeen F Rozas C Heiser G Lee RB(2016) CATalyst defeating last-level cache side channel attacks incloud computing In 2016 IEEE international symposium on highperformance computer architecture (HPCA) IEEE pp 406ndash418
30 Liu F Lee RB (2014) Random fill cache architecture In2014 47th annual IEEEACM international symposium onmicroarchitecture (MICRO) IEEE pp 203ndash215
31 Liu F Wu H Mai K Lee RB (2016) Newcache secure cachearchitecture thwarting cache side-channel attacks IEEE Micro36(5)8ndash16
32 Luk CK Cohn R Muth R Patil H Klauser A Lowney G WallaceS Reddi VJ Hazelwood K (2005) Pin building customizedprogram analysis tools with dynamic instrumentation In ACMsigplan notices vol 40 ACM pp 190ndash200
33 Masti RJ Rai D Ranganathan A Muller C Thiele L Capkun S(2015) Thermal covert channels on multi-core platforms In 24thUSENIX security symposium (USENIX security 15) pp 865ndash880
34 Osvik DA Shamir A Tromer E (2006) Cache attacks andcountermeasures the case of AES In Cryptographersrsquo track atthe RSA conference Springer pp 1ndash20
35 Patel A Afram F Chen S Ghose K (2011) MARSS Afull system simulator for multicore x86 CPUs In 2011 48thACMEDACIEEE design automation conference (DAC) IEEEpp 1050ndash1055
36 Percival C (2005) Cache missing for fun and profit BSDCan 2005Ottawa 2005 httpwwwdaemonologynetpapershttpdf
37 Hammarlund P Martinez AJ Bajwa AA Hill DL Hallnor EJiang H Dixon M Derr M Hunsaker M Kumar R Osborne RB(2014) Haswell The fourth-generation intel core processor IEEEMicro 1034(2)6ndash20
38 Qureshi MK (2018) CEASER mitigating conflict-based cacheattacks via encrypted-address and remapping In 2018 51Stannual IEEEACM international symposium on microarchitecture(MICRO) IEEE pp 775ndash787
39 Sanchez D Kozyrakis C (2013) ZSIm fast and accuratemicroarchitectural simulation of thousand-core systems In ACMSIGARCH computer architecture news vol 41 ACM pp 475ndash486
40 Schwarz M Schwarzl M Lipp M Gruss D (2018) NetSpectreread arbitrary memory over network In European Symposium onResearch in Computer Security Springer pp 279ndash299
41 Seznec A (1993) A case for two-way skewed-associative cachesACM SIGARCH computer architecture news 21(2)169ndash178
42 Sharkey J Ponomarev D Ghose K (2005) M-sim a flexiblemultithreaded architectural simulation environment Techenicalreport Department of Computer Science State University ofNew York at Binghamton In Technical Report 20012 CompaqComputer Corporation Aug 2001
J Hardw Syst Secur (2019) 3397ndash425424
43 Shivakumar P Jouppi NP (2001) Cacti 30 an integrated cachetiming power and area model In Technical Report 20012Compaq Computer Corporation Aug 2001
44 Taylor G Davies P Farmwald M (1990) The TLB slice-alow-cost high-speed address translation mechanism In 17thannual international symposium on computer architecture 1990Proceedings IEEE pp 355ndash363
45 Thoziyoor S Muralimanohar N Ahn JH Jouppi NP (2008)CACTI 51 Technical Report HPL-2008-20 HP Labs
46 Trippel C Lustig D Martonosi M (2018) MeltdownPrimeand SpectrePrime automatically-synthesized attacks exploitinginvalidation-based coherence protocols arXiv180203802
47 Wang S Wang P Liu X Zhang D Wu D (2017) CacheDidentifying cache-based timing channels in production softwareIn 26th USENIX security symposium USENIX association
48 Wang Y Ferraiuolo A Zhang D Myers AC Suh GE (2016)SecDCP secure dynamic cache partitioning for efficient timingchannel protection In 2016 53nd ACMEDACIEEE designautomation conference (DAC) IEEE pp 1ndash6
49 Wang Z Lee RB (2007) New cache designs for thwarting softwarecache-based side channel attacks In ACM SIGARCH computerarchitecture news vol 35 ACM pp 494ndash505
50 Wang Z Lee RB (2008) A novel cache architecture withenhanced performance and security In 2008 41st IEEEACMinternational symposium on Microarchitecture 2008 MICRO-41IEEE pp 83ndash93
51 Werner M Unterluggauer T Giner L Schwarz M Gruss D Man-gard S (2019) ScatterCache thwarting cache attacks via cache setrandomization In 28th USENIX security symposium (USENIXSecurity 19) USENIX Association Santa Clara CA httpswwwusenixorgconferenceusenixsecurity19presentationwerner
52 Yan M Choi J Skarlatos D Morrison A Fletcher C Torrellas J(2018) InvisiSpec making speculative execution invisible in thecache hierarchy In 2018 51st annual IEEEACM internationalsymposium on microarchitecture (MICRO) IEEE pp 428ndash441
53 Yan M Gopireddy B Shull T Torrellas J (2017) Secure hierarchy-aware cache replacement policy (SHARP) defending againstcache-based side channel attacks In Proceedings of the 44thannual international symposium on computer architecture ACMpp 347ndash360
54 Yao F Doroslovacki M Venkataramani G (2018) Are coherenceprotocol states vulnerable to information leakage In 2018IEEE international symposium on high performance computerarchitecture (HPCA) IEEE pp 168ndash179
55 Yarom Y Falkner K (2014) FLUSH+ RELOAD a high resolutionlow noise L3 cache side-channel attack In USENIX securitysymposium pp 719ndash732
56 Zhang D Askarov A Myers AC (2012) Language-based controland mitigation of timing channels ACM SIGPLAN Not 47(6)99ndash110
57 Zhang D Wang Y Suh GE Myers AC (2015) A hardware designlanguage for timing-sensitive information-flow security In ACMSIGARCH computer architecture news vol 43 ACM pp 503ndash516
58 Zhang S Wright A Bourgeat T Arvind A (2018) Composablebuilding blocks to open up processor design In 2018 51stannual IEEEACM international symposium on microarchitecture(MICRO) IEEE pp 68ndash81
59 Zhang T Lee RB (2014) New models of cache architecturescharacterizing information leakage from cache side channels InProceedings of the 30th annual computer security applicationsconference ACM pp 96ndash105
60 Zhang Y Parikh D Sankaranarayanan K Skadron K Stan M(2003) HotLeakage a temperature-aware model of subthresholdand gate leakage for architects University of Virginia Dept ofComputer Science Tech Report CS-2003 5
Publisherrsquos Note Springer Nature remains neutral with regard tojurisdictional claims in published maps and institutional affiliations
J Hardw Syst Secur (2019) 3397ndash425 425
operations and modeling of the cache attacks Speculativeexecution has gotten increased attention due to recentSpectre [26] attacks many of which depend on timingchannels to actually extract informationmdashspeculation aloneis not enough for most of these attacks Our model considerstiming channels in general independent of whether it is aside or a covert channel
In the process of development of the improved three-step model we have uncovered 43 types of timing-basedvulnerabilities which have not been previously exploited(in addition there are 29 types that map to attacks alreadyknown in literature) We cannot directly compare the typesof vulnerabilities found in this work and in our priorwork [11] due to the improved and different categorizationsof the states of the cache block
To address the threat of the prior cache timingndashbasedattacks to date 18 different secure cache designs havebeen presented in academic literature [7 10 12 22 2325 27 29 30 38 48ndash53 56 57] The secure processorcaches are designed with different assumptions and oftenaddress only specific types of timing-based side-channel orcovert-channel attacks To help analyze the security of thesedesigns this work uses our three-step modeling approach toreason about all the possible timing-based vulnerabilitiesEspecially since our work demonstrates a number of newtiming-based attacks the existing secure caches have neverbeen analyzed with respect to these new attacks beforeFor this work we manually reviewed and analyzed the 18existing secure cache designs [7 10 12 22 23 25 2729 30 38 48ndash53 56 57] in terms of the security featuresand implementations Most of these designs do not havepublicly available hardware implementation source code soautomatic analysis of the caches is not possible
Based on the analysis we summarize cache featuresthat help improve security Especially we propose that anldquoidealrdquo secure caches and processor architectures shouldprovide new features to let software explicitly label memoryloads or stores of sensitive data and differentiate them fromnormal loads and stores so sensitive data can be efficientlyidentified and protected by the hardware The caches canuse partitioning to isolate the attacker and the victim andprevent the attacker from being able to set the victimrsquoscache blocks into a known state which is needed by manyattacks To mitigate attacks based on internal interferencethe caches can use randomization to de-correlate the datathat is accessed and the data that is placed in the cache Moredetails of the possible defenses are discussed in Section 5and Section 6
11 Contributions
The new contributions of this work over [11] are as follows
bull A new formulation of the three-step model with newcache states and derivation of a new set of typesfor covering all the cache timingndashbased vulnerabilities(Section 3)
ndash Inclusion of cache coherence issues into thethree-step mode
ndash Expansion of the three-step model to considerboth cases of normal and speculative executionattacks
ndash Design of reduction rules and cache three-step simulator to automatically derive theexhaustive list of all the three steps which mapto effective vulnerabilities and elimination ofthree-step patterns which do not map to apotential attack
bull Overview of the 18 secure cache designs that have beenpresented in academic literature (Section 4)
bull Manual evaluation of 18 secure processor cache designsto determine how they can help prevent timing-basedattacks and analysis of security features secure cachesused (Sections 5 and 6)
bull Discussion of ldquoidealrdquo secure caches and the featuresthey would need (Section 6)
bull Attack strategies description and comparison amongdifferent attack strategies (Appendix A)
bull Analysis of the soundness of the three-step model andwhy three-steps are able to describe all timing-basedvulnerabilities (Appendix B)
2 Cache TimingndashBased Attacksand the Threat Model
Modern processor caches are known to be vulnerable totiming-based attacks The timing of the memory accessesvaries due to cachesrsquo operation For example a cache hitis fast while a cache miss is slow The cache coherenceprotocol can also change the cache states and affect thetiming of the memory operations The cache coherencemay invalidate a cache block from a remote core resultingin a cache miss in the local core for example Alsothe timing of cache flush operations varies depending onwhether the data to be flushed is in the cache or notFlushing an address using clf lush with valid data in thecache is slow while flushing an address not in the cacheis fast for example From these timing differences ofmemory-related operations the attacker can infer a datarsquosspecific memory address or corresponding cache indexvalue and thus learn some information about the victimrsquossecrets
J Hardw Syst Secur (2019) 3397ndash425398
21 Threat Model
This work focuses only on timing-based attacks in processorcaches Numerous other types of side and covert channelsthat do not use timing or caches exist eg power-based [9]EM-based [2] (including RF) thermal-based [33] and inprocessor channels based on features such as power state ofthe AVX unit [40] for example This work aims to exploremain cache attacks only but similar approach can be donefor the other buffers or cache-like structures which may betarget of attack once main processor caches are secured
In our threat model an attackerrsquos objective is to retrievevictimrsquos secret information using timing-based channels inthe processor cache Specifically we consider the situationwhere the victim accesses an address u and the addressdepends on some secret information The address u is withinsome set of physical memory locations x which are knownto the attacker The goal of the attacker is to obtain theaddress u or at least partial bits of it which relate to the cacheindex of the address
We assume the attacker knows some of the source codeof the victim Especially the attacker can only learn someinformation 1 about the address u from the timing channelsbut with knowledge of the source code he or she can furtherinfer the likely specific value of u and thus infer the secrethe or she is trying to learn
The attacker cannot directly access any data in the statemachine of the cache logic nor directly read the data of thevictim if the two are not sharing the same address spaceThe attacker can however observe its own timing or thetiming of the victim process And the attacker knows howthe timing of the memory-related operations depends on thecache states
The attacker further is able to force the victim to executea specific function For example the attacker can requestvictim to decrypt a specific piece of data thus triggering thevictim to execute a function that makes use of a secret keyhe or she wants to learn The victim in the cache attacks canbe user software code in an enclave operating system oranother virtual machine
The processor microarchitecture and the operatingsystem are assumed to be able to differentiate between thevictim and the attacker in different processes by assigningdifferent process IDs If the victim and the attacker are inthe same process eg attacker is a malicious library theywill have the same process ID The system software (egoperating system or hypervisor) is responsible for properly
1For a hit-based vulnerabilities the attacker is able to learn thefull address of the victimrsquos sensitive data while for the miss-basedvulnerabilities the attacker usually can learn the cache index ofthe victimrsquos sensitive data For more details of these vulnerabilitiesrsquocategorizations please refer to Section 333
setting up virtual memory (page tables) and assigning IDswhich may be used by the hardware to identify differentthreads processes or virtual machines When analyzingsecure cache designs the system software is consideredtrusted and bug-free The attacker is also assumed not to beable to undermine the physical implementation or changethe hardware eg he or she cannot influence randomnessgenerated by any random number generators in hardwarePhysical or invasive attacks are not in scope of this workFor secure cache designs which add new instructionsfor security-related operations the victim process ormanagement software is assumed to correctly use theseinstructions During speculative execution the cache statecan be modified by the instructions executed speculativelyunless a processor cache architecture explicitly prevents orforbids certain speculative accesses
22 Side and Covert Channels
This work focuses on both side and covert channels Covertchannels use the same methods as side channels but theattacker controls both the sender and the receiver side ofthe channel All types of side-channel attacks are equallyapplicable to covert channels For brevity we just use theterm ldquovictimrdquo in the text to represent both the victim (forside channels) and the sender (for covert channels)
23 Hyperthreading Versus Timing-Slice Sharing
When the hyperthreading is supported in a system theattacker and the victim are able to run on different threadsin parallel instead of running once every time slice (whenno hyperthreading is used) Our model can be applied toboth of the scenarios since our model abstracts away howthe sharing happens
3Modeling of the Cache TimingndashBasedSide-Channel Vulnerabilities
This section explains how we developed the three-stepmodeling approach and used it to model the behavior ofthe cache logic and to enumerate all the possible cachetimingndashbased vulnerabilities
31 Introduction of the Three-StepModel
We have observed that all of the existing cache timingndashbased attacks can be modeled with three steps of memory-related operations Here ldquomemory-related operationrdquo refersto loads stores or different flushes that can be done by thevictim or the attacker on the same core or different coresWhen the victim and the attacker are on different cores
J Hardw Syst Secur (2019) 3397ndash425 399
cache coherence will also be triggered when one of thememory-related operations is performed
The three-step model has three steps as the nameimplies In Step 1 a memory operation is performedplacing the cache in an initial state known to the attacker(eg a new piece of data at some address is put into thecache or the cache block is invalidated) Then in Step 2a second memory operation alters the state of the cachefrom the initial state Finally in Step 3 a final memoryoperation is performed and the timing of the final operationreveals some information about the relationship among theaddresses from Step 1 Step 2 and Step 3
For example in Flush + Reload [55] attack in Step
1 a cache block is flushed by the attacker In Step 2security critical data is accessed by for example victimrsquosAES encryption operation In Step 3 the same cache blockas the one flushed in Step 1 will be accessed and the time ofthe access will be measured by the attacker If the victimrsquossecret-dependent operation in Step 2 accesses the cacheblock in Step 3 there will be a cache hit and fast timing
of the memory operation will be observed and the attackerlearns the victimrsquos secret address
To model all the timing-based attacks we write the threesteps as Step 1 Step2 Step3 which represents asequence of steps taken by the attacker or the victim Tosimplify the model we focus on memory-related operationaffecting one single cache block (also called cache slotcache entry or cache line) Cache block is the smallestunit of the cache Since all the cache blocks are updatedfollowing the same cache state machine logic it is sufficientto consider only one cache block
32 States of the Three-StepModel
When modeling the attacks we propose that there are 17possible states for a cache block Table 1 lists all the 17possible states of the cache block for each step in our three-step model and their formal definitions Figure 1 graphicallyshows for each possible state how the memory locationmaps to the cache block
Table 1 The 17 possible states for a single cache block in our three-step model
State Description
Vu A memory location u belonging to the victim is accessed and is placed in the cache block by the victim (V) Attacker doesnot know u but u is from a set x of memory locations a set which is known to the attacker It may have the same index as a
or aalias and thus conflict with them in the cache block The goal of the attacker is to learn the index of the address u Theattacker does not know the address u hence there is no Au in the model
Aa or Va The cache block contains a specific memory location a The memory location is placed in the cache block due to a memoryaccess by the attacker Aa or the victim Va The attacker knows the address a independent of whether the access was by thevictim or the attacker themselves The address a is within the range of sensitive locations x The address a is known to theattacker
Aaalias or Vaalias The cache block contains a memory address aalias The memory location is placed in the cache block due to a memory accessby the attacker Aaalias or the victim Vaalias The address aalias is within the range x and not the same as a but it has the sameaddress index and maps to the same cache block ie it ldquoaliasesrdquo to the same block The address aalias is known to the attacker
Ad or Vd The cache block contains a memory address d The memory address is placed in the cache block due to a memory access bythe attacker Ad or the victim Vd The address d is not within the range x The address d is known to the attacker
Ainv or V inv The cache block is now invalid The data and its address are ldquoremovedrdquo from the cache block by the attacker Ainv or thevictim V inv as a result of cache block being invalidated eg this is a cache flush of the whole cache
Ainva or V inv
a The cache block state can be anything except a in this cache block now The data and its address are ldquoremovedrdquo from thecache block by the attacker Ainv
a or the victim V inva Eg by using a flush instruction such as clf lush that can flush specific
address or by causing certain cache coherence protocol events that force a to be removed from the cache block The addressa is known to the attacker
Ainvaalias or V inv
aalias The cache block state can be anything except aalias in this cache block now The data and its address are ldquoremovedrdquo fromthe cache block by the attacker Ainv
aalias or the victim V invaalias Eg by using a flush instruction such as clf lush that can flush
specific address or by causing certain cache coherence protocol events that force aalias to be removed from the cache blockThe address aalias is known to the attacker
Ainvd or V inv
d The cache block state can be anything except d in this cache block now The data and its address are ldquoremovedrdquo from thecache block by the attacker Ainv
d or the victim V invd Eg by using a flush instruction such as clf lush that can flush specific
address or by causing certain cache coherence protocol events that force d to be removed from the cache block The addressd is known to the attacker
V invu The cache block state can be anything except u in the cache block The data and its address are ldquoremovedrdquo from the cache
block by the victim V invu as a result of cache block being invalidated eg by using a flush instruction such as clf lush or
by certain cache coherence protocol events that force u to be removed from the cache block The attacker does not know uTherefore the attacker is not able to trigger this invalidation and Ainv
u does not exist in the model
Any data or no data can be in the cache block The attacker has no knowledge of the memory address in this cache block
J Hardw Syst Secur (2019) 3397ndash425400
Fig 1 The 17 possible states for a single cache block in our three-step model a Vu b AaVaAaalias Vaalias AdVd c AinvV inv dAinv
a V inva Ainv
aalias V invaalias A
invd V inv
d e V invu f )
In each sub-figure of Fig 1 the left-most part shows thepossible state being described in the sub-figure The middlepart shows the possible situation of the cache state affectedby each For all sub-figures the middle cache block (shownin bold) is the targeted cache block Right-most part showsthe memory region in relation to the cache block Recallthe addresses a and aalias are within the sensitive set ofaddresses x while d is outside the set of sensitive addresses(for simplicity the set is shown as a contiguous region butit can be any set) Also recall A represents the operationsperformed by the attacker and V represents the victimrsquosoperations
Figure 1a shows the description of the possible state Vuwhere address u is within sensitive set and unknown to theattacker Therefore it can possibly map to any cache blockincluding the target cache block shown in the middle Sinceits position in the cache and specific address is unknown
we show Vu in dashed lines Meanwhile Fig 1e shows thedescription of the possible state V inv
u which is result ofthe victim invalidating data at the sensitive address u andpossibly invalidating some address within sensitive regionFurther Fig 1f shows the description of the possible statelowast which represents null knowledge of the address for theattacker to this corresponding cache block Therefore it canpossibly refers to any address in the memory or no validaddress at all
Figure 1b shows the description of the possible stateAaVaAaalias Vaalias AdVd Their addresses are all knownto the attacker and map to the same targeted cache blockBoth a and aalias are within the sensitive set of addressesx and aalias as its name indicates is a different addressthan a but still within set x and maps to the same cacheblock as a Address d is outside of the set x MeanwhileFigure 1d shows the description of the possible state
J Hardw Syst Secur (2019) 3397ndash425 401
Ainva V inv
a Ainvaalias V inv
aalias Ainvd V inv
d which correspondto invalidation of the address shown in the subscript ofthe state Some additional possible invalidation statesAinvV inv are shown in Figure 1c These states indicate novalid address is in the cache block Therefore all the possi-ble addresses that mapped to this cache block eg a aalias d and u (if it mapped to this block) before the invalidationstep AinvV inv will be flushed back to the memory
33 Derivation of All Cache TimingndashBasedVulnerabilities
With the 17 candidate states shown in Table 1 for eachstep there are in total 17 lowast 17 lowast 17 = 4913 combinationsof three steps We developed a cache three-step simulatorand a set of reduction rules to process all the three-step combinations and decide which ones can indicate areal attack As is shown in Fig 2 the exhaustive list ofthe 4913 combinations will first be input to the cachethree-step simulator where the preliminary classificationof vulnerabilities is derived The effective vulnerabilitieswill then be sent as the input to the reduction rules toremove the redundant three steps and obtain final list ofvulnerabilities
331 Cache Three-Step Simulator
We developed a cache three-step simulator that simulatesthe state of one cache block and derives the attackerrsquosobservations in the last step of the three-step patterns that itanalyzes for different possible u Since u is in secure rangex the possible candidates of u for a cache block are a aalias
and NIB (Not-In-Block) Here NIB indicates the case thatu does not have same index as a or aalias and thus does notmap to this cache block
The cache three-step simulator is implemented inPython script and its pesudo implementation is shown inAlgorithm 1 Simulatorrsquos inputs are 17 possible states foreach of the step Outputs are all the vulnerabilities thatbelong to the Strong or the Weak type or the Ineffective typeThe simulator uses a nested for loop to check all possiblecombinations (4913) of the three step pattern For each step
Exhaustive List
of all possible
three-step
combinations
Cache
Three-Step
Simulator
Preliminary Strong Vulnerability
Preliminary WeakVulnerability
Ineffective Three-Step
Reduction
Rules
StrongVulnerability
WeakVulnerability
Classification Step
Reduction Step
Vulnerability TypesVulnerability Types
4913
132
572
4209
72
64
Fig 2 Procedure to derive the effective types of three-step timing-based vulnerabilities Ovals refer to the number of vulnerabilitiesin each category
of each pattern if it is Vu this step will be extended to beone of three candidates Va Vaalias and VNIB If it is V inv
u this step will be extended to be one of three candidatesV inv
a V invaalias and V inv
NIB We wrote a function output timingthat takes three known memory access steps as input andoutput whether fast or slow timing will be observed forthe last step In this case for each of the u-related steprsquoscandidate we can derive a timing observation Using thesetiming observation function judge type decides whether athree-step pattern is a potential vulnerability by analyzingwhether the attacker is able to observe different andunambiguous timing for different values of u
The simulator categorizes all the three-step patternsinto three categories as listed below Figure 3 showstwo examples for the Strong Vulnerability (a b) WeakVulnerability (c d) and Ineffective Three-Step (e f)categories respectively
1 Strong Vulnerability When a fast or slow timing isobserved by the attacker he or she is able to uniquelydistinguish the value of u (either it maps to someknown address or has the same index with some knownaddress) In this case the vulnerability has stronginformation leakage (ie attacker can directly obtainthe value of u based on the observed timing) Wecategorize these vulnerabilities to be strong Eg forVd Vu Aa vulnerability shown in Fig 3aif u maps to a the attacker will always derive fasttiming If u is aalias or NIB slow timing will beobserved This indicates that the attacker is able tounambiguously infer the victimrsquos behavior (u) from thetiming observation
2 Weak Vulnerability When fast or slow timing isobserved by the attacker he or she knows it correspondsto more than one possible value of u (eg a or aalias)For these vulnerabilities timing variation can still beobserved due to different victimrsquos behavior Howeverthe attacker cannot learn the value of the index ofthe address u unambiguously For example for type Vu Ainv
a shown in Fig 3c when fast timingis observed u possibly maps to aalias or NIB (thereason for the possibility of u mapping to NIB to derivefast timing is that due to the in Step 1 the cachecould have a hit and then Aa would result in a cachehit) On the other hand when slow timing is observedu possibly maps to a or NIB This pattern leads touncertain u guess about value of u based on timingobservation
3 Ineffective Three-Step The remaining types are treatedto be ineffective For example for type Aa Vu Ad
shown in Fig 3f no matter what the value of u isattackerrsquos observation is always slow timing
J Hardw Syst Secur (2019) 3397ndash425402
Pseudo-code for the cache three-step simulator algorithm
Input a list containing 17 possible states for each of the step
Output a list containing all the vulnerabilities that belong to the Strong type
a list containing all the vulnerabilities that belong to the Weak type
a list containing all the ineffective typs
1 for 1 do2 for 2 do3 for 3 do4 1 2 3
5 array to store all possible candidate combinations of this three-step pattern
6 array to store all possible timing observation regading different candidate combinations for this
three-step pattern
7 if 0 or 1 or 2 then8 for 3 rsquos candidates are and rsquos candidates are
and Both candidatersquos number is 3 do9 0
1 2
10 end for11 for 3 do12
13 end for14 if then15 strongappend(steps)
16 else17 if then18
19 else20
21 end if22 end if23 else24
25 continue26 end if27 end for28 end for29 end for
Algorithm 1
After computing the type of all the three-step patternsthe cache three-step simulator will output effective (StrongVulnerability or Weak Vulnerability) three-step patternsDue to the space limit we only list and analyze the Strongvulnerabilities in this paper Weak vulnerabilities are left forfuture work when channels with smaller channel capacitiesare desired to be analyzed
332 Reduction Rules
We also have developed rules that can further reduce theoutput list of all the effective three steps from the cache
three-step simulator Figure 2 shows how the output of thesimulator is filtered through the reduction rules to get thefinal list of vulnerabilities Reductionrsquos goal is to removevulnerabilities of repeating or redundant types from the liststo form effective Strong Vulnerability or Weak Vulnerabilityoutput A script was developed that automatically appliesbelow reduction rules to the output of the simulator to getthe final list of vulnerabilities A three-step combinationwill be eliminated if it satisfies one of the below rules
1 Three-step patterns with two adjacent steps which arerepeating or which are both known to the attacker canbe eliminated eg Ad Aa Vu can be reduced
J Hardw Syst Secur (2019) 3397ndash425 403
Behavior (u)a
aalias
Observation
NIB
fast
slow
Vd Vu Aa
Behavior (u)a
aalias
Observation
NIB
fast
slow
Vu Aainv
Behavior (u)a
aalias
Observation
NIB
fast
slow
Vd Vuinv Vd
(a)
(c)
(e)
Behavior (u)a
aalias
Observation
NIB
fast
slow
Vu Ad Vuinv
Behavior (u)a
aalias
Observation
NIB
fast
slow
Aaaliasinv Vu
inv Va
Behavior (u)a
aalias
Observation
NIB
fast
slow
Aa Vu Ad
(b)
(d)
(f)
Eg Eg
Eg Eg
Eg Eg
Fig 3 Examples of relations between victimrsquos behavior (u) andattackerrsquos observation for each vulnerability type a b StrongVulnerability c d Weak Vulnerability e f Ineffective Three-Step
to Aa Vu which is equivalent to Aa VuTherefore Ad Aa Vu is a repeat type of Aa Vu and can be eliminated
2 Three-step patterns with a step involving a knownaddress a and an alias to that address aalias givesthe same information Thus three step combinationswhich only differ in use of a or aalias cannot representdifferent attacks and only one combination needs tobe considered For example Vu Aaalias Vu is arepeat type of Vu Aa Vu and we will eliminatethe first pattern
3 Three-step patterns with steps Vu and V invu in adjacent
consecutive steps with each other will only keep thelatter step and eliminate the first step For exampleAa Vu V inv
u can be reduced to Aa V invu and
further equivalent to Aa V invu So Aa Vu
V invu can be eliminated
333 Categorization of StrongVulnerabilities
As is shown in Fig 2 after applying the reduction rulesthere are 72 types of Strong vulnerabilities remaining InAppendix B we analyze the soundness of the three-stepmodel to demonstrate that the three-step model can cover allpossible cache timingndashbased side-channel vulnerabilitiesAnd if there is a vulnerability it can always be reduced toa model that requires only three steps Table 2 lists all the
vulnerability types of which the last step is a memory accessand Table 3 shows all the vulnerability types of which thelast step is an invalidation-related operation To ease theunderstanding of all the vulnerability types we group thevulnerabilities based on attack strategies (left-most columnin Tables 2 and 3) these strategies correspond to well-known names for the attacks if such exist otherwise weprovide a new name In Appendix A we provide descriptionfor each attack strategy to show the main idea behind themWe use existing names for attack strategies where suchexisted before even if similar attacks eg attacks differingin only one step have been given different names before Weuse these established names to avoid confusion but detailsome of the similarities in Appendix A as a clarification
The list of vulnerability types can be further collectedinto four simple macro types which cover one or morevulnerability types internal interference miss-based (IM)internal interference hit-based (IH) external interferencemiss-based (EM) external interference hit-based (EH) aslabeled in the Macro Type column of Tables 2 and 3 Allthe types of vulnerabilities that only involve the victimrsquosbehavior V in the states in Step 2 and Step 3 are calledinternal interference vulnerabilities (I) The remaining onesare called external interference (E) Some vulnerabilitiesallow the attacker to learn that the address of the victimaccesses map to the set the attacker is attacking by observingslow timing due to a cache miss or f ast timing due toinvalidation of data not in the cache2 We call these miss-based vulnerabilities (M) The remaining ones leverageobservation of f ast timing due to a cache hit or slow timingdue to an invalidation of an address that is currently valid inthe cache and are called hit-based vulnerabilities (H)
Many vulnerability types have been explored before Forexample the Cache Collision attack [5] is effectively basedon the Internal Collision and it maps to types labeled (2)in the Attack column in Tables 2 and 3 The types labelednew correspond to new attack not previously discussedin literature We believe these 43 are new attacks notpreviously analyzed nor known
4 Secure Caches
Having explained the three-step model we now explorethe various secure caches which have been presented inliterature to date [7 10 12 22 23 25 27 29 30 38 48ndash53 56 57] Later in Section 5 we will apply the three-stepmodel to check if the secure caches can defend some or allof the vulnerabilities in our model
2Invalidation is fast when the corresponding address which is to beinvalidated does not exist in the cache since no operation is needed forthe invalidation
J Hardw Syst Secur (2019) 3397ndash425404
Table 2 The table shows all the cache timing-based cache vulnerabilities where the last step is a memory access related operation The AttackStrategy column gives a common name for each set of one or more specific vulnerabilities that would be exploited in an attack in a similar mannerThe Vulnerability Type column gives the three steps that define each vulnerability For Step 3 fast indicates a cache hit must be observed toderive sensitive address information while slow indicates a cache miss must be observed The Macro Type column proposes the categorizationthe vulnerability belongs to ldquoErdquo is for external interference vulnerabilities ldquoIrdquo is for internal interference vulnerabilities ldquoMrdquo is for miss-basedvulnerabilities ldquoHrdquo is for hit-based vulnerabilities The Attack column shows if a type of vulnerability has been previously presented in literature
Attack Strategy Vulnerability Type Macro Type Attack
Step 1 Step 2 Step 3
Cache Internal Collision Ainv Vu Va (fast) IH (2)
V inv Vu Va (fast) IH (2)
Ad Vu Va (fast) IH (2)
Vd Vu Va (fast) IH (2)
Aaalias Vu Va (fast) IH (2)
Vaalias Vu Va (fast) IH (2)
Ainva Vu Va (fast) IH (2)
V inva Vu Va (fast) IH (2)
Flush + Reload Ainva Vu Aa (fast) EH (5)
V inva Vu Aa (fast) EH (5)
Ainv Vu Aa (fast) EH (5)
V inv Vu Aa (fast) EH (5)
Ad Vu Aa (fast) EH (5)
Vd Vu Aa (fast) EH (5)
Aaalias Vu Aa (fast) EH (5)
Vaalias Vu Aa (fast) EH (5)
Reload + Time V invu Aa Vu (fast) EH new
V invu Va Vu (fast) IH new
Flush + Probe Aa V invu Aa (slow) EM (6)
Aa V invu Va (slow) IM new
Va V invu Aa (slow) EM new
Va V invu Va (slow) IM new
Evict + Time Vu Ad Vu (slow) EM (1)
Vu Aa Vu (slow) EM (1)
Prime + Probe Ad Vu Ad (slow) EM (4)
Aa Vu Aa (slow) EM (4)
Bernsteinrsquos Attack Vu Va Vu (slow) IM (3)
Vu Vd Vu (slow) IM (3)
Vd Vu Vd (slow) IM (3)
Va Vu Va (slow) IM (3)
Evict + Probe Vd Vu Ad (slow) EM newVa Vu Aa (slow) EM new
Prime + Time Ad Vu Vd (slow) IM newAa Vu Va (slow) IM new
Flush + Time Vu Ainva Vu (slow) EM new
Vu V inva Vu (slow) IM new
(1) Evict + Time attack [34] (2) Cache Internal Collision attack [5] (3) Bernsteinrsquos attack [3] (4) Prime + Probe attack [34 36] Alias-drivenattack [18] (5) Flush + Reload attack [54 55] Evict + Reload attack [17] (6) SpectrePrime MeltdownPrime attack [46]
J Hardw Syst Secur (2019) 3397ndash425 405
Table 3 The table shows the second part of the timing-based cache side-channel vulnerabilities where the last step is an invalidation relatedoperation For Step 3 fast indicates no corresponding address of the data is invalidated while slow indicates invalidation operation makes somedata invalid causing longer processing time
Attack Vulnerability Type Macro Attack
Step 1 Step 2 Step 3
Strategy Type
Cache Internal Collision Invalidation Ainv Vu V inva (slow) IH new
V inv Vu V inva (slow) IH new
Ad Vu V inva (slow) IH new
Vd Vu V inva (slow) IH new
Aaalias Vu V inva (slow) IH new
Vaalias Vu V inva (slow) IH new
Flush + Flush Ainva Vu V inv
a (slow) IH (1)
V inva Vu V inv
a (slow) IH (1)
Ainva Vu Ainv
a (slow) EH (1)
V inva Vu Ainv
a (slow) EH (1)
Flush + Reload Invalidation Ainv Vu Ainva (slow) EH new
V inv Vu Ainva (slow) EH new
Ad Vu Ainva (slow) EH new
Vd Vu Ainva (slow) EH new
Aaalias Vu Ainva (slow) EH new
Vaalias Vu Ainva (slow) EH new
Reload + Time Invalidation V invu Aa V inv
u (slow) EH new
V invu Va V inv
u (slow) IH new
Flush + Probe Invalidation Aa V invu Ainv
a (fast) EM new
Aa V invu V inv
a (fast) IM new
Va V invu Ainv
a (fast) EM new
Va V invu V inv
a (fast) IM new
Evict + Time Invalidation Vu Ad V invu (fast) EM new
Vu Aa V invu (fast) EM new
Prime + Probe Invalidation Ad Vu Ainvd (fast) EM new
Aa Vu Ainva (fast) EM new
Bernsteinrsquos Invalidation Attack Vu Va V invu (fast) IM new
Vu Vd V invu (fast) IM new
Vd Vu V invd (fast) IM new
Va Vu V inva (fast) IM new
Evict + Probe Invalidation Vd Vu Ainvd (fast) EM new
Va Vu Ainva (fast) EM new
Prime + Time Invalidation Ad Vu V invd (fast) IM new
Aa Vu V inva (fast) IM new
Flush + Time Invalidation Vu Ainva V inv
u (fast) EM new
Vu V inva V inv
u (fast) IM new
(1) Flush + Flush attack [16]
J Hardw Syst Secur (2019) 3397ndash425406
This section gives brief overview of the 18 secure cachedesigns that have been presented in academic literaturein the last 15 years To the best of our knowledge thesecover all the secure cache designs proposed to date Mostof the designs have been realized in functional simulationeg [23 48] Some have been realized in FPGA eg [7]and a few have been realized in real ASIC hardwareeg [31] No specific secure caches have been implementedin commercial processors to the best of our knowledgehowever CATalyst [29] leverages Intelrsquos CAT (CacheAllocation Technology) technology available today in IntelXeon E5 2618L v3 processors and could be deployed today
When the secure cache description in the cited papersdid not mention the issue of using flush or cache coherencewe assume the victim or the attacker cannot invalidateeach otherrsquos cache blocks by using clf lush instructions orthrough cache coherence protocol operations but they canflush or use cache coherence to invalidate their own cachelines The victim and the attacker also cannot invalidateprotected or locked data Further if the authors specified anyspecific assumptions (mainly about the software) we list theassumption as part of the description of the cache Whatrsquosmore when the level of cache hierarchy was unspecifiedwe assume the secure cachesrsquo features can be appliedto all levels of caches including L1 cache L2 cacheand Last Level Cache (LLC) If the inclusivity of thecaches was not specified we assume they target inclusivecaches Following the below descriptions of each securecache design the analysis of the secure caches is given inSection 5
SPlowast Cache [20 27]3 uses partitioning techniques tostatically partition the cache ways into High and Lowpartition for the victim and the attacker according to theirdifferent process IDs The victim typically belongs to Highsecurity and attacker belongs to Low security Victimrsquosmemory accesses cannot modify Low partition (assigned toprocesses such as the attacker) while the attackerrsquos memoryaccesses cannot modify High partition (assigned to thevictim) However the memory accesses of both the victimand the attacker can result in a hit in either Low or Highpartition if the data is in the cache
SecVerilog Cache [56 57] statically partitions cacheblocks between security levels L (Low) and H (High)Each instruction in the source code for programs usingSecVerilog cache needs to include a t iming label whicheffectively represents whether the data being accessed by
3Two existing papers give slightly different definitions for an ldquoSPrdquocache thus we selected to define a new cache the SPlowast cachethat combines secure cache features of the Secret-Protecting cachefrom [27] with secure cache features of the Static-Partitioned cachefrom [20]
that instruction is Low or High based on the code andthis t iming label can be similar to a process ID thatdifferentiates attackerrsquos (Low) instructions from victimrsquos(High) instructions The cache is designed such thatoperations in the High partition cannot affect timing ofoperations in the Low partition For a cache miss due to Lowinstructions when the data is in the High partition it willbehave as a cache miss and the data will be moved from theHigh to the Low partition to preserve consistency HoweverHigh instructions are able to result in a cache hit in bothHigh and Low partitions if the data is already in the cache
SecDCPCache [48] builds on the SecVerilog cache and usespartitioning idea from the original SecVerilog cache but thepartitioning is dynamic It can support at least two securityclasses H (High) and L (Low) and configurations with moresecurity classes are possible They use the percentage ofcache misses for L instructions that was reduced (increased)when Lrsquos partition size was increased (reduced) by onecache way to adjust the number of ways of the cacheassigned to the Low partition When adjusting number ofways in the cache dedicated to each partition if Lrsquos partitionsize decreases the process ID is checked and L blocksare flushed before the way is reallocated to H On theother hand if Lrsquos partition size increases H blocks in theadjusted cache way remain unmodified so as to not addmore performance overhead and they will eventually beevicted by Lrsquos memory accesses However the feature ofnot flushing High partition data during way adjustment mayleak timing information to the attacker
NoMo Cache [12] dynamically partitions the cache waysamong the currently ldquoactiverdquo simultaneous multithreading(SMT) threads Each thread is exclusively reserved Y blocksin each cache set where Y is within the range of [0 N
M]
where N is the number of ways and M is the number ofSMT threads NoMo-0 equals to traditional set associativecache while NoMo- N
M partitions cache evenly for the
different threads and there are no non-reserved ways Thenumber of Y assigned to each thread is adjusted based onits activeness When adjusting number of blocks assigned toa thread Y blocks are invalidated for cache sets to protecttiming leakage Eviction is not allowed within each threadrsquosown reserved ways while it is possible for the shared waysTherefore to avoid eviction caused by the unreserved wayswe assume NoMo- N
M is used to fully partition the cache
When the attacker and the victim share the same librarythere will be a cache hit if accessing the shared data andthe normal cache hit policy holds to guarantee the cachecoherence
SHARP Cache [53] uses both partitioning and randomiza-tion techniques to prevent victimrsquos data from being evicted
J Hardw Syst Secur (2019) 3397ndash425 407
or flushed by other malicious processes and it targets on theinclusive caches Each cache block is augmented with thecore valid bits (CVB) to indicate which private cache (pro-cess) it belongs to (similar to the Process ID) where CVBstores a bitmap and ith bit in the bitmap is set if the lineis present in ith corersquos private cache Cache hit is allowedamong different processesrsquo data When there is cache missand data needs to be evicted data not belonging to any cur-rent processes will be evicted first If there is no such datadata belonging to the same process will be evicted If thereis no existing data in the cache that is in the same processa random data in the cache set will be evicted This randomeviction will generate an interrupt to the OS to notify it ofa suspicious activity For pages that are read-only or exe-cutable SHARP cache disallows flushing using clf lush inuser mode However invalidating victimrsquos blocks by usingcache coherence protocol is still possible
Sanctum Cache [10] focuses on isolation of enclaves(equivalent to Trusted Software Module in other designs)from each other and the operating system (OS) In terms ofcaches they implement security features for L1 cache TLBand LLC Cache isolation of LLC is achieved by assigningeach enclave or OS to different DRAM address regionsIt uses page-coloring-based cache partitioning scheme [2444] and a software security monitor that ensures per-core isolation between OS and enclaves For L1 cacheand TLB when there is a transition between enclave andnon-enclave mode the security monitor will flush thecore-private caches to achieve isolation Normal flushestriggered by the enclave or the OS can only be done withinenclave or not within enclave code Also timing-based side-channel attacks exploiting cache coherence are explicitlynot prevented thus behavior on cache coherence operationsis not defined This cache listed extra software assumptionsas follows
Assumption 1 Software security monitor guarantees thatvictim and attacker process cannot share the same cacheblocks It uses page coloring [24 44] to ensure that victimand attackerrsquos memory is never mapped to the same cacheblocks for the LLCAssumption 2 The software runs on a system with asingle processor core where victim and attacker alternateexecution but can never run truly in parallel Moreoversecurity critical data is always flushed by the securitymonitor when program execution switches away from thevictim program for the L1 cache and the TLB
MI6 Cache [7] is part of the memory hierarchy of theMI6 processor which combines Sanctum [10] cachersquossecurity feature with disabling speculation during thespeculative execution of memory-related operations During
normal processor execution for L1 caches and TLB thecorresponding states will be flushed across context switchesbetween software threads For the LLC set partitioning isused to divide DRAM into contiguous regions And cachesets are guaranteed to be strictly partitioned (two DRAMregions cannot map to the same cache set) Each enclaveis only able to access its own partition Speculation issimply disabled when enclave interacts with the outsideworld because of small performance influence based on therare cases of speculation This cache listed extra softwareassumptions as follows
Assumption 1 Software security monitor guarantees thatthe victim and the attacker process cannot share the samecache blocks It uses page coloring [24 44] to ensure thatvictimrsquos and attackerrsquos memory are never mapped to thesame cache blocks for the LLCAssumption 2 The software runs on a system with asingle processor core where victim and attacker alternateexecution but can never run truly in parallel Moreoversecurity critical data is always flushed by the securitymonitor when program execution switches away from thevictim program for the L1 cache and the TLBAssumption 3 When an enclave is interacting with theoutside environment the corresponding speculation isdisabled by the software
InvisiSpec Cache [52] is able to make speculation invisiblein the data cache hierarchy Before a visibility pointshows up when all of its prior control flow instructionsresolve unsafe speculative loads (USL) will be put intoa speculative buffer (SB) without modifying any cachestates When reaching the visibility point there are twocases In one case the USL and successive instructionswill be possibly squashed because of mismatch of datain the SB and the up-to-date values in the cache Inanother case the core receives possible invalidation fromthe OS before checking of memory consistency modeland no comparison is needed When speculative executionhappens the hardware puts the data into SB as to identifyvisibility point for dealing with final state transition of thespeculative execution InvisiSpec cache targets on Spectre-like attacks and futuristic attacks However InvisiSpeccache is vulnerable to all non-speculative side channels
CATalyst Cache [29] uses partitioning especially CacheAllocation Technology (CAT) [21] available in the LLCof some Intel processors CAT allocates up to 4 differentClasses of Services (CoS) for separate cache ways sothat replacement of cache blocks is only allowed withina certain CoS CATalyst first uses CAT mechanism topartition caches into secure and non-secure parts (non-secure parts may map to 3 CoS while secure parts map
J Hardw Syst Secur (2019) 3397ndash425408
to 1 CoS) Secure pages are assigned to virtual machines(VMs) at a granularity of a page and not shared by morethan one VM Here attacker and victim reside in differentVMs Combined with CAT technology and pseudo-lockingmechanism which pins certain page frames managed bysoftware CATalyst guarantees that malicious code cannotevict secure pages CATalyst implicitly performs preloadingby remapping security-critical code or data to secure pagesFlushes can only be done within each VM And cachecoherence is achieved by assigning secure pages to only oneprocessor and not sharing pages among VMs This cachelisted extra software assumptions as follows
Assumption 1 Security critical data is always preloadedinto the cache at the beginning of the whole programexecutionAssumption 2 Security critical data is always able to fitwithin the secure partition of the cache That is all datain the range x can fit in the secure partitionAssumption 3 The victim and the attacker process cannotshare the same memory spaceAssumption 4 Use pseudo-locking mechanism by soft-ware to make sure that victim and attacker process cannotshare the same cache blocksAssumption 5 Secure pages are reloaded immediatelyafter the flush which is done by the virtual machinemonitor (VMM) to make sure all the secure pages are stillpinned in the secure partition
DAWG Cache [25] (Dynamically Allocated Way Guard)partitions the cache by cache ways and provides fullisolation for hits misses and metadata updates acrossdifferent protection domains (between the attacker and thevictim) DAWG cache is partitioned for the attacker and thevictim and each of them keep their own different domain id(which is similar to process ID used in general caches) Eachdomain id has its own bit fields one is called policy fillmapfor masking fills and selecting the victim to replace anotheris called policy hitmap for masking hit ways Only boththe tag and the domain id are the same will a cache hithappen Therefore DAWG allows read-only cache lines tobe replicated across ways for different protection domainFor a cache miss the victim can only be chosen withinthe ways belonging to the same domain id recorded bythe policy fillmap Consistently the replacement policy isupdated with the victim selection and the metadata derivedfrom the policy fillmap for different domains is updatedas well The paper also proposes the idea to dynamicallypartitions the cache ways following the systemrsquos workloadchanges but does not actually implement it
RIC Cache [22] (Relaxed Inclusion Caches) proposes a low-complexity cache to defend against eviction-based timing-based side-channel attacks on the LLC Normally for an
inclusive cache if the data R is in the LLC it is also inthe higher level cache and eviction of the R in the LLCwill cause the same data in the higher level cache eg L1cache to be invalidated making eviction-based attacks inthe higher level cache possible (eg attacker is able to evictvictimrsquos security critical cache line) For RIC each cacheline is extended with a single bit to set the relaxed inclusionOnce the relaxed inclusion is set for that cache line thecorresponding LLC line eviction will not cause the sameline in the higher-level cache to be invalidated Two kinds ofdata will be set relaxed inclusion bit read only data andthread private data when they are loaded into the cacheThese two kinds of data are claimed by the paper to cover allthe critical data for ciphers Therefore RIC will not preventwritable in-private critical data which is currently not foundin any ciphers Apart from that RIC requires flushing for thecorresponding cache lines in the cases that the RIC bits aremodified or for thread migration events to avoid the timingleakage during transition time
PL Cache [49] provides isolation by partitioning the cachebased on cache blocks It extends each cache block with aprocess ID and a lock status bit The process ID and thelock status bits are controlled by the extended load and storeinstructions (ld lockld unlock and st lockst unlock)which allow the programmer and compiler to set or reset thelock bit through use of the right load or store instructionIn terms of cache replacement policy for a cache hit PLcache will perform the normal cache hit handling procedureand the instructions with locking or unlocking capability canupdate the process ID and the lock status bits while the hit isprocessed When there is a cache miss locked data cannotbe evicted by data that is not locked and locked data amongdifferent processes cannot be evicted by each other In thiscase the new data will be either loaded or stored withoutcaching In other cases data eviction is possible This cachelisted extra software assumption as follows
Assumption 1 Security critical data is always preloadedinto the cache at the beginning of the whole programexecution
RP Cache [49] uses randomization to de-correlate thememory address accessing and timing of the cache Foreach block of RP cache there is a process ID and oneprotection bit P set to indicate if this cache block needs to beprotected or not A permutation table (PT) stores each cachesetrsquos pre-computed permuted set number and the numberof tables depends on number of protected processes Formemory access operations cache hits need both process IDand address to be the same When a cache miss happensto data D of a cache set S if the to-be-evicted data andto-be-brought-in data belong to the same process but have
J Hardw Syst Secur (2019) 3397ndash425 409
different protection bit arbitrary data of a random cache setSprime will be evicted and D will be accessed without cachingIf they belong to different processes D will be stored in anevicted cache block of Sprime and mapping of S and Sprime will beswapped as well Otherwise the normal replacement policyis executed
Newcache Cache [31 50] dynamically randomizesmemory-to-cache mapping It introduced a ReMappingTable (RMT) and the mapping between memory addressesand this RMT is as in a direct mapped cache while themapping between the RMT and actual cache is fully asso-ciative The index bits of memory address are used to lookup entries in the RMT to find the cache block that shouldbe accessed It stores the most useful cache lines rather thanhold a fixed set of cache lines This index stored in RMTcombined with the process ID is used to look up the actualcache where each cache line is associated with its real indexand process ID Each cache block is also associated witha protection bit (P) to indicate if it is security critical Forcache replacement policy it is very similar to RP cacheCache hit needs both process ID and address to be the sameWhen cache miss happens to data D arbitrary data willbe evicted and D will be accessed without caching if theybelong to the same process but either one of their protec-tion bit is set If the evicted data and brought-in data havedifferent process IDs D will randomly replace a cache linesince it is fully associative in the actual cache Otherwisethe normal replacement policy for direct mapped cache isexecuted
Random Fill Cache [30] de-correlates cache fills withthe memory access using random filling technique Newinstructions used by applications in Random Fill cachecan control if the requested data belongs to a normalrequest or a random fill request Cache hits are processedas in normal cache For the security critical data accessesof the victim a Nof ill request is executed and therequested data access will be performed without cachingMeanwhile on a Random Fill request arbitrary datafrom the range of addresses will be brought into the cacheIn the paper [30] the authors show that random fill ofspatially near data does not hurt performance For otherprocessesrsquo memory accesses and normal victimrsquos memoryaccesses Normal request will be used to achieve normalreplacement policy Victim and attacker are able to removevictimrsquos own security critical data including using clf lush
instructions or cache coherence protocol since the flush willnot influence timing-based side-channel attack prevention(the random filling technique is used for this)
CEASER Cache [38] is able to mitigate conflict-basedLLC timingndashbased side-channel attacks using address
encryption and dynamic remapping CEASER cache doesnot differentiate whom the address belongs to and whetherthe address is security critical When memory access triesto modify the cache state the address will first be encryptedusing Low-Latency BlockCipher (LLBC) [6] which notonly randomizes the cache set it maps but also scatters theoriginal possibly ordered and location-intensive addressesto different cache sets decreasing the probability of conflictmisses The encryption and decryption can be done withintwo cycles using LLBC Furthermore the encryption keywill be periodically changed to avoid key reconstructionThe periodic re-keying will cause the address remapping todynamically change
SCATTER Cache [51] uses cache set randomization toprevent timing-based attacks It builds upon two ideas Firsta mapping function is used to translate memory addressand process information to cache set indices the mappingis different for each program or security domain Secondthe mapping function also calculates a different index foreach cache way in a similar way to the skewed associativecaches [41] The mapping function can be keyed hash orkeyed permutation derivation functionmdasha different key isused for different application or security domain resultingin a different mapping from address to cache sets for eachSoftware (eg the operating system) is responsible formanaging the security domains and process IDs which areused to differentiate the different software and assign itdifferent keys for the mapping For the hardware extensiona cryptographic primitive such as hashing and an indexdecoder for each scattered cache way is added SCATTERcache also stores the index bits of the physical addressto efficiently perform lookups and writebacks There isalso one bit per page-table entry added to allow the kernelto communicate with the user space for security domainidentification
Non Deterministic Cache [23] uses cache access delay torandomize the relation between cache block access andcache access timing There is no differentiation of datacaching between different process ID or whether the datais secure or not A per-cache-block counter records theinterval of its data activeness and is increased on eachglobal counter clock tick when the data is untouched Whenthe counter reaches a predefined value the correspondingcache line will be invalidated Non Deterministic Cacherandomly sets the local countersrsquo initial value that is lessthan the maximum value of the global counter In thiscase the cache delay is changed to be randomized Cachedelay interval controlled by this non-deterministic executioncan lead to different cache hit and miss statistics becausethe invalidation is determined by the randomized counterof each cache line and therefore de-correlates any cache
J Hardw Syst Secur (2019) 3397ndash425410
access time from the address being accessed However theperformance degradation is tremendous
5 Analysis of the Secure Caches
In this section we manually evaluate the effectiveness ofthe 18 secure caches [7 10 12 22 23 25 27 29 30 3848ndash53 56 57] We analyze how well the different cachescan protect against the 72 types of vulnerabilities definedin Tables 2 and 3 which cover all the possible Strong(according to the definition in Section 3) cache timingndashbased vulnerabilities Following the analysis discuss whattypes of secure caches and features are best suited fordefending different types of timing-based attacks
51 Effectiveness of the Secure Caches AgainstTiming-Based Attacks
Tables 4 and 5 list the result of our analysis of whichcaches can prevent which types of attacks Some cachesare able to prevent certain vulnerabilities denoted by acheckmark and green color in the table For exampleSPlowast cache can defend against Vu Ad Vu (slow)(one type of Evict + Time [34]) vulnerability For someother caches and vulnerabilities the cache is not able toprevent the vulnerabilities and it is indicated by times and redcolor For example SecDCP cache cannot defend againstVu Va Vu (slow) (one type of Bernsteinrsquos Attack [3])vulnerability
Each cache is analyzed for each type of vulnerabilitylisted in Tables 2 and 3 A cache is judged to be able toprevent a type of cache timingndashbased vulnerability in threecases
1 A cache can prevent a timing attack if the timing ofthe last step in a vulnerability is always constant andthe attacker can never observe fast and slow timingdifference for the given set of three steps For instancein a regular set-associative cache the Vd Vu Aa
(fast) (one type of Flush + Reload [55]) vulnerabilitywill allow the attacker to know that address a mapsto secret u when the attacker observes fast timingcompared with observing slow timing in the other casesHowever in case of the RP cache [49] will make thetiming of the last step to be always slow because RPcache does not allow data of different processes toderive cache hit between each other
2 A cache can prevent a timing attack if the timingof last step is randomized and cannot have originalcorresponding relation between victimrsquos behavior andattackerrsquos observation For instance Ad Vu Ainv
d (fast) (one type of Prime + Probe Invalidation)
vulnerability when executed on a normal set-associativecache will allow the attacker to know that the addressd has the same index with secret u when observing fasttiming compared with slow timing in the other casesHowever when executing this attacks on the RandomFill cache [30] for example a slow timing will notdetermine that u and d have the same index as thesecret since in Random Fill cache u would be accessedwithout caching and another random data would becached instead
3 A cache can prevent a timing attack if it disallowscertain steps from the three-step model to be executedthus prevents the corresponding vulnerability Forinstance when PL cache [49] preloads and locks thesecurity critical data in the cache vulnerabilities suchas Ad Vu V inv
d (slow) (one type of Prime + TimeInvalidation) will not be possible since a preloadedlocked security critical data will not allow Ad in Step 1to replace it In this case Ad cannot be in the cache sothis vulnerability cannot be triggered in PL cache
From the security perspective the entries of the securecache in Tables 4 and 5 should have as many green coloredcells as possible If a cache design has any red cells thenit cannot defend against that type of vulnerabilitymdashattackerusing the timing-based vulnerability that corresponds to thered cell can attack the system
The third column in Tables 4 and 5 shows a normalset associative cache which cannot defend against anytype of timing-based vulnerabilities Meanwhile the lastcolumn of Tables 4 and 5 shows the situation where thecache is fully disabled As is expected the timing-basedvulnerabilities are eliminated and timing-based attacks willnot succeed Disabling caches however has tremendousperformance penalty Similarly second-to-last columnshows Nondeterministic Cache which totally randomizescache access time It can defend all the attacks but againwill have a tremendous cost to security when the applicationis complex
For each of the entry that shows the effectivenessof a secure cache against a vulnerability there are tworesults listed Left one is for normal execution and theright one is for speculative execution Some secure cachessuch as InvisiSpec cache target timing-based channels inspeculative execution For most of the caches that do notdifferentiate speculative execution and normal executionthe two sub-columns for each cache are the same
6 Secure Cache Techniques
Among the secure cache designs presented in the priorsection there are three main techniques that the caches
J Hardw Syst Secur (2019) 3397ndash425 411
Table4
Exi
stin
gse
cure
cach
esrsquo
prot
ectio
nag
ains
tal
lpo
ssib
letim
ing-
base
dvu
lner
abili
ties
with
last
step
tobe
mem
ory
acce
ssre
late
dop
erat
ions
Sin
gle
ina
gree
nce
llm
eans
this
cach
eca
npr
even
tth
eco
rres
pond
ing
vuln
erab
ility
i
na
pink
cell
mea
nsth
isca
che
can
prev
ent
the
corr
espo
ndin
gvu
lner
abili
tyin
som
ede
gree
A
timesin
are
dce
llm
eans
this
cach
eca
nnot
prev
ent
this
vuln
erab
ility
Fur
ther
mor
efo
rea
chca
che
we
anal
yze
norm
alex
ecut
ion
(lef
tcol
umn
unde
rth
eca
che
nam
e)an
dsp
ecul
ativ
eex
ecut
ion
(rig
htco
lum
nun
der
the
cach
ena
me)
a Dyn
amic
adju
stm
ento
fw
ays
for
diff
eren
tthr
eads
isas
sum
edto
bepr
oper
lyus
edac
cord
ing
toth
eru
nnin
gpr
ogra
mrsquos
cach
eus
age
bSo
me
soft
war
eas
sum
ptio
nslis
ted
inth
een
trie
sin
this
colu
mn
have
been
impl
emen
ted
byth
eca
chersquo
sre
late
dso
ftw
are
c Flus
his
disa
bled
but
cach
eco
here
nce
mig
htbe
used
todo
the
data
rem
oval
dFo
rL
1ca
che
and
TL
Bf
lush
ing
isdo
nedu
ring
cont
exts
witc
he T
hete
chni
ques
are
impl
emen
ted
inL
1ca
che
TL
Ban
dla
st-l
evel
cach
ew
hich
cons
isto
fth
ew
hole
cach
ehi
erar
chy
whe
reL
1ca
che
and
TL
Bre
quir
eso
ftw
are
flus
hpr
otec
tion
and
the
last
-lev
elca
che
can
beac
hiev
edby
sim
ple
hard
war
epa
rtiti
onin
gTo
prot
ecta
llle
vels
ofca
ches
the
soft
war
eas
sum
ptio
nsne
edto
bead
ded
f The
tech
niqu
eis
now
only
impl
emen
ted
inla
st-l
evel
cach
egT
hete
chni
que
now
only
targ
ets
shar
edca
che
hT
hete
chni
que
only
targ
ets
incl
usio
nla
st-l
evel
cach
ei T
hete
chni
que
targ
ets
data
cach
ehi
erar
chy
j For
the
last
-lev
elca
che
cach
eis
part
ition
edbe
twee
nth
evi
ctim
and
the
atta
cker
kT
hete
chni
que
can
cont
rolt
hepr
obab
ilitie
sof
the
vuln
erab
ility
tobe
succ
essf
ulto
beex
trem
ely
smal
ll T
hete
chni
que
can
wor
kin
shar
edr
ead
only
mem
ory
whi
leno
twor
king
insh
ared
wri
tabl
em
emor
ym
Ran
dom
dela
ybu
tnot
rand
omm
appi
ngca
non
lyde
crea
seth
epr
obab
ilitie
sof
atta
cker
inso
me
limite
dde
gree
J Hardw Syst Secur (2019) 3397ndash425412
Table5
Exi
stin
gse
cure
cach
esrsquop
rote
ctio
nag
ains
tall
poss
ible
timin
g-ba
sed
vuln
erab
ilitie
sw
ithla
stst
epto
bein
valid
atio
nre
late
dop
erat
ions
Sin
gle
ina
gree
nce
llm
eans
this
cach
eca
npr
even
tth
eco
rres
pond
ing
vuln
erab
ility
in
api
nkce
llm
eans
this
cach
eca
npr
even
tthe
corr
espo
ndin
gvu
lner
abili
tyin
som
ede
gree
Atimes
ina
red
cell
mea
nsth
isca
che
cann
otpr
even
tthi
svu
lner
abili
ty
Furt
herm
ore
for
each
cach
ew
ean
alyz
eno
rmal
exec
utio
n(l
eftc
olum
nun
der
the
cach
ena
me)
and
spec
ulat
ive
exec
utio
n(r
ight
colu
mn
unde
rth
eca
che
nam
e)
a Dyn
amic
adju
stm
ento
fw
ays
for
diff
eren
tthr
eads
isas
sum
edto
bepr
oper
lyus
edac
cord
ing
toth
eru
nnin
gpr
ogra
mrsquos
cach
eus
age
bSo
me
soft
war
eas
sum
ptio
nslis
ted
inth
een
trie
sin
this
colu
mn
have
been
impl
emen
ted
byth
eca
chersquo
sre
late
dso
ftw
are
c Flus
his
disa
bled
but
cach
eco
here
nce
mig
htbe
used
todo
the
data
rem
oval
dFo
rL
1ca
che
and
TL
Bf
lush
ing
isdo
nedu
ring
cont
exts
witc
he T
hete
chni
ques
are
impl
emen
ted
inL
1ca
che
TL
Ban
dla
st-l
evel
cach
ew
hich
cons
isto
fth
ew
hole
cach
ehi
erar
chy
whe
reL
1ca
che
and
TL
Bre
quir
eso
ftw
are
flus
hpr
otec
tion
and
the
last
-lev
elca
che
can
beac
hiev
edby
sim
ple
hard
war
epa
rtiti
onin
gTo
prot
ecta
llle
vels
ofca
ches
the
soft
war
eas
sum
ptio
nsne
edto
bead
ded
f The
tech
niqu
eis
now
only
impl
emen
ted
inla
st-l
evel
cach
egT
hete
chni
que
now
only
targ
ets
shar
edca
che
hT
hete
chni
que
only
targ
ets
incl
usio
nla
st-l
evel
cach
ei T
hete
chni
que
targ
ets
data
cach
ehi
erar
chy
j For
the
last
-lev
elca
che
cach
eis
part
ition
edbe
twee
nth
evi
ctim
and
the
atta
cker
kT
hete
chni
que
can
cont
rolt
hepr
obab
ilitie
sof
the
vuln
erab
ility
tobe
succ
essf
ulto
beex
trem
ely
smal
ll T
hete
chni
que
can
wor
kin
shar
edr
ead
only
mem
ory
whi
leno
twor
king
insh
ared
wri
tabl
em
emor
ym
Ran
dom
dela
ybu
tnot
rand
omm
appi
ngca
non
lyde
crea
seth
epr
obab
ilitie
sof
atta
cker
inso
me
limite
dde
gree
J Hardw Syst Secur (2019) 3397ndash425 413
utilize differentiating sensitive data partitioning andrandomization
Differentiating Sensitive Data (columns for CATalyst cacheto columns for Random Fill cache in Tables 4 and 5) allowsthe victim or attacker software or management software toexplicitly label a certain range of the data of victim whichthey think is sensitive The victim process or managementsoftware is able to use cache-specific instructions to protectthe data and limit internal interference between victimrsquosown data For example it is possible to disable victimrsquosown flushing of victimrsquos labeled data and therefore preventvulnerabilities that leverage flushing This technique allowsthe designer to have stronger control over security criticaldata rather than forcing the system to assume all of victimrsquosdata is sensitive However how to identify sensitive dataand whether this identification process is reliable are openresearch questions for caches that support differentiation ofsensitive data
This technique is independent of whether a cache usespartitioning or randomization techniques to eliminate sidechannels between the attacker and the victim Caches thatare able to label and identify sensitive data have theadvantage in preventing internal interference since they areable to differentiate sensitive data from the normal dataand can make use of special instructions to give moreprivileges to sensitive data However it requires careful usewhen identifying the actual sensitive data and implementingcorresponding security features on the cache
Comparing PL cache with SPlowast cache although bothof them use partitioning flush is able to be implementedto be disabled for victimrsquos sensitive data in PL cachewhere Vu V inv
a Vu (slow) (one type of Flush +Time) is prevented Newcache is able to prevent Vu Va Vu (slow) (one type of Bernsteinrsquos Attack [3])while most of the caches without ability to differentiatesensitive data cannot because Newcache disallows replacingdata as long as either data to be evicted or data to becached is identified to be sensitive However permittingdifferentiation of sensitive data can potentially backfire onthe cache itself For example Random Fill cache cannotprevent Vu Ad Vu (slow) (one type of Evict +Time [34]) which most of the other caches can prevent oravoid because the random fill technique loses its intendedrandom behavior when the security critical data is initiallyloaded into the cache in Step 1
Partitioning-Based Caches usually limit the victim and theattacker to be able to only access a limited set of cacheblock (columns for SPlowast cache to column for PL cachein Tables 4 and 5) For example either there is staticor dynamic partitioning of caches which allocates someblocks to High victim and Low attacker The partitioning
can be based not just on whether the memory access isvictimrsquos or attackerrsquos but also on where the access is to(eg High partition is determined by the data address) Forspeculative execution attackerrsquos code can be the part ofspeculation or out-of-order load or store which is able to bepartitioned (eg using speculative load buffer) from othernormal operations The partitioning granularity can be cachesets cache lines or cache ways Partitioning-based securecaches are usually able to prevent external interference bypartitioning but are weak at preventing internal interferenceWhen partitioning is used interference between the attackerand the victim or data belonging to different securitylevels should not be possible and attacks based on externalinterference between the victim and the attacker will failHowever the internal interference of victimrsquos own datais hard to be prevented by the partitioning-based cachesWhatrsquos more partitioning is recognized to be wastefulin terms of cache space and inherently degrades systemperformance [49] Dynamic partitioning can help limit thenegative performance and space impacts but it could be ata cost of revealing some information when adjusting thepartitioning size for each part It also does not help withinternal interference prevention
In terms of the three-step model the partitioning-basedcaches excel at making use of partitioning techniques todisallow the attacker to set initial states (Step 0) of victimpartition by use of flushing or eviction and therefore bringuncertainty to the final timing observation made by theattacker
SPlowast cache can prevent external miss-based interferencebut it still allows the victim and the attacker to getcache hits due to each otherrsquos data which makes hit-based vulnerabilities happen eg Vd Vu Va (fast)(one type of Cache Internal Collision [5]) vulnerabilityis one of the examples that SPlowast cache cannot preventSecVerilog cache is similar to SPlowast cache but prevents theattacker from directly getting cache hit due to victimrsquos datafor confidentiality and therefore prevents vulnerabilitiessuch as Ainv
a Vu Aa (fast) (one type of Flush+ Reload [55]) SHARP cache mainly uses partitioningcombined with random eviction to minimize the probabilityof evicting victimrsquos data and prevent external miss-basedvulnerabilities It is vulnerable to hit-based or internalinterference vulnerabilities such as Vu Va Vu (slow)(one type of Bernsteinrsquos Attack [3]) vulnerability DAWGcache will only allow the data to get a cache hit if bothits address and the process ID are the same Thereforecompared with normal partitioning cache such as SPlowast cacheit is able to prevent vulnerabilities such as Vd Vu Ainv
d
(fast) (one type of Prime + Flush)SecDCP and NoMo cache both leverage dynamic
partitioning to improve performance Compared withSecVerilog cache SecDCP cache introduces certain side
J Hardw Syst Secur (2019) 3397ndash425414
channels which manifest themselves when the number ofways assigned to the victim and attacker changes egVu Ainv
a Vu (slow) (one type of Flush + Time)vulnerability NoMo cache behaves more carefully whenchanging the number of ways during dynamic partitioninghowever it requires victimrsquos sensitive data to fit intothe assigned partitions otherwise it will be put into theunreserved way and allow eviction by the attacker SecDCPdoes not have unreserved way All the space in the cachewill be either belongs to High or Low partition
Sanctum cache and CATalyst cache are both controlledby a powerful software monitor and they disallow securepage sharing between victim and attacker to preventvulnerabilities such as Ad Vu Aa (fast) (onetype of Flush + Reload [55]) Sanctum cache does notconsider internal interference while CATalyst cache ismore carefully designed to prevent different vulnerabilitieswith the implemented software system so far supportingpreventing all of the vulnerabilities but only works forLLC and with high software implementation complexityand some assumptions that might be hard to achievein other scenarios eg assuming the secure partition isbig enough to fit all the secure data MI6 cache is thecombination of Sanctum and disabling speculation wheninteracting with the outside world Therefore in normalexecution it behaves the same as Sanctum For speculativeexecution because it will simply disable all the speculationwhen involving the outside world the external interferencevulnerability such as Vd Vu Ad (slow) (one type ofEvict + Probe) vulnerability will be prevented
InvisiSpec cache does not modify the original cachestate but places the data in a speculative buffer partitionduring the speculation or out-of-order load or store Sinceduring speculation cache state is not actually updated thespeculative execution cannot trigger any of the steps inthe three-step model RIC cache focuses on eviction basedattack and therefore are good at preventing even someinternal miss-based vulnerability such as Vu Va Vu
(slow) (one type of Bernsteinrsquos Attack [3]) but are bad atall hit-based vulnerabilities PL cache is line-partitionedand uses locking techniques for victimrsquos security criticaldata It can prevent many vulnerabilities because preloadingand locking secure data disallow the attacker or non-securevictim data to set initial states (Step 0) for victim partitionand therefore brings uncertainty to the final observation bythe attacker eg Ad Vu Va (fast) (one type of CacheInternal Collision [5]) vulnerability is prevented
Randomization-Based Caches (columns for SHARP cacheand columns for RP cache to columns for Non Deterministiccache in Tables 4 and 5) inherently de-correlate therelationship between information of victimrsquos securitycritical datarsquos address and observed timing from cache
hit or miss or between the address and observed timingof flush or cache coherence operations For speculativeexecution they also de-correlate the relationship betweenthe address of the data being accessed during speculativeexecution or out-of-order load or store and the observedtiming from a cache hit or miss Randomization can beused when bringing data into the cache evicting data orboth Some designs randomize the address to cache setmapping As a result of the randomization the mutualinformation from the observed timing due to having ornot having data in the cache could be reduced to 0 ifrandomization is done on every memory access Somesecure caches use randomization to avoid many of the miss-based internal interference vulnerabilities However theymay still suffer from hit-based vulnerabilities especiallywhen the vulnerabilities are related to internal interferenceHowever randomization is also likewise recognized toincrease performance overheads [23] It also requires afast and secure random number generator Most of therandomization is cache-line-based and can be combinedwith differentiation of sensitive data to be more efficient
RP cache allows eviction between different sensitivedata which leaves vulnerabilities such as Vu Va Vu
(slow) (one type of Bernsteinrsquos Attack [3]) still possiblewhile Newcache prevents this Both of the RP cacheand Newcache are not able to prevent hit-based internal-interference vulnerabilities such as Ainv
a Vu Va
(fast) (one type of Cache Internal Collision [5]) RandomFill cache is able to use total de-correlation of memoryaccess and cache access of victimrsquos security critical datato prevent most of the internal and external interferenceHowever when security critical data is initially directlyloaded into the cache block for Step 1 Random Fill cachewill not randomly load security critical data and allowsvulnerabilities such as Vu V inv
a Vu (slow) (one typeof Flush + Time) vulnerability to exist CEASER cache usesencryption scheme plus dynamic remapping to randomizemapping from memory addresses to cache sets Howeverthis targets eviction-based attacks and cannot prevent hit-based vulnerabilities such as Va V inv
u V inva (fast)
(one type of Flush + Probe Invalidation) SCATTER cacheencrypts both the cache address and process ID whenmapping into different cache index to further prevent morehit-based vulnerabilities for shared and read only memoryNon Deterministic Cache totally randomizes timing ofcache accesses by adding delays and can prevent all attacks(but at tremendous performance cost)
61 Estimated Performance and Security Tradeoffs
Table 6 shows the implementation and performanceresults of the secure caches as listed by the designersin the different papers At the extreme end there is
J Hardw Syst Secur (2019) 3397ndash425 415
Table6
Exi
stin
gse
cure
cach
esrsquo
impl
emen
tatio
nm
etho
dpe
rfor
man
cep
ower
and
area
sum
mar
y
J Hardw Syst Secur (2019) 3397ndash425416
the Non Deterministic cache with random delay thesecure cache can prevent all the cache timingndashbasedvulnerabilities in some degreemdashwhile their paper reportsonly 7 degradation in performance we expect it tobe much more for more complex application than AESalgorithm Disabling caches eliminates the attacks but ata huge performance cost Normally a secure cache needsto sacrifice some performance in order to de-correlatememory access with the timing The secure caches thattend to be able to prevent more vulnerabilities usuallyhave weaker performance compared with other securecaches For example more security seems to imply lessperformance
62 Towards Ideal Secure Cache
Based on the above analysis a good secure cache shouldconsider all the 72 types of Strong vulnerabilities egexternal and internal interference and hit-based and miss-based vulnerabilities Considering all factors and basedon Tables 4 and 5 we have several suggestions andobservations for a secure cache design which can defendtiming-based attacks
bull Internal interference is important for caches to preventtiming-based attacks and is the weak point of most ofthe secure caches To prevent this the following threesubpoints should be considered
ndash Miss-based internal interference can be solvedby randomly evicting data to de-correlatememory access with timing information wheneither data to be evicted or data to be cachedis sensitive eg Newcache prevents Vu Va Vu (slow) (one type of BernsteinrsquosAttack [3]) vulnerability
ndash Hit-based internal interference can be solvedby randomly bringing data into the cache egRandom Fill cache prevents Ad Vu Va
(fast) (Cache Internal Collision) vulnerabilityndash To limit internal interference at lower perfor-
mance cost rather than simply assume all ofvictimrsquos data is sensitive it is better to differ-entiate real sensitive data from other data in thevictim code However identification of sen-sitive information needs to be carefully usedeg Random Fill cache is vulnerable to Vu Ad Vu (fast) (one type of Evict + Time [34])vulnerability which most of the secure cachesare able to prevent
bull Direct partitioning between the victim and the attackeralthough may hurt cache space utilization or perfor-mance is good at disallowing attacker to set known
initial state to victimrsquos partition and therefore pre-vents external interference Alternatively careful use ofrandomization can also prevent external interference
It should be noted that some cache designs only focuson certain levels eg CATalyst cache only works at thelast level cache In order to fully protect the whole cachesystem from timing-based attacks all levels of caches in thehierarchy should be protected with related security featuresFor example Sanctum is able to prevent all levels of cachesfrom L1 to last-level cache Consequently secure cachedesign needs to be realizable at all levels of the cachehierarchy
7 RelatedWork
There are a lot of existing attacks exploring timing-based cache channels eg [1 3 5 16ndash19 34 36 5455] Furthermore our recent paper [11] has summarizedcache timingndashbased side-channel vulnerabilities using athree-step model and inspired this work on checkingwhich vulnerability types are truly defeated by the securecaches in context of timing-based attacks In other workZhang and Lee [59] used finite-state machine to modelcache architectures and leveraged mutual information tomeasure potential side-channel leakage of the modeledcache architectures Meanwhile He and Lee [20] modeledinterference using probabilistic information flow graph andused attackerrsquos success probability to estimate differentcachesrsquo ability to defend against some cache timingndashbasedside-channel attacks However they did not explore allpossible vulnerabilities due to cache timingndashbased channels
There is also some other work focusing on cacheside-channel verification [13 14 47] Among theseCacheAudit [14] efficiently computes possible side-channelobservations using abstractions in a modular way Bit-level and arithmetic reasoning is used in [13] for memoryaccesses in the presence of dynamic memory allocationCacheD [47] detects potential cache differences at eachprogram point leveraging symbolic execution and constraintsolving
Hardware transactional memory has also been lever-aged to prevent timing-based cache side-channel attacks [815] Hardware transactional memory (HTM) is availableon modern commercial processors such as Intelrsquos Transac-tional Synchronization Extensions (TSX) Its main featureis to abort the transaction and roll back the modificationswhenever a cache block contained in the read set or writeset is evicted out of the cache In [15] HTM was com-bined with preloading strategy for code and data to preventFlush + Reload attacks in the local setting and Prime andProbe attacks in the cloud setting In [8] the software-level
J Hardw Syst Secur (2019) 3397ndash425 417
solution targets system calls page faults code refactoringand abort reasoning to eliminate not only Prime + ProbeFlush + Reload but also Evict + time and Cache Collisionattacks
8 Conclusion
This paper first proposed a new three-step model in orderto model all possible cache timing vulnerabilities It furtherprovided a cache three-step simulator and reduction rulesto derive effective vulnerabilities allowing us to find onesthat have not been exploited in literature With exhaustiveeffective vulnerability types listed this paper presentedanalysis of 18 secure processor cache designs with respectto how well they can defend against these timing-basedvulnerabilities Our work showed that vulnerabilities basedon internal interference of the victim application aredifficult to protect against and many secure cache designsfail in this We also provided a summary of secure processorcache features that could be integrated to make an idealsecure cache that is able to defend timing-based attacksOverall implementing a secure cache in a processor can be aviable alternative to defend timing-based attacks Howeverit requires design of an ideal secure cache or correction ofexisting secure cache designs to eliminate the few attacksthat they do not protect against
Acknowledgments This work was support in part by NSF grants1651945 and 1813797 and through Semiconductor Research Corpo-ration (SRC) under GRC Task 2844001
Appendix A Attack Strategies Descriptions
This appendix gives overview of the attack strategies shownin Tables 2 and 3 in Section 3 For each attack strategy anoverview of the three steps of the strategy is given Some ofthe strategies are similar and some may not be precise butwe keep and use the original names as they were assignedin prior work One advantage of our three-step model isthat it gives precise definition of each attack Neverthelessthe attack strategy names used before (and added by us forstrategies which did not have such names) may be useful torecall the attacksrsquo high-level operation
Cache Internal Collision In Step 1 cache blockrsquos data isinvalidated by flushing or eviction done by either theattacker or the victim Then the victim accesses secret datain Step 2 Finally the victim accesses data at a knownaddress in Step 3 if there is a cache hit then it reveals thatthere is an internal collision and leaks value of u
Flush + Reload In Step 1 either the attacker or the victiminvalidates the cache blockrsquos data by flushing or evictionThen the victim access secret data in Step 2 Finally theattacker tries to access some data in Step 2 using a knownaddress If a cache hit is observed then addresses from lasttwo steps are the same and the attacker learns the secretaddress This strategy has similar Step 1 and Step 2 asCache Internal Collision vulnerability but for Step 3 it isthe attacker who does the reload access
Reload + Time (New Name Assigned in this Paper) In Step
1 secret data is invalidated by the victim Then the attackerdoes some known data access in Step 2 that could possiblybring back the invalidated the victimrsquos secret data in Step
1 In Step 3 if the victim reloads the secret data a cachehit is observed and the attacker can derive the secret datarsquosaddress
Flush + Probe (New Name Assigned in this Paper) In Step
1 the victim or the attacker access some known addressIn Step 2 the victim invalidates secret data In Step 3reloading of Step 1rsquos data and observation of a cache misswill help the attacker learn that the secret data maps to theknown address from Step 1
Evict + Time In Step 1 some victimrsquos secret data is put intothe cache by the victim itself In Step 2 the attacker evicts aspecific cache set by performing a memory related operationthat is not a flush In Step 3 the victim reloads secret dataand if a cache miss is observed the will learn the secretdatarsquos cache set information This attack has similar Step 1and Step 3 as Flush + Time vulnerability but for Step 2 inEvict + Time the attacker invalidates some known addressallowing it to find the full address of the secret data insteadof evicting a cache set to only find the secret datarsquos cacheindex as in the Flush + Time attack
Prime + Probe In Step 1 the attacker primes the cache setusing data at address known to the attacker In Step 2 thevictim accesses the secret data which possibly evicts datafrom Step 1 In Step 3 the attacker probes each cache setand if a cache miss is observed the attacker knowns thesecret data maps to the cache set he or she primed
Bernsteinrsquos Attack This attack strategy leverages the vic-timrsquos internal interference to trigger the miss-based attackFor one case the victim does the same secret data access inStep 1 and Step 3 while in Step 2 the victim tries to evictone whole cache setrsquos data by known data accesses If cachemiss is observed in Step 3 that will tell the attacker thecache set is the one secret data maps to For another casethe victim primes and probe a cache set in Step 1 and Step
3 driven by the attacker while in Step 2 the victim tries to
J Hardw Syst Secur (2019) 3397ndash425418
access the secret data Similar to the first case observingcache miss in Step 3 tells the attacker the cache set is theone secret data maps to
Evict + Probe (NewName Assigned in this Paper) In Step 1victim evict the cache set using the access to a data at anaddress known to the attacker In Step 2 the victim accessessecret data which possibly evicts data from Step 1 In Step
3 the attacker probes each cache set using the same data asin Step 1 if a cache miss is observed the attacker knownsthe secret data maps to the cache set he or she primed Thisattack strategy has similar Step 2 and Step 3 as Prime +Probe attack but for Step 1 it is the victim that does theeviction accesses
Prime + Time (NewName Assigned in this Paper) In Step 1the attacker primes the cache set using access to data at anaddress known to the attacker In Step 2 the victim accessessecret data which possibly evicts data from Step 1 In Step
3 the victim probes each cache set using the same data Step
1 if a cache miss is observed the attacker knowns the secretdata maps to the cache set he or she primed in Step 1 Thisattack strategy has similar Step 1 and Step 2 as Prime +Probe attack but for Step 3 it is the victim that does theprobing accesses
Flush + Time (NewNameAssigned in this Paper) The victimaccesses the same secret data in Step 1 and Step 3 whilein Step 2 the attacker tries to invalidate data at a knownaddress If cache miss is observed in Step 3 that will tellthe attacker the data address he or she invalidated in Step 2maps to the secret data
Invalidation related (new names assigned in this paper)Vulnerabilities that have names ending with ldquoinvalidationrdquoin Table 3 correspond to the vulnerabilities that havethe same name (except for the ldquoinvalidationrdquo part) inTable 2 The difference between each set of correspondingvulnerabilities is that the vulnerabilities ending withldquoinvalidationrdquo use invalidation related operation in the laststep to derive the timing information rather than the normalmemory access related operations
Appendix B Soundness Analysisof the Three-StepModel
In this section we analyze the soundness of the three-stepmodel to demonstrate that the three-step model can coverall possible timing-based cache vulnerabilities in normalcaches If there is a vulnerability that is represented usingmore than three steps the steps can be reduced to only three
steps or a three-step sub-pattern can be found in the longerrepresentation
In the below analysis we use β to denote the numberof memory-related operations ie steps in a representationof a vulnerability We show that β = 1 is not sufficient torepresent a vulnerability β = 2 covers some vulnerabilitiesbut not all β = 3 represents all the vulnerabilities andβ gt 3 can be reduced to only three steps or a three-step sub-pattern can be found in the longer representation Knownaddresses refer to all the cache states that interferencewith the data a aalias and d Unknown address refersu An access to a known memory address is denoted asknown access operation and an invalidation of a knownmemory address is denoted as known inv operationThe known access operation and known inv operation
together make up not u operations An unknown memoryrelated operation (containing u) is denoted as u operation
B1 Patterns with β = 1
When β = 1 there is only one memory-relatedoperation and it is not possible to create interferencebetween memory-related operations since two memory-related operations are the minimum requirement for aninterference Furthermore β = 1 corresponds to the three-step pattern with both Step 1 and Step 2 being since thecache state gives no information and Step 3 being theone operation These types of patterns are all examined bythe cache three-step simulator and none of these types arefound to be effective Consequently a vulnerability cannotexit when β = 1
B2 Patterns with β = 2
When β = 2 it satisfies the minimum requirementof an interference for memory related operations andcorresponds to the three-step cases where Step 1 is and Step 2 and Step 3 are the two operations Thesetypes are all examined by the cache three-step simulatorand some of them belong to Weak Vulnerabilities like Aa Vu Therefore three-step cases where Step
1 is have corresponding effective vulnerabilities shownin Table 2 Consequently β = 2 can represent someweak vulnerabilities but not all vulnerabilities as there existsome that are represented with three steps as discussednext
B3 Patterns with β = 3
When β = 3 we have tested all possible combinations ofthree-step memory related operations in Section 33 usingour cache simulator for the three-step model We foundthat there are in total 72 types of Strong Vulnerabilities
J Hardw Syst Secur (2019) 3397ndash425 419
and 64 types of Weak Vulnerabilities that are representedby patterns with β = 3 steps Consequently β = 3can represent all the vulnerabilities (including some weakones where Step 1 is ) Using more steps to representvulnerabilities is not necessary as discussed next
B4 Patterns with β gt 3
When β gt 3 the pattern of memory-related operations fora vulnerability can be reduced using the following rules
B41 Subdivision Rules
First a set of subdivision rules is used to divide the longpattern into shorter patterns following the below rulesEach subdivision rule should be applied recursively beforeapplying the next rule
Subdivision Rule 1 If the longer pattern contains a sub-pattern such as the longer pattern canbe divided into two separate patterns where is assignedas Step 1 of the second pattern This is because givesno timing information and the attacker loses track of thecache state after This rule should be recursively applieduntil there are no sub-patterns left with a in the middleor as last step ( in the last step will be deleted) in thelonger pattern
Subdivision Rule 2 Next if a pattern (derived afterrecursive application of the Rule 1 contains a sub-patternsuch as AinvVinv the longer pattern canbe divided into two separate patterns where AinvVinv isassigned as Step 1 of the second pattern This is becauseAinvVinv will flush all the timing information of thecurrent block and it can be used as the flushing step forStep 1 eg vulnerability Ainv Vu Aa(f ast)shown in Table 2 AinvVinv cannot be a candidate formiddle steps or the last step because it will flush alltiming information making the attacker unable to deducethe final timing with victimrsquos sensitive address translationinformation This rule should be recursively applied untilthere are no sub-patterns left with a AinvVinv in themiddle or the last step (AinvVinv in the last step will bedeleted)
B42 Simplification Rules
For each of the patterns resulting from the subdivisionof the original pattern we define Commute Rules UnionRules and Reduction Rules for a each set of two adjacentsteps in these remaining patterns In Table 7 we showall the possible cases of the rule applying conditions for
each adjacent two steps regardless of the attackerrsquos access(A) or the victimrsquos access (V ) The table shows whetherthe corresponding two steps can be commuted reduced orunioned (and the reduced or the unioned result if the rulescan be applied)
B421 Commute Rules
Suppose there are two adjacent steps M and N for a memorysequences M N If commuting M andN lead to the same observation result ie M N and N M will have thesame timing observation information in the final step for theattacker we can freely exchange the place of M and N inthis pattern In this case we have more chance to Reduceand Union the steps within the memory sequence by thefollowing Rules In the possible commuting process we willtry every possible combinations to commute different pairsof two steps that are able to apply the Commute Rules andthen further apply Reduce Rules and Union Rules to seewhether the commute is effective ie there can be stepsreduced or unioned after the proper commuting process Thefollowing two adjacent memory-related operations can becommuted
ndash Commute Rule 1 For two adjacent steps if one stepis a known access operation and another step is aknown inv operation and the addresses they referto are different these two steps can be commuted nomatter which position of the two steps they are in withinthe whole memory sequence It will show a ldquoyesrdquo forthe corresponding two-step pattern for the CommuteRule 1 column if these two can be commuted in Table 7
ndash Commute Rule 2 A superset of two-step patterns thatcan apply Commute Rule 1 can be commuted if thesecond step of these two adjacent steps is not thelast step in the whole memory sequence There aresome two adjacent steps that can only be commutedif the second step of these two adjacent steps is notthe last step in the whole memory sequence Therewill be a ldquoyesrdquo for the corresponding two-step patternfor the Commute Rule 2 column and a ldquonordquo for thecorresponding two-step pattern for the Commute Rule 1column in Table 7
B422 Reduction Rules
If the memory sequence after applying Commute Ruleshave a sub-pattern that has two adjacent steps both relatedto known addresses or both related to unknown address(including repeating states) the two adjacent steps can bereduced to only one following the reduction rules (if thetwo-step pattern has ldquoyesrdquo for the Column ldquoUnion Rule or
J Hardw Syst Secur (2019) 3397ndash425420
Table7
Rul
esfo
rco
mbi
ning
two
adja
cent
step
s
Firs
tSe
cond
Com
mute
Com
mute
Union
Rule
orC
ombi
ned
Firs
tSe
cond
Com
mute
Com
mute
Union
Rule
orC
ombi
ned
Rule1
Rule2
ReduceRule
Step
Rule1
Rule2
ReduceRule
Step
aa
yes
yes
yes
aa
inv
ano
noye
sa
aa
ali
as
nono
yes
aali
as
ain
va
ali
as
yes
yes
yes
aali
as
ad
nono
yes
da
inv
dye
sye
sye
sd
au
nono
nominus
ain
vu
nono
nominus
aa
inv
nono
yes
ain
va
inv
ain
vye
sye
sye
sa
inv
aa
ali
asin
vye
sye
sye
sa
ain
va
ali
asin
vye
sye
sye
sU
nio
n(a
inva
ali
asin
v)
ad
inv
yes
yes
yes
aa
inv
din
vye
sye
sye
sU
nio
n(a
invd
inv)
au
inv
nono
nominus
ain
vu
inv
noye
sno
minusa
ali
as
ano
noye
sa
aali
asin
va
yes
yes
yes
a
aali
as
aali
as
yes
yes
yes
aali
as
aali
asin
va
ali
as
nono
yes
aali
as
aali
as
dno
noye
sd
aali
asin
vd
yes
yes
yes
d
aali
as
uno
nono
minusa
ali
asin
vu
nono
nominus
aali
as
ain
vye
sye
sye
sa
ali
as
aali
asin
va
inv
yes
yes
yes
Unio
n(a
inva
ali
asin
v)
aali
as
aali
asin
vno
noye
sa
ali
asin
va
ali
asin
va
ali
asin
vye
sye
sye
sa
ali
asin
v
aali
as
din
vye
sye
sye
sa
ali
as
aali
asin
vd
inv
yes
yes
yes
Unio
n(d
inva
ali
asin
v)
aali
as
uin
vno
nono
minusa
ali
asin
vu
inv
noye
sno
minusd
ano
noye
sa
din
va
yes
yes
yes
a
da
ali
as
nono
yes
aali
as
din
va
ali
as
yes
yes
yes
aali
as
dd
yes
yes
yes
dd
inv
dno
noye
sd
du
nono
nominus
din
vu
noye
sno
minusd
ain
vye
sye
sye
sd
din
va
inv
yes
yes
yes
Unio
n(a
invd
inv)
da
ali
asin
vye
sye
sye
sd
din
va
ali
asin
vye
sye
sye
sU
nio
n(d
inva
ali
asin
v)
dd
inv
nono
yes
din
vd
inv
din
vye
sye
sye
sd
inv
du
inv
noye
sno
minusd
inv
uin
vno
yes
nominus
ua
nono
nominus
uin
va
nono
nominus
ua
ali
as
nono
nominus
uin
va
ali
as
nono
nominus
ud
nono
nominus
uin
vd
noye
sno
minusu
uye
sye
sye
su
uin
vu
nono
yes
u
ua
inv
nono
nominus
uin
va
inv
noye
sno
minusu
aali
asin
vno
nono
minusu
inv
aali
asin
vno
yes
nominus
ud
inv
noye
sno
minusu
inv
din
vno
yes
nominus
uu
inv
nono
yes
uin
vu
inv
uin
vye
sye
sye
su
inv
J Hardw Syst Secur (2019) 3397ndash425 421
Reduce Rulerdquo and has no Union result for the ldquoCombinedSteprdquo Column in Table 7
ndash Reduction Rule 1 For two u operations although u isunknown both of the accesses target on the same u socan be reduced to only keep the second access in thememory sequence
ndash Reduction Rule 2 For two known adjacent memoryaccessndashrelated operations (known access operation)they always result in a deterministic state of the secondmemory accessndashrelated cache block so these two stepscan be reduced to only one step
ndash Reduction Rule 3 For two adjacent steps if onestep is known access operation and another one isknown inv operation no matter what order theyhave and the address they refer to is the same these twocan be reduced to one step which is the second step
B423 Union Rules
Suppose there are two adjacent steps M and N for memorysequences M N If combining M andN leads to the same timing observation result ie M N and Union(M N) will havethe same timing observation information in the final step forthe attacker we can combine step M and N to be a joint onestep for this memory sequence defined as Union(M N)Two adjacent steps that can be combined are discussed inthe following cases
ndash Union Rule 1 Two invalidations to two knowndifferent memory addresses can be applied UnionRule 1 known inv operation are two operations bothinvalidating some known address therefore they canbe combined to only one step The Union Rule can becontinuously done to union all the adjacent invalidationstep that invalidates known different memory addresses
B424 Final Check Rules
Each long memory sequence will recursively apply thesethree categorizations of the rules in the order Com-mute Rules first to put known access operations andknown inv operation that targets the same address as nearas possible and u operations and not u operations areputting together as much as possible The Reduced Rulesare then checked and applied to the processed memorysequence to reduce the steps Then the Union Rule isapplied to the processed memory sequence
The recursion at each application to these threecategorizations of the rules should be always applied andreduce at least one step until the resulting sequence matchesone of the two possible cases
bull the long (β gt 3) memory sequence with u operation
and not u operation is further reduced to a sequencewhere there are at most three steps in the followingpatterns or less
ndash u operation not u operation u operation
ndash not u operation u operation not u operation
There might be possible extra or AinvV inv beforethese three-step pattern where
bull An extra in the first step will not influencethe result and can be directly removed
bull If an extra AinvV inv in the first step
ndash If followed by known access ope-ration AinvV inv can be removeddue to the actual state further put intothe cache block
ndash If followed by known inv opera-t ion or V inv
u AinvV inv can also beremoved since the memory locationis repeatedly flushed by the two steps
ndash If followed by Vu worst case will beAinvV inv Vu not u operation u operationwhich is either an effective vul-nerability according to Table 2and reduction rules shown inSection 33 or AinvV inv Vu Ainv
d V invd u operation where
Vu Ainvd V inv
d can further beapplied Commute Rule 2 to reduceand be within three steps
In this case the steps are finally within three steps andthe checking is done
bull There exist two adjacent steps that cannot be appliedany Rules above and requires the Rest Checking
The only left two adjacent steps that cannot be appliedby any of the three categorizations of the Rules are thefollowing
ndash AaVaAaalias Vaalias AdVdAinva V inv
a
Ainvaalias V inv
aalias Vu ndash AaVaAaalias Vaalias V inv
u ndash Vu AaVaAaalias Vaalias AdVd
Ainva V inv
a Ainvaalias V inv
aalias ndash V inv
u AaVaAaalias Vaalias We manually checked all of the two adjacent step patterns
above and found that adding extra step before or after these
J Hardw Syst Secur (2019) 3397ndash425422
Algorithm 2 -Step ( 3) pattern reduction
Input number of steps of the pattern
a two-dimensional dynamic-size array [0] contains the states of each step of the original pattern in
order [1] [2] are empty initially
Output reduced effective vulnerability pattern(s) array It will be an empty list if the original pattern does not
correspond to an effective vulnerability
1 = Oslash
2 while contain( ) and index not 0 do3 = 1 ( )
4 end while5 while ( contain( ) and index not 0) or ( contain( ) and index not 0) do6 = 2 ( )
7 end while8 while ( is ineffecitve or has interval effective three steps) do9 = ( )
10 = ( )
11 = ( )
12 if ( is ineffecitve or has interval effective three steps) then13 += ( )
14 end if15 end while16 return
two steps can either generate two adjacent step patterns thatbe processed by the three Rules where further step can bereduced or construct effective vulnerability according toTable 2 and reduction rules shown in Section 33 wherethe corresponding pattern can be treated effective and thechecking is done
B43 Algorithm for Reducing and CheckingMemorySequence
Algorithm 2 is used to (i) reduce a β-step (β gt 3)pattern to a three-step pattern thus demonstrating that thecorresponding β gt 3 step pattern actually is equivalent tothe output three-step pattern and represents a vulnerabilitythat is captured by an existing three-step pattern or (ii)demonstrate that the β-step pattern can be mapped to one ormore three-step vulnerabilities It is not possible for a β-stepvulnerability pattern to not be either (i) or (ii) after doing theRule applications Key outcome of our analysis is that anyβ-step pattern is not a vulnerability or if it is a vulnerabilityit maps to either outputs (i) or (ii) of the algorithm
Inside the Algorithm 2 contain() represents a function tocheck if a list contains a corresponding state is ineffective()represents a function that checks the corresponding memorysequence does not contain any effective three-stepshas interval effective three steps() represents a functionthat check if the corresponding memory sequence can bemapped to one or more three-step vulnerabilities
B44 Summary
In conclusion the three-step model can model all possibletiming-based cache vulnerability in normal caches Vulner-abilities which are represented by more than three steps canbe always reduced to one (or more) vulnerabilities from ourthree-step model and thus using more than three step is notnecessary
References
1 Acıicmez O Koc CK (2006) Trace-driven cache attacks on AES(short paper) In International conference on information andcommunications security Springer pp 112ndash121
2 Agrawal D Archambeault B Rao JR Rohatgi P (2002) TheEM side-channel (s) In International workshop on cryptographichardware and embedded systems Springer pp 29ndash45
3 Bernstein DJ (2005) Cache-timing attacks on AES4 Binkert N Beckmann B Black G Reinhardt SK Saidi A Basu
A Hestness J Hower DR Krishna T Sardashti S et al (2011)The gem5 simulator ACM SIGARCH computer architecture news39(2)1ndash7
5 Bonneau J Mironov I (2006) Cache-collision timing attacksagainst AES In International workshop on cryptographichardware and embedded systems Springer pp 201ndash215
6 Borghoff J Canteaut A Guneysu T Kavun EB Knezevic MKnudsen LR Leander G Nikov V Paar C Rechberger Cet al (2012) Princendasha low-latency block cipher for pervasivecomputing applications In International conference on the theoryand application of cryptology and information security Springerpp 208ndash225
J Hardw Syst Secur (2019) 3397ndash425 423
7 Bourgeat T Lebedev I Wright A Zhang S Devadas S et al (2019)MI6 Secure enclaves in a speculative out-of-order processorIn Proceedings of the 52nd Annual IEEEACM InternationalSymposium on Microarchitecture ACM pp 42ndash56
8 Chen S Liu F Mi Z Zhang Y Lee RB Chen H Wang X(2018) Leveraging hardware transactional memory for cache side-channel defenses In Proceedings of the 2018 on Asia conferenceon computer and communications security ACM pp 601ndash608
9 Clark SS Ransford B Rahmati A Guineau S Sorber J Xu WFu K (2013) Wattsupdoc power side channels to nonintrusivelydiscover untargeted malware on embedded medical devices InPresented as part of the 2013 USENIX workshop on healthinformation technologies
10 Costan V Lebedev IA Devadas S (2016) Sanctum minimalhardware extensions for strong software isolation In USENIXsecurity symposium pp 857ndash874
11 Deng S Xiong W Szefer J (2018) Cache timing side-channel vulnerability checking with computation tree logic InProceedings of the 7th international workshop on hardware andarchitectural support for security and privacy ACM p 2
12 Domnitser L Jaleel A Loew J Abu-Ghazaleh N Ponomarev D(2012) Non-monopolizable caches low-complexity mitigation ofcache side channel attacks ACM Transactions on Architectureand Code Optimization (TACO) 8(4)35
13 Doychev G Kopf B (2017) Rigorous analysis of softwarecountermeasures against cache attacks In Proceedings of the 38thACM SIGPLAN conference on programming language design andimplementation ACM pp 406ndash421
14 Doychev G Kopf B Mauborgne L Reineke J (2015) CacheAudita tool for the static analysis of cache side channels ACM Trans-actions on information and system security (TISSEC) 18(1)4
15 Gruss D Lettner J Schuster F Ohrimenko O Haller I CostaM (2017) Strong and efficient cache side-channel protectionusing hardware transactional memory In USENIX securitysymposium pp 217ndash233
16 Gruss D Maurice C Wagner K Mangard S (2016) Flush+ flusha fast and stealthy cache attack In International conference ondetection of intrusions and malware and vulnerability assessmentSpringer pp 279ndash299
17 Gruss D Spreitzer R Mangard S (2015) Cache template attacksautomating attacks on inclusive last-level caches In USENIXsecurity symposium pp 897ndash912
18 Guanciale R Nemati H Baumann C Dam M (2016) Cache stor-age channels alias-driven attacks and verified countermeasuresIn 2016 IEEE symposium on security and privacy (SP) IEEEpp 38ndash55
19 Gullasch D Bangerter E Krenn S (2011) Cache gamesndashbringingaccess-based cache attacks on AES to practice In 2011 IEEEsymposium on security and privacy (SP) IEEE pp 490ndash505
20 He Z Lee RB (2017) How secure is your cache against side-channel attacks In Proceedings of the 50th annual IEEEACMinternational symposium on microarchitecture ACM pp 341ndash353
21 Intel C (2015) Improving real-time performance by utilizing cacheallocation technology Intel Corporation
22 Kayaalp M Khasawneh KN Esfeden HA Elwell J Abu-Ghazaleh N Ponomarev D Jaleel A (2017) RIC relaxed inclusioncaches for mitigating LLC side-channel attacks In 2017 54thACMEDACIEEE design automation conference (DAC) IEEEpp 1ndash6
23 Keramidas G Antonopoulos A Serpanos DN Kaxiras S (2008)Non deterministic caches a simple and effective defense againstside channel attacks Des Autom Embed Syst 12(3)221ndash230
24 Kessler RE Hill MD (1992) Page placement algorithms for largerealindexed caches ACM Trans Comput Syst (TOCS) 10(4)338ndash359
25 Kiriansky V Lebedev I Amarasinghe S Devadas S Emer J(2018) DAWG a defense against cache timing attacks in spec-ulative execution processors In 2018 51st Annual IEEEACMinternational symposium on microarchitecture (MICRO) IEEEpp 974ndash987
26 Kocher P Genkin D Gruss D Haas W Hamburg M LippM Mangard S Prescher T Schwarz M Yarom Y (2018)Spectre attacks exploiting speculativeexecution In 2019 IEEESymposium on Security and Privacy (SP) IEEE pp 1ndash19
27 Lee RB Kwan P McGregor JP Dwoskin J Wang Z (2005)Architecture for protecting critical secrets in microprocessorsIn ACM SIGARCH computer architecture news vol 33 IEEEComputer Society pp 2ndash13
28 Lee Y Waterman A Avizienis R Cook H Sun C StojanovicV Asanovic K (2014) A 45nm 13 GHz 167 double-precisionGFLOPSW RISC-V processor with vector accelerators InEuropean solid state circuits conference (ESSCIRC) ESSCIRC2014-40th IEEE pp 199ndash202
29 Liu F Ge Q Yarom Y Mckeen F Rozas C Heiser G Lee RB(2016) CATalyst defeating last-level cache side channel attacks incloud computing In 2016 IEEE international symposium on highperformance computer architecture (HPCA) IEEE pp 406ndash418
30 Liu F Lee RB (2014) Random fill cache architecture In2014 47th annual IEEEACM international symposium onmicroarchitecture (MICRO) IEEE pp 203ndash215
31 Liu F Wu H Mai K Lee RB (2016) Newcache secure cachearchitecture thwarting cache side-channel attacks IEEE Micro36(5)8ndash16
32 Luk CK Cohn R Muth R Patil H Klauser A Lowney G WallaceS Reddi VJ Hazelwood K (2005) Pin building customizedprogram analysis tools with dynamic instrumentation In ACMsigplan notices vol 40 ACM pp 190ndash200
33 Masti RJ Rai D Ranganathan A Muller C Thiele L Capkun S(2015) Thermal covert channels on multi-core platforms In 24thUSENIX security symposium (USENIX security 15) pp 865ndash880
34 Osvik DA Shamir A Tromer E (2006) Cache attacks andcountermeasures the case of AES In Cryptographersrsquo track atthe RSA conference Springer pp 1ndash20
35 Patel A Afram F Chen S Ghose K (2011) MARSS Afull system simulator for multicore x86 CPUs In 2011 48thACMEDACIEEE design automation conference (DAC) IEEEpp 1050ndash1055
36 Percival C (2005) Cache missing for fun and profit BSDCan 2005Ottawa 2005 httpwwwdaemonologynetpapershttpdf
37 Hammarlund P Martinez AJ Bajwa AA Hill DL Hallnor EJiang H Dixon M Derr M Hunsaker M Kumar R Osborne RB(2014) Haswell The fourth-generation intel core processor IEEEMicro 1034(2)6ndash20
38 Qureshi MK (2018) CEASER mitigating conflict-based cacheattacks via encrypted-address and remapping In 2018 51Stannual IEEEACM international symposium on microarchitecture(MICRO) IEEE pp 775ndash787
39 Sanchez D Kozyrakis C (2013) ZSIm fast and accuratemicroarchitectural simulation of thousand-core systems In ACMSIGARCH computer architecture news vol 41 ACM pp 475ndash486
40 Schwarz M Schwarzl M Lipp M Gruss D (2018) NetSpectreread arbitrary memory over network In European Symposium onResearch in Computer Security Springer pp 279ndash299
41 Seznec A (1993) A case for two-way skewed-associative cachesACM SIGARCH computer architecture news 21(2)169ndash178
42 Sharkey J Ponomarev D Ghose K (2005) M-sim a flexiblemultithreaded architectural simulation environment Techenicalreport Department of Computer Science State University ofNew York at Binghamton In Technical Report 20012 CompaqComputer Corporation Aug 2001
J Hardw Syst Secur (2019) 3397ndash425424
43 Shivakumar P Jouppi NP (2001) Cacti 30 an integrated cachetiming power and area model In Technical Report 20012Compaq Computer Corporation Aug 2001
44 Taylor G Davies P Farmwald M (1990) The TLB slice-alow-cost high-speed address translation mechanism In 17thannual international symposium on computer architecture 1990Proceedings IEEE pp 355ndash363
45 Thoziyoor S Muralimanohar N Ahn JH Jouppi NP (2008)CACTI 51 Technical Report HPL-2008-20 HP Labs
46 Trippel C Lustig D Martonosi M (2018) MeltdownPrimeand SpectrePrime automatically-synthesized attacks exploitinginvalidation-based coherence protocols arXiv180203802
47 Wang S Wang P Liu X Zhang D Wu D (2017) CacheDidentifying cache-based timing channels in production softwareIn 26th USENIX security symposium USENIX association
48 Wang Y Ferraiuolo A Zhang D Myers AC Suh GE (2016)SecDCP secure dynamic cache partitioning for efficient timingchannel protection In 2016 53nd ACMEDACIEEE designautomation conference (DAC) IEEE pp 1ndash6
49 Wang Z Lee RB (2007) New cache designs for thwarting softwarecache-based side channel attacks In ACM SIGARCH computerarchitecture news vol 35 ACM pp 494ndash505
50 Wang Z Lee RB (2008) A novel cache architecture withenhanced performance and security In 2008 41st IEEEACMinternational symposium on Microarchitecture 2008 MICRO-41IEEE pp 83ndash93
51 Werner M Unterluggauer T Giner L Schwarz M Gruss D Man-gard S (2019) ScatterCache thwarting cache attacks via cache setrandomization In 28th USENIX security symposium (USENIXSecurity 19) USENIX Association Santa Clara CA httpswwwusenixorgconferenceusenixsecurity19presentationwerner
52 Yan M Choi J Skarlatos D Morrison A Fletcher C Torrellas J(2018) InvisiSpec making speculative execution invisible in thecache hierarchy In 2018 51st annual IEEEACM internationalsymposium on microarchitecture (MICRO) IEEE pp 428ndash441
53 Yan M Gopireddy B Shull T Torrellas J (2017) Secure hierarchy-aware cache replacement policy (SHARP) defending againstcache-based side channel attacks In Proceedings of the 44thannual international symposium on computer architecture ACMpp 347ndash360
54 Yao F Doroslovacki M Venkataramani G (2018) Are coherenceprotocol states vulnerable to information leakage In 2018IEEE international symposium on high performance computerarchitecture (HPCA) IEEE pp 168ndash179
55 Yarom Y Falkner K (2014) FLUSH+ RELOAD a high resolutionlow noise L3 cache side-channel attack In USENIX securitysymposium pp 719ndash732
56 Zhang D Askarov A Myers AC (2012) Language-based controland mitigation of timing channels ACM SIGPLAN Not 47(6)99ndash110
57 Zhang D Wang Y Suh GE Myers AC (2015) A hardware designlanguage for timing-sensitive information-flow security In ACMSIGARCH computer architecture news vol 43 ACM pp 503ndash516
58 Zhang S Wright A Bourgeat T Arvind A (2018) Composablebuilding blocks to open up processor design In 2018 51stannual IEEEACM international symposium on microarchitecture(MICRO) IEEE pp 68ndash81
59 Zhang T Lee RB (2014) New models of cache architecturescharacterizing information leakage from cache side channels InProceedings of the 30th annual computer security applicationsconference ACM pp 96ndash105
60 Zhang Y Parikh D Sankaranarayanan K Skadron K Stan M(2003) HotLeakage a temperature-aware model of subthresholdand gate leakage for architects University of Virginia Dept ofComputer Science Tech Report CS-2003 5
Publisherrsquos Note Springer Nature remains neutral with regard tojurisdictional claims in published maps and institutional affiliations
J Hardw Syst Secur (2019) 3397ndash425 425
21 Threat Model
This work focuses only on timing-based attacks in processorcaches Numerous other types of side and covert channelsthat do not use timing or caches exist eg power-based [9]EM-based [2] (including RF) thermal-based [33] and inprocessor channels based on features such as power state ofthe AVX unit [40] for example This work aims to exploremain cache attacks only but similar approach can be donefor the other buffers or cache-like structures which may betarget of attack once main processor caches are secured
In our threat model an attackerrsquos objective is to retrievevictimrsquos secret information using timing-based channels inthe processor cache Specifically we consider the situationwhere the victim accesses an address u and the addressdepends on some secret information The address u is withinsome set of physical memory locations x which are knownto the attacker The goal of the attacker is to obtain theaddress u or at least partial bits of it which relate to the cacheindex of the address
We assume the attacker knows some of the source codeof the victim Especially the attacker can only learn someinformation 1 about the address u from the timing channelsbut with knowledge of the source code he or she can furtherinfer the likely specific value of u and thus infer the secrethe or she is trying to learn
The attacker cannot directly access any data in the statemachine of the cache logic nor directly read the data of thevictim if the two are not sharing the same address spaceThe attacker can however observe its own timing or thetiming of the victim process And the attacker knows howthe timing of the memory-related operations depends on thecache states
The attacker further is able to force the victim to executea specific function For example the attacker can requestvictim to decrypt a specific piece of data thus triggering thevictim to execute a function that makes use of a secret keyhe or she wants to learn The victim in the cache attacks canbe user software code in an enclave operating system oranother virtual machine
The processor microarchitecture and the operatingsystem are assumed to be able to differentiate between thevictim and the attacker in different processes by assigningdifferent process IDs If the victim and the attacker are inthe same process eg attacker is a malicious library theywill have the same process ID The system software (egoperating system or hypervisor) is responsible for properly
1For a hit-based vulnerabilities the attacker is able to learn thefull address of the victimrsquos sensitive data while for the miss-basedvulnerabilities the attacker usually can learn the cache index ofthe victimrsquos sensitive data For more details of these vulnerabilitiesrsquocategorizations please refer to Section 333
setting up virtual memory (page tables) and assigning IDswhich may be used by the hardware to identify differentthreads processes or virtual machines When analyzingsecure cache designs the system software is consideredtrusted and bug-free The attacker is also assumed not to beable to undermine the physical implementation or changethe hardware eg he or she cannot influence randomnessgenerated by any random number generators in hardwarePhysical or invasive attacks are not in scope of this workFor secure cache designs which add new instructionsfor security-related operations the victim process ormanagement software is assumed to correctly use theseinstructions During speculative execution the cache statecan be modified by the instructions executed speculativelyunless a processor cache architecture explicitly prevents orforbids certain speculative accesses
22 Side and Covert Channels
This work focuses on both side and covert channels Covertchannels use the same methods as side channels but theattacker controls both the sender and the receiver side ofthe channel All types of side-channel attacks are equallyapplicable to covert channels For brevity we just use theterm ldquovictimrdquo in the text to represent both the victim (forside channels) and the sender (for covert channels)
23 Hyperthreading Versus Timing-Slice Sharing
When the hyperthreading is supported in a system theattacker and the victim are able to run on different threadsin parallel instead of running once every time slice (whenno hyperthreading is used) Our model can be applied toboth of the scenarios since our model abstracts away howthe sharing happens
3Modeling of the Cache TimingndashBasedSide-Channel Vulnerabilities
This section explains how we developed the three-stepmodeling approach and used it to model the behavior ofthe cache logic and to enumerate all the possible cachetimingndashbased vulnerabilities
31 Introduction of the Three-StepModel
We have observed that all of the existing cache timingndashbased attacks can be modeled with three steps of memory-related operations Here ldquomemory-related operationrdquo refersto loads stores or different flushes that can be done by thevictim or the attacker on the same core or different coresWhen the victim and the attacker are on different cores
J Hardw Syst Secur (2019) 3397ndash425 399
cache coherence will also be triggered when one of thememory-related operations is performed
The three-step model has three steps as the nameimplies In Step 1 a memory operation is performedplacing the cache in an initial state known to the attacker(eg a new piece of data at some address is put into thecache or the cache block is invalidated) Then in Step 2a second memory operation alters the state of the cachefrom the initial state Finally in Step 3 a final memoryoperation is performed and the timing of the final operationreveals some information about the relationship among theaddresses from Step 1 Step 2 and Step 3
For example in Flush + Reload [55] attack in Step
1 a cache block is flushed by the attacker In Step 2security critical data is accessed by for example victimrsquosAES encryption operation In Step 3 the same cache blockas the one flushed in Step 1 will be accessed and the time ofthe access will be measured by the attacker If the victimrsquossecret-dependent operation in Step 2 accesses the cacheblock in Step 3 there will be a cache hit and fast timing
of the memory operation will be observed and the attackerlearns the victimrsquos secret address
To model all the timing-based attacks we write the threesteps as Step 1 Step2 Step3 which represents asequence of steps taken by the attacker or the victim Tosimplify the model we focus on memory-related operationaffecting one single cache block (also called cache slotcache entry or cache line) Cache block is the smallestunit of the cache Since all the cache blocks are updatedfollowing the same cache state machine logic it is sufficientto consider only one cache block
32 States of the Three-StepModel
When modeling the attacks we propose that there are 17possible states for a cache block Table 1 lists all the 17possible states of the cache block for each step in our three-step model and their formal definitions Figure 1 graphicallyshows for each possible state how the memory locationmaps to the cache block
Table 1 The 17 possible states for a single cache block in our three-step model
State Description
Vu A memory location u belonging to the victim is accessed and is placed in the cache block by the victim (V) Attacker doesnot know u but u is from a set x of memory locations a set which is known to the attacker It may have the same index as a
or aalias and thus conflict with them in the cache block The goal of the attacker is to learn the index of the address u Theattacker does not know the address u hence there is no Au in the model
Aa or Va The cache block contains a specific memory location a The memory location is placed in the cache block due to a memoryaccess by the attacker Aa or the victim Va The attacker knows the address a independent of whether the access was by thevictim or the attacker themselves The address a is within the range of sensitive locations x The address a is known to theattacker
Aaalias or Vaalias The cache block contains a memory address aalias The memory location is placed in the cache block due to a memory accessby the attacker Aaalias or the victim Vaalias The address aalias is within the range x and not the same as a but it has the sameaddress index and maps to the same cache block ie it ldquoaliasesrdquo to the same block The address aalias is known to the attacker
Ad or Vd The cache block contains a memory address d The memory address is placed in the cache block due to a memory access bythe attacker Ad or the victim Vd The address d is not within the range x The address d is known to the attacker
Ainv or V inv The cache block is now invalid The data and its address are ldquoremovedrdquo from the cache block by the attacker Ainv or thevictim V inv as a result of cache block being invalidated eg this is a cache flush of the whole cache
Ainva or V inv
a The cache block state can be anything except a in this cache block now The data and its address are ldquoremovedrdquo from thecache block by the attacker Ainv
a or the victim V inva Eg by using a flush instruction such as clf lush that can flush specific
address or by causing certain cache coherence protocol events that force a to be removed from the cache block The addressa is known to the attacker
Ainvaalias or V inv
aalias The cache block state can be anything except aalias in this cache block now The data and its address are ldquoremovedrdquo fromthe cache block by the attacker Ainv
aalias or the victim V invaalias Eg by using a flush instruction such as clf lush that can flush
specific address or by causing certain cache coherence protocol events that force aalias to be removed from the cache blockThe address aalias is known to the attacker
Ainvd or V inv
d The cache block state can be anything except d in this cache block now The data and its address are ldquoremovedrdquo from thecache block by the attacker Ainv
d or the victim V invd Eg by using a flush instruction such as clf lush that can flush specific
address or by causing certain cache coherence protocol events that force d to be removed from the cache block The addressd is known to the attacker
V invu The cache block state can be anything except u in the cache block The data and its address are ldquoremovedrdquo from the cache
block by the victim V invu as a result of cache block being invalidated eg by using a flush instruction such as clf lush or
by certain cache coherence protocol events that force u to be removed from the cache block The attacker does not know uTherefore the attacker is not able to trigger this invalidation and Ainv
u does not exist in the model
Any data or no data can be in the cache block The attacker has no knowledge of the memory address in this cache block
J Hardw Syst Secur (2019) 3397ndash425400
Fig 1 The 17 possible states for a single cache block in our three-step model a Vu b AaVaAaalias Vaalias AdVd c AinvV inv dAinv
a V inva Ainv
aalias V invaalias A
invd V inv
d e V invu f )
In each sub-figure of Fig 1 the left-most part shows thepossible state being described in the sub-figure The middlepart shows the possible situation of the cache state affectedby each For all sub-figures the middle cache block (shownin bold) is the targeted cache block Right-most part showsthe memory region in relation to the cache block Recallthe addresses a and aalias are within the sensitive set ofaddresses x while d is outside the set of sensitive addresses(for simplicity the set is shown as a contiguous region butit can be any set) Also recall A represents the operationsperformed by the attacker and V represents the victimrsquosoperations
Figure 1a shows the description of the possible state Vuwhere address u is within sensitive set and unknown to theattacker Therefore it can possibly map to any cache blockincluding the target cache block shown in the middle Sinceits position in the cache and specific address is unknown
we show Vu in dashed lines Meanwhile Fig 1e shows thedescription of the possible state V inv
u which is result ofthe victim invalidating data at the sensitive address u andpossibly invalidating some address within sensitive regionFurther Fig 1f shows the description of the possible statelowast which represents null knowledge of the address for theattacker to this corresponding cache block Therefore it canpossibly refers to any address in the memory or no validaddress at all
Figure 1b shows the description of the possible stateAaVaAaalias Vaalias AdVd Their addresses are all knownto the attacker and map to the same targeted cache blockBoth a and aalias are within the sensitive set of addressesx and aalias as its name indicates is a different addressthan a but still within set x and maps to the same cacheblock as a Address d is outside of the set x MeanwhileFigure 1d shows the description of the possible state
J Hardw Syst Secur (2019) 3397ndash425 401
Ainva V inv
a Ainvaalias V inv
aalias Ainvd V inv
d which correspondto invalidation of the address shown in the subscript ofthe state Some additional possible invalidation statesAinvV inv are shown in Figure 1c These states indicate novalid address is in the cache block Therefore all the possi-ble addresses that mapped to this cache block eg a aalias d and u (if it mapped to this block) before the invalidationstep AinvV inv will be flushed back to the memory
33 Derivation of All Cache TimingndashBasedVulnerabilities
With the 17 candidate states shown in Table 1 for eachstep there are in total 17 lowast 17 lowast 17 = 4913 combinationsof three steps We developed a cache three-step simulatorand a set of reduction rules to process all the three-step combinations and decide which ones can indicate areal attack As is shown in Fig 2 the exhaustive list ofthe 4913 combinations will first be input to the cachethree-step simulator where the preliminary classificationof vulnerabilities is derived The effective vulnerabilitieswill then be sent as the input to the reduction rules toremove the redundant three steps and obtain final list ofvulnerabilities
331 Cache Three-Step Simulator
We developed a cache three-step simulator that simulatesthe state of one cache block and derives the attackerrsquosobservations in the last step of the three-step patterns that itanalyzes for different possible u Since u is in secure rangex the possible candidates of u for a cache block are a aalias
and NIB (Not-In-Block) Here NIB indicates the case thatu does not have same index as a or aalias and thus does notmap to this cache block
The cache three-step simulator is implemented inPython script and its pesudo implementation is shown inAlgorithm 1 Simulatorrsquos inputs are 17 possible states foreach of the step Outputs are all the vulnerabilities thatbelong to the Strong or the Weak type or the Ineffective typeThe simulator uses a nested for loop to check all possiblecombinations (4913) of the three step pattern For each step
Exhaustive List
of all possible
three-step
combinations
Cache
Three-Step
Simulator
Preliminary Strong Vulnerability
Preliminary WeakVulnerability
Ineffective Three-Step
Reduction
Rules
StrongVulnerability
WeakVulnerability
Classification Step
Reduction Step
Vulnerability TypesVulnerability Types
4913
132
572
4209
72
64
Fig 2 Procedure to derive the effective types of three-step timing-based vulnerabilities Ovals refer to the number of vulnerabilitiesin each category
of each pattern if it is Vu this step will be extended to beone of three candidates Va Vaalias and VNIB If it is V inv
u this step will be extended to be one of three candidatesV inv
a V invaalias and V inv
NIB We wrote a function output timingthat takes three known memory access steps as input andoutput whether fast or slow timing will be observed forthe last step In this case for each of the u-related steprsquoscandidate we can derive a timing observation Using thesetiming observation function judge type decides whether athree-step pattern is a potential vulnerability by analyzingwhether the attacker is able to observe different andunambiguous timing for different values of u
The simulator categorizes all the three-step patternsinto three categories as listed below Figure 3 showstwo examples for the Strong Vulnerability (a b) WeakVulnerability (c d) and Ineffective Three-Step (e f)categories respectively
1 Strong Vulnerability When a fast or slow timing isobserved by the attacker he or she is able to uniquelydistinguish the value of u (either it maps to someknown address or has the same index with some knownaddress) In this case the vulnerability has stronginformation leakage (ie attacker can directly obtainthe value of u based on the observed timing) Wecategorize these vulnerabilities to be strong Eg forVd Vu Aa vulnerability shown in Fig 3aif u maps to a the attacker will always derive fasttiming If u is aalias or NIB slow timing will beobserved This indicates that the attacker is able tounambiguously infer the victimrsquos behavior (u) from thetiming observation
2 Weak Vulnerability When fast or slow timing isobserved by the attacker he or she knows it correspondsto more than one possible value of u (eg a or aalias)For these vulnerabilities timing variation can still beobserved due to different victimrsquos behavior Howeverthe attacker cannot learn the value of the index ofthe address u unambiguously For example for type Vu Ainv
a shown in Fig 3c when fast timingis observed u possibly maps to aalias or NIB (thereason for the possibility of u mapping to NIB to derivefast timing is that due to the in Step 1 the cachecould have a hit and then Aa would result in a cachehit) On the other hand when slow timing is observedu possibly maps to a or NIB This pattern leads touncertain u guess about value of u based on timingobservation
3 Ineffective Three-Step The remaining types are treatedto be ineffective For example for type Aa Vu Ad
shown in Fig 3f no matter what the value of u isattackerrsquos observation is always slow timing
J Hardw Syst Secur (2019) 3397ndash425402
Pseudo-code for the cache three-step simulator algorithm
Input a list containing 17 possible states for each of the step
Output a list containing all the vulnerabilities that belong to the Strong type
a list containing all the vulnerabilities that belong to the Weak type
a list containing all the ineffective typs
1 for 1 do2 for 2 do3 for 3 do4 1 2 3
5 array to store all possible candidate combinations of this three-step pattern
6 array to store all possible timing observation regading different candidate combinations for this
three-step pattern
7 if 0 or 1 or 2 then8 for 3 rsquos candidates are and rsquos candidates are
and Both candidatersquos number is 3 do9 0
1 2
10 end for11 for 3 do12
13 end for14 if then15 strongappend(steps)
16 else17 if then18
19 else20
21 end if22 end if23 else24
25 continue26 end if27 end for28 end for29 end for
Algorithm 1
After computing the type of all the three-step patternsthe cache three-step simulator will output effective (StrongVulnerability or Weak Vulnerability) three-step patternsDue to the space limit we only list and analyze the Strongvulnerabilities in this paper Weak vulnerabilities are left forfuture work when channels with smaller channel capacitiesare desired to be analyzed
332 Reduction Rules
We also have developed rules that can further reduce theoutput list of all the effective three steps from the cache
three-step simulator Figure 2 shows how the output of thesimulator is filtered through the reduction rules to get thefinal list of vulnerabilities Reductionrsquos goal is to removevulnerabilities of repeating or redundant types from the liststo form effective Strong Vulnerability or Weak Vulnerabilityoutput A script was developed that automatically appliesbelow reduction rules to the output of the simulator to getthe final list of vulnerabilities A three-step combinationwill be eliminated if it satisfies one of the below rules
1 Three-step patterns with two adjacent steps which arerepeating or which are both known to the attacker canbe eliminated eg Ad Aa Vu can be reduced
J Hardw Syst Secur (2019) 3397ndash425 403
Behavior (u)a
aalias
Observation
NIB
fast
slow
Vd Vu Aa
Behavior (u)a
aalias
Observation
NIB
fast
slow
Vu Aainv
Behavior (u)a
aalias
Observation
NIB
fast
slow
Vd Vuinv Vd
(a)
(c)
(e)
Behavior (u)a
aalias
Observation
NIB
fast
slow
Vu Ad Vuinv
Behavior (u)a
aalias
Observation
NIB
fast
slow
Aaaliasinv Vu
inv Va
Behavior (u)a
aalias
Observation
NIB
fast
slow
Aa Vu Ad
(b)
(d)
(f)
Eg Eg
Eg Eg
Eg Eg
Fig 3 Examples of relations between victimrsquos behavior (u) andattackerrsquos observation for each vulnerability type a b StrongVulnerability c d Weak Vulnerability e f Ineffective Three-Step
to Aa Vu which is equivalent to Aa VuTherefore Ad Aa Vu is a repeat type of Aa Vu and can be eliminated
2 Three-step patterns with a step involving a knownaddress a and an alias to that address aalias givesthe same information Thus three step combinationswhich only differ in use of a or aalias cannot representdifferent attacks and only one combination needs tobe considered For example Vu Aaalias Vu is arepeat type of Vu Aa Vu and we will eliminatethe first pattern
3 Three-step patterns with steps Vu and V invu in adjacent
consecutive steps with each other will only keep thelatter step and eliminate the first step For exampleAa Vu V inv
u can be reduced to Aa V invu and
further equivalent to Aa V invu So Aa Vu
V invu can be eliminated
333 Categorization of StrongVulnerabilities
As is shown in Fig 2 after applying the reduction rulesthere are 72 types of Strong vulnerabilities remaining InAppendix B we analyze the soundness of the three-stepmodel to demonstrate that the three-step model can cover allpossible cache timingndashbased side-channel vulnerabilitiesAnd if there is a vulnerability it can always be reduced toa model that requires only three steps Table 2 lists all the
vulnerability types of which the last step is a memory accessand Table 3 shows all the vulnerability types of which thelast step is an invalidation-related operation To ease theunderstanding of all the vulnerability types we group thevulnerabilities based on attack strategies (left-most columnin Tables 2 and 3) these strategies correspond to well-known names for the attacks if such exist otherwise weprovide a new name In Appendix A we provide descriptionfor each attack strategy to show the main idea behind themWe use existing names for attack strategies where suchexisted before even if similar attacks eg attacks differingin only one step have been given different names before Weuse these established names to avoid confusion but detailsome of the similarities in Appendix A as a clarification
The list of vulnerability types can be further collectedinto four simple macro types which cover one or morevulnerability types internal interference miss-based (IM)internal interference hit-based (IH) external interferencemiss-based (EM) external interference hit-based (EH) aslabeled in the Macro Type column of Tables 2 and 3 Allthe types of vulnerabilities that only involve the victimrsquosbehavior V in the states in Step 2 and Step 3 are calledinternal interference vulnerabilities (I) The remaining onesare called external interference (E) Some vulnerabilitiesallow the attacker to learn that the address of the victimaccesses map to the set the attacker is attacking by observingslow timing due to a cache miss or f ast timing due toinvalidation of data not in the cache2 We call these miss-based vulnerabilities (M) The remaining ones leverageobservation of f ast timing due to a cache hit or slow timingdue to an invalidation of an address that is currently valid inthe cache and are called hit-based vulnerabilities (H)
Many vulnerability types have been explored before Forexample the Cache Collision attack [5] is effectively basedon the Internal Collision and it maps to types labeled (2)in the Attack column in Tables 2 and 3 The types labelednew correspond to new attack not previously discussedin literature We believe these 43 are new attacks notpreviously analyzed nor known
4 Secure Caches
Having explained the three-step model we now explorethe various secure caches which have been presented inliterature to date [7 10 12 22 23 25 27 29 30 38 48ndash53 56 57] Later in Section 5 we will apply the three-stepmodel to check if the secure caches can defend some or allof the vulnerabilities in our model
2Invalidation is fast when the corresponding address which is to beinvalidated does not exist in the cache since no operation is needed forthe invalidation
J Hardw Syst Secur (2019) 3397ndash425404
Table 2 The table shows all the cache timing-based cache vulnerabilities where the last step is a memory access related operation The AttackStrategy column gives a common name for each set of one or more specific vulnerabilities that would be exploited in an attack in a similar mannerThe Vulnerability Type column gives the three steps that define each vulnerability For Step 3 fast indicates a cache hit must be observed toderive sensitive address information while slow indicates a cache miss must be observed The Macro Type column proposes the categorizationthe vulnerability belongs to ldquoErdquo is for external interference vulnerabilities ldquoIrdquo is for internal interference vulnerabilities ldquoMrdquo is for miss-basedvulnerabilities ldquoHrdquo is for hit-based vulnerabilities The Attack column shows if a type of vulnerability has been previously presented in literature
Attack Strategy Vulnerability Type Macro Type Attack
Step 1 Step 2 Step 3
Cache Internal Collision Ainv Vu Va (fast) IH (2)
V inv Vu Va (fast) IH (2)
Ad Vu Va (fast) IH (2)
Vd Vu Va (fast) IH (2)
Aaalias Vu Va (fast) IH (2)
Vaalias Vu Va (fast) IH (2)
Ainva Vu Va (fast) IH (2)
V inva Vu Va (fast) IH (2)
Flush + Reload Ainva Vu Aa (fast) EH (5)
V inva Vu Aa (fast) EH (5)
Ainv Vu Aa (fast) EH (5)
V inv Vu Aa (fast) EH (5)
Ad Vu Aa (fast) EH (5)
Vd Vu Aa (fast) EH (5)
Aaalias Vu Aa (fast) EH (5)
Vaalias Vu Aa (fast) EH (5)
Reload + Time V invu Aa Vu (fast) EH new
V invu Va Vu (fast) IH new
Flush + Probe Aa V invu Aa (slow) EM (6)
Aa V invu Va (slow) IM new
Va V invu Aa (slow) EM new
Va V invu Va (slow) IM new
Evict + Time Vu Ad Vu (slow) EM (1)
Vu Aa Vu (slow) EM (1)
Prime + Probe Ad Vu Ad (slow) EM (4)
Aa Vu Aa (slow) EM (4)
Bernsteinrsquos Attack Vu Va Vu (slow) IM (3)
Vu Vd Vu (slow) IM (3)
Vd Vu Vd (slow) IM (3)
Va Vu Va (slow) IM (3)
Evict + Probe Vd Vu Ad (slow) EM newVa Vu Aa (slow) EM new
Prime + Time Ad Vu Vd (slow) IM newAa Vu Va (slow) IM new
Flush + Time Vu Ainva Vu (slow) EM new
Vu V inva Vu (slow) IM new
(1) Evict + Time attack [34] (2) Cache Internal Collision attack [5] (3) Bernsteinrsquos attack [3] (4) Prime + Probe attack [34 36] Alias-drivenattack [18] (5) Flush + Reload attack [54 55] Evict + Reload attack [17] (6) SpectrePrime MeltdownPrime attack [46]
J Hardw Syst Secur (2019) 3397ndash425 405
Table 3 The table shows the second part of the timing-based cache side-channel vulnerabilities where the last step is an invalidation relatedoperation For Step 3 fast indicates no corresponding address of the data is invalidated while slow indicates invalidation operation makes somedata invalid causing longer processing time
Attack Vulnerability Type Macro Attack
Step 1 Step 2 Step 3
Strategy Type
Cache Internal Collision Invalidation Ainv Vu V inva (slow) IH new
V inv Vu V inva (slow) IH new
Ad Vu V inva (slow) IH new
Vd Vu V inva (slow) IH new
Aaalias Vu V inva (slow) IH new
Vaalias Vu V inva (slow) IH new
Flush + Flush Ainva Vu V inv
a (slow) IH (1)
V inva Vu V inv
a (slow) IH (1)
Ainva Vu Ainv
a (slow) EH (1)
V inva Vu Ainv
a (slow) EH (1)
Flush + Reload Invalidation Ainv Vu Ainva (slow) EH new
V inv Vu Ainva (slow) EH new
Ad Vu Ainva (slow) EH new
Vd Vu Ainva (slow) EH new
Aaalias Vu Ainva (slow) EH new
Vaalias Vu Ainva (slow) EH new
Reload + Time Invalidation V invu Aa V inv
u (slow) EH new
V invu Va V inv
u (slow) IH new
Flush + Probe Invalidation Aa V invu Ainv
a (fast) EM new
Aa V invu V inv
a (fast) IM new
Va V invu Ainv
a (fast) EM new
Va V invu V inv
a (fast) IM new
Evict + Time Invalidation Vu Ad V invu (fast) EM new
Vu Aa V invu (fast) EM new
Prime + Probe Invalidation Ad Vu Ainvd (fast) EM new
Aa Vu Ainva (fast) EM new
Bernsteinrsquos Invalidation Attack Vu Va V invu (fast) IM new
Vu Vd V invu (fast) IM new
Vd Vu V invd (fast) IM new
Va Vu V inva (fast) IM new
Evict + Probe Invalidation Vd Vu Ainvd (fast) EM new
Va Vu Ainva (fast) EM new
Prime + Time Invalidation Ad Vu V invd (fast) IM new
Aa Vu V inva (fast) IM new
Flush + Time Invalidation Vu Ainva V inv
u (fast) EM new
Vu V inva V inv
u (fast) IM new
(1) Flush + Flush attack [16]
J Hardw Syst Secur (2019) 3397ndash425406
This section gives brief overview of the 18 secure cachedesigns that have been presented in academic literaturein the last 15 years To the best of our knowledge thesecover all the secure cache designs proposed to date Mostof the designs have been realized in functional simulationeg [23 48] Some have been realized in FPGA eg [7]and a few have been realized in real ASIC hardwareeg [31] No specific secure caches have been implementedin commercial processors to the best of our knowledgehowever CATalyst [29] leverages Intelrsquos CAT (CacheAllocation Technology) technology available today in IntelXeon E5 2618L v3 processors and could be deployed today
When the secure cache description in the cited papersdid not mention the issue of using flush or cache coherencewe assume the victim or the attacker cannot invalidateeach otherrsquos cache blocks by using clf lush instructions orthrough cache coherence protocol operations but they canflush or use cache coherence to invalidate their own cachelines The victim and the attacker also cannot invalidateprotected or locked data Further if the authors specified anyspecific assumptions (mainly about the software) we list theassumption as part of the description of the cache Whatrsquosmore when the level of cache hierarchy was unspecifiedwe assume the secure cachesrsquo features can be appliedto all levels of caches including L1 cache L2 cacheand Last Level Cache (LLC) If the inclusivity of thecaches was not specified we assume they target inclusivecaches Following the below descriptions of each securecache design the analysis of the secure caches is given inSection 5
SPlowast Cache [20 27]3 uses partitioning techniques tostatically partition the cache ways into High and Lowpartition for the victim and the attacker according to theirdifferent process IDs The victim typically belongs to Highsecurity and attacker belongs to Low security Victimrsquosmemory accesses cannot modify Low partition (assigned toprocesses such as the attacker) while the attackerrsquos memoryaccesses cannot modify High partition (assigned to thevictim) However the memory accesses of both the victimand the attacker can result in a hit in either Low or Highpartition if the data is in the cache
SecVerilog Cache [56 57] statically partitions cacheblocks between security levels L (Low) and H (High)Each instruction in the source code for programs usingSecVerilog cache needs to include a t iming label whicheffectively represents whether the data being accessed by
3Two existing papers give slightly different definitions for an ldquoSPrdquocache thus we selected to define a new cache the SPlowast cachethat combines secure cache features of the Secret-Protecting cachefrom [27] with secure cache features of the Static-Partitioned cachefrom [20]
that instruction is Low or High based on the code andthis t iming label can be similar to a process ID thatdifferentiates attackerrsquos (Low) instructions from victimrsquos(High) instructions The cache is designed such thatoperations in the High partition cannot affect timing ofoperations in the Low partition For a cache miss due to Lowinstructions when the data is in the High partition it willbehave as a cache miss and the data will be moved from theHigh to the Low partition to preserve consistency HoweverHigh instructions are able to result in a cache hit in bothHigh and Low partitions if the data is already in the cache
SecDCPCache [48] builds on the SecVerilog cache and usespartitioning idea from the original SecVerilog cache but thepartitioning is dynamic It can support at least two securityclasses H (High) and L (Low) and configurations with moresecurity classes are possible They use the percentage ofcache misses for L instructions that was reduced (increased)when Lrsquos partition size was increased (reduced) by onecache way to adjust the number of ways of the cacheassigned to the Low partition When adjusting number ofways in the cache dedicated to each partition if Lrsquos partitionsize decreases the process ID is checked and L blocksare flushed before the way is reallocated to H On theother hand if Lrsquos partition size increases H blocks in theadjusted cache way remain unmodified so as to not addmore performance overhead and they will eventually beevicted by Lrsquos memory accesses However the feature ofnot flushing High partition data during way adjustment mayleak timing information to the attacker
NoMo Cache [12] dynamically partitions the cache waysamong the currently ldquoactiverdquo simultaneous multithreading(SMT) threads Each thread is exclusively reserved Y blocksin each cache set where Y is within the range of [0 N
M]
where N is the number of ways and M is the number ofSMT threads NoMo-0 equals to traditional set associativecache while NoMo- N
M partitions cache evenly for the
different threads and there are no non-reserved ways Thenumber of Y assigned to each thread is adjusted based onits activeness When adjusting number of blocks assigned toa thread Y blocks are invalidated for cache sets to protecttiming leakage Eviction is not allowed within each threadrsquosown reserved ways while it is possible for the shared waysTherefore to avoid eviction caused by the unreserved wayswe assume NoMo- N
M is used to fully partition the cache
When the attacker and the victim share the same librarythere will be a cache hit if accessing the shared data andthe normal cache hit policy holds to guarantee the cachecoherence
SHARP Cache [53] uses both partitioning and randomiza-tion techniques to prevent victimrsquos data from being evicted
J Hardw Syst Secur (2019) 3397ndash425 407
or flushed by other malicious processes and it targets on theinclusive caches Each cache block is augmented with thecore valid bits (CVB) to indicate which private cache (pro-cess) it belongs to (similar to the Process ID) where CVBstores a bitmap and ith bit in the bitmap is set if the lineis present in ith corersquos private cache Cache hit is allowedamong different processesrsquo data When there is cache missand data needs to be evicted data not belonging to any cur-rent processes will be evicted first If there is no such datadata belonging to the same process will be evicted If thereis no existing data in the cache that is in the same processa random data in the cache set will be evicted This randomeviction will generate an interrupt to the OS to notify it ofa suspicious activity For pages that are read-only or exe-cutable SHARP cache disallows flushing using clf lush inuser mode However invalidating victimrsquos blocks by usingcache coherence protocol is still possible
Sanctum Cache [10] focuses on isolation of enclaves(equivalent to Trusted Software Module in other designs)from each other and the operating system (OS) In terms ofcaches they implement security features for L1 cache TLBand LLC Cache isolation of LLC is achieved by assigningeach enclave or OS to different DRAM address regionsIt uses page-coloring-based cache partitioning scheme [2444] and a software security monitor that ensures per-core isolation between OS and enclaves For L1 cacheand TLB when there is a transition between enclave andnon-enclave mode the security monitor will flush thecore-private caches to achieve isolation Normal flushestriggered by the enclave or the OS can only be done withinenclave or not within enclave code Also timing-based side-channel attacks exploiting cache coherence are explicitlynot prevented thus behavior on cache coherence operationsis not defined This cache listed extra software assumptionsas follows
Assumption 1 Software security monitor guarantees thatvictim and attacker process cannot share the same cacheblocks It uses page coloring [24 44] to ensure that victimand attackerrsquos memory is never mapped to the same cacheblocks for the LLCAssumption 2 The software runs on a system with asingle processor core where victim and attacker alternateexecution but can never run truly in parallel Moreoversecurity critical data is always flushed by the securitymonitor when program execution switches away from thevictim program for the L1 cache and the TLB
MI6 Cache [7] is part of the memory hierarchy of theMI6 processor which combines Sanctum [10] cachersquossecurity feature with disabling speculation during thespeculative execution of memory-related operations During
normal processor execution for L1 caches and TLB thecorresponding states will be flushed across context switchesbetween software threads For the LLC set partitioning isused to divide DRAM into contiguous regions And cachesets are guaranteed to be strictly partitioned (two DRAMregions cannot map to the same cache set) Each enclaveis only able to access its own partition Speculation issimply disabled when enclave interacts with the outsideworld because of small performance influence based on therare cases of speculation This cache listed extra softwareassumptions as follows
Assumption 1 Software security monitor guarantees thatthe victim and the attacker process cannot share the samecache blocks It uses page coloring [24 44] to ensure thatvictimrsquos and attackerrsquos memory are never mapped to thesame cache blocks for the LLCAssumption 2 The software runs on a system with asingle processor core where victim and attacker alternateexecution but can never run truly in parallel Moreoversecurity critical data is always flushed by the securitymonitor when program execution switches away from thevictim program for the L1 cache and the TLBAssumption 3 When an enclave is interacting with theoutside environment the corresponding speculation isdisabled by the software
InvisiSpec Cache [52] is able to make speculation invisiblein the data cache hierarchy Before a visibility pointshows up when all of its prior control flow instructionsresolve unsafe speculative loads (USL) will be put intoa speculative buffer (SB) without modifying any cachestates When reaching the visibility point there are twocases In one case the USL and successive instructionswill be possibly squashed because of mismatch of datain the SB and the up-to-date values in the cache Inanother case the core receives possible invalidation fromthe OS before checking of memory consistency modeland no comparison is needed When speculative executionhappens the hardware puts the data into SB as to identifyvisibility point for dealing with final state transition of thespeculative execution InvisiSpec cache targets on Spectre-like attacks and futuristic attacks However InvisiSpeccache is vulnerable to all non-speculative side channels
CATalyst Cache [29] uses partitioning especially CacheAllocation Technology (CAT) [21] available in the LLCof some Intel processors CAT allocates up to 4 differentClasses of Services (CoS) for separate cache ways sothat replacement of cache blocks is only allowed withina certain CoS CATalyst first uses CAT mechanism topartition caches into secure and non-secure parts (non-secure parts may map to 3 CoS while secure parts map
J Hardw Syst Secur (2019) 3397ndash425408
to 1 CoS) Secure pages are assigned to virtual machines(VMs) at a granularity of a page and not shared by morethan one VM Here attacker and victim reside in differentVMs Combined with CAT technology and pseudo-lockingmechanism which pins certain page frames managed bysoftware CATalyst guarantees that malicious code cannotevict secure pages CATalyst implicitly performs preloadingby remapping security-critical code or data to secure pagesFlushes can only be done within each VM And cachecoherence is achieved by assigning secure pages to only oneprocessor and not sharing pages among VMs This cachelisted extra software assumptions as follows
Assumption 1 Security critical data is always preloadedinto the cache at the beginning of the whole programexecutionAssumption 2 Security critical data is always able to fitwithin the secure partition of the cache That is all datain the range x can fit in the secure partitionAssumption 3 The victim and the attacker process cannotshare the same memory spaceAssumption 4 Use pseudo-locking mechanism by soft-ware to make sure that victim and attacker process cannotshare the same cache blocksAssumption 5 Secure pages are reloaded immediatelyafter the flush which is done by the virtual machinemonitor (VMM) to make sure all the secure pages are stillpinned in the secure partition
DAWG Cache [25] (Dynamically Allocated Way Guard)partitions the cache by cache ways and provides fullisolation for hits misses and metadata updates acrossdifferent protection domains (between the attacker and thevictim) DAWG cache is partitioned for the attacker and thevictim and each of them keep their own different domain id(which is similar to process ID used in general caches) Eachdomain id has its own bit fields one is called policy fillmapfor masking fills and selecting the victim to replace anotheris called policy hitmap for masking hit ways Only boththe tag and the domain id are the same will a cache hithappen Therefore DAWG allows read-only cache lines tobe replicated across ways for different protection domainFor a cache miss the victim can only be chosen withinthe ways belonging to the same domain id recorded bythe policy fillmap Consistently the replacement policy isupdated with the victim selection and the metadata derivedfrom the policy fillmap for different domains is updatedas well The paper also proposes the idea to dynamicallypartitions the cache ways following the systemrsquos workloadchanges but does not actually implement it
RIC Cache [22] (Relaxed Inclusion Caches) proposes a low-complexity cache to defend against eviction-based timing-based side-channel attacks on the LLC Normally for an
inclusive cache if the data R is in the LLC it is also inthe higher level cache and eviction of the R in the LLCwill cause the same data in the higher level cache eg L1cache to be invalidated making eviction-based attacks inthe higher level cache possible (eg attacker is able to evictvictimrsquos security critical cache line) For RIC each cacheline is extended with a single bit to set the relaxed inclusionOnce the relaxed inclusion is set for that cache line thecorresponding LLC line eviction will not cause the sameline in the higher-level cache to be invalidated Two kinds ofdata will be set relaxed inclusion bit read only data andthread private data when they are loaded into the cacheThese two kinds of data are claimed by the paper to cover allthe critical data for ciphers Therefore RIC will not preventwritable in-private critical data which is currently not foundin any ciphers Apart from that RIC requires flushing for thecorresponding cache lines in the cases that the RIC bits aremodified or for thread migration events to avoid the timingleakage during transition time
PL Cache [49] provides isolation by partitioning the cachebased on cache blocks It extends each cache block with aprocess ID and a lock status bit The process ID and thelock status bits are controlled by the extended load and storeinstructions (ld lockld unlock and st lockst unlock)which allow the programmer and compiler to set or reset thelock bit through use of the right load or store instructionIn terms of cache replacement policy for a cache hit PLcache will perform the normal cache hit handling procedureand the instructions with locking or unlocking capability canupdate the process ID and the lock status bits while the hit isprocessed When there is a cache miss locked data cannotbe evicted by data that is not locked and locked data amongdifferent processes cannot be evicted by each other In thiscase the new data will be either loaded or stored withoutcaching In other cases data eviction is possible This cachelisted extra software assumption as follows
Assumption 1 Security critical data is always preloadedinto the cache at the beginning of the whole programexecution
RP Cache [49] uses randomization to de-correlate thememory address accessing and timing of the cache Foreach block of RP cache there is a process ID and oneprotection bit P set to indicate if this cache block needs to beprotected or not A permutation table (PT) stores each cachesetrsquos pre-computed permuted set number and the numberof tables depends on number of protected processes Formemory access operations cache hits need both process IDand address to be the same When a cache miss happensto data D of a cache set S if the to-be-evicted data andto-be-brought-in data belong to the same process but have
J Hardw Syst Secur (2019) 3397ndash425 409
different protection bit arbitrary data of a random cache setSprime will be evicted and D will be accessed without cachingIf they belong to different processes D will be stored in anevicted cache block of Sprime and mapping of S and Sprime will beswapped as well Otherwise the normal replacement policyis executed
Newcache Cache [31 50] dynamically randomizesmemory-to-cache mapping It introduced a ReMappingTable (RMT) and the mapping between memory addressesand this RMT is as in a direct mapped cache while themapping between the RMT and actual cache is fully asso-ciative The index bits of memory address are used to lookup entries in the RMT to find the cache block that shouldbe accessed It stores the most useful cache lines rather thanhold a fixed set of cache lines This index stored in RMTcombined with the process ID is used to look up the actualcache where each cache line is associated with its real indexand process ID Each cache block is also associated witha protection bit (P) to indicate if it is security critical Forcache replacement policy it is very similar to RP cacheCache hit needs both process ID and address to be the sameWhen cache miss happens to data D arbitrary data willbe evicted and D will be accessed without caching if theybelong to the same process but either one of their protec-tion bit is set If the evicted data and brought-in data havedifferent process IDs D will randomly replace a cache linesince it is fully associative in the actual cache Otherwisethe normal replacement policy for direct mapped cache isexecuted
Random Fill Cache [30] de-correlates cache fills withthe memory access using random filling technique Newinstructions used by applications in Random Fill cachecan control if the requested data belongs to a normalrequest or a random fill request Cache hits are processedas in normal cache For the security critical data accessesof the victim a Nof ill request is executed and therequested data access will be performed without cachingMeanwhile on a Random Fill request arbitrary datafrom the range of addresses will be brought into the cacheIn the paper [30] the authors show that random fill ofspatially near data does not hurt performance For otherprocessesrsquo memory accesses and normal victimrsquos memoryaccesses Normal request will be used to achieve normalreplacement policy Victim and attacker are able to removevictimrsquos own security critical data including using clf lush
instructions or cache coherence protocol since the flush willnot influence timing-based side-channel attack prevention(the random filling technique is used for this)
CEASER Cache [38] is able to mitigate conflict-basedLLC timingndashbased side-channel attacks using address
encryption and dynamic remapping CEASER cache doesnot differentiate whom the address belongs to and whetherthe address is security critical When memory access triesto modify the cache state the address will first be encryptedusing Low-Latency BlockCipher (LLBC) [6] which notonly randomizes the cache set it maps but also scatters theoriginal possibly ordered and location-intensive addressesto different cache sets decreasing the probability of conflictmisses The encryption and decryption can be done withintwo cycles using LLBC Furthermore the encryption keywill be periodically changed to avoid key reconstructionThe periodic re-keying will cause the address remapping todynamically change
SCATTER Cache [51] uses cache set randomization toprevent timing-based attacks It builds upon two ideas Firsta mapping function is used to translate memory addressand process information to cache set indices the mappingis different for each program or security domain Secondthe mapping function also calculates a different index foreach cache way in a similar way to the skewed associativecaches [41] The mapping function can be keyed hash orkeyed permutation derivation functionmdasha different key isused for different application or security domain resultingin a different mapping from address to cache sets for eachSoftware (eg the operating system) is responsible formanaging the security domains and process IDs which areused to differentiate the different software and assign itdifferent keys for the mapping For the hardware extensiona cryptographic primitive such as hashing and an indexdecoder for each scattered cache way is added SCATTERcache also stores the index bits of the physical addressto efficiently perform lookups and writebacks There isalso one bit per page-table entry added to allow the kernelto communicate with the user space for security domainidentification
Non Deterministic Cache [23] uses cache access delay torandomize the relation between cache block access andcache access timing There is no differentiation of datacaching between different process ID or whether the datais secure or not A per-cache-block counter records theinterval of its data activeness and is increased on eachglobal counter clock tick when the data is untouched Whenthe counter reaches a predefined value the correspondingcache line will be invalidated Non Deterministic Cacherandomly sets the local countersrsquo initial value that is lessthan the maximum value of the global counter In thiscase the cache delay is changed to be randomized Cachedelay interval controlled by this non-deterministic executioncan lead to different cache hit and miss statistics becausethe invalidation is determined by the randomized counterof each cache line and therefore de-correlates any cache
J Hardw Syst Secur (2019) 3397ndash425410
access time from the address being accessed However theperformance degradation is tremendous
5 Analysis of the Secure Caches
In this section we manually evaluate the effectiveness ofthe 18 secure caches [7 10 12 22 23 25 27 29 30 3848ndash53 56 57] We analyze how well the different cachescan protect against the 72 types of vulnerabilities definedin Tables 2 and 3 which cover all the possible Strong(according to the definition in Section 3) cache timingndashbased vulnerabilities Following the analysis discuss whattypes of secure caches and features are best suited fordefending different types of timing-based attacks
51 Effectiveness of the Secure Caches AgainstTiming-Based Attacks
Tables 4 and 5 list the result of our analysis of whichcaches can prevent which types of attacks Some cachesare able to prevent certain vulnerabilities denoted by acheckmark and green color in the table For exampleSPlowast cache can defend against Vu Ad Vu (slow)(one type of Evict + Time [34]) vulnerability For someother caches and vulnerabilities the cache is not able toprevent the vulnerabilities and it is indicated by times and redcolor For example SecDCP cache cannot defend againstVu Va Vu (slow) (one type of Bernsteinrsquos Attack [3])vulnerability
Each cache is analyzed for each type of vulnerabilitylisted in Tables 2 and 3 A cache is judged to be able toprevent a type of cache timingndashbased vulnerability in threecases
1 A cache can prevent a timing attack if the timing ofthe last step in a vulnerability is always constant andthe attacker can never observe fast and slow timingdifference for the given set of three steps For instancein a regular set-associative cache the Vd Vu Aa
(fast) (one type of Flush + Reload [55]) vulnerabilitywill allow the attacker to know that address a mapsto secret u when the attacker observes fast timingcompared with observing slow timing in the other casesHowever in case of the RP cache [49] will make thetiming of the last step to be always slow because RPcache does not allow data of different processes toderive cache hit between each other
2 A cache can prevent a timing attack if the timingof last step is randomized and cannot have originalcorresponding relation between victimrsquos behavior andattackerrsquos observation For instance Ad Vu Ainv
d (fast) (one type of Prime + Probe Invalidation)
vulnerability when executed on a normal set-associativecache will allow the attacker to know that the addressd has the same index with secret u when observing fasttiming compared with slow timing in the other casesHowever when executing this attacks on the RandomFill cache [30] for example a slow timing will notdetermine that u and d have the same index as thesecret since in Random Fill cache u would be accessedwithout caching and another random data would becached instead
3 A cache can prevent a timing attack if it disallowscertain steps from the three-step model to be executedthus prevents the corresponding vulnerability Forinstance when PL cache [49] preloads and locks thesecurity critical data in the cache vulnerabilities suchas Ad Vu V inv
d (slow) (one type of Prime + TimeInvalidation) will not be possible since a preloadedlocked security critical data will not allow Ad in Step 1to replace it In this case Ad cannot be in the cache sothis vulnerability cannot be triggered in PL cache
From the security perspective the entries of the securecache in Tables 4 and 5 should have as many green coloredcells as possible If a cache design has any red cells thenit cannot defend against that type of vulnerabilitymdashattackerusing the timing-based vulnerability that corresponds to thered cell can attack the system
The third column in Tables 4 and 5 shows a normalset associative cache which cannot defend against anytype of timing-based vulnerabilities Meanwhile the lastcolumn of Tables 4 and 5 shows the situation where thecache is fully disabled As is expected the timing-basedvulnerabilities are eliminated and timing-based attacks willnot succeed Disabling caches however has tremendousperformance penalty Similarly second-to-last columnshows Nondeterministic Cache which totally randomizescache access time It can defend all the attacks but againwill have a tremendous cost to security when the applicationis complex
For each of the entry that shows the effectivenessof a secure cache against a vulnerability there are tworesults listed Left one is for normal execution and theright one is for speculative execution Some secure cachessuch as InvisiSpec cache target timing-based channels inspeculative execution For most of the caches that do notdifferentiate speculative execution and normal executionthe two sub-columns for each cache are the same
6 Secure Cache Techniques
Among the secure cache designs presented in the priorsection there are three main techniques that the caches
J Hardw Syst Secur (2019) 3397ndash425 411
Table4
Exi
stin
gse
cure
cach
esrsquo
prot
ectio
nag
ains
tal
lpo
ssib
letim
ing-
base
dvu
lner
abili
ties
with
last
step
tobe
mem
ory
acce
ssre
late
dop
erat
ions
Sin
gle
ina
gree
nce
llm
eans
this
cach
eca
npr
even
tth
eco
rres
pond
ing
vuln
erab
ility
i
na
pink
cell
mea
nsth
isca
che
can
prev
ent
the
corr
espo
ndin
gvu
lner
abili
tyin
som
ede
gree
A
timesin
are
dce
llm
eans
this
cach
eca
nnot
prev
ent
this
vuln
erab
ility
Fur
ther
mor
efo
rea
chca
che
we
anal
yze
norm
alex
ecut
ion
(lef
tcol
umn
unde
rth
eca
che
nam
e)an
dsp
ecul
ativ
eex
ecut
ion
(rig
htco
lum
nun
der
the
cach
ena
me)
a Dyn
amic
adju
stm
ento
fw
ays
for
diff
eren
tthr
eads
isas
sum
edto
bepr
oper
lyus
edac
cord
ing
toth
eru
nnin
gpr
ogra
mrsquos
cach
eus
age
bSo
me
soft
war
eas
sum
ptio
nslis
ted
inth
een
trie
sin
this
colu
mn
have
been
impl
emen
ted
byth
eca
chersquo
sre
late
dso
ftw
are
c Flus
his
disa
bled
but
cach
eco
here
nce
mig
htbe
used
todo
the
data
rem
oval
dFo
rL
1ca
che
and
TL
Bf
lush
ing
isdo
nedu
ring
cont
exts
witc
he T
hete
chni
ques
are
impl
emen
ted
inL
1ca
che
TL
Ban
dla
st-l
evel
cach
ew
hich
cons
isto
fth
ew
hole
cach
ehi
erar
chy
whe
reL
1ca
che
and
TL
Bre
quir
eso
ftw
are
flus
hpr
otec
tion
and
the
last
-lev
elca
che
can
beac
hiev
edby
sim
ple
hard
war
epa
rtiti
onin
gTo
prot
ecta
llle
vels
ofca
ches
the
soft
war
eas
sum
ptio
nsne
edto
bead
ded
f The
tech
niqu
eis
now
only
impl
emen
ted
inla
st-l
evel
cach
egT
hete
chni
que
now
only
targ
ets
shar
edca
che
hT
hete
chni
que
only
targ
ets
incl
usio
nla
st-l
evel
cach
ei T
hete
chni
que
targ
ets
data
cach
ehi
erar
chy
j For
the
last
-lev
elca
che
cach
eis
part
ition
edbe
twee
nth
evi
ctim
and
the
atta
cker
kT
hete
chni
que
can
cont
rolt
hepr
obab
ilitie
sof
the
vuln
erab
ility
tobe
succ
essf
ulto
beex
trem
ely
smal
ll T
hete
chni
que
can
wor
kin
shar
edr
ead
only
mem
ory
whi
leno
twor
king
insh
ared
wri
tabl
em
emor
ym
Ran
dom
dela
ybu
tnot
rand
omm
appi
ngca
non
lyde
crea
seth
epr
obab
ilitie
sof
atta
cker
inso
me
limite
dde
gree
J Hardw Syst Secur (2019) 3397ndash425412
Table5
Exi
stin
gse
cure
cach
esrsquop
rote
ctio
nag
ains
tall
poss
ible
timin
g-ba
sed
vuln
erab
ilitie
sw
ithla
stst
epto
bein
valid
atio
nre
late
dop
erat
ions
Sin
gle
ina
gree
nce
llm
eans
this
cach
eca
npr
even
tth
eco
rres
pond
ing
vuln
erab
ility
in
api
nkce
llm
eans
this
cach
eca
npr
even
tthe
corr
espo
ndin
gvu
lner
abili
tyin
som
ede
gree
Atimes
ina
red
cell
mea
nsth
isca
che
cann
otpr
even
tthi
svu
lner
abili
ty
Furt
herm
ore
for
each
cach
ew
ean
alyz
eno
rmal
exec
utio
n(l
eftc
olum
nun
der
the
cach
ena
me)
and
spec
ulat
ive
exec
utio
n(r
ight
colu
mn
unde
rth
eca
che
nam
e)
a Dyn
amic
adju
stm
ento
fw
ays
for
diff
eren
tthr
eads
isas
sum
edto
bepr
oper
lyus
edac
cord
ing
toth
eru
nnin
gpr
ogra
mrsquos
cach
eus
age
bSo
me
soft
war
eas
sum
ptio
nslis
ted
inth
een
trie
sin
this
colu
mn
have
been
impl
emen
ted
byth
eca
chersquo
sre
late
dso
ftw
are
c Flus
his
disa
bled
but
cach
eco
here
nce
mig
htbe
used
todo
the
data
rem
oval
dFo
rL
1ca
che
and
TL
Bf
lush
ing
isdo
nedu
ring
cont
exts
witc
he T
hete
chni
ques
are
impl
emen
ted
inL
1ca
che
TL
Ban
dla
st-l
evel
cach
ew
hich
cons
isto
fth
ew
hole
cach
ehi
erar
chy
whe
reL
1ca
che
and
TL
Bre
quir
eso
ftw
are
flus
hpr
otec
tion
and
the
last
-lev
elca
che
can
beac
hiev
edby
sim
ple
hard
war
epa
rtiti
onin
gTo
prot
ecta
llle
vels
ofca
ches
the
soft
war
eas
sum
ptio
nsne
edto
bead
ded
f The
tech
niqu
eis
now
only
impl
emen
ted
inla
st-l
evel
cach
egT
hete
chni
que
now
only
targ
ets
shar
edca
che
hT
hete
chni
que
only
targ
ets
incl
usio
nla
st-l
evel
cach
ei T
hete
chni
que
targ
ets
data
cach
ehi
erar
chy
j For
the
last
-lev
elca
che
cach
eis
part
ition
edbe
twee
nth
evi
ctim
and
the
atta
cker
kT
hete
chni
que
can
cont
rolt
hepr
obab
ilitie
sof
the
vuln
erab
ility
tobe
succ
essf
ulto
beex
trem
ely
smal
ll T
hete
chni
que
can
wor
kin
shar
edr
ead
only
mem
ory
whi
leno
twor
king
insh
ared
wri
tabl
em
emor
ym
Ran
dom
dela
ybu
tnot
rand
omm
appi
ngca
non
lyde
crea
seth
epr
obab
ilitie
sof
atta
cker
inso
me
limite
dde
gree
J Hardw Syst Secur (2019) 3397ndash425 413
utilize differentiating sensitive data partitioning andrandomization
Differentiating Sensitive Data (columns for CATalyst cacheto columns for Random Fill cache in Tables 4 and 5) allowsthe victim or attacker software or management software toexplicitly label a certain range of the data of victim whichthey think is sensitive The victim process or managementsoftware is able to use cache-specific instructions to protectthe data and limit internal interference between victimrsquosown data For example it is possible to disable victimrsquosown flushing of victimrsquos labeled data and therefore preventvulnerabilities that leverage flushing This technique allowsthe designer to have stronger control over security criticaldata rather than forcing the system to assume all of victimrsquosdata is sensitive However how to identify sensitive dataand whether this identification process is reliable are openresearch questions for caches that support differentiation ofsensitive data
This technique is independent of whether a cache usespartitioning or randomization techniques to eliminate sidechannels between the attacker and the victim Caches thatare able to label and identify sensitive data have theadvantage in preventing internal interference since they areable to differentiate sensitive data from the normal dataand can make use of special instructions to give moreprivileges to sensitive data However it requires careful usewhen identifying the actual sensitive data and implementingcorresponding security features on the cache
Comparing PL cache with SPlowast cache although bothof them use partitioning flush is able to be implementedto be disabled for victimrsquos sensitive data in PL cachewhere Vu V inv
a Vu (slow) (one type of Flush +Time) is prevented Newcache is able to prevent Vu Va Vu (slow) (one type of Bernsteinrsquos Attack [3])while most of the caches without ability to differentiatesensitive data cannot because Newcache disallows replacingdata as long as either data to be evicted or data to becached is identified to be sensitive However permittingdifferentiation of sensitive data can potentially backfire onthe cache itself For example Random Fill cache cannotprevent Vu Ad Vu (slow) (one type of Evict +Time [34]) which most of the other caches can prevent oravoid because the random fill technique loses its intendedrandom behavior when the security critical data is initiallyloaded into the cache in Step 1
Partitioning-Based Caches usually limit the victim and theattacker to be able to only access a limited set of cacheblock (columns for SPlowast cache to column for PL cachein Tables 4 and 5) For example either there is staticor dynamic partitioning of caches which allocates someblocks to High victim and Low attacker The partitioning
can be based not just on whether the memory access isvictimrsquos or attackerrsquos but also on where the access is to(eg High partition is determined by the data address) Forspeculative execution attackerrsquos code can be the part ofspeculation or out-of-order load or store which is able to bepartitioned (eg using speculative load buffer) from othernormal operations The partitioning granularity can be cachesets cache lines or cache ways Partitioning-based securecaches are usually able to prevent external interference bypartitioning but are weak at preventing internal interferenceWhen partitioning is used interference between the attackerand the victim or data belonging to different securitylevels should not be possible and attacks based on externalinterference between the victim and the attacker will failHowever the internal interference of victimrsquos own datais hard to be prevented by the partitioning-based cachesWhatrsquos more partitioning is recognized to be wastefulin terms of cache space and inherently degrades systemperformance [49] Dynamic partitioning can help limit thenegative performance and space impacts but it could be ata cost of revealing some information when adjusting thepartitioning size for each part It also does not help withinternal interference prevention
In terms of the three-step model the partitioning-basedcaches excel at making use of partitioning techniques todisallow the attacker to set initial states (Step 0) of victimpartition by use of flushing or eviction and therefore bringuncertainty to the final timing observation made by theattacker
SPlowast cache can prevent external miss-based interferencebut it still allows the victim and the attacker to getcache hits due to each otherrsquos data which makes hit-based vulnerabilities happen eg Vd Vu Va (fast)(one type of Cache Internal Collision [5]) vulnerabilityis one of the examples that SPlowast cache cannot preventSecVerilog cache is similar to SPlowast cache but prevents theattacker from directly getting cache hit due to victimrsquos datafor confidentiality and therefore prevents vulnerabilitiessuch as Ainv
a Vu Aa (fast) (one type of Flush+ Reload [55]) SHARP cache mainly uses partitioningcombined with random eviction to minimize the probabilityof evicting victimrsquos data and prevent external miss-basedvulnerabilities It is vulnerable to hit-based or internalinterference vulnerabilities such as Vu Va Vu (slow)(one type of Bernsteinrsquos Attack [3]) vulnerability DAWGcache will only allow the data to get a cache hit if bothits address and the process ID are the same Thereforecompared with normal partitioning cache such as SPlowast cacheit is able to prevent vulnerabilities such as Vd Vu Ainv
d
(fast) (one type of Prime + Flush)SecDCP and NoMo cache both leverage dynamic
partitioning to improve performance Compared withSecVerilog cache SecDCP cache introduces certain side
J Hardw Syst Secur (2019) 3397ndash425414
channels which manifest themselves when the number ofways assigned to the victim and attacker changes egVu Ainv
a Vu (slow) (one type of Flush + Time)vulnerability NoMo cache behaves more carefully whenchanging the number of ways during dynamic partitioninghowever it requires victimrsquos sensitive data to fit intothe assigned partitions otherwise it will be put into theunreserved way and allow eviction by the attacker SecDCPdoes not have unreserved way All the space in the cachewill be either belongs to High or Low partition
Sanctum cache and CATalyst cache are both controlledby a powerful software monitor and they disallow securepage sharing between victim and attacker to preventvulnerabilities such as Ad Vu Aa (fast) (onetype of Flush + Reload [55]) Sanctum cache does notconsider internal interference while CATalyst cache ismore carefully designed to prevent different vulnerabilitieswith the implemented software system so far supportingpreventing all of the vulnerabilities but only works forLLC and with high software implementation complexityand some assumptions that might be hard to achievein other scenarios eg assuming the secure partition isbig enough to fit all the secure data MI6 cache is thecombination of Sanctum and disabling speculation wheninteracting with the outside world Therefore in normalexecution it behaves the same as Sanctum For speculativeexecution because it will simply disable all the speculationwhen involving the outside world the external interferencevulnerability such as Vd Vu Ad (slow) (one type ofEvict + Probe) vulnerability will be prevented
InvisiSpec cache does not modify the original cachestate but places the data in a speculative buffer partitionduring the speculation or out-of-order load or store Sinceduring speculation cache state is not actually updated thespeculative execution cannot trigger any of the steps inthe three-step model RIC cache focuses on eviction basedattack and therefore are good at preventing even someinternal miss-based vulnerability such as Vu Va Vu
(slow) (one type of Bernsteinrsquos Attack [3]) but are bad atall hit-based vulnerabilities PL cache is line-partitionedand uses locking techniques for victimrsquos security criticaldata It can prevent many vulnerabilities because preloadingand locking secure data disallow the attacker or non-securevictim data to set initial states (Step 0) for victim partitionand therefore brings uncertainty to the final observation bythe attacker eg Ad Vu Va (fast) (one type of CacheInternal Collision [5]) vulnerability is prevented
Randomization-Based Caches (columns for SHARP cacheand columns for RP cache to columns for Non Deterministiccache in Tables 4 and 5) inherently de-correlate therelationship between information of victimrsquos securitycritical datarsquos address and observed timing from cache
hit or miss or between the address and observed timingof flush or cache coherence operations For speculativeexecution they also de-correlate the relationship betweenthe address of the data being accessed during speculativeexecution or out-of-order load or store and the observedtiming from a cache hit or miss Randomization can beused when bringing data into the cache evicting data orboth Some designs randomize the address to cache setmapping As a result of the randomization the mutualinformation from the observed timing due to having ornot having data in the cache could be reduced to 0 ifrandomization is done on every memory access Somesecure caches use randomization to avoid many of the miss-based internal interference vulnerabilities However theymay still suffer from hit-based vulnerabilities especiallywhen the vulnerabilities are related to internal interferenceHowever randomization is also likewise recognized toincrease performance overheads [23] It also requires afast and secure random number generator Most of therandomization is cache-line-based and can be combinedwith differentiation of sensitive data to be more efficient
RP cache allows eviction between different sensitivedata which leaves vulnerabilities such as Vu Va Vu
(slow) (one type of Bernsteinrsquos Attack [3]) still possiblewhile Newcache prevents this Both of the RP cacheand Newcache are not able to prevent hit-based internal-interference vulnerabilities such as Ainv
a Vu Va
(fast) (one type of Cache Internal Collision [5]) RandomFill cache is able to use total de-correlation of memoryaccess and cache access of victimrsquos security critical datato prevent most of the internal and external interferenceHowever when security critical data is initially directlyloaded into the cache block for Step 1 Random Fill cachewill not randomly load security critical data and allowsvulnerabilities such as Vu V inv
a Vu (slow) (one typeof Flush + Time) vulnerability to exist CEASER cache usesencryption scheme plus dynamic remapping to randomizemapping from memory addresses to cache sets Howeverthis targets eviction-based attacks and cannot prevent hit-based vulnerabilities such as Va V inv
u V inva (fast)
(one type of Flush + Probe Invalidation) SCATTER cacheencrypts both the cache address and process ID whenmapping into different cache index to further prevent morehit-based vulnerabilities for shared and read only memoryNon Deterministic Cache totally randomizes timing ofcache accesses by adding delays and can prevent all attacks(but at tremendous performance cost)
61 Estimated Performance and Security Tradeoffs
Table 6 shows the implementation and performanceresults of the secure caches as listed by the designersin the different papers At the extreme end there is
J Hardw Syst Secur (2019) 3397ndash425 415
Table6
Exi
stin
gse
cure
cach
esrsquo
impl
emen
tatio
nm
etho
dpe
rfor
man
cep
ower
and
area
sum
mar
y
J Hardw Syst Secur (2019) 3397ndash425416
the Non Deterministic cache with random delay thesecure cache can prevent all the cache timingndashbasedvulnerabilities in some degreemdashwhile their paper reportsonly 7 degradation in performance we expect it tobe much more for more complex application than AESalgorithm Disabling caches eliminates the attacks but ata huge performance cost Normally a secure cache needsto sacrifice some performance in order to de-correlatememory access with the timing The secure caches thattend to be able to prevent more vulnerabilities usuallyhave weaker performance compared with other securecaches For example more security seems to imply lessperformance
62 Towards Ideal Secure Cache
Based on the above analysis a good secure cache shouldconsider all the 72 types of Strong vulnerabilities egexternal and internal interference and hit-based and miss-based vulnerabilities Considering all factors and basedon Tables 4 and 5 we have several suggestions andobservations for a secure cache design which can defendtiming-based attacks
bull Internal interference is important for caches to preventtiming-based attacks and is the weak point of most ofthe secure caches To prevent this the following threesubpoints should be considered
ndash Miss-based internal interference can be solvedby randomly evicting data to de-correlatememory access with timing information wheneither data to be evicted or data to be cachedis sensitive eg Newcache prevents Vu Va Vu (slow) (one type of BernsteinrsquosAttack [3]) vulnerability
ndash Hit-based internal interference can be solvedby randomly bringing data into the cache egRandom Fill cache prevents Ad Vu Va
(fast) (Cache Internal Collision) vulnerabilityndash To limit internal interference at lower perfor-
mance cost rather than simply assume all ofvictimrsquos data is sensitive it is better to differ-entiate real sensitive data from other data in thevictim code However identification of sen-sitive information needs to be carefully usedeg Random Fill cache is vulnerable to Vu Ad Vu (fast) (one type of Evict + Time [34])vulnerability which most of the secure cachesare able to prevent
bull Direct partitioning between the victim and the attackeralthough may hurt cache space utilization or perfor-mance is good at disallowing attacker to set known
initial state to victimrsquos partition and therefore pre-vents external interference Alternatively careful use ofrandomization can also prevent external interference
It should be noted that some cache designs only focuson certain levels eg CATalyst cache only works at thelast level cache In order to fully protect the whole cachesystem from timing-based attacks all levels of caches in thehierarchy should be protected with related security featuresFor example Sanctum is able to prevent all levels of cachesfrom L1 to last-level cache Consequently secure cachedesign needs to be realizable at all levels of the cachehierarchy
7 RelatedWork
There are a lot of existing attacks exploring timing-based cache channels eg [1 3 5 16ndash19 34 36 5455] Furthermore our recent paper [11] has summarizedcache timingndashbased side-channel vulnerabilities using athree-step model and inspired this work on checkingwhich vulnerability types are truly defeated by the securecaches in context of timing-based attacks In other workZhang and Lee [59] used finite-state machine to modelcache architectures and leveraged mutual information tomeasure potential side-channel leakage of the modeledcache architectures Meanwhile He and Lee [20] modeledinterference using probabilistic information flow graph andused attackerrsquos success probability to estimate differentcachesrsquo ability to defend against some cache timingndashbasedside-channel attacks However they did not explore allpossible vulnerabilities due to cache timingndashbased channels
There is also some other work focusing on cacheside-channel verification [13 14 47] Among theseCacheAudit [14] efficiently computes possible side-channelobservations using abstractions in a modular way Bit-level and arithmetic reasoning is used in [13] for memoryaccesses in the presence of dynamic memory allocationCacheD [47] detects potential cache differences at eachprogram point leveraging symbolic execution and constraintsolving
Hardware transactional memory has also been lever-aged to prevent timing-based cache side-channel attacks [815] Hardware transactional memory (HTM) is availableon modern commercial processors such as Intelrsquos Transac-tional Synchronization Extensions (TSX) Its main featureis to abort the transaction and roll back the modificationswhenever a cache block contained in the read set or writeset is evicted out of the cache In [15] HTM was com-bined with preloading strategy for code and data to preventFlush + Reload attacks in the local setting and Prime andProbe attacks in the cloud setting In [8] the software-level
J Hardw Syst Secur (2019) 3397ndash425 417
solution targets system calls page faults code refactoringand abort reasoning to eliminate not only Prime + ProbeFlush + Reload but also Evict + time and Cache Collisionattacks
8 Conclusion
This paper first proposed a new three-step model in orderto model all possible cache timing vulnerabilities It furtherprovided a cache three-step simulator and reduction rulesto derive effective vulnerabilities allowing us to find onesthat have not been exploited in literature With exhaustiveeffective vulnerability types listed this paper presentedanalysis of 18 secure processor cache designs with respectto how well they can defend against these timing-basedvulnerabilities Our work showed that vulnerabilities basedon internal interference of the victim application aredifficult to protect against and many secure cache designsfail in this We also provided a summary of secure processorcache features that could be integrated to make an idealsecure cache that is able to defend timing-based attacksOverall implementing a secure cache in a processor can be aviable alternative to defend timing-based attacks Howeverit requires design of an ideal secure cache or correction ofexisting secure cache designs to eliminate the few attacksthat they do not protect against
Acknowledgments This work was support in part by NSF grants1651945 and 1813797 and through Semiconductor Research Corpo-ration (SRC) under GRC Task 2844001
Appendix A Attack Strategies Descriptions
This appendix gives overview of the attack strategies shownin Tables 2 and 3 in Section 3 For each attack strategy anoverview of the three steps of the strategy is given Some ofthe strategies are similar and some may not be precise butwe keep and use the original names as they were assignedin prior work One advantage of our three-step model isthat it gives precise definition of each attack Neverthelessthe attack strategy names used before (and added by us forstrategies which did not have such names) may be useful torecall the attacksrsquo high-level operation
Cache Internal Collision In Step 1 cache blockrsquos data isinvalidated by flushing or eviction done by either theattacker or the victim Then the victim accesses secret datain Step 2 Finally the victim accesses data at a knownaddress in Step 3 if there is a cache hit then it reveals thatthere is an internal collision and leaks value of u
Flush + Reload In Step 1 either the attacker or the victiminvalidates the cache blockrsquos data by flushing or evictionThen the victim access secret data in Step 2 Finally theattacker tries to access some data in Step 2 using a knownaddress If a cache hit is observed then addresses from lasttwo steps are the same and the attacker learns the secretaddress This strategy has similar Step 1 and Step 2 asCache Internal Collision vulnerability but for Step 3 it isthe attacker who does the reload access
Reload + Time (New Name Assigned in this Paper) In Step
1 secret data is invalidated by the victim Then the attackerdoes some known data access in Step 2 that could possiblybring back the invalidated the victimrsquos secret data in Step
1 In Step 3 if the victim reloads the secret data a cachehit is observed and the attacker can derive the secret datarsquosaddress
Flush + Probe (New Name Assigned in this Paper) In Step
1 the victim or the attacker access some known addressIn Step 2 the victim invalidates secret data In Step 3reloading of Step 1rsquos data and observation of a cache misswill help the attacker learn that the secret data maps to theknown address from Step 1
Evict + Time In Step 1 some victimrsquos secret data is put intothe cache by the victim itself In Step 2 the attacker evicts aspecific cache set by performing a memory related operationthat is not a flush In Step 3 the victim reloads secret dataand if a cache miss is observed the will learn the secretdatarsquos cache set information This attack has similar Step 1and Step 3 as Flush + Time vulnerability but for Step 2 inEvict + Time the attacker invalidates some known addressallowing it to find the full address of the secret data insteadof evicting a cache set to only find the secret datarsquos cacheindex as in the Flush + Time attack
Prime + Probe In Step 1 the attacker primes the cache setusing data at address known to the attacker In Step 2 thevictim accesses the secret data which possibly evicts datafrom Step 1 In Step 3 the attacker probes each cache setand if a cache miss is observed the attacker knowns thesecret data maps to the cache set he or she primed
Bernsteinrsquos Attack This attack strategy leverages the vic-timrsquos internal interference to trigger the miss-based attackFor one case the victim does the same secret data access inStep 1 and Step 3 while in Step 2 the victim tries to evictone whole cache setrsquos data by known data accesses If cachemiss is observed in Step 3 that will tell the attacker thecache set is the one secret data maps to For another casethe victim primes and probe a cache set in Step 1 and Step
3 driven by the attacker while in Step 2 the victim tries to
J Hardw Syst Secur (2019) 3397ndash425418
access the secret data Similar to the first case observingcache miss in Step 3 tells the attacker the cache set is theone secret data maps to
Evict + Probe (NewName Assigned in this Paper) In Step 1victim evict the cache set using the access to a data at anaddress known to the attacker In Step 2 the victim accessessecret data which possibly evicts data from Step 1 In Step
3 the attacker probes each cache set using the same data asin Step 1 if a cache miss is observed the attacker knownsthe secret data maps to the cache set he or she primed Thisattack strategy has similar Step 2 and Step 3 as Prime +Probe attack but for Step 1 it is the victim that does theeviction accesses
Prime + Time (NewName Assigned in this Paper) In Step 1the attacker primes the cache set using access to data at anaddress known to the attacker In Step 2 the victim accessessecret data which possibly evicts data from Step 1 In Step
3 the victim probes each cache set using the same data Step
1 if a cache miss is observed the attacker knowns the secretdata maps to the cache set he or she primed in Step 1 Thisattack strategy has similar Step 1 and Step 2 as Prime +Probe attack but for Step 3 it is the victim that does theprobing accesses
Flush + Time (NewNameAssigned in this Paper) The victimaccesses the same secret data in Step 1 and Step 3 whilein Step 2 the attacker tries to invalidate data at a knownaddress If cache miss is observed in Step 3 that will tellthe attacker the data address he or she invalidated in Step 2maps to the secret data
Invalidation related (new names assigned in this paper)Vulnerabilities that have names ending with ldquoinvalidationrdquoin Table 3 correspond to the vulnerabilities that havethe same name (except for the ldquoinvalidationrdquo part) inTable 2 The difference between each set of correspondingvulnerabilities is that the vulnerabilities ending withldquoinvalidationrdquo use invalidation related operation in the laststep to derive the timing information rather than the normalmemory access related operations
Appendix B Soundness Analysisof the Three-StepModel
In this section we analyze the soundness of the three-stepmodel to demonstrate that the three-step model can coverall possible timing-based cache vulnerabilities in normalcaches If there is a vulnerability that is represented usingmore than three steps the steps can be reduced to only three
steps or a three-step sub-pattern can be found in the longerrepresentation
In the below analysis we use β to denote the numberof memory-related operations ie steps in a representationof a vulnerability We show that β = 1 is not sufficient torepresent a vulnerability β = 2 covers some vulnerabilitiesbut not all β = 3 represents all the vulnerabilities andβ gt 3 can be reduced to only three steps or a three-step sub-pattern can be found in the longer representation Knownaddresses refer to all the cache states that interferencewith the data a aalias and d Unknown address refersu An access to a known memory address is denoted asknown access operation and an invalidation of a knownmemory address is denoted as known inv operationThe known access operation and known inv operation
together make up not u operations An unknown memoryrelated operation (containing u) is denoted as u operation
B1 Patterns with β = 1
When β = 1 there is only one memory-relatedoperation and it is not possible to create interferencebetween memory-related operations since two memory-related operations are the minimum requirement for aninterference Furthermore β = 1 corresponds to the three-step pattern with both Step 1 and Step 2 being since thecache state gives no information and Step 3 being theone operation These types of patterns are all examined bythe cache three-step simulator and none of these types arefound to be effective Consequently a vulnerability cannotexit when β = 1
B2 Patterns with β = 2
When β = 2 it satisfies the minimum requirementof an interference for memory related operations andcorresponds to the three-step cases where Step 1 is and Step 2 and Step 3 are the two operations Thesetypes are all examined by the cache three-step simulatorand some of them belong to Weak Vulnerabilities like Aa Vu Therefore three-step cases where Step
1 is have corresponding effective vulnerabilities shownin Table 2 Consequently β = 2 can represent someweak vulnerabilities but not all vulnerabilities as there existsome that are represented with three steps as discussednext
B3 Patterns with β = 3
When β = 3 we have tested all possible combinations ofthree-step memory related operations in Section 33 usingour cache simulator for the three-step model We foundthat there are in total 72 types of Strong Vulnerabilities
J Hardw Syst Secur (2019) 3397ndash425 419
and 64 types of Weak Vulnerabilities that are representedby patterns with β = 3 steps Consequently β = 3can represent all the vulnerabilities (including some weakones where Step 1 is ) Using more steps to representvulnerabilities is not necessary as discussed next
B4 Patterns with β gt 3
When β gt 3 the pattern of memory-related operations fora vulnerability can be reduced using the following rules
B41 Subdivision Rules
First a set of subdivision rules is used to divide the longpattern into shorter patterns following the below rulesEach subdivision rule should be applied recursively beforeapplying the next rule
Subdivision Rule 1 If the longer pattern contains a sub-pattern such as the longer pattern canbe divided into two separate patterns where is assignedas Step 1 of the second pattern This is because givesno timing information and the attacker loses track of thecache state after This rule should be recursively applieduntil there are no sub-patterns left with a in the middleor as last step ( in the last step will be deleted) in thelonger pattern
Subdivision Rule 2 Next if a pattern (derived afterrecursive application of the Rule 1 contains a sub-patternsuch as AinvVinv the longer pattern canbe divided into two separate patterns where AinvVinv isassigned as Step 1 of the second pattern This is becauseAinvVinv will flush all the timing information of thecurrent block and it can be used as the flushing step forStep 1 eg vulnerability Ainv Vu Aa(f ast)shown in Table 2 AinvVinv cannot be a candidate formiddle steps or the last step because it will flush alltiming information making the attacker unable to deducethe final timing with victimrsquos sensitive address translationinformation This rule should be recursively applied untilthere are no sub-patterns left with a AinvVinv in themiddle or the last step (AinvVinv in the last step will bedeleted)
B42 Simplification Rules
For each of the patterns resulting from the subdivisionof the original pattern we define Commute Rules UnionRules and Reduction Rules for a each set of two adjacentsteps in these remaining patterns In Table 7 we showall the possible cases of the rule applying conditions for
each adjacent two steps regardless of the attackerrsquos access(A) or the victimrsquos access (V ) The table shows whetherthe corresponding two steps can be commuted reduced orunioned (and the reduced or the unioned result if the rulescan be applied)
B421 Commute Rules
Suppose there are two adjacent steps M and N for a memorysequences M N If commuting M andN lead to the same observation result ie M N and N M will have thesame timing observation information in the final step for theattacker we can freely exchange the place of M and N inthis pattern In this case we have more chance to Reduceand Union the steps within the memory sequence by thefollowing Rules In the possible commuting process we willtry every possible combinations to commute different pairsof two steps that are able to apply the Commute Rules andthen further apply Reduce Rules and Union Rules to seewhether the commute is effective ie there can be stepsreduced or unioned after the proper commuting process Thefollowing two adjacent memory-related operations can becommuted
ndash Commute Rule 1 For two adjacent steps if one stepis a known access operation and another step is aknown inv operation and the addresses they referto are different these two steps can be commuted nomatter which position of the two steps they are in withinthe whole memory sequence It will show a ldquoyesrdquo forthe corresponding two-step pattern for the CommuteRule 1 column if these two can be commuted in Table 7
ndash Commute Rule 2 A superset of two-step patterns thatcan apply Commute Rule 1 can be commuted if thesecond step of these two adjacent steps is not thelast step in the whole memory sequence There aresome two adjacent steps that can only be commutedif the second step of these two adjacent steps is notthe last step in the whole memory sequence Therewill be a ldquoyesrdquo for the corresponding two-step patternfor the Commute Rule 2 column and a ldquonordquo for thecorresponding two-step pattern for the Commute Rule 1column in Table 7
B422 Reduction Rules
If the memory sequence after applying Commute Ruleshave a sub-pattern that has two adjacent steps both relatedto known addresses or both related to unknown address(including repeating states) the two adjacent steps can bereduced to only one following the reduction rules (if thetwo-step pattern has ldquoyesrdquo for the Column ldquoUnion Rule or
J Hardw Syst Secur (2019) 3397ndash425420
Table7
Rul
esfo
rco
mbi
ning
two
adja
cent
step
s
Firs
tSe
cond
Com
mute
Com
mute
Union
Rule
orC
ombi
ned
Firs
tSe
cond
Com
mute
Com
mute
Union
Rule
orC
ombi
ned
Rule1
Rule2
ReduceRule
Step
Rule1
Rule2
ReduceRule
Step
aa
yes
yes
yes
aa
inv
ano
noye
sa
aa
ali
as
nono
yes
aali
as
ain
va
ali
as
yes
yes
yes
aali
as
ad
nono
yes
da
inv
dye
sye
sye
sd
au
nono
nominus
ain
vu
nono
nominus
aa
inv
nono
yes
ain
va
inv
ain
vye
sye
sye
sa
inv
aa
ali
asin
vye
sye
sye
sa
ain
va
ali
asin
vye
sye
sye
sU
nio
n(a
inva
ali
asin
v)
ad
inv
yes
yes
yes
aa
inv
din
vye
sye
sye
sU
nio
n(a
invd
inv)
au
inv
nono
nominus
ain
vu
inv
noye
sno
minusa
ali
as
ano
noye
sa
aali
asin
va
yes
yes
yes
a
aali
as
aali
as
yes
yes
yes
aali
as
aali
asin
va
ali
as
nono
yes
aali
as
aali
as
dno
noye
sd
aali
asin
vd
yes
yes
yes
d
aali
as
uno
nono
minusa
ali
asin
vu
nono
nominus
aali
as
ain
vye
sye
sye
sa
ali
as
aali
asin
va
inv
yes
yes
yes
Unio
n(a
inva
ali
asin
v)
aali
as
aali
asin
vno
noye
sa
ali
asin
va
ali
asin
va
ali
asin
vye
sye
sye
sa
ali
asin
v
aali
as
din
vye
sye
sye
sa
ali
as
aali
asin
vd
inv
yes
yes
yes
Unio
n(d
inva
ali
asin
v)
aali
as
uin
vno
nono
minusa
ali
asin
vu
inv
noye
sno
minusd
ano
noye
sa
din
va
yes
yes
yes
a
da
ali
as
nono
yes
aali
as
din
va
ali
as
yes
yes
yes
aali
as
dd
yes
yes
yes
dd
inv
dno
noye
sd
du
nono
nominus
din
vu
noye
sno
minusd
ain
vye
sye
sye
sd
din
va
inv
yes
yes
yes
Unio
n(a
invd
inv)
da
ali
asin
vye
sye
sye
sd
din
va
ali
asin
vye
sye
sye
sU
nio
n(d
inva
ali
asin
v)
dd
inv
nono
yes
din
vd
inv
din
vye
sye
sye
sd
inv
du
inv
noye
sno
minusd
inv
uin
vno
yes
nominus
ua
nono
nominus
uin
va
nono
nominus
ua
ali
as
nono
nominus
uin
va
ali
as
nono
nominus
ud
nono
nominus
uin
vd
noye
sno
minusu
uye
sye
sye
su
uin
vu
nono
yes
u
ua
inv
nono
nominus
uin
va
inv
noye
sno
minusu
aali
asin
vno
nono
minusu
inv
aali
asin
vno
yes
nominus
ud
inv
noye
sno
minusu
inv
din
vno
yes
nominus
uu
inv
nono
yes
uin
vu
inv
uin
vye
sye
sye
su
inv
J Hardw Syst Secur (2019) 3397ndash425 421
Reduce Rulerdquo and has no Union result for the ldquoCombinedSteprdquo Column in Table 7
ndash Reduction Rule 1 For two u operations although u isunknown both of the accesses target on the same u socan be reduced to only keep the second access in thememory sequence
ndash Reduction Rule 2 For two known adjacent memoryaccessndashrelated operations (known access operation)they always result in a deterministic state of the secondmemory accessndashrelated cache block so these two stepscan be reduced to only one step
ndash Reduction Rule 3 For two adjacent steps if onestep is known access operation and another one isknown inv operation no matter what order theyhave and the address they refer to is the same these twocan be reduced to one step which is the second step
B423 Union Rules
Suppose there are two adjacent steps M and N for memorysequences M N If combining M andN leads to the same timing observation result ie M N and Union(M N) will havethe same timing observation information in the final step forthe attacker we can combine step M and N to be a joint onestep for this memory sequence defined as Union(M N)Two adjacent steps that can be combined are discussed inthe following cases
ndash Union Rule 1 Two invalidations to two knowndifferent memory addresses can be applied UnionRule 1 known inv operation are two operations bothinvalidating some known address therefore they canbe combined to only one step The Union Rule can becontinuously done to union all the adjacent invalidationstep that invalidates known different memory addresses
B424 Final Check Rules
Each long memory sequence will recursively apply thesethree categorizations of the rules in the order Com-mute Rules first to put known access operations andknown inv operation that targets the same address as nearas possible and u operations and not u operations areputting together as much as possible The Reduced Rulesare then checked and applied to the processed memorysequence to reduce the steps Then the Union Rule isapplied to the processed memory sequence
The recursion at each application to these threecategorizations of the rules should be always applied andreduce at least one step until the resulting sequence matchesone of the two possible cases
bull the long (β gt 3) memory sequence with u operation
and not u operation is further reduced to a sequencewhere there are at most three steps in the followingpatterns or less
ndash u operation not u operation u operation
ndash not u operation u operation not u operation
There might be possible extra or AinvV inv beforethese three-step pattern where
bull An extra in the first step will not influencethe result and can be directly removed
bull If an extra AinvV inv in the first step
ndash If followed by known access ope-ration AinvV inv can be removeddue to the actual state further put intothe cache block
ndash If followed by known inv opera-t ion or V inv
u AinvV inv can also beremoved since the memory locationis repeatedly flushed by the two steps
ndash If followed by Vu worst case will beAinvV inv Vu not u operation u operationwhich is either an effective vul-nerability according to Table 2and reduction rules shown inSection 33 or AinvV inv Vu Ainv
d V invd u operation where
Vu Ainvd V inv
d can further beapplied Commute Rule 2 to reduceand be within three steps
In this case the steps are finally within three steps andthe checking is done
bull There exist two adjacent steps that cannot be appliedany Rules above and requires the Rest Checking
The only left two adjacent steps that cannot be appliedby any of the three categorizations of the Rules are thefollowing
ndash AaVaAaalias Vaalias AdVdAinva V inv
a
Ainvaalias V inv
aalias Vu ndash AaVaAaalias Vaalias V inv
u ndash Vu AaVaAaalias Vaalias AdVd
Ainva V inv
a Ainvaalias V inv
aalias ndash V inv
u AaVaAaalias Vaalias We manually checked all of the two adjacent step patterns
above and found that adding extra step before or after these
J Hardw Syst Secur (2019) 3397ndash425422
Algorithm 2 -Step ( 3) pattern reduction
Input number of steps of the pattern
a two-dimensional dynamic-size array [0] contains the states of each step of the original pattern in
order [1] [2] are empty initially
Output reduced effective vulnerability pattern(s) array It will be an empty list if the original pattern does not
correspond to an effective vulnerability
1 = Oslash
2 while contain( ) and index not 0 do3 = 1 ( )
4 end while5 while ( contain( ) and index not 0) or ( contain( ) and index not 0) do6 = 2 ( )
7 end while8 while ( is ineffecitve or has interval effective three steps) do9 = ( )
10 = ( )
11 = ( )
12 if ( is ineffecitve or has interval effective three steps) then13 += ( )
14 end if15 end while16 return
two steps can either generate two adjacent step patterns thatbe processed by the three Rules where further step can bereduced or construct effective vulnerability according toTable 2 and reduction rules shown in Section 33 wherethe corresponding pattern can be treated effective and thechecking is done
B43 Algorithm for Reducing and CheckingMemorySequence
Algorithm 2 is used to (i) reduce a β-step (β gt 3)pattern to a three-step pattern thus demonstrating that thecorresponding β gt 3 step pattern actually is equivalent tothe output three-step pattern and represents a vulnerabilitythat is captured by an existing three-step pattern or (ii)demonstrate that the β-step pattern can be mapped to one ormore three-step vulnerabilities It is not possible for a β-stepvulnerability pattern to not be either (i) or (ii) after doing theRule applications Key outcome of our analysis is that anyβ-step pattern is not a vulnerability or if it is a vulnerabilityit maps to either outputs (i) or (ii) of the algorithm
Inside the Algorithm 2 contain() represents a function tocheck if a list contains a corresponding state is ineffective()represents a function that checks the corresponding memorysequence does not contain any effective three-stepshas interval effective three steps() represents a functionthat check if the corresponding memory sequence can bemapped to one or more three-step vulnerabilities
B44 Summary
In conclusion the three-step model can model all possibletiming-based cache vulnerability in normal caches Vulner-abilities which are represented by more than three steps canbe always reduced to one (or more) vulnerabilities from ourthree-step model and thus using more than three step is notnecessary
References
1 Acıicmez O Koc CK (2006) Trace-driven cache attacks on AES(short paper) In International conference on information andcommunications security Springer pp 112ndash121
2 Agrawal D Archambeault B Rao JR Rohatgi P (2002) TheEM side-channel (s) In International workshop on cryptographichardware and embedded systems Springer pp 29ndash45
3 Bernstein DJ (2005) Cache-timing attacks on AES4 Binkert N Beckmann B Black G Reinhardt SK Saidi A Basu
A Hestness J Hower DR Krishna T Sardashti S et al (2011)The gem5 simulator ACM SIGARCH computer architecture news39(2)1ndash7
5 Bonneau J Mironov I (2006) Cache-collision timing attacksagainst AES In International workshop on cryptographichardware and embedded systems Springer pp 201ndash215
6 Borghoff J Canteaut A Guneysu T Kavun EB Knezevic MKnudsen LR Leander G Nikov V Paar C Rechberger Cet al (2012) Princendasha low-latency block cipher for pervasivecomputing applications In International conference on the theoryand application of cryptology and information security Springerpp 208ndash225
J Hardw Syst Secur (2019) 3397ndash425 423
7 Bourgeat T Lebedev I Wright A Zhang S Devadas S et al (2019)MI6 Secure enclaves in a speculative out-of-order processorIn Proceedings of the 52nd Annual IEEEACM InternationalSymposium on Microarchitecture ACM pp 42ndash56
8 Chen S Liu F Mi Z Zhang Y Lee RB Chen H Wang X(2018) Leveraging hardware transactional memory for cache side-channel defenses In Proceedings of the 2018 on Asia conferenceon computer and communications security ACM pp 601ndash608
9 Clark SS Ransford B Rahmati A Guineau S Sorber J Xu WFu K (2013) Wattsupdoc power side channels to nonintrusivelydiscover untargeted malware on embedded medical devices InPresented as part of the 2013 USENIX workshop on healthinformation technologies
10 Costan V Lebedev IA Devadas S (2016) Sanctum minimalhardware extensions for strong software isolation In USENIXsecurity symposium pp 857ndash874
11 Deng S Xiong W Szefer J (2018) Cache timing side-channel vulnerability checking with computation tree logic InProceedings of the 7th international workshop on hardware andarchitectural support for security and privacy ACM p 2
12 Domnitser L Jaleel A Loew J Abu-Ghazaleh N Ponomarev D(2012) Non-monopolizable caches low-complexity mitigation ofcache side channel attacks ACM Transactions on Architectureand Code Optimization (TACO) 8(4)35
13 Doychev G Kopf B (2017) Rigorous analysis of softwarecountermeasures against cache attacks In Proceedings of the 38thACM SIGPLAN conference on programming language design andimplementation ACM pp 406ndash421
14 Doychev G Kopf B Mauborgne L Reineke J (2015) CacheAudita tool for the static analysis of cache side channels ACM Trans-actions on information and system security (TISSEC) 18(1)4
15 Gruss D Lettner J Schuster F Ohrimenko O Haller I CostaM (2017) Strong and efficient cache side-channel protectionusing hardware transactional memory In USENIX securitysymposium pp 217ndash233
16 Gruss D Maurice C Wagner K Mangard S (2016) Flush+ flusha fast and stealthy cache attack In International conference ondetection of intrusions and malware and vulnerability assessmentSpringer pp 279ndash299
17 Gruss D Spreitzer R Mangard S (2015) Cache template attacksautomating attacks on inclusive last-level caches In USENIXsecurity symposium pp 897ndash912
18 Guanciale R Nemati H Baumann C Dam M (2016) Cache stor-age channels alias-driven attacks and verified countermeasuresIn 2016 IEEE symposium on security and privacy (SP) IEEEpp 38ndash55
19 Gullasch D Bangerter E Krenn S (2011) Cache gamesndashbringingaccess-based cache attacks on AES to practice In 2011 IEEEsymposium on security and privacy (SP) IEEE pp 490ndash505
20 He Z Lee RB (2017) How secure is your cache against side-channel attacks In Proceedings of the 50th annual IEEEACMinternational symposium on microarchitecture ACM pp 341ndash353
21 Intel C (2015) Improving real-time performance by utilizing cacheallocation technology Intel Corporation
22 Kayaalp M Khasawneh KN Esfeden HA Elwell J Abu-Ghazaleh N Ponomarev D Jaleel A (2017) RIC relaxed inclusioncaches for mitigating LLC side-channel attacks In 2017 54thACMEDACIEEE design automation conference (DAC) IEEEpp 1ndash6
23 Keramidas G Antonopoulos A Serpanos DN Kaxiras S (2008)Non deterministic caches a simple and effective defense againstside channel attacks Des Autom Embed Syst 12(3)221ndash230
24 Kessler RE Hill MD (1992) Page placement algorithms for largerealindexed caches ACM Trans Comput Syst (TOCS) 10(4)338ndash359
25 Kiriansky V Lebedev I Amarasinghe S Devadas S Emer J(2018) DAWG a defense against cache timing attacks in spec-ulative execution processors In 2018 51st Annual IEEEACMinternational symposium on microarchitecture (MICRO) IEEEpp 974ndash987
26 Kocher P Genkin D Gruss D Haas W Hamburg M LippM Mangard S Prescher T Schwarz M Yarom Y (2018)Spectre attacks exploiting speculativeexecution In 2019 IEEESymposium on Security and Privacy (SP) IEEE pp 1ndash19
27 Lee RB Kwan P McGregor JP Dwoskin J Wang Z (2005)Architecture for protecting critical secrets in microprocessorsIn ACM SIGARCH computer architecture news vol 33 IEEEComputer Society pp 2ndash13
28 Lee Y Waterman A Avizienis R Cook H Sun C StojanovicV Asanovic K (2014) A 45nm 13 GHz 167 double-precisionGFLOPSW RISC-V processor with vector accelerators InEuropean solid state circuits conference (ESSCIRC) ESSCIRC2014-40th IEEE pp 199ndash202
29 Liu F Ge Q Yarom Y Mckeen F Rozas C Heiser G Lee RB(2016) CATalyst defeating last-level cache side channel attacks incloud computing In 2016 IEEE international symposium on highperformance computer architecture (HPCA) IEEE pp 406ndash418
30 Liu F Lee RB (2014) Random fill cache architecture In2014 47th annual IEEEACM international symposium onmicroarchitecture (MICRO) IEEE pp 203ndash215
31 Liu F Wu H Mai K Lee RB (2016) Newcache secure cachearchitecture thwarting cache side-channel attacks IEEE Micro36(5)8ndash16
32 Luk CK Cohn R Muth R Patil H Klauser A Lowney G WallaceS Reddi VJ Hazelwood K (2005) Pin building customizedprogram analysis tools with dynamic instrumentation In ACMsigplan notices vol 40 ACM pp 190ndash200
33 Masti RJ Rai D Ranganathan A Muller C Thiele L Capkun S(2015) Thermal covert channels on multi-core platforms In 24thUSENIX security symposium (USENIX security 15) pp 865ndash880
34 Osvik DA Shamir A Tromer E (2006) Cache attacks andcountermeasures the case of AES In Cryptographersrsquo track atthe RSA conference Springer pp 1ndash20
35 Patel A Afram F Chen S Ghose K (2011) MARSS Afull system simulator for multicore x86 CPUs In 2011 48thACMEDACIEEE design automation conference (DAC) IEEEpp 1050ndash1055
36 Percival C (2005) Cache missing for fun and profit BSDCan 2005Ottawa 2005 httpwwwdaemonologynetpapershttpdf
37 Hammarlund P Martinez AJ Bajwa AA Hill DL Hallnor EJiang H Dixon M Derr M Hunsaker M Kumar R Osborne RB(2014) Haswell The fourth-generation intel core processor IEEEMicro 1034(2)6ndash20
38 Qureshi MK (2018) CEASER mitigating conflict-based cacheattacks via encrypted-address and remapping In 2018 51Stannual IEEEACM international symposium on microarchitecture(MICRO) IEEE pp 775ndash787
39 Sanchez D Kozyrakis C (2013) ZSIm fast and accuratemicroarchitectural simulation of thousand-core systems In ACMSIGARCH computer architecture news vol 41 ACM pp 475ndash486
40 Schwarz M Schwarzl M Lipp M Gruss D (2018) NetSpectreread arbitrary memory over network In European Symposium onResearch in Computer Security Springer pp 279ndash299
41 Seznec A (1993) A case for two-way skewed-associative cachesACM SIGARCH computer architecture news 21(2)169ndash178
42 Sharkey J Ponomarev D Ghose K (2005) M-sim a flexiblemultithreaded architectural simulation environment Techenicalreport Department of Computer Science State University ofNew York at Binghamton In Technical Report 20012 CompaqComputer Corporation Aug 2001
J Hardw Syst Secur (2019) 3397ndash425424
43 Shivakumar P Jouppi NP (2001) Cacti 30 an integrated cachetiming power and area model In Technical Report 20012Compaq Computer Corporation Aug 2001
44 Taylor G Davies P Farmwald M (1990) The TLB slice-alow-cost high-speed address translation mechanism In 17thannual international symposium on computer architecture 1990Proceedings IEEE pp 355ndash363
45 Thoziyoor S Muralimanohar N Ahn JH Jouppi NP (2008)CACTI 51 Technical Report HPL-2008-20 HP Labs
46 Trippel C Lustig D Martonosi M (2018) MeltdownPrimeand SpectrePrime automatically-synthesized attacks exploitinginvalidation-based coherence protocols arXiv180203802
47 Wang S Wang P Liu X Zhang D Wu D (2017) CacheDidentifying cache-based timing channels in production softwareIn 26th USENIX security symposium USENIX association
48 Wang Y Ferraiuolo A Zhang D Myers AC Suh GE (2016)SecDCP secure dynamic cache partitioning for efficient timingchannel protection In 2016 53nd ACMEDACIEEE designautomation conference (DAC) IEEE pp 1ndash6
49 Wang Z Lee RB (2007) New cache designs for thwarting softwarecache-based side channel attacks In ACM SIGARCH computerarchitecture news vol 35 ACM pp 494ndash505
50 Wang Z Lee RB (2008) A novel cache architecture withenhanced performance and security In 2008 41st IEEEACMinternational symposium on Microarchitecture 2008 MICRO-41IEEE pp 83ndash93
51 Werner M Unterluggauer T Giner L Schwarz M Gruss D Man-gard S (2019) ScatterCache thwarting cache attacks via cache setrandomization In 28th USENIX security symposium (USENIXSecurity 19) USENIX Association Santa Clara CA httpswwwusenixorgconferenceusenixsecurity19presentationwerner
52 Yan M Choi J Skarlatos D Morrison A Fletcher C Torrellas J(2018) InvisiSpec making speculative execution invisible in thecache hierarchy In 2018 51st annual IEEEACM internationalsymposium on microarchitecture (MICRO) IEEE pp 428ndash441
53 Yan M Gopireddy B Shull T Torrellas J (2017) Secure hierarchy-aware cache replacement policy (SHARP) defending againstcache-based side channel attacks In Proceedings of the 44thannual international symposium on computer architecture ACMpp 347ndash360
54 Yao F Doroslovacki M Venkataramani G (2018) Are coherenceprotocol states vulnerable to information leakage In 2018IEEE international symposium on high performance computerarchitecture (HPCA) IEEE pp 168ndash179
55 Yarom Y Falkner K (2014) FLUSH+ RELOAD a high resolutionlow noise L3 cache side-channel attack In USENIX securitysymposium pp 719ndash732
56 Zhang D Askarov A Myers AC (2012) Language-based controland mitigation of timing channels ACM SIGPLAN Not 47(6)99ndash110
57 Zhang D Wang Y Suh GE Myers AC (2015) A hardware designlanguage for timing-sensitive information-flow security In ACMSIGARCH computer architecture news vol 43 ACM pp 503ndash516
58 Zhang S Wright A Bourgeat T Arvind A (2018) Composablebuilding blocks to open up processor design In 2018 51stannual IEEEACM international symposium on microarchitecture(MICRO) IEEE pp 68ndash81
59 Zhang T Lee RB (2014) New models of cache architecturescharacterizing information leakage from cache side channels InProceedings of the 30th annual computer security applicationsconference ACM pp 96ndash105
60 Zhang Y Parikh D Sankaranarayanan K Skadron K Stan M(2003) HotLeakage a temperature-aware model of subthresholdand gate leakage for architects University of Virginia Dept ofComputer Science Tech Report CS-2003 5
Publisherrsquos Note Springer Nature remains neutral with regard tojurisdictional claims in published maps and institutional affiliations
J Hardw Syst Secur (2019) 3397ndash425 425
cache coherence will also be triggered when one of thememory-related operations is performed
The three-step model has three steps as the nameimplies In Step 1 a memory operation is performedplacing the cache in an initial state known to the attacker(eg a new piece of data at some address is put into thecache or the cache block is invalidated) Then in Step 2a second memory operation alters the state of the cachefrom the initial state Finally in Step 3 a final memoryoperation is performed and the timing of the final operationreveals some information about the relationship among theaddresses from Step 1 Step 2 and Step 3
For example in Flush + Reload [55] attack in Step
1 a cache block is flushed by the attacker In Step 2security critical data is accessed by for example victimrsquosAES encryption operation In Step 3 the same cache blockas the one flushed in Step 1 will be accessed and the time ofthe access will be measured by the attacker If the victimrsquossecret-dependent operation in Step 2 accesses the cacheblock in Step 3 there will be a cache hit and fast timing
of the memory operation will be observed and the attackerlearns the victimrsquos secret address
To model all the timing-based attacks we write the threesteps as Step 1 Step2 Step3 which represents asequence of steps taken by the attacker or the victim Tosimplify the model we focus on memory-related operationaffecting one single cache block (also called cache slotcache entry or cache line) Cache block is the smallestunit of the cache Since all the cache blocks are updatedfollowing the same cache state machine logic it is sufficientto consider only one cache block
32 States of the Three-StepModel
When modeling the attacks we propose that there are 17possible states for a cache block Table 1 lists all the 17possible states of the cache block for each step in our three-step model and their formal definitions Figure 1 graphicallyshows for each possible state how the memory locationmaps to the cache block
Table 1 The 17 possible states for a single cache block in our three-step model
State Description
Vu A memory location u belonging to the victim is accessed and is placed in the cache block by the victim (V) Attacker doesnot know u but u is from a set x of memory locations a set which is known to the attacker It may have the same index as a
or aalias and thus conflict with them in the cache block The goal of the attacker is to learn the index of the address u Theattacker does not know the address u hence there is no Au in the model
Aa or Va The cache block contains a specific memory location a The memory location is placed in the cache block due to a memoryaccess by the attacker Aa or the victim Va The attacker knows the address a independent of whether the access was by thevictim or the attacker themselves The address a is within the range of sensitive locations x The address a is known to theattacker
Aaalias or Vaalias The cache block contains a memory address aalias The memory location is placed in the cache block due to a memory accessby the attacker Aaalias or the victim Vaalias The address aalias is within the range x and not the same as a but it has the sameaddress index and maps to the same cache block ie it ldquoaliasesrdquo to the same block The address aalias is known to the attacker
Ad or Vd The cache block contains a memory address d The memory address is placed in the cache block due to a memory access bythe attacker Ad or the victim Vd The address d is not within the range x The address d is known to the attacker
Ainv or V inv The cache block is now invalid The data and its address are ldquoremovedrdquo from the cache block by the attacker Ainv or thevictim V inv as a result of cache block being invalidated eg this is a cache flush of the whole cache
Ainva or V inv
a The cache block state can be anything except a in this cache block now The data and its address are ldquoremovedrdquo from thecache block by the attacker Ainv
a or the victim V inva Eg by using a flush instruction such as clf lush that can flush specific
address or by causing certain cache coherence protocol events that force a to be removed from the cache block The addressa is known to the attacker
Ainvaalias or V inv
aalias The cache block state can be anything except aalias in this cache block now The data and its address are ldquoremovedrdquo fromthe cache block by the attacker Ainv
aalias or the victim V invaalias Eg by using a flush instruction such as clf lush that can flush
specific address or by causing certain cache coherence protocol events that force aalias to be removed from the cache blockThe address aalias is known to the attacker
Ainvd or V inv
d The cache block state can be anything except d in this cache block now The data and its address are ldquoremovedrdquo from thecache block by the attacker Ainv
d or the victim V invd Eg by using a flush instruction such as clf lush that can flush specific
address or by causing certain cache coherence protocol events that force d to be removed from the cache block The addressd is known to the attacker
V invu The cache block state can be anything except u in the cache block The data and its address are ldquoremovedrdquo from the cache
block by the victim V invu as a result of cache block being invalidated eg by using a flush instruction such as clf lush or
by certain cache coherence protocol events that force u to be removed from the cache block The attacker does not know uTherefore the attacker is not able to trigger this invalidation and Ainv
u does not exist in the model
Any data or no data can be in the cache block The attacker has no knowledge of the memory address in this cache block
J Hardw Syst Secur (2019) 3397ndash425400
Fig 1 The 17 possible states for a single cache block in our three-step model a Vu b AaVaAaalias Vaalias AdVd c AinvV inv dAinv
a V inva Ainv
aalias V invaalias A
invd V inv
d e V invu f )
In each sub-figure of Fig 1 the left-most part shows thepossible state being described in the sub-figure The middlepart shows the possible situation of the cache state affectedby each For all sub-figures the middle cache block (shownin bold) is the targeted cache block Right-most part showsthe memory region in relation to the cache block Recallthe addresses a and aalias are within the sensitive set ofaddresses x while d is outside the set of sensitive addresses(for simplicity the set is shown as a contiguous region butit can be any set) Also recall A represents the operationsperformed by the attacker and V represents the victimrsquosoperations
Figure 1a shows the description of the possible state Vuwhere address u is within sensitive set and unknown to theattacker Therefore it can possibly map to any cache blockincluding the target cache block shown in the middle Sinceits position in the cache and specific address is unknown
we show Vu in dashed lines Meanwhile Fig 1e shows thedescription of the possible state V inv
u which is result ofthe victim invalidating data at the sensitive address u andpossibly invalidating some address within sensitive regionFurther Fig 1f shows the description of the possible statelowast which represents null knowledge of the address for theattacker to this corresponding cache block Therefore it canpossibly refers to any address in the memory or no validaddress at all
Figure 1b shows the description of the possible stateAaVaAaalias Vaalias AdVd Their addresses are all knownto the attacker and map to the same targeted cache blockBoth a and aalias are within the sensitive set of addressesx and aalias as its name indicates is a different addressthan a but still within set x and maps to the same cacheblock as a Address d is outside of the set x MeanwhileFigure 1d shows the description of the possible state
J Hardw Syst Secur (2019) 3397ndash425 401
Ainva V inv
a Ainvaalias V inv
aalias Ainvd V inv
d which correspondto invalidation of the address shown in the subscript ofthe state Some additional possible invalidation statesAinvV inv are shown in Figure 1c These states indicate novalid address is in the cache block Therefore all the possi-ble addresses that mapped to this cache block eg a aalias d and u (if it mapped to this block) before the invalidationstep AinvV inv will be flushed back to the memory
33 Derivation of All Cache TimingndashBasedVulnerabilities
With the 17 candidate states shown in Table 1 for eachstep there are in total 17 lowast 17 lowast 17 = 4913 combinationsof three steps We developed a cache three-step simulatorand a set of reduction rules to process all the three-step combinations and decide which ones can indicate areal attack As is shown in Fig 2 the exhaustive list ofthe 4913 combinations will first be input to the cachethree-step simulator where the preliminary classificationof vulnerabilities is derived The effective vulnerabilitieswill then be sent as the input to the reduction rules toremove the redundant three steps and obtain final list ofvulnerabilities
331 Cache Three-Step Simulator
We developed a cache three-step simulator that simulatesthe state of one cache block and derives the attackerrsquosobservations in the last step of the three-step patterns that itanalyzes for different possible u Since u is in secure rangex the possible candidates of u for a cache block are a aalias
and NIB (Not-In-Block) Here NIB indicates the case thatu does not have same index as a or aalias and thus does notmap to this cache block
The cache three-step simulator is implemented inPython script and its pesudo implementation is shown inAlgorithm 1 Simulatorrsquos inputs are 17 possible states foreach of the step Outputs are all the vulnerabilities thatbelong to the Strong or the Weak type or the Ineffective typeThe simulator uses a nested for loop to check all possiblecombinations (4913) of the three step pattern For each step
Exhaustive List
of all possible
three-step
combinations
Cache
Three-Step
Simulator
Preliminary Strong Vulnerability
Preliminary WeakVulnerability
Ineffective Three-Step
Reduction
Rules
StrongVulnerability
WeakVulnerability
Classification Step
Reduction Step
Vulnerability TypesVulnerability Types
4913
132
572
4209
72
64
Fig 2 Procedure to derive the effective types of three-step timing-based vulnerabilities Ovals refer to the number of vulnerabilitiesin each category
of each pattern if it is Vu this step will be extended to beone of three candidates Va Vaalias and VNIB If it is V inv
u this step will be extended to be one of three candidatesV inv
a V invaalias and V inv
NIB We wrote a function output timingthat takes three known memory access steps as input andoutput whether fast or slow timing will be observed forthe last step In this case for each of the u-related steprsquoscandidate we can derive a timing observation Using thesetiming observation function judge type decides whether athree-step pattern is a potential vulnerability by analyzingwhether the attacker is able to observe different andunambiguous timing for different values of u
The simulator categorizes all the three-step patternsinto three categories as listed below Figure 3 showstwo examples for the Strong Vulnerability (a b) WeakVulnerability (c d) and Ineffective Three-Step (e f)categories respectively
1 Strong Vulnerability When a fast or slow timing isobserved by the attacker he or she is able to uniquelydistinguish the value of u (either it maps to someknown address or has the same index with some knownaddress) In this case the vulnerability has stronginformation leakage (ie attacker can directly obtainthe value of u based on the observed timing) Wecategorize these vulnerabilities to be strong Eg forVd Vu Aa vulnerability shown in Fig 3aif u maps to a the attacker will always derive fasttiming If u is aalias or NIB slow timing will beobserved This indicates that the attacker is able tounambiguously infer the victimrsquos behavior (u) from thetiming observation
2 Weak Vulnerability When fast or slow timing isobserved by the attacker he or she knows it correspondsto more than one possible value of u (eg a or aalias)For these vulnerabilities timing variation can still beobserved due to different victimrsquos behavior Howeverthe attacker cannot learn the value of the index ofthe address u unambiguously For example for type Vu Ainv
a shown in Fig 3c when fast timingis observed u possibly maps to aalias or NIB (thereason for the possibility of u mapping to NIB to derivefast timing is that due to the in Step 1 the cachecould have a hit and then Aa would result in a cachehit) On the other hand when slow timing is observedu possibly maps to a or NIB This pattern leads touncertain u guess about value of u based on timingobservation
3 Ineffective Three-Step The remaining types are treatedto be ineffective For example for type Aa Vu Ad
shown in Fig 3f no matter what the value of u isattackerrsquos observation is always slow timing
J Hardw Syst Secur (2019) 3397ndash425402
Pseudo-code for the cache three-step simulator algorithm
Input a list containing 17 possible states for each of the step
Output a list containing all the vulnerabilities that belong to the Strong type
a list containing all the vulnerabilities that belong to the Weak type
a list containing all the ineffective typs
1 for 1 do2 for 2 do3 for 3 do4 1 2 3
5 array to store all possible candidate combinations of this three-step pattern
6 array to store all possible timing observation regading different candidate combinations for this
three-step pattern
7 if 0 or 1 or 2 then8 for 3 rsquos candidates are and rsquos candidates are
and Both candidatersquos number is 3 do9 0
1 2
10 end for11 for 3 do12
13 end for14 if then15 strongappend(steps)
16 else17 if then18
19 else20
21 end if22 end if23 else24
25 continue26 end if27 end for28 end for29 end for
Algorithm 1
After computing the type of all the three-step patternsthe cache three-step simulator will output effective (StrongVulnerability or Weak Vulnerability) three-step patternsDue to the space limit we only list and analyze the Strongvulnerabilities in this paper Weak vulnerabilities are left forfuture work when channels with smaller channel capacitiesare desired to be analyzed
332 Reduction Rules
We also have developed rules that can further reduce theoutput list of all the effective three steps from the cache
three-step simulator Figure 2 shows how the output of thesimulator is filtered through the reduction rules to get thefinal list of vulnerabilities Reductionrsquos goal is to removevulnerabilities of repeating or redundant types from the liststo form effective Strong Vulnerability or Weak Vulnerabilityoutput A script was developed that automatically appliesbelow reduction rules to the output of the simulator to getthe final list of vulnerabilities A three-step combinationwill be eliminated if it satisfies one of the below rules
1 Three-step patterns with two adjacent steps which arerepeating or which are both known to the attacker canbe eliminated eg Ad Aa Vu can be reduced
J Hardw Syst Secur (2019) 3397ndash425 403
Behavior (u)a
aalias
Observation
NIB
fast
slow
Vd Vu Aa
Behavior (u)a
aalias
Observation
NIB
fast
slow
Vu Aainv
Behavior (u)a
aalias
Observation
NIB
fast
slow
Vd Vuinv Vd
(a)
(c)
(e)
Behavior (u)a
aalias
Observation
NIB
fast
slow
Vu Ad Vuinv
Behavior (u)a
aalias
Observation
NIB
fast
slow
Aaaliasinv Vu
inv Va
Behavior (u)a
aalias
Observation
NIB
fast
slow
Aa Vu Ad
(b)
(d)
(f)
Eg Eg
Eg Eg
Eg Eg
Fig 3 Examples of relations between victimrsquos behavior (u) andattackerrsquos observation for each vulnerability type a b StrongVulnerability c d Weak Vulnerability e f Ineffective Three-Step
to Aa Vu which is equivalent to Aa VuTherefore Ad Aa Vu is a repeat type of Aa Vu and can be eliminated
2 Three-step patterns with a step involving a knownaddress a and an alias to that address aalias givesthe same information Thus three step combinationswhich only differ in use of a or aalias cannot representdifferent attacks and only one combination needs tobe considered For example Vu Aaalias Vu is arepeat type of Vu Aa Vu and we will eliminatethe first pattern
3 Three-step patterns with steps Vu and V invu in adjacent
consecutive steps with each other will only keep thelatter step and eliminate the first step For exampleAa Vu V inv
u can be reduced to Aa V invu and
further equivalent to Aa V invu So Aa Vu
V invu can be eliminated
333 Categorization of StrongVulnerabilities
As is shown in Fig 2 after applying the reduction rulesthere are 72 types of Strong vulnerabilities remaining InAppendix B we analyze the soundness of the three-stepmodel to demonstrate that the three-step model can cover allpossible cache timingndashbased side-channel vulnerabilitiesAnd if there is a vulnerability it can always be reduced toa model that requires only three steps Table 2 lists all the
vulnerability types of which the last step is a memory accessand Table 3 shows all the vulnerability types of which thelast step is an invalidation-related operation To ease theunderstanding of all the vulnerability types we group thevulnerabilities based on attack strategies (left-most columnin Tables 2 and 3) these strategies correspond to well-known names for the attacks if such exist otherwise weprovide a new name In Appendix A we provide descriptionfor each attack strategy to show the main idea behind themWe use existing names for attack strategies where suchexisted before even if similar attacks eg attacks differingin only one step have been given different names before Weuse these established names to avoid confusion but detailsome of the similarities in Appendix A as a clarification
The list of vulnerability types can be further collectedinto four simple macro types which cover one or morevulnerability types internal interference miss-based (IM)internal interference hit-based (IH) external interferencemiss-based (EM) external interference hit-based (EH) aslabeled in the Macro Type column of Tables 2 and 3 Allthe types of vulnerabilities that only involve the victimrsquosbehavior V in the states in Step 2 and Step 3 are calledinternal interference vulnerabilities (I) The remaining onesare called external interference (E) Some vulnerabilitiesallow the attacker to learn that the address of the victimaccesses map to the set the attacker is attacking by observingslow timing due to a cache miss or f ast timing due toinvalidation of data not in the cache2 We call these miss-based vulnerabilities (M) The remaining ones leverageobservation of f ast timing due to a cache hit or slow timingdue to an invalidation of an address that is currently valid inthe cache and are called hit-based vulnerabilities (H)
Many vulnerability types have been explored before Forexample the Cache Collision attack [5] is effectively basedon the Internal Collision and it maps to types labeled (2)in the Attack column in Tables 2 and 3 The types labelednew correspond to new attack not previously discussedin literature We believe these 43 are new attacks notpreviously analyzed nor known
4 Secure Caches
Having explained the three-step model we now explorethe various secure caches which have been presented inliterature to date [7 10 12 22 23 25 27 29 30 38 48ndash53 56 57] Later in Section 5 we will apply the three-stepmodel to check if the secure caches can defend some or allof the vulnerabilities in our model
2Invalidation is fast when the corresponding address which is to beinvalidated does not exist in the cache since no operation is needed forthe invalidation
J Hardw Syst Secur (2019) 3397ndash425404
Table 2 The table shows all the cache timing-based cache vulnerabilities where the last step is a memory access related operation The AttackStrategy column gives a common name for each set of one or more specific vulnerabilities that would be exploited in an attack in a similar mannerThe Vulnerability Type column gives the three steps that define each vulnerability For Step 3 fast indicates a cache hit must be observed toderive sensitive address information while slow indicates a cache miss must be observed The Macro Type column proposes the categorizationthe vulnerability belongs to ldquoErdquo is for external interference vulnerabilities ldquoIrdquo is for internal interference vulnerabilities ldquoMrdquo is for miss-basedvulnerabilities ldquoHrdquo is for hit-based vulnerabilities The Attack column shows if a type of vulnerability has been previously presented in literature
Attack Strategy Vulnerability Type Macro Type Attack
Step 1 Step 2 Step 3
Cache Internal Collision Ainv Vu Va (fast) IH (2)
V inv Vu Va (fast) IH (2)
Ad Vu Va (fast) IH (2)
Vd Vu Va (fast) IH (2)
Aaalias Vu Va (fast) IH (2)
Vaalias Vu Va (fast) IH (2)
Ainva Vu Va (fast) IH (2)
V inva Vu Va (fast) IH (2)
Flush + Reload Ainva Vu Aa (fast) EH (5)
V inva Vu Aa (fast) EH (5)
Ainv Vu Aa (fast) EH (5)
V inv Vu Aa (fast) EH (5)
Ad Vu Aa (fast) EH (5)
Vd Vu Aa (fast) EH (5)
Aaalias Vu Aa (fast) EH (5)
Vaalias Vu Aa (fast) EH (5)
Reload + Time V invu Aa Vu (fast) EH new
V invu Va Vu (fast) IH new
Flush + Probe Aa V invu Aa (slow) EM (6)
Aa V invu Va (slow) IM new
Va V invu Aa (slow) EM new
Va V invu Va (slow) IM new
Evict + Time Vu Ad Vu (slow) EM (1)
Vu Aa Vu (slow) EM (1)
Prime + Probe Ad Vu Ad (slow) EM (4)
Aa Vu Aa (slow) EM (4)
Bernsteinrsquos Attack Vu Va Vu (slow) IM (3)
Vu Vd Vu (slow) IM (3)
Vd Vu Vd (slow) IM (3)
Va Vu Va (slow) IM (3)
Evict + Probe Vd Vu Ad (slow) EM newVa Vu Aa (slow) EM new
Prime + Time Ad Vu Vd (slow) IM newAa Vu Va (slow) IM new
Flush + Time Vu Ainva Vu (slow) EM new
Vu V inva Vu (slow) IM new
(1) Evict + Time attack [34] (2) Cache Internal Collision attack [5] (3) Bernsteinrsquos attack [3] (4) Prime + Probe attack [34 36] Alias-drivenattack [18] (5) Flush + Reload attack [54 55] Evict + Reload attack [17] (6) SpectrePrime MeltdownPrime attack [46]
J Hardw Syst Secur (2019) 3397ndash425 405
Table 3 The table shows the second part of the timing-based cache side-channel vulnerabilities where the last step is an invalidation relatedoperation For Step 3 fast indicates no corresponding address of the data is invalidated while slow indicates invalidation operation makes somedata invalid causing longer processing time
Attack Vulnerability Type Macro Attack
Step 1 Step 2 Step 3
Strategy Type
Cache Internal Collision Invalidation Ainv Vu V inva (slow) IH new
V inv Vu V inva (slow) IH new
Ad Vu V inva (slow) IH new
Vd Vu V inva (slow) IH new
Aaalias Vu V inva (slow) IH new
Vaalias Vu V inva (slow) IH new
Flush + Flush Ainva Vu V inv
a (slow) IH (1)
V inva Vu V inv
a (slow) IH (1)
Ainva Vu Ainv
a (slow) EH (1)
V inva Vu Ainv
a (slow) EH (1)
Flush + Reload Invalidation Ainv Vu Ainva (slow) EH new
V inv Vu Ainva (slow) EH new
Ad Vu Ainva (slow) EH new
Vd Vu Ainva (slow) EH new
Aaalias Vu Ainva (slow) EH new
Vaalias Vu Ainva (slow) EH new
Reload + Time Invalidation V invu Aa V inv
u (slow) EH new
V invu Va V inv
u (slow) IH new
Flush + Probe Invalidation Aa V invu Ainv
a (fast) EM new
Aa V invu V inv
a (fast) IM new
Va V invu Ainv
a (fast) EM new
Va V invu V inv
a (fast) IM new
Evict + Time Invalidation Vu Ad V invu (fast) EM new
Vu Aa V invu (fast) EM new
Prime + Probe Invalidation Ad Vu Ainvd (fast) EM new
Aa Vu Ainva (fast) EM new
Bernsteinrsquos Invalidation Attack Vu Va V invu (fast) IM new
Vu Vd V invu (fast) IM new
Vd Vu V invd (fast) IM new
Va Vu V inva (fast) IM new
Evict + Probe Invalidation Vd Vu Ainvd (fast) EM new
Va Vu Ainva (fast) EM new
Prime + Time Invalidation Ad Vu V invd (fast) IM new
Aa Vu V inva (fast) IM new
Flush + Time Invalidation Vu Ainva V inv
u (fast) EM new
Vu V inva V inv
u (fast) IM new
(1) Flush + Flush attack [16]
J Hardw Syst Secur (2019) 3397ndash425406
This section gives brief overview of the 18 secure cachedesigns that have been presented in academic literaturein the last 15 years To the best of our knowledge thesecover all the secure cache designs proposed to date Mostof the designs have been realized in functional simulationeg [23 48] Some have been realized in FPGA eg [7]and a few have been realized in real ASIC hardwareeg [31] No specific secure caches have been implementedin commercial processors to the best of our knowledgehowever CATalyst [29] leverages Intelrsquos CAT (CacheAllocation Technology) technology available today in IntelXeon E5 2618L v3 processors and could be deployed today
When the secure cache description in the cited papersdid not mention the issue of using flush or cache coherencewe assume the victim or the attacker cannot invalidateeach otherrsquos cache blocks by using clf lush instructions orthrough cache coherence protocol operations but they canflush or use cache coherence to invalidate their own cachelines The victim and the attacker also cannot invalidateprotected or locked data Further if the authors specified anyspecific assumptions (mainly about the software) we list theassumption as part of the description of the cache Whatrsquosmore when the level of cache hierarchy was unspecifiedwe assume the secure cachesrsquo features can be appliedto all levels of caches including L1 cache L2 cacheand Last Level Cache (LLC) If the inclusivity of thecaches was not specified we assume they target inclusivecaches Following the below descriptions of each securecache design the analysis of the secure caches is given inSection 5
SPlowast Cache [20 27]3 uses partitioning techniques tostatically partition the cache ways into High and Lowpartition for the victim and the attacker according to theirdifferent process IDs The victim typically belongs to Highsecurity and attacker belongs to Low security Victimrsquosmemory accesses cannot modify Low partition (assigned toprocesses such as the attacker) while the attackerrsquos memoryaccesses cannot modify High partition (assigned to thevictim) However the memory accesses of both the victimand the attacker can result in a hit in either Low or Highpartition if the data is in the cache
SecVerilog Cache [56 57] statically partitions cacheblocks between security levels L (Low) and H (High)Each instruction in the source code for programs usingSecVerilog cache needs to include a t iming label whicheffectively represents whether the data being accessed by
3Two existing papers give slightly different definitions for an ldquoSPrdquocache thus we selected to define a new cache the SPlowast cachethat combines secure cache features of the Secret-Protecting cachefrom [27] with secure cache features of the Static-Partitioned cachefrom [20]
that instruction is Low or High based on the code andthis t iming label can be similar to a process ID thatdifferentiates attackerrsquos (Low) instructions from victimrsquos(High) instructions The cache is designed such thatoperations in the High partition cannot affect timing ofoperations in the Low partition For a cache miss due to Lowinstructions when the data is in the High partition it willbehave as a cache miss and the data will be moved from theHigh to the Low partition to preserve consistency HoweverHigh instructions are able to result in a cache hit in bothHigh and Low partitions if the data is already in the cache
SecDCPCache [48] builds on the SecVerilog cache and usespartitioning idea from the original SecVerilog cache but thepartitioning is dynamic It can support at least two securityclasses H (High) and L (Low) and configurations with moresecurity classes are possible They use the percentage ofcache misses for L instructions that was reduced (increased)when Lrsquos partition size was increased (reduced) by onecache way to adjust the number of ways of the cacheassigned to the Low partition When adjusting number ofways in the cache dedicated to each partition if Lrsquos partitionsize decreases the process ID is checked and L blocksare flushed before the way is reallocated to H On theother hand if Lrsquos partition size increases H blocks in theadjusted cache way remain unmodified so as to not addmore performance overhead and they will eventually beevicted by Lrsquos memory accesses However the feature ofnot flushing High partition data during way adjustment mayleak timing information to the attacker
NoMo Cache [12] dynamically partitions the cache waysamong the currently ldquoactiverdquo simultaneous multithreading(SMT) threads Each thread is exclusively reserved Y blocksin each cache set where Y is within the range of [0 N
M]
where N is the number of ways and M is the number ofSMT threads NoMo-0 equals to traditional set associativecache while NoMo- N
M partitions cache evenly for the
different threads and there are no non-reserved ways Thenumber of Y assigned to each thread is adjusted based onits activeness When adjusting number of blocks assigned toa thread Y blocks are invalidated for cache sets to protecttiming leakage Eviction is not allowed within each threadrsquosown reserved ways while it is possible for the shared waysTherefore to avoid eviction caused by the unreserved wayswe assume NoMo- N
M is used to fully partition the cache
When the attacker and the victim share the same librarythere will be a cache hit if accessing the shared data andthe normal cache hit policy holds to guarantee the cachecoherence
SHARP Cache [53] uses both partitioning and randomiza-tion techniques to prevent victimrsquos data from being evicted
J Hardw Syst Secur (2019) 3397ndash425 407
or flushed by other malicious processes and it targets on theinclusive caches Each cache block is augmented with thecore valid bits (CVB) to indicate which private cache (pro-cess) it belongs to (similar to the Process ID) where CVBstores a bitmap and ith bit in the bitmap is set if the lineis present in ith corersquos private cache Cache hit is allowedamong different processesrsquo data When there is cache missand data needs to be evicted data not belonging to any cur-rent processes will be evicted first If there is no such datadata belonging to the same process will be evicted If thereis no existing data in the cache that is in the same processa random data in the cache set will be evicted This randomeviction will generate an interrupt to the OS to notify it ofa suspicious activity For pages that are read-only or exe-cutable SHARP cache disallows flushing using clf lush inuser mode However invalidating victimrsquos blocks by usingcache coherence protocol is still possible
Sanctum Cache [10] focuses on isolation of enclaves(equivalent to Trusted Software Module in other designs)from each other and the operating system (OS) In terms ofcaches they implement security features for L1 cache TLBand LLC Cache isolation of LLC is achieved by assigningeach enclave or OS to different DRAM address regionsIt uses page-coloring-based cache partitioning scheme [2444] and a software security monitor that ensures per-core isolation between OS and enclaves For L1 cacheand TLB when there is a transition between enclave andnon-enclave mode the security monitor will flush thecore-private caches to achieve isolation Normal flushestriggered by the enclave or the OS can only be done withinenclave or not within enclave code Also timing-based side-channel attacks exploiting cache coherence are explicitlynot prevented thus behavior on cache coherence operationsis not defined This cache listed extra software assumptionsas follows
Assumption 1 Software security monitor guarantees thatvictim and attacker process cannot share the same cacheblocks It uses page coloring [24 44] to ensure that victimand attackerrsquos memory is never mapped to the same cacheblocks for the LLCAssumption 2 The software runs on a system with asingle processor core where victim and attacker alternateexecution but can never run truly in parallel Moreoversecurity critical data is always flushed by the securitymonitor when program execution switches away from thevictim program for the L1 cache and the TLB
MI6 Cache [7] is part of the memory hierarchy of theMI6 processor which combines Sanctum [10] cachersquossecurity feature with disabling speculation during thespeculative execution of memory-related operations During
normal processor execution for L1 caches and TLB thecorresponding states will be flushed across context switchesbetween software threads For the LLC set partitioning isused to divide DRAM into contiguous regions And cachesets are guaranteed to be strictly partitioned (two DRAMregions cannot map to the same cache set) Each enclaveis only able to access its own partition Speculation issimply disabled when enclave interacts with the outsideworld because of small performance influence based on therare cases of speculation This cache listed extra softwareassumptions as follows
Assumption 1 Software security monitor guarantees thatthe victim and the attacker process cannot share the samecache blocks It uses page coloring [24 44] to ensure thatvictimrsquos and attackerrsquos memory are never mapped to thesame cache blocks for the LLCAssumption 2 The software runs on a system with asingle processor core where victim and attacker alternateexecution but can never run truly in parallel Moreoversecurity critical data is always flushed by the securitymonitor when program execution switches away from thevictim program for the L1 cache and the TLBAssumption 3 When an enclave is interacting with theoutside environment the corresponding speculation isdisabled by the software
InvisiSpec Cache [52] is able to make speculation invisiblein the data cache hierarchy Before a visibility pointshows up when all of its prior control flow instructionsresolve unsafe speculative loads (USL) will be put intoa speculative buffer (SB) without modifying any cachestates When reaching the visibility point there are twocases In one case the USL and successive instructionswill be possibly squashed because of mismatch of datain the SB and the up-to-date values in the cache Inanother case the core receives possible invalidation fromthe OS before checking of memory consistency modeland no comparison is needed When speculative executionhappens the hardware puts the data into SB as to identifyvisibility point for dealing with final state transition of thespeculative execution InvisiSpec cache targets on Spectre-like attacks and futuristic attacks However InvisiSpeccache is vulnerable to all non-speculative side channels
CATalyst Cache [29] uses partitioning especially CacheAllocation Technology (CAT) [21] available in the LLCof some Intel processors CAT allocates up to 4 differentClasses of Services (CoS) for separate cache ways sothat replacement of cache blocks is only allowed withina certain CoS CATalyst first uses CAT mechanism topartition caches into secure and non-secure parts (non-secure parts may map to 3 CoS while secure parts map
J Hardw Syst Secur (2019) 3397ndash425408
to 1 CoS) Secure pages are assigned to virtual machines(VMs) at a granularity of a page and not shared by morethan one VM Here attacker and victim reside in differentVMs Combined with CAT technology and pseudo-lockingmechanism which pins certain page frames managed bysoftware CATalyst guarantees that malicious code cannotevict secure pages CATalyst implicitly performs preloadingby remapping security-critical code or data to secure pagesFlushes can only be done within each VM And cachecoherence is achieved by assigning secure pages to only oneprocessor and not sharing pages among VMs This cachelisted extra software assumptions as follows
Assumption 1 Security critical data is always preloadedinto the cache at the beginning of the whole programexecutionAssumption 2 Security critical data is always able to fitwithin the secure partition of the cache That is all datain the range x can fit in the secure partitionAssumption 3 The victim and the attacker process cannotshare the same memory spaceAssumption 4 Use pseudo-locking mechanism by soft-ware to make sure that victim and attacker process cannotshare the same cache blocksAssumption 5 Secure pages are reloaded immediatelyafter the flush which is done by the virtual machinemonitor (VMM) to make sure all the secure pages are stillpinned in the secure partition
DAWG Cache [25] (Dynamically Allocated Way Guard)partitions the cache by cache ways and provides fullisolation for hits misses and metadata updates acrossdifferent protection domains (between the attacker and thevictim) DAWG cache is partitioned for the attacker and thevictim and each of them keep their own different domain id(which is similar to process ID used in general caches) Eachdomain id has its own bit fields one is called policy fillmapfor masking fills and selecting the victim to replace anotheris called policy hitmap for masking hit ways Only boththe tag and the domain id are the same will a cache hithappen Therefore DAWG allows read-only cache lines tobe replicated across ways for different protection domainFor a cache miss the victim can only be chosen withinthe ways belonging to the same domain id recorded bythe policy fillmap Consistently the replacement policy isupdated with the victim selection and the metadata derivedfrom the policy fillmap for different domains is updatedas well The paper also proposes the idea to dynamicallypartitions the cache ways following the systemrsquos workloadchanges but does not actually implement it
RIC Cache [22] (Relaxed Inclusion Caches) proposes a low-complexity cache to defend against eviction-based timing-based side-channel attacks on the LLC Normally for an
inclusive cache if the data R is in the LLC it is also inthe higher level cache and eviction of the R in the LLCwill cause the same data in the higher level cache eg L1cache to be invalidated making eviction-based attacks inthe higher level cache possible (eg attacker is able to evictvictimrsquos security critical cache line) For RIC each cacheline is extended with a single bit to set the relaxed inclusionOnce the relaxed inclusion is set for that cache line thecorresponding LLC line eviction will not cause the sameline in the higher-level cache to be invalidated Two kinds ofdata will be set relaxed inclusion bit read only data andthread private data when they are loaded into the cacheThese two kinds of data are claimed by the paper to cover allthe critical data for ciphers Therefore RIC will not preventwritable in-private critical data which is currently not foundin any ciphers Apart from that RIC requires flushing for thecorresponding cache lines in the cases that the RIC bits aremodified or for thread migration events to avoid the timingleakage during transition time
PL Cache [49] provides isolation by partitioning the cachebased on cache blocks It extends each cache block with aprocess ID and a lock status bit The process ID and thelock status bits are controlled by the extended load and storeinstructions (ld lockld unlock and st lockst unlock)which allow the programmer and compiler to set or reset thelock bit through use of the right load or store instructionIn terms of cache replacement policy for a cache hit PLcache will perform the normal cache hit handling procedureand the instructions with locking or unlocking capability canupdate the process ID and the lock status bits while the hit isprocessed When there is a cache miss locked data cannotbe evicted by data that is not locked and locked data amongdifferent processes cannot be evicted by each other In thiscase the new data will be either loaded or stored withoutcaching In other cases data eviction is possible This cachelisted extra software assumption as follows
Assumption 1 Security critical data is always preloadedinto the cache at the beginning of the whole programexecution
RP Cache [49] uses randomization to de-correlate thememory address accessing and timing of the cache Foreach block of RP cache there is a process ID and oneprotection bit P set to indicate if this cache block needs to beprotected or not A permutation table (PT) stores each cachesetrsquos pre-computed permuted set number and the numberof tables depends on number of protected processes Formemory access operations cache hits need both process IDand address to be the same When a cache miss happensto data D of a cache set S if the to-be-evicted data andto-be-brought-in data belong to the same process but have
J Hardw Syst Secur (2019) 3397ndash425 409
different protection bit arbitrary data of a random cache setSprime will be evicted and D will be accessed without cachingIf they belong to different processes D will be stored in anevicted cache block of Sprime and mapping of S and Sprime will beswapped as well Otherwise the normal replacement policyis executed
Newcache Cache [31 50] dynamically randomizesmemory-to-cache mapping It introduced a ReMappingTable (RMT) and the mapping between memory addressesand this RMT is as in a direct mapped cache while themapping between the RMT and actual cache is fully asso-ciative The index bits of memory address are used to lookup entries in the RMT to find the cache block that shouldbe accessed It stores the most useful cache lines rather thanhold a fixed set of cache lines This index stored in RMTcombined with the process ID is used to look up the actualcache where each cache line is associated with its real indexand process ID Each cache block is also associated witha protection bit (P) to indicate if it is security critical Forcache replacement policy it is very similar to RP cacheCache hit needs both process ID and address to be the sameWhen cache miss happens to data D arbitrary data willbe evicted and D will be accessed without caching if theybelong to the same process but either one of their protec-tion bit is set If the evicted data and brought-in data havedifferent process IDs D will randomly replace a cache linesince it is fully associative in the actual cache Otherwisethe normal replacement policy for direct mapped cache isexecuted
Random Fill Cache [30] de-correlates cache fills withthe memory access using random filling technique Newinstructions used by applications in Random Fill cachecan control if the requested data belongs to a normalrequest or a random fill request Cache hits are processedas in normal cache For the security critical data accessesof the victim a Nof ill request is executed and therequested data access will be performed without cachingMeanwhile on a Random Fill request arbitrary datafrom the range of addresses will be brought into the cacheIn the paper [30] the authors show that random fill ofspatially near data does not hurt performance For otherprocessesrsquo memory accesses and normal victimrsquos memoryaccesses Normal request will be used to achieve normalreplacement policy Victim and attacker are able to removevictimrsquos own security critical data including using clf lush
instructions or cache coherence protocol since the flush willnot influence timing-based side-channel attack prevention(the random filling technique is used for this)
CEASER Cache [38] is able to mitigate conflict-basedLLC timingndashbased side-channel attacks using address
encryption and dynamic remapping CEASER cache doesnot differentiate whom the address belongs to and whetherthe address is security critical When memory access triesto modify the cache state the address will first be encryptedusing Low-Latency BlockCipher (LLBC) [6] which notonly randomizes the cache set it maps but also scatters theoriginal possibly ordered and location-intensive addressesto different cache sets decreasing the probability of conflictmisses The encryption and decryption can be done withintwo cycles using LLBC Furthermore the encryption keywill be periodically changed to avoid key reconstructionThe periodic re-keying will cause the address remapping todynamically change
SCATTER Cache [51] uses cache set randomization toprevent timing-based attacks It builds upon two ideas Firsta mapping function is used to translate memory addressand process information to cache set indices the mappingis different for each program or security domain Secondthe mapping function also calculates a different index foreach cache way in a similar way to the skewed associativecaches [41] The mapping function can be keyed hash orkeyed permutation derivation functionmdasha different key isused for different application or security domain resultingin a different mapping from address to cache sets for eachSoftware (eg the operating system) is responsible formanaging the security domains and process IDs which areused to differentiate the different software and assign itdifferent keys for the mapping For the hardware extensiona cryptographic primitive such as hashing and an indexdecoder for each scattered cache way is added SCATTERcache also stores the index bits of the physical addressto efficiently perform lookups and writebacks There isalso one bit per page-table entry added to allow the kernelto communicate with the user space for security domainidentification
Non Deterministic Cache [23] uses cache access delay torandomize the relation between cache block access andcache access timing There is no differentiation of datacaching between different process ID or whether the datais secure or not A per-cache-block counter records theinterval of its data activeness and is increased on eachglobal counter clock tick when the data is untouched Whenthe counter reaches a predefined value the correspondingcache line will be invalidated Non Deterministic Cacherandomly sets the local countersrsquo initial value that is lessthan the maximum value of the global counter In thiscase the cache delay is changed to be randomized Cachedelay interval controlled by this non-deterministic executioncan lead to different cache hit and miss statistics becausethe invalidation is determined by the randomized counterof each cache line and therefore de-correlates any cache
J Hardw Syst Secur (2019) 3397ndash425410
access time from the address being accessed However theperformance degradation is tremendous
5 Analysis of the Secure Caches
In this section we manually evaluate the effectiveness ofthe 18 secure caches [7 10 12 22 23 25 27 29 30 3848ndash53 56 57] We analyze how well the different cachescan protect against the 72 types of vulnerabilities definedin Tables 2 and 3 which cover all the possible Strong(according to the definition in Section 3) cache timingndashbased vulnerabilities Following the analysis discuss whattypes of secure caches and features are best suited fordefending different types of timing-based attacks
51 Effectiveness of the Secure Caches AgainstTiming-Based Attacks
Tables 4 and 5 list the result of our analysis of whichcaches can prevent which types of attacks Some cachesare able to prevent certain vulnerabilities denoted by acheckmark and green color in the table For exampleSPlowast cache can defend against Vu Ad Vu (slow)(one type of Evict + Time [34]) vulnerability For someother caches and vulnerabilities the cache is not able toprevent the vulnerabilities and it is indicated by times and redcolor For example SecDCP cache cannot defend againstVu Va Vu (slow) (one type of Bernsteinrsquos Attack [3])vulnerability
Each cache is analyzed for each type of vulnerabilitylisted in Tables 2 and 3 A cache is judged to be able toprevent a type of cache timingndashbased vulnerability in threecases
1 A cache can prevent a timing attack if the timing ofthe last step in a vulnerability is always constant andthe attacker can never observe fast and slow timingdifference for the given set of three steps For instancein a regular set-associative cache the Vd Vu Aa
(fast) (one type of Flush + Reload [55]) vulnerabilitywill allow the attacker to know that address a mapsto secret u when the attacker observes fast timingcompared with observing slow timing in the other casesHowever in case of the RP cache [49] will make thetiming of the last step to be always slow because RPcache does not allow data of different processes toderive cache hit between each other
2 A cache can prevent a timing attack if the timingof last step is randomized and cannot have originalcorresponding relation between victimrsquos behavior andattackerrsquos observation For instance Ad Vu Ainv
d (fast) (one type of Prime + Probe Invalidation)
vulnerability when executed on a normal set-associativecache will allow the attacker to know that the addressd has the same index with secret u when observing fasttiming compared with slow timing in the other casesHowever when executing this attacks on the RandomFill cache [30] for example a slow timing will notdetermine that u and d have the same index as thesecret since in Random Fill cache u would be accessedwithout caching and another random data would becached instead
3 A cache can prevent a timing attack if it disallowscertain steps from the three-step model to be executedthus prevents the corresponding vulnerability Forinstance when PL cache [49] preloads and locks thesecurity critical data in the cache vulnerabilities suchas Ad Vu V inv
d (slow) (one type of Prime + TimeInvalidation) will not be possible since a preloadedlocked security critical data will not allow Ad in Step 1to replace it In this case Ad cannot be in the cache sothis vulnerability cannot be triggered in PL cache
From the security perspective the entries of the securecache in Tables 4 and 5 should have as many green coloredcells as possible If a cache design has any red cells thenit cannot defend against that type of vulnerabilitymdashattackerusing the timing-based vulnerability that corresponds to thered cell can attack the system
The third column in Tables 4 and 5 shows a normalset associative cache which cannot defend against anytype of timing-based vulnerabilities Meanwhile the lastcolumn of Tables 4 and 5 shows the situation where thecache is fully disabled As is expected the timing-basedvulnerabilities are eliminated and timing-based attacks willnot succeed Disabling caches however has tremendousperformance penalty Similarly second-to-last columnshows Nondeterministic Cache which totally randomizescache access time It can defend all the attacks but againwill have a tremendous cost to security when the applicationis complex
For each of the entry that shows the effectivenessof a secure cache against a vulnerability there are tworesults listed Left one is for normal execution and theright one is for speculative execution Some secure cachessuch as InvisiSpec cache target timing-based channels inspeculative execution For most of the caches that do notdifferentiate speculative execution and normal executionthe two sub-columns for each cache are the same
6 Secure Cache Techniques
Among the secure cache designs presented in the priorsection there are three main techniques that the caches
J Hardw Syst Secur (2019) 3397ndash425 411
Table4
Exi
stin
gse
cure
cach
esrsquo
prot
ectio
nag
ains
tal
lpo
ssib
letim
ing-
base
dvu
lner
abili
ties
with
last
step
tobe
mem
ory
acce
ssre
late
dop
erat
ions
Sin
gle
ina
gree
nce
llm
eans
this
cach
eca
npr
even
tth
eco
rres
pond
ing
vuln
erab
ility
i
na
pink
cell
mea
nsth
isca
che
can
prev
ent
the
corr
espo
ndin
gvu
lner
abili
tyin
som
ede
gree
A
timesin
are
dce
llm
eans
this
cach
eca
nnot
prev
ent
this
vuln
erab
ility
Fur
ther
mor
efo
rea
chca
che
we
anal
yze
norm
alex
ecut
ion
(lef
tcol
umn
unde
rth
eca
che
nam
e)an
dsp
ecul
ativ
eex
ecut
ion
(rig
htco
lum
nun
der
the
cach
ena
me)
a Dyn
amic
adju
stm
ento
fw
ays
for
diff
eren
tthr
eads
isas
sum
edto
bepr
oper
lyus
edac
cord
ing
toth
eru
nnin
gpr
ogra
mrsquos
cach
eus
age
bSo
me
soft
war
eas
sum
ptio
nslis
ted
inth
een
trie
sin
this
colu
mn
have
been
impl
emen
ted
byth
eca
chersquo
sre
late
dso
ftw
are
c Flus
his
disa
bled
but
cach
eco
here
nce
mig
htbe
used
todo
the
data
rem
oval
dFo
rL
1ca
che
and
TL
Bf
lush
ing
isdo
nedu
ring
cont
exts
witc
he T
hete
chni
ques
are
impl
emen
ted
inL
1ca
che
TL
Ban
dla
st-l
evel
cach
ew
hich
cons
isto
fth
ew
hole
cach
ehi
erar
chy
whe
reL
1ca
che
and
TL
Bre
quir
eso
ftw
are
flus
hpr
otec
tion
and
the
last
-lev
elca
che
can
beac
hiev
edby
sim
ple
hard
war
epa
rtiti
onin
gTo
prot
ecta
llle
vels
ofca
ches
the
soft
war
eas
sum
ptio
nsne
edto
bead
ded
f The
tech
niqu
eis
now
only
impl
emen
ted
inla
st-l
evel
cach
egT
hete
chni
que
now
only
targ
ets
shar
edca
che
hT
hete
chni
que
only
targ
ets
incl
usio
nla
st-l
evel
cach
ei T
hete
chni
que
targ
ets
data
cach
ehi
erar
chy
j For
the
last
-lev
elca
che
cach
eis
part
ition
edbe
twee
nth
evi
ctim
and
the
atta
cker
kT
hete
chni
que
can
cont
rolt
hepr
obab
ilitie
sof
the
vuln
erab
ility
tobe
succ
essf
ulto
beex
trem
ely
smal
ll T
hete
chni
que
can
wor
kin
shar
edr
ead
only
mem
ory
whi
leno
twor
king
insh
ared
wri
tabl
em
emor
ym
Ran
dom
dela
ybu
tnot
rand
omm
appi
ngca
non
lyde
crea
seth
epr
obab
ilitie
sof
atta
cker
inso
me
limite
dde
gree
J Hardw Syst Secur (2019) 3397ndash425412
Table5
Exi
stin
gse
cure
cach
esrsquop
rote
ctio
nag
ains
tall
poss
ible
timin
g-ba
sed
vuln
erab
ilitie
sw
ithla
stst
epto
bein
valid
atio
nre
late
dop
erat
ions
Sin
gle
ina
gree
nce
llm
eans
this
cach
eca
npr
even
tth
eco
rres
pond
ing
vuln
erab
ility
in
api
nkce
llm
eans
this
cach
eca
npr
even
tthe
corr
espo
ndin
gvu
lner
abili
tyin
som
ede
gree
Atimes
ina
red
cell
mea
nsth
isca
che
cann
otpr
even
tthi
svu
lner
abili
ty
Furt
herm
ore
for
each
cach
ew
ean
alyz
eno
rmal
exec
utio
n(l
eftc
olum
nun
der
the
cach
ena
me)
and
spec
ulat
ive
exec
utio
n(r
ight
colu
mn
unde
rth
eca
che
nam
e)
a Dyn
amic
adju
stm
ento
fw
ays
for
diff
eren
tthr
eads
isas
sum
edto
bepr
oper
lyus
edac
cord
ing
toth
eru
nnin
gpr
ogra
mrsquos
cach
eus
age
bSo
me
soft
war
eas
sum
ptio
nslis
ted
inth
een
trie
sin
this
colu
mn
have
been
impl
emen
ted
byth
eca
chersquo
sre
late
dso
ftw
are
c Flus
his
disa
bled
but
cach
eco
here
nce
mig
htbe
used
todo
the
data
rem
oval
dFo
rL
1ca
che
and
TL
Bf
lush
ing
isdo
nedu
ring
cont
exts
witc
he T
hete
chni
ques
are
impl
emen
ted
inL
1ca
che
TL
Ban
dla
st-l
evel
cach
ew
hich
cons
isto
fth
ew
hole
cach
ehi
erar
chy
whe
reL
1ca
che
and
TL
Bre
quir
eso
ftw
are
flus
hpr
otec
tion
and
the
last
-lev
elca
che
can
beac
hiev
edby
sim
ple
hard
war
epa
rtiti
onin
gTo
prot
ecta
llle
vels
ofca
ches
the
soft
war
eas
sum
ptio
nsne
edto
bead
ded
f The
tech
niqu
eis
now
only
impl
emen
ted
inla
st-l
evel
cach
egT
hete
chni
que
now
only
targ
ets
shar
edca
che
hT
hete
chni
que
only
targ
ets
incl
usio
nla
st-l
evel
cach
ei T
hete
chni
que
targ
ets
data
cach
ehi
erar
chy
j For
the
last
-lev
elca
che
cach
eis
part
ition
edbe
twee
nth
evi
ctim
and
the
atta
cker
kT
hete
chni
que
can
cont
rolt
hepr
obab
ilitie
sof
the
vuln
erab
ility
tobe
succ
essf
ulto
beex
trem
ely
smal
ll T
hete
chni
que
can
wor
kin
shar
edr
ead
only
mem
ory
whi
leno
twor
king
insh
ared
wri
tabl
em
emor
ym
Ran
dom
dela
ybu
tnot
rand
omm
appi
ngca
non
lyde
crea
seth
epr
obab
ilitie
sof
atta
cker
inso
me
limite
dde
gree
J Hardw Syst Secur (2019) 3397ndash425 413
utilize differentiating sensitive data partitioning andrandomization
Differentiating Sensitive Data (columns for CATalyst cacheto columns for Random Fill cache in Tables 4 and 5) allowsthe victim or attacker software or management software toexplicitly label a certain range of the data of victim whichthey think is sensitive The victim process or managementsoftware is able to use cache-specific instructions to protectthe data and limit internal interference between victimrsquosown data For example it is possible to disable victimrsquosown flushing of victimrsquos labeled data and therefore preventvulnerabilities that leverage flushing This technique allowsthe designer to have stronger control over security criticaldata rather than forcing the system to assume all of victimrsquosdata is sensitive However how to identify sensitive dataand whether this identification process is reliable are openresearch questions for caches that support differentiation ofsensitive data
This technique is independent of whether a cache usespartitioning or randomization techniques to eliminate sidechannels between the attacker and the victim Caches thatare able to label and identify sensitive data have theadvantage in preventing internal interference since they areable to differentiate sensitive data from the normal dataand can make use of special instructions to give moreprivileges to sensitive data However it requires careful usewhen identifying the actual sensitive data and implementingcorresponding security features on the cache
Comparing PL cache with SPlowast cache although bothof them use partitioning flush is able to be implementedto be disabled for victimrsquos sensitive data in PL cachewhere Vu V inv
a Vu (slow) (one type of Flush +Time) is prevented Newcache is able to prevent Vu Va Vu (slow) (one type of Bernsteinrsquos Attack [3])while most of the caches without ability to differentiatesensitive data cannot because Newcache disallows replacingdata as long as either data to be evicted or data to becached is identified to be sensitive However permittingdifferentiation of sensitive data can potentially backfire onthe cache itself For example Random Fill cache cannotprevent Vu Ad Vu (slow) (one type of Evict +Time [34]) which most of the other caches can prevent oravoid because the random fill technique loses its intendedrandom behavior when the security critical data is initiallyloaded into the cache in Step 1
Partitioning-Based Caches usually limit the victim and theattacker to be able to only access a limited set of cacheblock (columns for SPlowast cache to column for PL cachein Tables 4 and 5) For example either there is staticor dynamic partitioning of caches which allocates someblocks to High victim and Low attacker The partitioning
can be based not just on whether the memory access isvictimrsquos or attackerrsquos but also on where the access is to(eg High partition is determined by the data address) Forspeculative execution attackerrsquos code can be the part ofspeculation or out-of-order load or store which is able to bepartitioned (eg using speculative load buffer) from othernormal operations The partitioning granularity can be cachesets cache lines or cache ways Partitioning-based securecaches are usually able to prevent external interference bypartitioning but are weak at preventing internal interferenceWhen partitioning is used interference between the attackerand the victim or data belonging to different securitylevels should not be possible and attacks based on externalinterference between the victim and the attacker will failHowever the internal interference of victimrsquos own datais hard to be prevented by the partitioning-based cachesWhatrsquos more partitioning is recognized to be wastefulin terms of cache space and inherently degrades systemperformance [49] Dynamic partitioning can help limit thenegative performance and space impacts but it could be ata cost of revealing some information when adjusting thepartitioning size for each part It also does not help withinternal interference prevention
In terms of the three-step model the partitioning-basedcaches excel at making use of partitioning techniques todisallow the attacker to set initial states (Step 0) of victimpartition by use of flushing or eviction and therefore bringuncertainty to the final timing observation made by theattacker
SPlowast cache can prevent external miss-based interferencebut it still allows the victim and the attacker to getcache hits due to each otherrsquos data which makes hit-based vulnerabilities happen eg Vd Vu Va (fast)(one type of Cache Internal Collision [5]) vulnerabilityis one of the examples that SPlowast cache cannot preventSecVerilog cache is similar to SPlowast cache but prevents theattacker from directly getting cache hit due to victimrsquos datafor confidentiality and therefore prevents vulnerabilitiessuch as Ainv
a Vu Aa (fast) (one type of Flush+ Reload [55]) SHARP cache mainly uses partitioningcombined with random eviction to minimize the probabilityof evicting victimrsquos data and prevent external miss-basedvulnerabilities It is vulnerable to hit-based or internalinterference vulnerabilities such as Vu Va Vu (slow)(one type of Bernsteinrsquos Attack [3]) vulnerability DAWGcache will only allow the data to get a cache hit if bothits address and the process ID are the same Thereforecompared with normal partitioning cache such as SPlowast cacheit is able to prevent vulnerabilities such as Vd Vu Ainv
d
(fast) (one type of Prime + Flush)SecDCP and NoMo cache both leverage dynamic
partitioning to improve performance Compared withSecVerilog cache SecDCP cache introduces certain side
J Hardw Syst Secur (2019) 3397ndash425414
channels which manifest themselves when the number ofways assigned to the victim and attacker changes egVu Ainv
a Vu (slow) (one type of Flush + Time)vulnerability NoMo cache behaves more carefully whenchanging the number of ways during dynamic partitioninghowever it requires victimrsquos sensitive data to fit intothe assigned partitions otherwise it will be put into theunreserved way and allow eviction by the attacker SecDCPdoes not have unreserved way All the space in the cachewill be either belongs to High or Low partition
Sanctum cache and CATalyst cache are both controlledby a powerful software monitor and they disallow securepage sharing between victim and attacker to preventvulnerabilities such as Ad Vu Aa (fast) (onetype of Flush + Reload [55]) Sanctum cache does notconsider internal interference while CATalyst cache ismore carefully designed to prevent different vulnerabilitieswith the implemented software system so far supportingpreventing all of the vulnerabilities but only works forLLC and with high software implementation complexityand some assumptions that might be hard to achievein other scenarios eg assuming the secure partition isbig enough to fit all the secure data MI6 cache is thecombination of Sanctum and disabling speculation wheninteracting with the outside world Therefore in normalexecution it behaves the same as Sanctum For speculativeexecution because it will simply disable all the speculationwhen involving the outside world the external interferencevulnerability such as Vd Vu Ad (slow) (one type ofEvict + Probe) vulnerability will be prevented
InvisiSpec cache does not modify the original cachestate but places the data in a speculative buffer partitionduring the speculation or out-of-order load or store Sinceduring speculation cache state is not actually updated thespeculative execution cannot trigger any of the steps inthe three-step model RIC cache focuses on eviction basedattack and therefore are good at preventing even someinternal miss-based vulnerability such as Vu Va Vu
(slow) (one type of Bernsteinrsquos Attack [3]) but are bad atall hit-based vulnerabilities PL cache is line-partitionedand uses locking techniques for victimrsquos security criticaldata It can prevent many vulnerabilities because preloadingand locking secure data disallow the attacker or non-securevictim data to set initial states (Step 0) for victim partitionand therefore brings uncertainty to the final observation bythe attacker eg Ad Vu Va (fast) (one type of CacheInternal Collision [5]) vulnerability is prevented
Randomization-Based Caches (columns for SHARP cacheand columns for RP cache to columns for Non Deterministiccache in Tables 4 and 5) inherently de-correlate therelationship between information of victimrsquos securitycritical datarsquos address and observed timing from cache
hit or miss or between the address and observed timingof flush or cache coherence operations For speculativeexecution they also de-correlate the relationship betweenthe address of the data being accessed during speculativeexecution or out-of-order load or store and the observedtiming from a cache hit or miss Randomization can beused when bringing data into the cache evicting data orboth Some designs randomize the address to cache setmapping As a result of the randomization the mutualinformation from the observed timing due to having ornot having data in the cache could be reduced to 0 ifrandomization is done on every memory access Somesecure caches use randomization to avoid many of the miss-based internal interference vulnerabilities However theymay still suffer from hit-based vulnerabilities especiallywhen the vulnerabilities are related to internal interferenceHowever randomization is also likewise recognized toincrease performance overheads [23] It also requires afast and secure random number generator Most of therandomization is cache-line-based and can be combinedwith differentiation of sensitive data to be more efficient
RP cache allows eviction between different sensitivedata which leaves vulnerabilities such as Vu Va Vu
(slow) (one type of Bernsteinrsquos Attack [3]) still possiblewhile Newcache prevents this Both of the RP cacheand Newcache are not able to prevent hit-based internal-interference vulnerabilities such as Ainv
a Vu Va
(fast) (one type of Cache Internal Collision [5]) RandomFill cache is able to use total de-correlation of memoryaccess and cache access of victimrsquos security critical datato prevent most of the internal and external interferenceHowever when security critical data is initially directlyloaded into the cache block for Step 1 Random Fill cachewill not randomly load security critical data and allowsvulnerabilities such as Vu V inv
a Vu (slow) (one typeof Flush + Time) vulnerability to exist CEASER cache usesencryption scheme plus dynamic remapping to randomizemapping from memory addresses to cache sets Howeverthis targets eviction-based attacks and cannot prevent hit-based vulnerabilities such as Va V inv
u V inva (fast)
(one type of Flush + Probe Invalidation) SCATTER cacheencrypts both the cache address and process ID whenmapping into different cache index to further prevent morehit-based vulnerabilities for shared and read only memoryNon Deterministic Cache totally randomizes timing ofcache accesses by adding delays and can prevent all attacks(but at tremendous performance cost)
61 Estimated Performance and Security Tradeoffs
Table 6 shows the implementation and performanceresults of the secure caches as listed by the designersin the different papers At the extreme end there is
J Hardw Syst Secur (2019) 3397ndash425 415
Table6
Exi
stin
gse
cure
cach
esrsquo
impl
emen
tatio
nm
etho
dpe
rfor
man
cep
ower
and
area
sum
mar
y
J Hardw Syst Secur (2019) 3397ndash425416
the Non Deterministic cache with random delay thesecure cache can prevent all the cache timingndashbasedvulnerabilities in some degreemdashwhile their paper reportsonly 7 degradation in performance we expect it tobe much more for more complex application than AESalgorithm Disabling caches eliminates the attacks but ata huge performance cost Normally a secure cache needsto sacrifice some performance in order to de-correlatememory access with the timing The secure caches thattend to be able to prevent more vulnerabilities usuallyhave weaker performance compared with other securecaches For example more security seems to imply lessperformance
62 Towards Ideal Secure Cache
Based on the above analysis a good secure cache shouldconsider all the 72 types of Strong vulnerabilities egexternal and internal interference and hit-based and miss-based vulnerabilities Considering all factors and basedon Tables 4 and 5 we have several suggestions andobservations for a secure cache design which can defendtiming-based attacks
bull Internal interference is important for caches to preventtiming-based attacks and is the weak point of most ofthe secure caches To prevent this the following threesubpoints should be considered
ndash Miss-based internal interference can be solvedby randomly evicting data to de-correlatememory access with timing information wheneither data to be evicted or data to be cachedis sensitive eg Newcache prevents Vu Va Vu (slow) (one type of BernsteinrsquosAttack [3]) vulnerability
ndash Hit-based internal interference can be solvedby randomly bringing data into the cache egRandom Fill cache prevents Ad Vu Va
(fast) (Cache Internal Collision) vulnerabilityndash To limit internal interference at lower perfor-
mance cost rather than simply assume all ofvictimrsquos data is sensitive it is better to differ-entiate real sensitive data from other data in thevictim code However identification of sen-sitive information needs to be carefully usedeg Random Fill cache is vulnerable to Vu Ad Vu (fast) (one type of Evict + Time [34])vulnerability which most of the secure cachesare able to prevent
bull Direct partitioning between the victim and the attackeralthough may hurt cache space utilization or perfor-mance is good at disallowing attacker to set known
initial state to victimrsquos partition and therefore pre-vents external interference Alternatively careful use ofrandomization can also prevent external interference
It should be noted that some cache designs only focuson certain levels eg CATalyst cache only works at thelast level cache In order to fully protect the whole cachesystem from timing-based attacks all levels of caches in thehierarchy should be protected with related security featuresFor example Sanctum is able to prevent all levels of cachesfrom L1 to last-level cache Consequently secure cachedesign needs to be realizable at all levels of the cachehierarchy
7 RelatedWork
There are a lot of existing attacks exploring timing-based cache channels eg [1 3 5 16ndash19 34 36 5455] Furthermore our recent paper [11] has summarizedcache timingndashbased side-channel vulnerabilities using athree-step model and inspired this work on checkingwhich vulnerability types are truly defeated by the securecaches in context of timing-based attacks In other workZhang and Lee [59] used finite-state machine to modelcache architectures and leveraged mutual information tomeasure potential side-channel leakage of the modeledcache architectures Meanwhile He and Lee [20] modeledinterference using probabilistic information flow graph andused attackerrsquos success probability to estimate differentcachesrsquo ability to defend against some cache timingndashbasedside-channel attacks However they did not explore allpossible vulnerabilities due to cache timingndashbased channels
There is also some other work focusing on cacheside-channel verification [13 14 47] Among theseCacheAudit [14] efficiently computes possible side-channelobservations using abstractions in a modular way Bit-level and arithmetic reasoning is used in [13] for memoryaccesses in the presence of dynamic memory allocationCacheD [47] detects potential cache differences at eachprogram point leveraging symbolic execution and constraintsolving
Hardware transactional memory has also been lever-aged to prevent timing-based cache side-channel attacks [815] Hardware transactional memory (HTM) is availableon modern commercial processors such as Intelrsquos Transac-tional Synchronization Extensions (TSX) Its main featureis to abort the transaction and roll back the modificationswhenever a cache block contained in the read set or writeset is evicted out of the cache In [15] HTM was com-bined with preloading strategy for code and data to preventFlush + Reload attacks in the local setting and Prime andProbe attacks in the cloud setting In [8] the software-level
J Hardw Syst Secur (2019) 3397ndash425 417
solution targets system calls page faults code refactoringand abort reasoning to eliminate not only Prime + ProbeFlush + Reload but also Evict + time and Cache Collisionattacks
8 Conclusion
This paper first proposed a new three-step model in orderto model all possible cache timing vulnerabilities It furtherprovided a cache three-step simulator and reduction rulesto derive effective vulnerabilities allowing us to find onesthat have not been exploited in literature With exhaustiveeffective vulnerability types listed this paper presentedanalysis of 18 secure processor cache designs with respectto how well they can defend against these timing-basedvulnerabilities Our work showed that vulnerabilities basedon internal interference of the victim application aredifficult to protect against and many secure cache designsfail in this We also provided a summary of secure processorcache features that could be integrated to make an idealsecure cache that is able to defend timing-based attacksOverall implementing a secure cache in a processor can be aviable alternative to defend timing-based attacks Howeverit requires design of an ideal secure cache or correction ofexisting secure cache designs to eliminate the few attacksthat they do not protect against
Acknowledgments This work was support in part by NSF grants1651945 and 1813797 and through Semiconductor Research Corpo-ration (SRC) under GRC Task 2844001
Appendix A Attack Strategies Descriptions
This appendix gives overview of the attack strategies shownin Tables 2 and 3 in Section 3 For each attack strategy anoverview of the three steps of the strategy is given Some ofthe strategies are similar and some may not be precise butwe keep and use the original names as they were assignedin prior work One advantage of our three-step model isthat it gives precise definition of each attack Neverthelessthe attack strategy names used before (and added by us forstrategies which did not have such names) may be useful torecall the attacksrsquo high-level operation
Cache Internal Collision In Step 1 cache blockrsquos data isinvalidated by flushing or eviction done by either theattacker or the victim Then the victim accesses secret datain Step 2 Finally the victim accesses data at a knownaddress in Step 3 if there is a cache hit then it reveals thatthere is an internal collision and leaks value of u
Flush + Reload In Step 1 either the attacker or the victiminvalidates the cache blockrsquos data by flushing or evictionThen the victim access secret data in Step 2 Finally theattacker tries to access some data in Step 2 using a knownaddress If a cache hit is observed then addresses from lasttwo steps are the same and the attacker learns the secretaddress This strategy has similar Step 1 and Step 2 asCache Internal Collision vulnerability but for Step 3 it isthe attacker who does the reload access
Reload + Time (New Name Assigned in this Paper) In Step
1 secret data is invalidated by the victim Then the attackerdoes some known data access in Step 2 that could possiblybring back the invalidated the victimrsquos secret data in Step
1 In Step 3 if the victim reloads the secret data a cachehit is observed and the attacker can derive the secret datarsquosaddress
Flush + Probe (New Name Assigned in this Paper) In Step
1 the victim or the attacker access some known addressIn Step 2 the victim invalidates secret data In Step 3reloading of Step 1rsquos data and observation of a cache misswill help the attacker learn that the secret data maps to theknown address from Step 1
Evict + Time In Step 1 some victimrsquos secret data is put intothe cache by the victim itself In Step 2 the attacker evicts aspecific cache set by performing a memory related operationthat is not a flush In Step 3 the victim reloads secret dataand if a cache miss is observed the will learn the secretdatarsquos cache set information This attack has similar Step 1and Step 3 as Flush + Time vulnerability but for Step 2 inEvict + Time the attacker invalidates some known addressallowing it to find the full address of the secret data insteadof evicting a cache set to only find the secret datarsquos cacheindex as in the Flush + Time attack
Prime + Probe In Step 1 the attacker primes the cache setusing data at address known to the attacker In Step 2 thevictim accesses the secret data which possibly evicts datafrom Step 1 In Step 3 the attacker probes each cache setand if a cache miss is observed the attacker knowns thesecret data maps to the cache set he or she primed
Bernsteinrsquos Attack This attack strategy leverages the vic-timrsquos internal interference to trigger the miss-based attackFor one case the victim does the same secret data access inStep 1 and Step 3 while in Step 2 the victim tries to evictone whole cache setrsquos data by known data accesses If cachemiss is observed in Step 3 that will tell the attacker thecache set is the one secret data maps to For another casethe victim primes and probe a cache set in Step 1 and Step
3 driven by the attacker while in Step 2 the victim tries to
J Hardw Syst Secur (2019) 3397ndash425418
access the secret data Similar to the first case observingcache miss in Step 3 tells the attacker the cache set is theone secret data maps to
Evict + Probe (NewName Assigned in this Paper) In Step 1victim evict the cache set using the access to a data at anaddress known to the attacker In Step 2 the victim accessessecret data which possibly evicts data from Step 1 In Step
3 the attacker probes each cache set using the same data asin Step 1 if a cache miss is observed the attacker knownsthe secret data maps to the cache set he or she primed Thisattack strategy has similar Step 2 and Step 3 as Prime +Probe attack but for Step 1 it is the victim that does theeviction accesses
Prime + Time (NewName Assigned in this Paper) In Step 1the attacker primes the cache set using access to data at anaddress known to the attacker In Step 2 the victim accessessecret data which possibly evicts data from Step 1 In Step
3 the victim probes each cache set using the same data Step
1 if a cache miss is observed the attacker knowns the secretdata maps to the cache set he or she primed in Step 1 Thisattack strategy has similar Step 1 and Step 2 as Prime +Probe attack but for Step 3 it is the victim that does theprobing accesses
Flush + Time (NewNameAssigned in this Paper) The victimaccesses the same secret data in Step 1 and Step 3 whilein Step 2 the attacker tries to invalidate data at a knownaddress If cache miss is observed in Step 3 that will tellthe attacker the data address he or she invalidated in Step 2maps to the secret data
Invalidation related (new names assigned in this paper)Vulnerabilities that have names ending with ldquoinvalidationrdquoin Table 3 correspond to the vulnerabilities that havethe same name (except for the ldquoinvalidationrdquo part) inTable 2 The difference between each set of correspondingvulnerabilities is that the vulnerabilities ending withldquoinvalidationrdquo use invalidation related operation in the laststep to derive the timing information rather than the normalmemory access related operations
Appendix B Soundness Analysisof the Three-StepModel
In this section we analyze the soundness of the three-stepmodel to demonstrate that the three-step model can coverall possible timing-based cache vulnerabilities in normalcaches If there is a vulnerability that is represented usingmore than three steps the steps can be reduced to only three
steps or a three-step sub-pattern can be found in the longerrepresentation
In the below analysis we use β to denote the numberof memory-related operations ie steps in a representationof a vulnerability We show that β = 1 is not sufficient torepresent a vulnerability β = 2 covers some vulnerabilitiesbut not all β = 3 represents all the vulnerabilities andβ gt 3 can be reduced to only three steps or a three-step sub-pattern can be found in the longer representation Knownaddresses refer to all the cache states that interferencewith the data a aalias and d Unknown address refersu An access to a known memory address is denoted asknown access operation and an invalidation of a knownmemory address is denoted as known inv operationThe known access operation and known inv operation
together make up not u operations An unknown memoryrelated operation (containing u) is denoted as u operation
B1 Patterns with β = 1
When β = 1 there is only one memory-relatedoperation and it is not possible to create interferencebetween memory-related operations since two memory-related operations are the minimum requirement for aninterference Furthermore β = 1 corresponds to the three-step pattern with both Step 1 and Step 2 being since thecache state gives no information and Step 3 being theone operation These types of patterns are all examined bythe cache three-step simulator and none of these types arefound to be effective Consequently a vulnerability cannotexit when β = 1
B2 Patterns with β = 2
When β = 2 it satisfies the minimum requirementof an interference for memory related operations andcorresponds to the three-step cases where Step 1 is and Step 2 and Step 3 are the two operations Thesetypes are all examined by the cache three-step simulatorand some of them belong to Weak Vulnerabilities like Aa Vu Therefore three-step cases where Step
1 is have corresponding effective vulnerabilities shownin Table 2 Consequently β = 2 can represent someweak vulnerabilities but not all vulnerabilities as there existsome that are represented with three steps as discussednext
B3 Patterns with β = 3
When β = 3 we have tested all possible combinations ofthree-step memory related operations in Section 33 usingour cache simulator for the three-step model We foundthat there are in total 72 types of Strong Vulnerabilities
J Hardw Syst Secur (2019) 3397ndash425 419
and 64 types of Weak Vulnerabilities that are representedby patterns with β = 3 steps Consequently β = 3can represent all the vulnerabilities (including some weakones where Step 1 is ) Using more steps to representvulnerabilities is not necessary as discussed next
B4 Patterns with β gt 3
When β gt 3 the pattern of memory-related operations fora vulnerability can be reduced using the following rules
B41 Subdivision Rules
First a set of subdivision rules is used to divide the longpattern into shorter patterns following the below rulesEach subdivision rule should be applied recursively beforeapplying the next rule
Subdivision Rule 1 If the longer pattern contains a sub-pattern such as the longer pattern canbe divided into two separate patterns where is assignedas Step 1 of the second pattern This is because givesno timing information and the attacker loses track of thecache state after This rule should be recursively applieduntil there are no sub-patterns left with a in the middleor as last step ( in the last step will be deleted) in thelonger pattern
Subdivision Rule 2 Next if a pattern (derived afterrecursive application of the Rule 1 contains a sub-patternsuch as AinvVinv the longer pattern canbe divided into two separate patterns where AinvVinv isassigned as Step 1 of the second pattern This is becauseAinvVinv will flush all the timing information of thecurrent block and it can be used as the flushing step forStep 1 eg vulnerability Ainv Vu Aa(f ast)shown in Table 2 AinvVinv cannot be a candidate formiddle steps or the last step because it will flush alltiming information making the attacker unable to deducethe final timing with victimrsquos sensitive address translationinformation This rule should be recursively applied untilthere are no sub-patterns left with a AinvVinv in themiddle or the last step (AinvVinv in the last step will bedeleted)
B42 Simplification Rules
For each of the patterns resulting from the subdivisionof the original pattern we define Commute Rules UnionRules and Reduction Rules for a each set of two adjacentsteps in these remaining patterns In Table 7 we showall the possible cases of the rule applying conditions for
each adjacent two steps regardless of the attackerrsquos access(A) or the victimrsquos access (V ) The table shows whetherthe corresponding two steps can be commuted reduced orunioned (and the reduced or the unioned result if the rulescan be applied)
B421 Commute Rules
Suppose there are two adjacent steps M and N for a memorysequences M N If commuting M andN lead to the same observation result ie M N and N M will have thesame timing observation information in the final step for theattacker we can freely exchange the place of M and N inthis pattern In this case we have more chance to Reduceand Union the steps within the memory sequence by thefollowing Rules In the possible commuting process we willtry every possible combinations to commute different pairsof two steps that are able to apply the Commute Rules andthen further apply Reduce Rules and Union Rules to seewhether the commute is effective ie there can be stepsreduced or unioned after the proper commuting process Thefollowing two adjacent memory-related operations can becommuted
ndash Commute Rule 1 For two adjacent steps if one stepis a known access operation and another step is aknown inv operation and the addresses they referto are different these two steps can be commuted nomatter which position of the two steps they are in withinthe whole memory sequence It will show a ldquoyesrdquo forthe corresponding two-step pattern for the CommuteRule 1 column if these two can be commuted in Table 7
ndash Commute Rule 2 A superset of two-step patterns thatcan apply Commute Rule 1 can be commuted if thesecond step of these two adjacent steps is not thelast step in the whole memory sequence There aresome two adjacent steps that can only be commutedif the second step of these two adjacent steps is notthe last step in the whole memory sequence Therewill be a ldquoyesrdquo for the corresponding two-step patternfor the Commute Rule 2 column and a ldquonordquo for thecorresponding two-step pattern for the Commute Rule 1column in Table 7
B422 Reduction Rules
If the memory sequence after applying Commute Ruleshave a sub-pattern that has two adjacent steps both relatedto known addresses or both related to unknown address(including repeating states) the two adjacent steps can bereduced to only one following the reduction rules (if thetwo-step pattern has ldquoyesrdquo for the Column ldquoUnion Rule or
J Hardw Syst Secur (2019) 3397ndash425420
Table7
Rul
esfo
rco
mbi
ning
two
adja
cent
step
s
Firs
tSe
cond
Com
mute
Com
mute
Union
Rule
orC
ombi
ned
Firs
tSe
cond
Com
mute
Com
mute
Union
Rule
orC
ombi
ned
Rule1
Rule2
ReduceRule
Step
Rule1
Rule2
ReduceRule
Step
aa
yes
yes
yes
aa
inv
ano
noye
sa
aa
ali
as
nono
yes
aali
as
ain
va
ali
as
yes
yes
yes
aali
as
ad
nono
yes
da
inv
dye
sye
sye
sd
au
nono
nominus
ain
vu
nono
nominus
aa
inv
nono
yes
ain
va
inv
ain
vye
sye
sye
sa
inv
aa
ali
asin
vye
sye
sye
sa
ain
va
ali
asin
vye
sye
sye
sU
nio
n(a
inva
ali
asin
v)
ad
inv
yes
yes
yes
aa
inv
din
vye
sye
sye
sU
nio
n(a
invd
inv)
au
inv
nono
nominus
ain
vu
inv
noye
sno
minusa
ali
as
ano
noye
sa
aali
asin
va
yes
yes
yes
a
aali
as
aali
as
yes
yes
yes
aali
as
aali
asin
va
ali
as
nono
yes
aali
as
aali
as
dno
noye
sd
aali
asin
vd
yes
yes
yes
d
aali
as
uno
nono
minusa
ali
asin
vu
nono
nominus
aali
as
ain
vye
sye
sye
sa
ali
as
aali
asin
va
inv
yes
yes
yes
Unio
n(a
inva
ali
asin
v)
aali
as
aali
asin
vno
noye
sa
ali
asin
va
ali
asin
va
ali
asin
vye
sye
sye
sa
ali
asin
v
aali
as
din
vye
sye
sye
sa
ali
as
aali
asin
vd
inv
yes
yes
yes
Unio
n(d
inva
ali
asin
v)
aali
as
uin
vno
nono
minusa
ali
asin
vu
inv
noye
sno
minusd
ano
noye
sa
din
va
yes
yes
yes
a
da
ali
as
nono
yes
aali
as
din
va
ali
as
yes
yes
yes
aali
as
dd
yes
yes
yes
dd
inv
dno
noye
sd
du
nono
nominus
din
vu
noye
sno
minusd
ain
vye
sye
sye
sd
din
va
inv
yes
yes
yes
Unio
n(a
invd
inv)
da
ali
asin
vye
sye
sye
sd
din
va
ali
asin
vye
sye
sye
sU
nio
n(d
inva
ali
asin
v)
dd
inv
nono
yes
din
vd
inv
din
vye
sye
sye
sd
inv
du
inv
noye
sno
minusd
inv
uin
vno
yes
nominus
ua
nono
nominus
uin
va
nono
nominus
ua
ali
as
nono
nominus
uin
va
ali
as
nono
nominus
ud
nono
nominus
uin
vd
noye
sno
minusu
uye
sye
sye
su
uin
vu
nono
yes
u
ua
inv
nono
nominus
uin
va
inv
noye
sno
minusu
aali
asin
vno
nono
minusu
inv
aali
asin
vno
yes
nominus
ud
inv
noye
sno
minusu
inv
din
vno
yes
nominus
uu
inv
nono
yes
uin
vu
inv
uin
vye
sye
sye
su
inv
J Hardw Syst Secur (2019) 3397ndash425 421
Reduce Rulerdquo and has no Union result for the ldquoCombinedSteprdquo Column in Table 7
ndash Reduction Rule 1 For two u operations although u isunknown both of the accesses target on the same u socan be reduced to only keep the second access in thememory sequence
ndash Reduction Rule 2 For two known adjacent memoryaccessndashrelated operations (known access operation)they always result in a deterministic state of the secondmemory accessndashrelated cache block so these two stepscan be reduced to only one step
ndash Reduction Rule 3 For two adjacent steps if onestep is known access operation and another one isknown inv operation no matter what order theyhave and the address they refer to is the same these twocan be reduced to one step which is the second step
B423 Union Rules
Suppose there are two adjacent steps M and N for memorysequences M N If combining M andN leads to the same timing observation result ie M N and Union(M N) will havethe same timing observation information in the final step forthe attacker we can combine step M and N to be a joint onestep for this memory sequence defined as Union(M N)Two adjacent steps that can be combined are discussed inthe following cases
ndash Union Rule 1 Two invalidations to two knowndifferent memory addresses can be applied UnionRule 1 known inv operation are two operations bothinvalidating some known address therefore they canbe combined to only one step The Union Rule can becontinuously done to union all the adjacent invalidationstep that invalidates known different memory addresses
B424 Final Check Rules
Each long memory sequence will recursively apply thesethree categorizations of the rules in the order Com-mute Rules first to put known access operations andknown inv operation that targets the same address as nearas possible and u operations and not u operations areputting together as much as possible The Reduced Rulesare then checked and applied to the processed memorysequence to reduce the steps Then the Union Rule isapplied to the processed memory sequence
The recursion at each application to these threecategorizations of the rules should be always applied andreduce at least one step until the resulting sequence matchesone of the two possible cases
bull the long (β gt 3) memory sequence with u operation
and not u operation is further reduced to a sequencewhere there are at most three steps in the followingpatterns or less
ndash u operation not u operation u operation
ndash not u operation u operation not u operation
There might be possible extra or AinvV inv beforethese three-step pattern where
bull An extra in the first step will not influencethe result and can be directly removed
bull If an extra AinvV inv in the first step
ndash If followed by known access ope-ration AinvV inv can be removeddue to the actual state further put intothe cache block
ndash If followed by known inv opera-t ion or V inv
u AinvV inv can also beremoved since the memory locationis repeatedly flushed by the two steps
ndash If followed by Vu worst case will beAinvV inv Vu not u operation u operationwhich is either an effective vul-nerability according to Table 2and reduction rules shown inSection 33 or AinvV inv Vu Ainv
d V invd u operation where
Vu Ainvd V inv
d can further beapplied Commute Rule 2 to reduceand be within three steps
In this case the steps are finally within three steps andthe checking is done
bull There exist two adjacent steps that cannot be appliedany Rules above and requires the Rest Checking
The only left two adjacent steps that cannot be appliedby any of the three categorizations of the Rules are thefollowing
ndash AaVaAaalias Vaalias AdVdAinva V inv
a
Ainvaalias V inv
aalias Vu ndash AaVaAaalias Vaalias V inv
u ndash Vu AaVaAaalias Vaalias AdVd
Ainva V inv
a Ainvaalias V inv
aalias ndash V inv
u AaVaAaalias Vaalias We manually checked all of the two adjacent step patterns
above and found that adding extra step before or after these
J Hardw Syst Secur (2019) 3397ndash425422
Algorithm 2 -Step ( 3) pattern reduction
Input number of steps of the pattern
a two-dimensional dynamic-size array [0] contains the states of each step of the original pattern in
order [1] [2] are empty initially
Output reduced effective vulnerability pattern(s) array It will be an empty list if the original pattern does not
correspond to an effective vulnerability
1 = Oslash
2 while contain( ) and index not 0 do3 = 1 ( )
4 end while5 while ( contain( ) and index not 0) or ( contain( ) and index not 0) do6 = 2 ( )
7 end while8 while ( is ineffecitve or has interval effective three steps) do9 = ( )
10 = ( )
11 = ( )
12 if ( is ineffecitve or has interval effective three steps) then13 += ( )
14 end if15 end while16 return
two steps can either generate two adjacent step patterns thatbe processed by the three Rules where further step can bereduced or construct effective vulnerability according toTable 2 and reduction rules shown in Section 33 wherethe corresponding pattern can be treated effective and thechecking is done
B43 Algorithm for Reducing and CheckingMemorySequence
Algorithm 2 is used to (i) reduce a β-step (β gt 3)pattern to a three-step pattern thus demonstrating that thecorresponding β gt 3 step pattern actually is equivalent tothe output three-step pattern and represents a vulnerabilitythat is captured by an existing three-step pattern or (ii)demonstrate that the β-step pattern can be mapped to one ormore three-step vulnerabilities It is not possible for a β-stepvulnerability pattern to not be either (i) or (ii) after doing theRule applications Key outcome of our analysis is that anyβ-step pattern is not a vulnerability or if it is a vulnerabilityit maps to either outputs (i) or (ii) of the algorithm
Inside the Algorithm 2 contain() represents a function tocheck if a list contains a corresponding state is ineffective()represents a function that checks the corresponding memorysequence does not contain any effective three-stepshas interval effective three steps() represents a functionthat check if the corresponding memory sequence can bemapped to one or more three-step vulnerabilities
B44 Summary
In conclusion the three-step model can model all possibletiming-based cache vulnerability in normal caches Vulner-abilities which are represented by more than three steps canbe always reduced to one (or more) vulnerabilities from ourthree-step model and thus using more than three step is notnecessary
References
1 Acıicmez O Koc CK (2006) Trace-driven cache attacks on AES(short paper) In International conference on information andcommunications security Springer pp 112ndash121
2 Agrawal D Archambeault B Rao JR Rohatgi P (2002) TheEM side-channel (s) In International workshop on cryptographichardware and embedded systems Springer pp 29ndash45
3 Bernstein DJ (2005) Cache-timing attacks on AES4 Binkert N Beckmann B Black G Reinhardt SK Saidi A Basu
A Hestness J Hower DR Krishna T Sardashti S et al (2011)The gem5 simulator ACM SIGARCH computer architecture news39(2)1ndash7
5 Bonneau J Mironov I (2006) Cache-collision timing attacksagainst AES In International workshop on cryptographichardware and embedded systems Springer pp 201ndash215
6 Borghoff J Canteaut A Guneysu T Kavun EB Knezevic MKnudsen LR Leander G Nikov V Paar C Rechberger Cet al (2012) Princendasha low-latency block cipher for pervasivecomputing applications In International conference on the theoryand application of cryptology and information security Springerpp 208ndash225
J Hardw Syst Secur (2019) 3397ndash425 423
7 Bourgeat T Lebedev I Wright A Zhang S Devadas S et al (2019)MI6 Secure enclaves in a speculative out-of-order processorIn Proceedings of the 52nd Annual IEEEACM InternationalSymposium on Microarchitecture ACM pp 42ndash56
8 Chen S Liu F Mi Z Zhang Y Lee RB Chen H Wang X(2018) Leveraging hardware transactional memory for cache side-channel defenses In Proceedings of the 2018 on Asia conferenceon computer and communications security ACM pp 601ndash608
9 Clark SS Ransford B Rahmati A Guineau S Sorber J Xu WFu K (2013) Wattsupdoc power side channels to nonintrusivelydiscover untargeted malware on embedded medical devices InPresented as part of the 2013 USENIX workshop on healthinformation technologies
10 Costan V Lebedev IA Devadas S (2016) Sanctum minimalhardware extensions for strong software isolation In USENIXsecurity symposium pp 857ndash874
11 Deng S Xiong W Szefer J (2018) Cache timing side-channel vulnerability checking with computation tree logic InProceedings of the 7th international workshop on hardware andarchitectural support for security and privacy ACM p 2
12 Domnitser L Jaleel A Loew J Abu-Ghazaleh N Ponomarev D(2012) Non-monopolizable caches low-complexity mitigation ofcache side channel attacks ACM Transactions on Architectureand Code Optimization (TACO) 8(4)35
13 Doychev G Kopf B (2017) Rigorous analysis of softwarecountermeasures against cache attacks In Proceedings of the 38thACM SIGPLAN conference on programming language design andimplementation ACM pp 406ndash421
14 Doychev G Kopf B Mauborgne L Reineke J (2015) CacheAudita tool for the static analysis of cache side channels ACM Trans-actions on information and system security (TISSEC) 18(1)4
15 Gruss D Lettner J Schuster F Ohrimenko O Haller I CostaM (2017) Strong and efficient cache side-channel protectionusing hardware transactional memory In USENIX securitysymposium pp 217ndash233
16 Gruss D Maurice C Wagner K Mangard S (2016) Flush+ flusha fast and stealthy cache attack In International conference ondetection of intrusions and malware and vulnerability assessmentSpringer pp 279ndash299
17 Gruss D Spreitzer R Mangard S (2015) Cache template attacksautomating attacks on inclusive last-level caches In USENIXsecurity symposium pp 897ndash912
18 Guanciale R Nemati H Baumann C Dam M (2016) Cache stor-age channels alias-driven attacks and verified countermeasuresIn 2016 IEEE symposium on security and privacy (SP) IEEEpp 38ndash55
19 Gullasch D Bangerter E Krenn S (2011) Cache gamesndashbringingaccess-based cache attacks on AES to practice In 2011 IEEEsymposium on security and privacy (SP) IEEE pp 490ndash505
20 He Z Lee RB (2017) How secure is your cache against side-channel attacks In Proceedings of the 50th annual IEEEACMinternational symposium on microarchitecture ACM pp 341ndash353
21 Intel C (2015) Improving real-time performance by utilizing cacheallocation technology Intel Corporation
22 Kayaalp M Khasawneh KN Esfeden HA Elwell J Abu-Ghazaleh N Ponomarev D Jaleel A (2017) RIC relaxed inclusioncaches for mitigating LLC side-channel attacks In 2017 54thACMEDACIEEE design automation conference (DAC) IEEEpp 1ndash6
23 Keramidas G Antonopoulos A Serpanos DN Kaxiras S (2008)Non deterministic caches a simple and effective defense againstside channel attacks Des Autom Embed Syst 12(3)221ndash230
24 Kessler RE Hill MD (1992) Page placement algorithms for largerealindexed caches ACM Trans Comput Syst (TOCS) 10(4)338ndash359
25 Kiriansky V Lebedev I Amarasinghe S Devadas S Emer J(2018) DAWG a defense against cache timing attacks in spec-ulative execution processors In 2018 51st Annual IEEEACMinternational symposium on microarchitecture (MICRO) IEEEpp 974ndash987
26 Kocher P Genkin D Gruss D Haas W Hamburg M LippM Mangard S Prescher T Schwarz M Yarom Y (2018)Spectre attacks exploiting speculativeexecution In 2019 IEEESymposium on Security and Privacy (SP) IEEE pp 1ndash19
27 Lee RB Kwan P McGregor JP Dwoskin J Wang Z (2005)Architecture for protecting critical secrets in microprocessorsIn ACM SIGARCH computer architecture news vol 33 IEEEComputer Society pp 2ndash13
28 Lee Y Waterman A Avizienis R Cook H Sun C StojanovicV Asanovic K (2014) A 45nm 13 GHz 167 double-precisionGFLOPSW RISC-V processor with vector accelerators InEuropean solid state circuits conference (ESSCIRC) ESSCIRC2014-40th IEEE pp 199ndash202
29 Liu F Ge Q Yarom Y Mckeen F Rozas C Heiser G Lee RB(2016) CATalyst defeating last-level cache side channel attacks incloud computing In 2016 IEEE international symposium on highperformance computer architecture (HPCA) IEEE pp 406ndash418
30 Liu F Lee RB (2014) Random fill cache architecture In2014 47th annual IEEEACM international symposium onmicroarchitecture (MICRO) IEEE pp 203ndash215
31 Liu F Wu H Mai K Lee RB (2016) Newcache secure cachearchitecture thwarting cache side-channel attacks IEEE Micro36(5)8ndash16
32 Luk CK Cohn R Muth R Patil H Klauser A Lowney G WallaceS Reddi VJ Hazelwood K (2005) Pin building customizedprogram analysis tools with dynamic instrumentation In ACMsigplan notices vol 40 ACM pp 190ndash200
33 Masti RJ Rai D Ranganathan A Muller C Thiele L Capkun S(2015) Thermal covert channels on multi-core platforms In 24thUSENIX security symposium (USENIX security 15) pp 865ndash880
34 Osvik DA Shamir A Tromer E (2006) Cache attacks andcountermeasures the case of AES In Cryptographersrsquo track atthe RSA conference Springer pp 1ndash20
35 Patel A Afram F Chen S Ghose K (2011) MARSS Afull system simulator for multicore x86 CPUs In 2011 48thACMEDACIEEE design automation conference (DAC) IEEEpp 1050ndash1055
36 Percival C (2005) Cache missing for fun and profit BSDCan 2005Ottawa 2005 httpwwwdaemonologynetpapershttpdf
37 Hammarlund P Martinez AJ Bajwa AA Hill DL Hallnor EJiang H Dixon M Derr M Hunsaker M Kumar R Osborne RB(2014) Haswell The fourth-generation intel core processor IEEEMicro 1034(2)6ndash20
38 Qureshi MK (2018) CEASER mitigating conflict-based cacheattacks via encrypted-address and remapping In 2018 51Stannual IEEEACM international symposium on microarchitecture(MICRO) IEEE pp 775ndash787
39 Sanchez D Kozyrakis C (2013) ZSIm fast and accuratemicroarchitectural simulation of thousand-core systems In ACMSIGARCH computer architecture news vol 41 ACM pp 475ndash486
40 Schwarz M Schwarzl M Lipp M Gruss D (2018) NetSpectreread arbitrary memory over network In European Symposium onResearch in Computer Security Springer pp 279ndash299
41 Seznec A (1993) A case for two-way skewed-associative cachesACM SIGARCH computer architecture news 21(2)169ndash178
42 Sharkey J Ponomarev D Ghose K (2005) M-sim a flexiblemultithreaded architectural simulation environment Techenicalreport Department of Computer Science State University ofNew York at Binghamton In Technical Report 20012 CompaqComputer Corporation Aug 2001
J Hardw Syst Secur (2019) 3397ndash425424
43 Shivakumar P Jouppi NP (2001) Cacti 30 an integrated cachetiming power and area model In Technical Report 20012Compaq Computer Corporation Aug 2001
44 Taylor G Davies P Farmwald M (1990) The TLB slice-alow-cost high-speed address translation mechanism In 17thannual international symposium on computer architecture 1990Proceedings IEEE pp 355ndash363
45 Thoziyoor S Muralimanohar N Ahn JH Jouppi NP (2008)CACTI 51 Technical Report HPL-2008-20 HP Labs
46 Trippel C Lustig D Martonosi M (2018) MeltdownPrimeand SpectrePrime automatically-synthesized attacks exploitinginvalidation-based coherence protocols arXiv180203802
47 Wang S Wang P Liu X Zhang D Wu D (2017) CacheDidentifying cache-based timing channels in production softwareIn 26th USENIX security symposium USENIX association
48 Wang Y Ferraiuolo A Zhang D Myers AC Suh GE (2016)SecDCP secure dynamic cache partitioning for efficient timingchannel protection In 2016 53nd ACMEDACIEEE designautomation conference (DAC) IEEE pp 1ndash6
49 Wang Z Lee RB (2007) New cache designs for thwarting softwarecache-based side channel attacks In ACM SIGARCH computerarchitecture news vol 35 ACM pp 494ndash505
50 Wang Z Lee RB (2008) A novel cache architecture withenhanced performance and security In 2008 41st IEEEACMinternational symposium on Microarchitecture 2008 MICRO-41IEEE pp 83ndash93
51 Werner M Unterluggauer T Giner L Schwarz M Gruss D Man-gard S (2019) ScatterCache thwarting cache attacks via cache setrandomization In 28th USENIX security symposium (USENIXSecurity 19) USENIX Association Santa Clara CA httpswwwusenixorgconferenceusenixsecurity19presentationwerner
52 Yan M Choi J Skarlatos D Morrison A Fletcher C Torrellas J(2018) InvisiSpec making speculative execution invisible in thecache hierarchy In 2018 51st annual IEEEACM internationalsymposium on microarchitecture (MICRO) IEEE pp 428ndash441
53 Yan M Gopireddy B Shull T Torrellas J (2017) Secure hierarchy-aware cache replacement policy (SHARP) defending againstcache-based side channel attacks In Proceedings of the 44thannual international symposium on computer architecture ACMpp 347ndash360
54 Yao F Doroslovacki M Venkataramani G (2018) Are coherenceprotocol states vulnerable to information leakage In 2018IEEE international symposium on high performance computerarchitecture (HPCA) IEEE pp 168ndash179
55 Yarom Y Falkner K (2014) FLUSH+ RELOAD a high resolutionlow noise L3 cache side-channel attack In USENIX securitysymposium pp 719ndash732
56 Zhang D Askarov A Myers AC (2012) Language-based controland mitigation of timing channels ACM SIGPLAN Not 47(6)99ndash110
57 Zhang D Wang Y Suh GE Myers AC (2015) A hardware designlanguage for timing-sensitive information-flow security In ACMSIGARCH computer architecture news vol 43 ACM pp 503ndash516
58 Zhang S Wright A Bourgeat T Arvind A (2018) Composablebuilding blocks to open up processor design In 2018 51stannual IEEEACM international symposium on microarchitecture(MICRO) IEEE pp 68ndash81
59 Zhang T Lee RB (2014) New models of cache architecturescharacterizing information leakage from cache side channels InProceedings of the 30th annual computer security applicationsconference ACM pp 96ndash105
60 Zhang Y Parikh D Sankaranarayanan K Skadron K Stan M(2003) HotLeakage a temperature-aware model of subthresholdand gate leakage for architects University of Virginia Dept ofComputer Science Tech Report CS-2003 5
Publisherrsquos Note Springer Nature remains neutral with regard tojurisdictional claims in published maps and institutional affiliations
J Hardw Syst Secur (2019) 3397ndash425 425
Fig 1 The 17 possible states for a single cache block in our three-step model a Vu b AaVaAaalias Vaalias AdVd c AinvV inv dAinv
a V inva Ainv
aalias V invaalias A
invd V inv
d e V invu f )
In each sub-figure of Fig 1 the left-most part shows thepossible state being described in the sub-figure The middlepart shows the possible situation of the cache state affectedby each For all sub-figures the middle cache block (shownin bold) is the targeted cache block Right-most part showsthe memory region in relation to the cache block Recallthe addresses a and aalias are within the sensitive set ofaddresses x while d is outside the set of sensitive addresses(for simplicity the set is shown as a contiguous region butit can be any set) Also recall A represents the operationsperformed by the attacker and V represents the victimrsquosoperations
Figure 1a shows the description of the possible state Vuwhere address u is within sensitive set and unknown to theattacker Therefore it can possibly map to any cache blockincluding the target cache block shown in the middle Sinceits position in the cache and specific address is unknown
we show Vu in dashed lines Meanwhile Fig 1e shows thedescription of the possible state V inv
u which is result ofthe victim invalidating data at the sensitive address u andpossibly invalidating some address within sensitive regionFurther Fig 1f shows the description of the possible statelowast which represents null knowledge of the address for theattacker to this corresponding cache block Therefore it canpossibly refers to any address in the memory or no validaddress at all
Figure 1b shows the description of the possible stateAaVaAaalias Vaalias AdVd Their addresses are all knownto the attacker and map to the same targeted cache blockBoth a and aalias are within the sensitive set of addressesx and aalias as its name indicates is a different addressthan a but still within set x and maps to the same cacheblock as a Address d is outside of the set x MeanwhileFigure 1d shows the description of the possible state
J Hardw Syst Secur (2019) 3397ndash425 401
Ainva V inv
a Ainvaalias V inv
aalias Ainvd V inv
d which correspondto invalidation of the address shown in the subscript ofthe state Some additional possible invalidation statesAinvV inv are shown in Figure 1c These states indicate novalid address is in the cache block Therefore all the possi-ble addresses that mapped to this cache block eg a aalias d and u (if it mapped to this block) before the invalidationstep AinvV inv will be flushed back to the memory
33 Derivation of All Cache TimingndashBasedVulnerabilities
With the 17 candidate states shown in Table 1 for eachstep there are in total 17 lowast 17 lowast 17 = 4913 combinationsof three steps We developed a cache three-step simulatorand a set of reduction rules to process all the three-step combinations and decide which ones can indicate areal attack As is shown in Fig 2 the exhaustive list ofthe 4913 combinations will first be input to the cachethree-step simulator where the preliminary classificationof vulnerabilities is derived The effective vulnerabilitieswill then be sent as the input to the reduction rules toremove the redundant three steps and obtain final list ofvulnerabilities
331 Cache Three-Step Simulator
We developed a cache three-step simulator that simulatesthe state of one cache block and derives the attackerrsquosobservations in the last step of the three-step patterns that itanalyzes for different possible u Since u is in secure rangex the possible candidates of u for a cache block are a aalias
and NIB (Not-In-Block) Here NIB indicates the case thatu does not have same index as a or aalias and thus does notmap to this cache block
The cache three-step simulator is implemented inPython script and its pesudo implementation is shown inAlgorithm 1 Simulatorrsquos inputs are 17 possible states foreach of the step Outputs are all the vulnerabilities thatbelong to the Strong or the Weak type or the Ineffective typeThe simulator uses a nested for loop to check all possiblecombinations (4913) of the three step pattern For each step
Exhaustive List
of all possible
three-step
combinations
Cache
Three-Step
Simulator
Preliminary Strong Vulnerability
Preliminary WeakVulnerability
Ineffective Three-Step
Reduction
Rules
StrongVulnerability
WeakVulnerability
Classification Step
Reduction Step
Vulnerability TypesVulnerability Types
4913
132
572
4209
72
64
Fig 2 Procedure to derive the effective types of three-step timing-based vulnerabilities Ovals refer to the number of vulnerabilitiesin each category
of each pattern if it is Vu this step will be extended to beone of three candidates Va Vaalias and VNIB If it is V inv
u this step will be extended to be one of three candidatesV inv
a V invaalias and V inv
NIB We wrote a function output timingthat takes three known memory access steps as input andoutput whether fast or slow timing will be observed forthe last step In this case for each of the u-related steprsquoscandidate we can derive a timing observation Using thesetiming observation function judge type decides whether athree-step pattern is a potential vulnerability by analyzingwhether the attacker is able to observe different andunambiguous timing for different values of u
The simulator categorizes all the three-step patternsinto three categories as listed below Figure 3 showstwo examples for the Strong Vulnerability (a b) WeakVulnerability (c d) and Ineffective Three-Step (e f)categories respectively
1 Strong Vulnerability When a fast or slow timing isobserved by the attacker he or she is able to uniquelydistinguish the value of u (either it maps to someknown address or has the same index with some knownaddress) In this case the vulnerability has stronginformation leakage (ie attacker can directly obtainthe value of u based on the observed timing) Wecategorize these vulnerabilities to be strong Eg forVd Vu Aa vulnerability shown in Fig 3aif u maps to a the attacker will always derive fasttiming If u is aalias or NIB slow timing will beobserved This indicates that the attacker is able tounambiguously infer the victimrsquos behavior (u) from thetiming observation
2 Weak Vulnerability When fast or slow timing isobserved by the attacker he or she knows it correspondsto more than one possible value of u (eg a or aalias)For these vulnerabilities timing variation can still beobserved due to different victimrsquos behavior Howeverthe attacker cannot learn the value of the index ofthe address u unambiguously For example for type Vu Ainv
a shown in Fig 3c when fast timingis observed u possibly maps to aalias or NIB (thereason for the possibility of u mapping to NIB to derivefast timing is that due to the in Step 1 the cachecould have a hit and then Aa would result in a cachehit) On the other hand when slow timing is observedu possibly maps to a or NIB This pattern leads touncertain u guess about value of u based on timingobservation
3 Ineffective Three-Step The remaining types are treatedto be ineffective For example for type Aa Vu Ad
shown in Fig 3f no matter what the value of u isattackerrsquos observation is always slow timing
J Hardw Syst Secur (2019) 3397ndash425402
Pseudo-code for the cache three-step simulator algorithm
Input a list containing 17 possible states for each of the step
Output a list containing all the vulnerabilities that belong to the Strong type
a list containing all the vulnerabilities that belong to the Weak type
a list containing all the ineffective typs
1 for 1 do2 for 2 do3 for 3 do4 1 2 3
5 array to store all possible candidate combinations of this three-step pattern
6 array to store all possible timing observation regading different candidate combinations for this
three-step pattern
7 if 0 or 1 or 2 then8 for 3 rsquos candidates are and rsquos candidates are
and Both candidatersquos number is 3 do9 0
1 2
10 end for11 for 3 do12
13 end for14 if then15 strongappend(steps)
16 else17 if then18
19 else20
21 end if22 end if23 else24
25 continue26 end if27 end for28 end for29 end for
Algorithm 1
After computing the type of all the three-step patternsthe cache three-step simulator will output effective (StrongVulnerability or Weak Vulnerability) three-step patternsDue to the space limit we only list and analyze the Strongvulnerabilities in this paper Weak vulnerabilities are left forfuture work when channels with smaller channel capacitiesare desired to be analyzed
332 Reduction Rules
We also have developed rules that can further reduce theoutput list of all the effective three steps from the cache
three-step simulator Figure 2 shows how the output of thesimulator is filtered through the reduction rules to get thefinal list of vulnerabilities Reductionrsquos goal is to removevulnerabilities of repeating or redundant types from the liststo form effective Strong Vulnerability or Weak Vulnerabilityoutput A script was developed that automatically appliesbelow reduction rules to the output of the simulator to getthe final list of vulnerabilities A three-step combinationwill be eliminated if it satisfies one of the below rules
1 Three-step patterns with two adjacent steps which arerepeating or which are both known to the attacker canbe eliminated eg Ad Aa Vu can be reduced
J Hardw Syst Secur (2019) 3397ndash425 403
Behavior (u)a
aalias
Observation
NIB
fast
slow
Vd Vu Aa
Behavior (u)a
aalias
Observation
NIB
fast
slow
Vu Aainv
Behavior (u)a
aalias
Observation
NIB
fast
slow
Vd Vuinv Vd
(a)
(c)
(e)
Behavior (u)a
aalias
Observation
NIB
fast
slow
Vu Ad Vuinv
Behavior (u)a
aalias
Observation
NIB
fast
slow
Aaaliasinv Vu
inv Va
Behavior (u)a
aalias
Observation
NIB
fast
slow
Aa Vu Ad
(b)
(d)
(f)
Eg Eg
Eg Eg
Eg Eg
Fig 3 Examples of relations between victimrsquos behavior (u) andattackerrsquos observation for each vulnerability type a b StrongVulnerability c d Weak Vulnerability e f Ineffective Three-Step
to Aa Vu which is equivalent to Aa VuTherefore Ad Aa Vu is a repeat type of Aa Vu and can be eliminated
2 Three-step patterns with a step involving a knownaddress a and an alias to that address aalias givesthe same information Thus three step combinationswhich only differ in use of a or aalias cannot representdifferent attacks and only one combination needs tobe considered For example Vu Aaalias Vu is arepeat type of Vu Aa Vu and we will eliminatethe first pattern
3 Three-step patterns with steps Vu and V invu in adjacent
consecutive steps with each other will only keep thelatter step and eliminate the first step For exampleAa Vu V inv
u can be reduced to Aa V invu and
further equivalent to Aa V invu So Aa Vu
V invu can be eliminated
333 Categorization of StrongVulnerabilities
As is shown in Fig 2 after applying the reduction rulesthere are 72 types of Strong vulnerabilities remaining InAppendix B we analyze the soundness of the three-stepmodel to demonstrate that the three-step model can cover allpossible cache timingndashbased side-channel vulnerabilitiesAnd if there is a vulnerability it can always be reduced toa model that requires only three steps Table 2 lists all the
vulnerability types of which the last step is a memory accessand Table 3 shows all the vulnerability types of which thelast step is an invalidation-related operation To ease theunderstanding of all the vulnerability types we group thevulnerabilities based on attack strategies (left-most columnin Tables 2 and 3) these strategies correspond to well-known names for the attacks if such exist otherwise weprovide a new name In Appendix A we provide descriptionfor each attack strategy to show the main idea behind themWe use existing names for attack strategies where suchexisted before even if similar attacks eg attacks differingin only one step have been given different names before Weuse these established names to avoid confusion but detailsome of the similarities in Appendix A as a clarification
The list of vulnerability types can be further collectedinto four simple macro types which cover one or morevulnerability types internal interference miss-based (IM)internal interference hit-based (IH) external interferencemiss-based (EM) external interference hit-based (EH) aslabeled in the Macro Type column of Tables 2 and 3 Allthe types of vulnerabilities that only involve the victimrsquosbehavior V in the states in Step 2 and Step 3 are calledinternal interference vulnerabilities (I) The remaining onesare called external interference (E) Some vulnerabilitiesallow the attacker to learn that the address of the victimaccesses map to the set the attacker is attacking by observingslow timing due to a cache miss or f ast timing due toinvalidation of data not in the cache2 We call these miss-based vulnerabilities (M) The remaining ones leverageobservation of f ast timing due to a cache hit or slow timingdue to an invalidation of an address that is currently valid inthe cache and are called hit-based vulnerabilities (H)
Many vulnerability types have been explored before Forexample the Cache Collision attack [5] is effectively basedon the Internal Collision and it maps to types labeled (2)in the Attack column in Tables 2 and 3 The types labelednew correspond to new attack not previously discussedin literature We believe these 43 are new attacks notpreviously analyzed nor known
4 Secure Caches
Having explained the three-step model we now explorethe various secure caches which have been presented inliterature to date [7 10 12 22 23 25 27 29 30 38 48ndash53 56 57] Later in Section 5 we will apply the three-stepmodel to check if the secure caches can defend some or allof the vulnerabilities in our model
2Invalidation is fast when the corresponding address which is to beinvalidated does not exist in the cache since no operation is needed forthe invalidation
J Hardw Syst Secur (2019) 3397ndash425404
Table 2 The table shows all the cache timing-based cache vulnerabilities where the last step is a memory access related operation The AttackStrategy column gives a common name for each set of one or more specific vulnerabilities that would be exploited in an attack in a similar mannerThe Vulnerability Type column gives the three steps that define each vulnerability For Step 3 fast indicates a cache hit must be observed toderive sensitive address information while slow indicates a cache miss must be observed The Macro Type column proposes the categorizationthe vulnerability belongs to ldquoErdquo is for external interference vulnerabilities ldquoIrdquo is for internal interference vulnerabilities ldquoMrdquo is for miss-basedvulnerabilities ldquoHrdquo is for hit-based vulnerabilities The Attack column shows if a type of vulnerability has been previously presented in literature
Attack Strategy Vulnerability Type Macro Type Attack
Step 1 Step 2 Step 3
Cache Internal Collision Ainv Vu Va (fast) IH (2)
V inv Vu Va (fast) IH (2)
Ad Vu Va (fast) IH (2)
Vd Vu Va (fast) IH (2)
Aaalias Vu Va (fast) IH (2)
Vaalias Vu Va (fast) IH (2)
Ainva Vu Va (fast) IH (2)
V inva Vu Va (fast) IH (2)
Flush + Reload Ainva Vu Aa (fast) EH (5)
V inva Vu Aa (fast) EH (5)
Ainv Vu Aa (fast) EH (5)
V inv Vu Aa (fast) EH (5)
Ad Vu Aa (fast) EH (5)
Vd Vu Aa (fast) EH (5)
Aaalias Vu Aa (fast) EH (5)
Vaalias Vu Aa (fast) EH (5)
Reload + Time V invu Aa Vu (fast) EH new
V invu Va Vu (fast) IH new
Flush + Probe Aa V invu Aa (slow) EM (6)
Aa V invu Va (slow) IM new
Va V invu Aa (slow) EM new
Va V invu Va (slow) IM new
Evict + Time Vu Ad Vu (slow) EM (1)
Vu Aa Vu (slow) EM (1)
Prime + Probe Ad Vu Ad (slow) EM (4)
Aa Vu Aa (slow) EM (4)
Bernsteinrsquos Attack Vu Va Vu (slow) IM (3)
Vu Vd Vu (slow) IM (3)
Vd Vu Vd (slow) IM (3)
Va Vu Va (slow) IM (3)
Evict + Probe Vd Vu Ad (slow) EM newVa Vu Aa (slow) EM new
Prime + Time Ad Vu Vd (slow) IM newAa Vu Va (slow) IM new
Flush + Time Vu Ainva Vu (slow) EM new
Vu V inva Vu (slow) IM new
(1) Evict + Time attack [34] (2) Cache Internal Collision attack [5] (3) Bernsteinrsquos attack [3] (4) Prime + Probe attack [34 36] Alias-drivenattack [18] (5) Flush + Reload attack [54 55] Evict + Reload attack [17] (6) SpectrePrime MeltdownPrime attack [46]
J Hardw Syst Secur (2019) 3397ndash425 405
Table 3 The table shows the second part of the timing-based cache side-channel vulnerabilities where the last step is an invalidation relatedoperation For Step 3 fast indicates no corresponding address of the data is invalidated while slow indicates invalidation operation makes somedata invalid causing longer processing time
Attack Vulnerability Type Macro Attack
Step 1 Step 2 Step 3
Strategy Type
Cache Internal Collision Invalidation Ainv Vu V inva (slow) IH new
V inv Vu V inva (slow) IH new
Ad Vu V inva (slow) IH new
Vd Vu V inva (slow) IH new
Aaalias Vu V inva (slow) IH new
Vaalias Vu V inva (slow) IH new
Flush + Flush Ainva Vu V inv
a (slow) IH (1)
V inva Vu V inv
a (slow) IH (1)
Ainva Vu Ainv
a (slow) EH (1)
V inva Vu Ainv
a (slow) EH (1)
Flush + Reload Invalidation Ainv Vu Ainva (slow) EH new
V inv Vu Ainva (slow) EH new
Ad Vu Ainva (slow) EH new
Vd Vu Ainva (slow) EH new
Aaalias Vu Ainva (slow) EH new
Vaalias Vu Ainva (slow) EH new
Reload + Time Invalidation V invu Aa V inv
u (slow) EH new
V invu Va V inv
u (slow) IH new
Flush + Probe Invalidation Aa V invu Ainv
a (fast) EM new
Aa V invu V inv
a (fast) IM new
Va V invu Ainv
a (fast) EM new
Va V invu V inv
a (fast) IM new
Evict + Time Invalidation Vu Ad V invu (fast) EM new
Vu Aa V invu (fast) EM new
Prime + Probe Invalidation Ad Vu Ainvd (fast) EM new
Aa Vu Ainva (fast) EM new
Bernsteinrsquos Invalidation Attack Vu Va V invu (fast) IM new
Vu Vd V invu (fast) IM new
Vd Vu V invd (fast) IM new
Va Vu V inva (fast) IM new
Evict + Probe Invalidation Vd Vu Ainvd (fast) EM new
Va Vu Ainva (fast) EM new
Prime + Time Invalidation Ad Vu V invd (fast) IM new
Aa Vu V inva (fast) IM new
Flush + Time Invalidation Vu Ainva V inv
u (fast) EM new
Vu V inva V inv
u (fast) IM new
(1) Flush + Flush attack [16]
J Hardw Syst Secur (2019) 3397ndash425406
This section gives brief overview of the 18 secure cachedesigns that have been presented in academic literaturein the last 15 years To the best of our knowledge thesecover all the secure cache designs proposed to date Mostof the designs have been realized in functional simulationeg [23 48] Some have been realized in FPGA eg [7]and a few have been realized in real ASIC hardwareeg [31] No specific secure caches have been implementedin commercial processors to the best of our knowledgehowever CATalyst [29] leverages Intelrsquos CAT (CacheAllocation Technology) technology available today in IntelXeon E5 2618L v3 processors and could be deployed today
When the secure cache description in the cited papersdid not mention the issue of using flush or cache coherencewe assume the victim or the attacker cannot invalidateeach otherrsquos cache blocks by using clf lush instructions orthrough cache coherence protocol operations but they canflush or use cache coherence to invalidate their own cachelines The victim and the attacker also cannot invalidateprotected or locked data Further if the authors specified anyspecific assumptions (mainly about the software) we list theassumption as part of the description of the cache Whatrsquosmore when the level of cache hierarchy was unspecifiedwe assume the secure cachesrsquo features can be appliedto all levels of caches including L1 cache L2 cacheand Last Level Cache (LLC) If the inclusivity of thecaches was not specified we assume they target inclusivecaches Following the below descriptions of each securecache design the analysis of the secure caches is given inSection 5
SPlowast Cache [20 27]3 uses partitioning techniques tostatically partition the cache ways into High and Lowpartition for the victim and the attacker according to theirdifferent process IDs The victim typically belongs to Highsecurity and attacker belongs to Low security Victimrsquosmemory accesses cannot modify Low partition (assigned toprocesses such as the attacker) while the attackerrsquos memoryaccesses cannot modify High partition (assigned to thevictim) However the memory accesses of both the victimand the attacker can result in a hit in either Low or Highpartition if the data is in the cache
SecVerilog Cache [56 57] statically partitions cacheblocks between security levels L (Low) and H (High)Each instruction in the source code for programs usingSecVerilog cache needs to include a t iming label whicheffectively represents whether the data being accessed by
3Two existing papers give slightly different definitions for an ldquoSPrdquocache thus we selected to define a new cache the SPlowast cachethat combines secure cache features of the Secret-Protecting cachefrom [27] with secure cache features of the Static-Partitioned cachefrom [20]
that instruction is Low or High based on the code andthis t iming label can be similar to a process ID thatdifferentiates attackerrsquos (Low) instructions from victimrsquos(High) instructions The cache is designed such thatoperations in the High partition cannot affect timing ofoperations in the Low partition For a cache miss due to Lowinstructions when the data is in the High partition it willbehave as a cache miss and the data will be moved from theHigh to the Low partition to preserve consistency HoweverHigh instructions are able to result in a cache hit in bothHigh and Low partitions if the data is already in the cache
SecDCPCache [48] builds on the SecVerilog cache and usespartitioning idea from the original SecVerilog cache but thepartitioning is dynamic It can support at least two securityclasses H (High) and L (Low) and configurations with moresecurity classes are possible They use the percentage ofcache misses for L instructions that was reduced (increased)when Lrsquos partition size was increased (reduced) by onecache way to adjust the number of ways of the cacheassigned to the Low partition When adjusting number ofways in the cache dedicated to each partition if Lrsquos partitionsize decreases the process ID is checked and L blocksare flushed before the way is reallocated to H On theother hand if Lrsquos partition size increases H blocks in theadjusted cache way remain unmodified so as to not addmore performance overhead and they will eventually beevicted by Lrsquos memory accesses However the feature ofnot flushing High partition data during way adjustment mayleak timing information to the attacker
NoMo Cache [12] dynamically partitions the cache waysamong the currently ldquoactiverdquo simultaneous multithreading(SMT) threads Each thread is exclusively reserved Y blocksin each cache set where Y is within the range of [0 N
M]
where N is the number of ways and M is the number ofSMT threads NoMo-0 equals to traditional set associativecache while NoMo- N
M partitions cache evenly for the
different threads and there are no non-reserved ways Thenumber of Y assigned to each thread is adjusted based onits activeness When adjusting number of blocks assigned toa thread Y blocks are invalidated for cache sets to protecttiming leakage Eviction is not allowed within each threadrsquosown reserved ways while it is possible for the shared waysTherefore to avoid eviction caused by the unreserved wayswe assume NoMo- N
M is used to fully partition the cache
When the attacker and the victim share the same librarythere will be a cache hit if accessing the shared data andthe normal cache hit policy holds to guarantee the cachecoherence
SHARP Cache [53] uses both partitioning and randomiza-tion techniques to prevent victimrsquos data from being evicted
J Hardw Syst Secur (2019) 3397ndash425 407
or flushed by other malicious processes and it targets on theinclusive caches Each cache block is augmented with thecore valid bits (CVB) to indicate which private cache (pro-cess) it belongs to (similar to the Process ID) where CVBstores a bitmap and ith bit in the bitmap is set if the lineis present in ith corersquos private cache Cache hit is allowedamong different processesrsquo data When there is cache missand data needs to be evicted data not belonging to any cur-rent processes will be evicted first If there is no such datadata belonging to the same process will be evicted If thereis no existing data in the cache that is in the same processa random data in the cache set will be evicted This randomeviction will generate an interrupt to the OS to notify it ofa suspicious activity For pages that are read-only or exe-cutable SHARP cache disallows flushing using clf lush inuser mode However invalidating victimrsquos blocks by usingcache coherence protocol is still possible
Sanctum Cache [10] focuses on isolation of enclaves(equivalent to Trusted Software Module in other designs)from each other and the operating system (OS) In terms ofcaches they implement security features for L1 cache TLBand LLC Cache isolation of LLC is achieved by assigningeach enclave or OS to different DRAM address regionsIt uses page-coloring-based cache partitioning scheme [2444] and a software security monitor that ensures per-core isolation between OS and enclaves For L1 cacheand TLB when there is a transition between enclave andnon-enclave mode the security monitor will flush thecore-private caches to achieve isolation Normal flushestriggered by the enclave or the OS can only be done withinenclave or not within enclave code Also timing-based side-channel attacks exploiting cache coherence are explicitlynot prevented thus behavior on cache coherence operationsis not defined This cache listed extra software assumptionsas follows
Assumption 1 Software security monitor guarantees thatvictim and attacker process cannot share the same cacheblocks It uses page coloring [24 44] to ensure that victimand attackerrsquos memory is never mapped to the same cacheblocks for the LLCAssumption 2 The software runs on a system with asingle processor core where victim and attacker alternateexecution but can never run truly in parallel Moreoversecurity critical data is always flushed by the securitymonitor when program execution switches away from thevictim program for the L1 cache and the TLB
MI6 Cache [7] is part of the memory hierarchy of theMI6 processor which combines Sanctum [10] cachersquossecurity feature with disabling speculation during thespeculative execution of memory-related operations During
normal processor execution for L1 caches and TLB thecorresponding states will be flushed across context switchesbetween software threads For the LLC set partitioning isused to divide DRAM into contiguous regions And cachesets are guaranteed to be strictly partitioned (two DRAMregions cannot map to the same cache set) Each enclaveis only able to access its own partition Speculation issimply disabled when enclave interacts with the outsideworld because of small performance influence based on therare cases of speculation This cache listed extra softwareassumptions as follows
Assumption 1 Software security monitor guarantees thatthe victim and the attacker process cannot share the samecache blocks It uses page coloring [24 44] to ensure thatvictimrsquos and attackerrsquos memory are never mapped to thesame cache blocks for the LLCAssumption 2 The software runs on a system with asingle processor core where victim and attacker alternateexecution but can never run truly in parallel Moreoversecurity critical data is always flushed by the securitymonitor when program execution switches away from thevictim program for the L1 cache and the TLBAssumption 3 When an enclave is interacting with theoutside environment the corresponding speculation isdisabled by the software
InvisiSpec Cache [52] is able to make speculation invisiblein the data cache hierarchy Before a visibility pointshows up when all of its prior control flow instructionsresolve unsafe speculative loads (USL) will be put intoa speculative buffer (SB) without modifying any cachestates When reaching the visibility point there are twocases In one case the USL and successive instructionswill be possibly squashed because of mismatch of datain the SB and the up-to-date values in the cache Inanother case the core receives possible invalidation fromthe OS before checking of memory consistency modeland no comparison is needed When speculative executionhappens the hardware puts the data into SB as to identifyvisibility point for dealing with final state transition of thespeculative execution InvisiSpec cache targets on Spectre-like attacks and futuristic attacks However InvisiSpeccache is vulnerable to all non-speculative side channels
CATalyst Cache [29] uses partitioning especially CacheAllocation Technology (CAT) [21] available in the LLCof some Intel processors CAT allocates up to 4 differentClasses of Services (CoS) for separate cache ways sothat replacement of cache blocks is only allowed withina certain CoS CATalyst first uses CAT mechanism topartition caches into secure and non-secure parts (non-secure parts may map to 3 CoS while secure parts map
J Hardw Syst Secur (2019) 3397ndash425408
to 1 CoS) Secure pages are assigned to virtual machines(VMs) at a granularity of a page and not shared by morethan one VM Here attacker and victim reside in differentVMs Combined with CAT technology and pseudo-lockingmechanism which pins certain page frames managed bysoftware CATalyst guarantees that malicious code cannotevict secure pages CATalyst implicitly performs preloadingby remapping security-critical code or data to secure pagesFlushes can only be done within each VM And cachecoherence is achieved by assigning secure pages to only oneprocessor and not sharing pages among VMs This cachelisted extra software assumptions as follows
Assumption 1 Security critical data is always preloadedinto the cache at the beginning of the whole programexecutionAssumption 2 Security critical data is always able to fitwithin the secure partition of the cache That is all datain the range x can fit in the secure partitionAssumption 3 The victim and the attacker process cannotshare the same memory spaceAssumption 4 Use pseudo-locking mechanism by soft-ware to make sure that victim and attacker process cannotshare the same cache blocksAssumption 5 Secure pages are reloaded immediatelyafter the flush which is done by the virtual machinemonitor (VMM) to make sure all the secure pages are stillpinned in the secure partition
DAWG Cache [25] (Dynamically Allocated Way Guard)partitions the cache by cache ways and provides fullisolation for hits misses and metadata updates acrossdifferent protection domains (between the attacker and thevictim) DAWG cache is partitioned for the attacker and thevictim and each of them keep their own different domain id(which is similar to process ID used in general caches) Eachdomain id has its own bit fields one is called policy fillmapfor masking fills and selecting the victim to replace anotheris called policy hitmap for masking hit ways Only boththe tag and the domain id are the same will a cache hithappen Therefore DAWG allows read-only cache lines tobe replicated across ways for different protection domainFor a cache miss the victim can only be chosen withinthe ways belonging to the same domain id recorded bythe policy fillmap Consistently the replacement policy isupdated with the victim selection and the metadata derivedfrom the policy fillmap for different domains is updatedas well The paper also proposes the idea to dynamicallypartitions the cache ways following the systemrsquos workloadchanges but does not actually implement it
RIC Cache [22] (Relaxed Inclusion Caches) proposes a low-complexity cache to defend against eviction-based timing-based side-channel attacks on the LLC Normally for an
inclusive cache if the data R is in the LLC it is also inthe higher level cache and eviction of the R in the LLCwill cause the same data in the higher level cache eg L1cache to be invalidated making eviction-based attacks inthe higher level cache possible (eg attacker is able to evictvictimrsquos security critical cache line) For RIC each cacheline is extended with a single bit to set the relaxed inclusionOnce the relaxed inclusion is set for that cache line thecorresponding LLC line eviction will not cause the sameline in the higher-level cache to be invalidated Two kinds ofdata will be set relaxed inclusion bit read only data andthread private data when they are loaded into the cacheThese two kinds of data are claimed by the paper to cover allthe critical data for ciphers Therefore RIC will not preventwritable in-private critical data which is currently not foundin any ciphers Apart from that RIC requires flushing for thecorresponding cache lines in the cases that the RIC bits aremodified or for thread migration events to avoid the timingleakage during transition time
PL Cache [49] provides isolation by partitioning the cachebased on cache blocks It extends each cache block with aprocess ID and a lock status bit The process ID and thelock status bits are controlled by the extended load and storeinstructions (ld lockld unlock and st lockst unlock)which allow the programmer and compiler to set or reset thelock bit through use of the right load or store instructionIn terms of cache replacement policy for a cache hit PLcache will perform the normal cache hit handling procedureand the instructions with locking or unlocking capability canupdate the process ID and the lock status bits while the hit isprocessed When there is a cache miss locked data cannotbe evicted by data that is not locked and locked data amongdifferent processes cannot be evicted by each other In thiscase the new data will be either loaded or stored withoutcaching In other cases data eviction is possible This cachelisted extra software assumption as follows
Assumption 1 Security critical data is always preloadedinto the cache at the beginning of the whole programexecution
RP Cache [49] uses randomization to de-correlate thememory address accessing and timing of the cache Foreach block of RP cache there is a process ID and oneprotection bit P set to indicate if this cache block needs to beprotected or not A permutation table (PT) stores each cachesetrsquos pre-computed permuted set number and the numberof tables depends on number of protected processes Formemory access operations cache hits need both process IDand address to be the same When a cache miss happensto data D of a cache set S if the to-be-evicted data andto-be-brought-in data belong to the same process but have
J Hardw Syst Secur (2019) 3397ndash425 409
different protection bit arbitrary data of a random cache setSprime will be evicted and D will be accessed without cachingIf they belong to different processes D will be stored in anevicted cache block of Sprime and mapping of S and Sprime will beswapped as well Otherwise the normal replacement policyis executed
Newcache Cache [31 50] dynamically randomizesmemory-to-cache mapping It introduced a ReMappingTable (RMT) and the mapping between memory addressesand this RMT is as in a direct mapped cache while themapping between the RMT and actual cache is fully asso-ciative The index bits of memory address are used to lookup entries in the RMT to find the cache block that shouldbe accessed It stores the most useful cache lines rather thanhold a fixed set of cache lines This index stored in RMTcombined with the process ID is used to look up the actualcache where each cache line is associated with its real indexand process ID Each cache block is also associated witha protection bit (P) to indicate if it is security critical Forcache replacement policy it is very similar to RP cacheCache hit needs both process ID and address to be the sameWhen cache miss happens to data D arbitrary data willbe evicted and D will be accessed without caching if theybelong to the same process but either one of their protec-tion bit is set If the evicted data and brought-in data havedifferent process IDs D will randomly replace a cache linesince it is fully associative in the actual cache Otherwisethe normal replacement policy for direct mapped cache isexecuted
Random Fill Cache [30] de-correlates cache fills withthe memory access using random filling technique Newinstructions used by applications in Random Fill cachecan control if the requested data belongs to a normalrequest or a random fill request Cache hits are processedas in normal cache For the security critical data accessesof the victim a Nof ill request is executed and therequested data access will be performed without cachingMeanwhile on a Random Fill request arbitrary datafrom the range of addresses will be brought into the cacheIn the paper [30] the authors show that random fill ofspatially near data does not hurt performance For otherprocessesrsquo memory accesses and normal victimrsquos memoryaccesses Normal request will be used to achieve normalreplacement policy Victim and attacker are able to removevictimrsquos own security critical data including using clf lush
instructions or cache coherence protocol since the flush willnot influence timing-based side-channel attack prevention(the random filling technique is used for this)
CEASER Cache [38] is able to mitigate conflict-basedLLC timingndashbased side-channel attacks using address
encryption and dynamic remapping CEASER cache doesnot differentiate whom the address belongs to and whetherthe address is security critical When memory access triesto modify the cache state the address will first be encryptedusing Low-Latency BlockCipher (LLBC) [6] which notonly randomizes the cache set it maps but also scatters theoriginal possibly ordered and location-intensive addressesto different cache sets decreasing the probability of conflictmisses The encryption and decryption can be done withintwo cycles using LLBC Furthermore the encryption keywill be periodically changed to avoid key reconstructionThe periodic re-keying will cause the address remapping todynamically change
SCATTER Cache [51] uses cache set randomization toprevent timing-based attacks It builds upon two ideas Firsta mapping function is used to translate memory addressand process information to cache set indices the mappingis different for each program or security domain Secondthe mapping function also calculates a different index foreach cache way in a similar way to the skewed associativecaches [41] The mapping function can be keyed hash orkeyed permutation derivation functionmdasha different key isused for different application or security domain resultingin a different mapping from address to cache sets for eachSoftware (eg the operating system) is responsible formanaging the security domains and process IDs which areused to differentiate the different software and assign itdifferent keys for the mapping For the hardware extensiona cryptographic primitive such as hashing and an indexdecoder for each scattered cache way is added SCATTERcache also stores the index bits of the physical addressto efficiently perform lookups and writebacks There isalso one bit per page-table entry added to allow the kernelto communicate with the user space for security domainidentification
Non Deterministic Cache [23] uses cache access delay torandomize the relation between cache block access andcache access timing There is no differentiation of datacaching between different process ID or whether the datais secure or not A per-cache-block counter records theinterval of its data activeness and is increased on eachglobal counter clock tick when the data is untouched Whenthe counter reaches a predefined value the correspondingcache line will be invalidated Non Deterministic Cacherandomly sets the local countersrsquo initial value that is lessthan the maximum value of the global counter In thiscase the cache delay is changed to be randomized Cachedelay interval controlled by this non-deterministic executioncan lead to different cache hit and miss statistics becausethe invalidation is determined by the randomized counterof each cache line and therefore de-correlates any cache
J Hardw Syst Secur (2019) 3397ndash425410
access time from the address being accessed However theperformance degradation is tremendous
5 Analysis of the Secure Caches
In this section we manually evaluate the effectiveness ofthe 18 secure caches [7 10 12 22 23 25 27 29 30 3848ndash53 56 57] We analyze how well the different cachescan protect against the 72 types of vulnerabilities definedin Tables 2 and 3 which cover all the possible Strong(according to the definition in Section 3) cache timingndashbased vulnerabilities Following the analysis discuss whattypes of secure caches and features are best suited fordefending different types of timing-based attacks
51 Effectiveness of the Secure Caches AgainstTiming-Based Attacks
Tables 4 and 5 list the result of our analysis of whichcaches can prevent which types of attacks Some cachesare able to prevent certain vulnerabilities denoted by acheckmark and green color in the table For exampleSPlowast cache can defend against Vu Ad Vu (slow)(one type of Evict + Time [34]) vulnerability For someother caches and vulnerabilities the cache is not able toprevent the vulnerabilities and it is indicated by times and redcolor For example SecDCP cache cannot defend againstVu Va Vu (slow) (one type of Bernsteinrsquos Attack [3])vulnerability
Each cache is analyzed for each type of vulnerabilitylisted in Tables 2 and 3 A cache is judged to be able toprevent a type of cache timingndashbased vulnerability in threecases
1 A cache can prevent a timing attack if the timing ofthe last step in a vulnerability is always constant andthe attacker can never observe fast and slow timingdifference for the given set of three steps For instancein a regular set-associative cache the Vd Vu Aa
(fast) (one type of Flush + Reload [55]) vulnerabilitywill allow the attacker to know that address a mapsto secret u when the attacker observes fast timingcompared with observing slow timing in the other casesHowever in case of the RP cache [49] will make thetiming of the last step to be always slow because RPcache does not allow data of different processes toderive cache hit between each other
2 A cache can prevent a timing attack if the timingof last step is randomized and cannot have originalcorresponding relation between victimrsquos behavior andattackerrsquos observation For instance Ad Vu Ainv
d (fast) (one type of Prime + Probe Invalidation)
vulnerability when executed on a normal set-associativecache will allow the attacker to know that the addressd has the same index with secret u when observing fasttiming compared with slow timing in the other casesHowever when executing this attacks on the RandomFill cache [30] for example a slow timing will notdetermine that u and d have the same index as thesecret since in Random Fill cache u would be accessedwithout caching and another random data would becached instead
3 A cache can prevent a timing attack if it disallowscertain steps from the three-step model to be executedthus prevents the corresponding vulnerability Forinstance when PL cache [49] preloads and locks thesecurity critical data in the cache vulnerabilities suchas Ad Vu V inv
d (slow) (one type of Prime + TimeInvalidation) will not be possible since a preloadedlocked security critical data will not allow Ad in Step 1to replace it In this case Ad cannot be in the cache sothis vulnerability cannot be triggered in PL cache
From the security perspective the entries of the securecache in Tables 4 and 5 should have as many green coloredcells as possible If a cache design has any red cells thenit cannot defend against that type of vulnerabilitymdashattackerusing the timing-based vulnerability that corresponds to thered cell can attack the system
The third column in Tables 4 and 5 shows a normalset associative cache which cannot defend against anytype of timing-based vulnerabilities Meanwhile the lastcolumn of Tables 4 and 5 shows the situation where thecache is fully disabled As is expected the timing-basedvulnerabilities are eliminated and timing-based attacks willnot succeed Disabling caches however has tremendousperformance penalty Similarly second-to-last columnshows Nondeterministic Cache which totally randomizescache access time It can defend all the attacks but againwill have a tremendous cost to security when the applicationis complex
For each of the entry that shows the effectivenessof a secure cache against a vulnerability there are tworesults listed Left one is for normal execution and theright one is for speculative execution Some secure cachessuch as InvisiSpec cache target timing-based channels inspeculative execution For most of the caches that do notdifferentiate speculative execution and normal executionthe two sub-columns for each cache are the same
6 Secure Cache Techniques
Among the secure cache designs presented in the priorsection there are three main techniques that the caches
J Hardw Syst Secur (2019) 3397ndash425 411
Table4
Exi
stin
gse
cure
cach
esrsquo
prot
ectio
nag
ains
tal
lpo
ssib
letim
ing-
base
dvu
lner
abili
ties
with
last
step
tobe
mem
ory
acce
ssre
late
dop
erat
ions
Sin
gle
ina
gree
nce
llm
eans
this
cach
eca
npr
even
tth
eco
rres
pond
ing
vuln
erab
ility
i
na
pink
cell
mea
nsth
isca
che
can
prev
ent
the
corr
espo
ndin
gvu
lner
abili
tyin
som
ede
gree
A
timesin
are
dce
llm
eans
this
cach
eca
nnot
prev
ent
this
vuln
erab
ility
Fur
ther
mor
efo
rea
chca
che
we
anal
yze
norm
alex
ecut
ion
(lef
tcol
umn
unde
rth
eca
che
nam
e)an
dsp
ecul
ativ
eex
ecut
ion
(rig
htco
lum
nun
der
the
cach
ena
me)
a Dyn
amic
adju
stm
ento
fw
ays
for
diff
eren
tthr
eads
isas
sum
edto
bepr
oper
lyus
edac
cord
ing
toth
eru
nnin
gpr
ogra
mrsquos
cach
eus
age
bSo
me
soft
war
eas
sum
ptio
nslis
ted
inth
een
trie
sin
this
colu
mn
have
been
impl
emen
ted
byth
eca
chersquo
sre
late
dso
ftw
are
c Flus
his
disa
bled
but
cach
eco
here
nce
mig
htbe
used
todo
the
data
rem
oval
dFo
rL
1ca
che
and
TL
Bf
lush
ing
isdo
nedu
ring
cont
exts
witc
he T
hete
chni
ques
are
impl
emen
ted
inL
1ca
che
TL
Ban
dla
st-l
evel
cach
ew
hich
cons
isto
fth
ew
hole
cach
ehi
erar
chy
whe
reL
1ca
che
and
TL
Bre
quir
eso
ftw
are
flus
hpr
otec
tion
and
the
last
-lev
elca
che
can
beac
hiev
edby
sim
ple
hard
war
epa
rtiti
onin
gTo
prot
ecta
llle
vels
ofca
ches
the
soft
war
eas
sum
ptio
nsne
edto
bead
ded
f The
tech
niqu
eis
now
only
impl
emen
ted
inla
st-l
evel
cach
egT
hete
chni
que
now
only
targ
ets
shar
edca
che
hT
hete
chni
que
only
targ
ets
incl
usio
nla
st-l
evel
cach
ei T
hete
chni
que
targ
ets
data
cach
ehi
erar
chy
j For
the
last
-lev
elca
che
cach
eis
part
ition
edbe
twee
nth
evi
ctim
and
the
atta
cker
kT
hete
chni
que
can
cont
rolt
hepr
obab
ilitie
sof
the
vuln
erab
ility
tobe
succ
essf
ulto
beex
trem
ely
smal
ll T
hete
chni
que
can
wor
kin
shar
edr
ead
only
mem
ory
whi
leno
twor
king
insh
ared
wri
tabl
em
emor
ym
Ran
dom
dela
ybu
tnot
rand
omm
appi
ngca
non
lyde
crea
seth
epr
obab
ilitie
sof
atta
cker
inso
me
limite
dde
gree
J Hardw Syst Secur (2019) 3397ndash425412
Table5
Exi
stin
gse
cure
cach
esrsquop
rote
ctio
nag
ains
tall
poss
ible
timin
g-ba
sed
vuln
erab
ilitie
sw
ithla
stst
epto
bein
valid
atio
nre
late
dop
erat
ions
Sin
gle
ina
gree
nce
llm
eans
this
cach
eca
npr
even
tth
eco
rres
pond
ing
vuln
erab
ility
in
api
nkce
llm
eans
this
cach
eca
npr
even
tthe
corr
espo
ndin
gvu
lner
abili
tyin
som
ede
gree
Atimes
ina
red
cell
mea
nsth
isca
che
cann
otpr
even
tthi
svu
lner
abili
ty
Furt
herm
ore
for
each
cach
ew
ean
alyz
eno
rmal
exec
utio
n(l
eftc
olum
nun
der
the
cach
ena
me)
and
spec
ulat
ive
exec
utio
n(r
ight
colu
mn
unde
rth
eca
che
nam
e)
a Dyn
amic
adju
stm
ento
fw
ays
for
diff
eren
tthr
eads
isas
sum
edto
bepr
oper
lyus
edac
cord
ing
toth
eru
nnin
gpr
ogra
mrsquos
cach
eus
age
bSo
me
soft
war
eas
sum
ptio
nslis
ted
inth
een
trie
sin
this
colu
mn
have
been
impl
emen
ted
byth
eca
chersquo
sre
late
dso
ftw
are
c Flus
his
disa
bled
but
cach
eco
here
nce
mig
htbe
used
todo
the
data
rem
oval
dFo
rL
1ca
che
and
TL
Bf
lush
ing
isdo
nedu
ring
cont
exts
witc
he T
hete
chni
ques
are
impl
emen
ted
inL
1ca
che
TL
Ban
dla
st-l
evel
cach
ew
hich
cons
isto
fth
ew
hole
cach
ehi
erar
chy
whe
reL
1ca
che
and
TL
Bre
quir
eso
ftw
are
flus
hpr
otec
tion
and
the
last
-lev
elca
che
can
beac
hiev
edby
sim
ple
hard
war
epa
rtiti
onin
gTo
prot
ecta
llle
vels
ofca
ches
the
soft
war
eas
sum
ptio
nsne
edto
bead
ded
f The
tech
niqu
eis
now
only
impl
emen
ted
inla
st-l
evel
cach
egT
hete
chni
que
now
only
targ
ets
shar
edca
che
hT
hete
chni
que
only
targ
ets
incl
usio
nla
st-l
evel
cach
ei T
hete
chni
que
targ
ets
data
cach
ehi
erar
chy
j For
the
last
-lev
elca
che
cach
eis
part
ition
edbe
twee
nth
evi
ctim
and
the
atta
cker
kT
hete
chni
que
can
cont
rolt
hepr
obab
ilitie
sof
the
vuln
erab
ility
tobe
succ
essf
ulto
beex
trem
ely
smal
ll T
hete
chni
que
can
wor
kin
shar
edr
ead
only
mem
ory
whi
leno
twor
king
insh
ared
wri
tabl
em
emor
ym
Ran
dom
dela
ybu
tnot
rand
omm
appi
ngca
non
lyde
crea
seth
epr
obab
ilitie
sof
atta
cker
inso
me
limite
dde
gree
J Hardw Syst Secur (2019) 3397ndash425 413
utilize differentiating sensitive data partitioning andrandomization
Differentiating Sensitive Data (columns for CATalyst cacheto columns for Random Fill cache in Tables 4 and 5) allowsthe victim or attacker software or management software toexplicitly label a certain range of the data of victim whichthey think is sensitive The victim process or managementsoftware is able to use cache-specific instructions to protectthe data and limit internal interference between victimrsquosown data For example it is possible to disable victimrsquosown flushing of victimrsquos labeled data and therefore preventvulnerabilities that leverage flushing This technique allowsthe designer to have stronger control over security criticaldata rather than forcing the system to assume all of victimrsquosdata is sensitive However how to identify sensitive dataand whether this identification process is reliable are openresearch questions for caches that support differentiation ofsensitive data
This technique is independent of whether a cache usespartitioning or randomization techniques to eliminate sidechannels between the attacker and the victim Caches thatare able to label and identify sensitive data have theadvantage in preventing internal interference since they areable to differentiate sensitive data from the normal dataand can make use of special instructions to give moreprivileges to sensitive data However it requires careful usewhen identifying the actual sensitive data and implementingcorresponding security features on the cache
Comparing PL cache with SPlowast cache although bothof them use partitioning flush is able to be implementedto be disabled for victimrsquos sensitive data in PL cachewhere Vu V inv
a Vu (slow) (one type of Flush +Time) is prevented Newcache is able to prevent Vu Va Vu (slow) (one type of Bernsteinrsquos Attack [3])while most of the caches without ability to differentiatesensitive data cannot because Newcache disallows replacingdata as long as either data to be evicted or data to becached is identified to be sensitive However permittingdifferentiation of sensitive data can potentially backfire onthe cache itself For example Random Fill cache cannotprevent Vu Ad Vu (slow) (one type of Evict +Time [34]) which most of the other caches can prevent oravoid because the random fill technique loses its intendedrandom behavior when the security critical data is initiallyloaded into the cache in Step 1
Partitioning-Based Caches usually limit the victim and theattacker to be able to only access a limited set of cacheblock (columns for SPlowast cache to column for PL cachein Tables 4 and 5) For example either there is staticor dynamic partitioning of caches which allocates someblocks to High victim and Low attacker The partitioning
can be based not just on whether the memory access isvictimrsquos or attackerrsquos but also on where the access is to(eg High partition is determined by the data address) Forspeculative execution attackerrsquos code can be the part ofspeculation or out-of-order load or store which is able to bepartitioned (eg using speculative load buffer) from othernormal operations The partitioning granularity can be cachesets cache lines or cache ways Partitioning-based securecaches are usually able to prevent external interference bypartitioning but are weak at preventing internal interferenceWhen partitioning is used interference between the attackerand the victim or data belonging to different securitylevels should not be possible and attacks based on externalinterference between the victim and the attacker will failHowever the internal interference of victimrsquos own datais hard to be prevented by the partitioning-based cachesWhatrsquos more partitioning is recognized to be wastefulin terms of cache space and inherently degrades systemperformance [49] Dynamic partitioning can help limit thenegative performance and space impacts but it could be ata cost of revealing some information when adjusting thepartitioning size for each part It also does not help withinternal interference prevention
In terms of the three-step model the partitioning-basedcaches excel at making use of partitioning techniques todisallow the attacker to set initial states (Step 0) of victimpartition by use of flushing or eviction and therefore bringuncertainty to the final timing observation made by theattacker
SPlowast cache can prevent external miss-based interferencebut it still allows the victim and the attacker to getcache hits due to each otherrsquos data which makes hit-based vulnerabilities happen eg Vd Vu Va (fast)(one type of Cache Internal Collision [5]) vulnerabilityis one of the examples that SPlowast cache cannot preventSecVerilog cache is similar to SPlowast cache but prevents theattacker from directly getting cache hit due to victimrsquos datafor confidentiality and therefore prevents vulnerabilitiessuch as Ainv
a Vu Aa (fast) (one type of Flush+ Reload [55]) SHARP cache mainly uses partitioningcombined with random eviction to minimize the probabilityof evicting victimrsquos data and prevent external miss-basedvulnerabilities It is vulnerable to hit-based or internalinterference vulnerabilities such as Vu Va Vu (slow)(one type of Bernsteinrsquos Attack [3]) vulnerability DAWGcache will only allow the data to get a cache hit if bothits address and the process ID are the same Thereforecompared with normal partitioning cache such as SPlowast cacheit is able to prevent vulnerabilities such as Vd Vu Ainv
d
(fast) (one type of Prime + Flush)SecDCP and NoMo cache both leverage dynamic
partitioning to improve performance Compared withSecVerilog cache SecDCP cache introduces certain side
J Hardw Syst Secur (2019) 3397ndash425414
channels which manifest themselves when the number ofways assigned to the victim and attacker changes egVu Ainv
a Vu (slow) (one type of Flush + Time)vulnerability NoMo cache behaves more carefully whenchanging the number of ways during dynamic partitioninghowever it requires victimrsquos sensitive data to fit intothe assigned partitions otherwise it will be put into theunreserved way and allow eviction by the attacker SecDCPdoes not have unreserved way All the space in the cachewill be either belongs to High or Low partition
Sanctum cache and CATalyst cache are both controlledby a powerful software monitor and they disallow securepage sharing between victim and attacker to preventvulnerabilities such as Ad Vu Aa (fast) (onetype of Flush + Reload [55]) Sanctum cache does notconsider internal interference while CATalyst cache ismore carefully designed to prevent different vulnerabilitieswith the implemented software system so far supportingpreventing all of the vulnerabilities but only works forLLC and with high software implementation complexityand some assumptions that might be hard to achievein other scenarios eg assuming the secure partition isbig enough to fit all the secure data MI6 cache is thecombination of Sanctum and disabling speculation wheninteracting with the outside world Therefore in normalexecution it behaves the same as Sanctum For speculativeexecution because it will simply disable all the speculationwhen involving the outside world the external interferencevulnerability such as Vd Vu Ad (slow) (one type ofEvict + Probe) vulnerability will be prevented
InvisiSpec cache does not modify the original cachestate but places the data in a speculative buffer partitionduring the speculation or out-of-order load or store Sinceduring speculation cache state is not actually updated thespeculative execution cannot trigger any of the steps inthe three-step model RIC cache focuses on eviction basedattack and therefore are good at preventing even someinternal miss-based vulnerability such as Vu Va Vu
(slow) (one type of Bernsteinrsquos Attack [3]) but are bad atall hit-based vulnerabilities PL cache is line-partitionedand uses locking techniques for victimrsquos security criticaldata It can prevent many vulnerabilities because preloadingand locking secure data disallow the attacker or non-securevictim data to set initial states (Step 0) for victim partitionand therefore brings uncertainty to the final observation bythe attacker eg Ad Vu Va (fast) (one type of CacheInternal Collision [5]) vulnerability is prevented
Randomization-Based Caches (columns for SHARP cacheand columns for RP cache to columns for Non Deterministiccache in Tables 4 and 5) inherently de-correlate therelationship between information of victimrsquos securitycritical datarsquos address and observed timing from cache
hit or miss or between the address and observed timingof flush or cache coherence operations For speculativeexecution they also de-correlate the relationship betweenthe address of the data being accessed during speculativeexecution or out-of-order load or store and the observedtiming from a cache hit or miss Randomization can beused when bringing data into the cache evicting data orboth Some designs randomize the address to cache setmapping As a result of the randomization the mutualinformation from the observed timing due to having ornot having data in the cache could be reduced to 0 ifrandomization is done on every memory access Somesecure caches use randomization to avoid many of the miss-based internal interference vulnerabilities However theymay still suffer from hit-based vulnerabilities especiallywhen the vulnerabilities are related to internal interferenceHowever randomization is also likewise recognized toincrease performance overheads [23] It also requires afast and secure random number generator Most of therandomization is cache-line-based and can be combinedwith differentiation of sensitive data to be more efficient
RP cache allows eviction between different sensitivedata which leaves vulnerabilities such as Vu Va Vu
(slow) (one type of Bernsteinrsquos Attack [3]) still possiblewhile Newcache prevents this Both of the RP cacheand Newcache are not able to prevent hit-based internal-interference vulnerabilities such as Ainv
a Vu Va
(fast) (one type of Cache Internal Collision [5]) RandomFill cache is able to use total de-correlation of memoryaccess and cache access of victimrsquos security critical datato prevent most of the internal and external interferenceHowever when security critical data is initially directlyloaded into the cache block for Step 1 Random Fill cachewill not randomly load security critical data and allowsvulnerabilities such as Vu V inv
a Vu (slow) (one typeof Flush + Time) vulnerability to exist CEASER cache usesencryption scheme plus dynamic remapping to randomizemapping from memory addresses to cache sets Howeverthis targets eviction-based attacks and cannot prevent hit-based vulnerabilities such as Va V inv
u V inva (fast)
(one type of Flush + Probe Invalidation) SCATTER cacheencrypts both the cache address and process ID whenmapping into different cache index to further prevent morehit-based vulnerabilities for shared and read only memoryNon Deterministic Cache totally randomizes timing ofcache accesses by adding delays and can prevent all attacks(but at tremendous performance cost)
61 Estimated Performance and Security Tradeoffs
Table 6 shows the implementation and performanceresults of the secure caches as listed by the designersin the different papers At the extreme end there is
J Hardw Syst Secur (2019) 3397ndash425 415
Table6
Exi
stin
gse
cure
cach
esrsquo
impl
emen
tatio
nm
etho
dpe
rfor
man
cep
ower
and
area
sum
mar
y
J Hardw Syst Secur (2019) 3397ndash425416
the Non Deterministic cache with random delay thesecure cache can prevent all the cache timingndashbasedvulnerabilities in some degreemdashwhile their paper reportsonly 7 degradation in performance we expect it tobe much more for more complex application than AESalgorithm Disabling caches eliminates the attacks but ata huge performance cost Normally a secure cache needsto sacrifice some performance in order to de-correlatememory access with the timing The secure caches thattend to be able to prevent more vulnerabilities usuallyhave weaker performance compared with other securecaches For example more security seems to imply lessperformance
62 Towards Ideal Secure Cache
Based on the above analysis a good secure cache shouldconsider all the 72 types of Strong vulnerabilities egexternal and internal interference and hit-based and miss-based vulnerabilities Considering all factors and basedon Tables 4 and 5 we have several suggestions andobservations for a secure cache design which can defendtiming-based attacks
bull Internal interference is important for caches to preventtiming-based attacks and is the weak point of most ofthe secure caches To prevent this the following threesubpoints should be considered
ndash Miss-based internal interference can be solvedby randomly evicting data to de-correlatememory access with timing information wheneither data to be evicted or data to be cachedis sensitive eg Newcache prevents Vu Va Vu (slow) (one type of BernsteinrsquosAttack [3]) vulnerability
ndash Hit-based internal interference can be solvedby randomly bringing data into the cache egRandom Fill cache prevents Ad Vu Va
(fast) (Cache Internal Collision) vulnerabilityndash To limit internal interference at lower perfor-
mance cost rather than simply assume all ofvictimrsquos data is sensitive it is better to differ-entiate real sensitive data from other data in thevictim code However identification of sen-sitive information needs to be carefully usedeg Random Fill cache is vulnerable to Vu Ad Vu (fast) (one type of Evict + Time [34])vulnerability which most of the secure cachesare able to prevent
bull Direct partitioning between the victim and the attackeralthough may hurt cache space utilization or perfor-mance is good at disallowing attacker to set known
initial state to victimrsquos partition and therefore pre-vents external interference Alternatively careful use ofrandomization can also prevent external interference
It should be noted that some cache designs only focuson certain levels eg CATalyst cache only works at thelast level cache In order to fully protect the whole cachesystem from timing-based attacks all levels of caches in thehierarchy should be protected with related security featuresFor example Sanctum is able to prevent all levels of cachesfrom L1 to last-level cache Consequently secure cachedesign needs to be realizable at all levels of the cachehierarchy
7 RelatedWork
There are a lot of existing attacks exploring timing-based cache channels eg [1 3 5 16ndash19 34 36 5455] Furthermore our recent paper [11] has summarizedcache timingndashbased side-channel vulnerabilities using athree-step model and inspired this work on checkingwhich vulnerability types are truly defeated by the securecaches in context of timing-based attacks In other workZhang and Lee [59] used finite-state machine to modelcache architectures and leveraged mutual information tomeasure potential side-channel leakage of the modeledcache architectures Meanwhile He and Lee [20] modeledinterference using probabilistic information flow graph andused attackerrsquos success probability to estimate differentcachesrsquo ability to defend against some cache timingndashbasedside-channel attacks However they did not explore allpossible vulnerabilities due to cache timingndashbased channels
There is also some other work focusing on cacheside-channel verification [13 14 47] Among theseCacheAudit [14] efficiently computes possible side-channelobservations using abstractions in a modular way Bit-level and arithmetic reasoning is used in [13] for memoryaccesses in the presence of dynamic memory allocationCacheD [47] detects potential cache differences at eachprogram point leveraging symbolic execution and constraintsolving
Hardware transactional memory has also been lever-aged to prevent timing-based cache side-channel attacks [815] Hardware transactional memory (HTM) is availableon modern commercial processors such as Intelrsquos Transac-tional Synchronization Extensions (TSX) Its main featureis to abort the transaction and roll back the modificationswhenever a cache block contained in the read set or writeset is evicted out of the cache In [15] HTM was com-bined with preloading strategy for code and data to preventFlush + Reload attacks in the local setting and Prime andProbe attacks in the cloud setting In [8] the software-level
J Hardw Syst Secur (2019) 3397ndash425 417
solution targets system calls page faults code refactoringand abort reasoning to eliminate not only Prime + ProbeFlush + Reload but also Evict + time and Cache Collisionattacks
8 Conclusion
This paper first proposed a new three-step model in orderto model all possible cache timing vulnerabilities It furtherprovided a cache three-step simulator and reduction rulesto derive effective vulnerabilities allowing us to find onesthat have not been exploited in literature With exhaustiveeffective vulnerability types listed this paper presentedanalysis of 18 secure processor cache designs with respectto how well they can defend against these timing-basedvulnerabilities Our work showed that vulnerabilities basedon internal interference of the victim application aredifficult to protect against and many secure cache designsfail in this We also provided a summary of secure processorcache features that could be integrated to make an idealsecure cache that is able to defend timing-based attacksOverall implementing a secure cache in a processor can be aviable alternative to defend timing-based attacks Howeverit requires design of an ideal secure cache or correction ofexisting secure cache designs to eliminate the few attacksthat they do not protect against
Acknowledgments This work was support in part by NSF grants1651945 and 1813797 and through Semiconductor Research Corpo-ration (SRC) under GRC Task 2844001
Appendix A Attack Strategies Descriptions
This appendix gives overview of the attack strategies shownin Tables 2 and 3 in Section 3 For each attack strategy anoverview of the three steps of the strategy is given Some ofthe strategies are similar and some may not be precise butwe keep and use the original names as they were assignedin prior work One advantage of our three-step model isthat it gives precise definition of each attack Neverthelessthe attack strategy names used before (and added by us forstrategies which did not have such names) may be useful torecall the attacksrsquo high-level operation
Cache Internal Collision In Step 1 cache blockrsquos data isinvalidated by flushing or eviction done by either theattacker or the victim Then the victim accesses secret datain Step 2 Finally the victim accesses data at a knownaddress in Step 3 if there is a cache hit then it reveals thatthere is an internal collision and leaks value of u
Flush + Reload In Step 1 either the attacker or the victiminvalidates the cache blockrsquos data by flushing or evictionThen the victim access secret data in Step 2 Finally theattacker tries to access some data in Step 2 using a knownaddress If a cache hit is observed then addresses from lasttwo steps are the same and the attacker learns the secretaddress This strategy has similar Step 1 and Step 2 asCache Internal Collision vulnerability but for Step 3 it isthe attacker who does the reload access
Reload + Time (New Name Assigned in this Paper) In Step
1 secret data is invalidated by the victim Then the attackerdoes some known data access in Step 2 that could possiblybring back the invalidated the victimrsquos secret data in Step
1 In Step 3 if the victim reloads the secret data a cachehit is observed and the attacker can derive the secret datarsquosaddress
Flush + Probe (New Name Assigned in this Paper) In Step
1 the victim or the attacker access some known addressIn Step 2 the victim invalidates secret data In Step 3reloading of Step 1rsquos data and observation of a cache misswill help the attacker learn that the secret data maps to theknown address from Step 1
Evict + Time In Step 1 some victimrsquos secret data is put intothe cache by the victim itself In Step 2 the attacker evicts aspecific cache set by performing a memory related operationthat is not a flush In Step 3 the victim reloads secret dataand if a cache miss is observed the will learn the secretdatarsquos cache set information This attack has similar Step 1and Step 3 as Flush + Time vulnerability but for Step 2 inEvict + Time the attacker invalidates some known addressallowing it to find the full address of the secret data insteadof evicting a cache set to only find the secret datarsquos cacheindex as in the Flush + Time attack
Prime + Probe In Step 1 the attacker primes the cache setusing data at address known to the attacker In Step 2 thevictim accesses the secret data which possibly evicts datafrom Step 1 In Step 3 the attacker probes each cache setand if a cache miss is observed the attacker knowns thesecret data maps to the cache set he or she primed
Bernsteinrsquos Attack This attack strategy leverages the vic-timrsquos internal interference to trigger the miss-based attackFor one case the victim does the same secret data access inStep 1 and Step 3 while in Step 2 the victim tries to evictone whole cache setrsquos data by known data accesses If cachemiss is observed in Step 3 that will tell the attacker thecache set is the one secret data maps to For another casethe victim primes and probe a cache set in Step 1 and Step
3 driven by the attacker while in Step 2 the victim tries to
J Hardw Syst Secur (2019) 3397ndash425418
access the secret data Similar to the first case observingcache miss in Step 3 tells the attacker the cache set is theone secret data maps to
Evict + Probe (NewName Assigned in this Paper) In Step 1victim evict the cache set using the access to a data at anaddress known to the attacker In Step 2 the victim accessessecret data which possibly evicts data from Step 1 In Step
3 the attacker probes each cache set using the same data asin Step 1 if a cache miss is observed the attacker knownsthe secret data maps to the cache set he or she primed Thisattack strategy has similar Step 2 and Step 3 as Prime +Probe attack but for Step 1 it is the victim that does theeviction accesses
Prime + Time (NewName Assigned in this Paper) In Step 1the attacker primes the cache set using access to data at anaddress known to the attacker In Step 2 the victim accessessecret data which possibly evicts data from Step 1 In Step
3 the victim probes each cache set using the same data Step
1 if a cache miss is observed the attacker knowns the secretdata maps to the cache set he or she primed in Step 1 Thisattack strategy has similar Step 1 and Step 2 as Prime +Probe attack but for Step 3 it is the victim that does theprobing accesses
Flush + Time (NewNameAssigned in this Paper) The victimaccesses the same secret data in Step 1 and Step 3 whilein Step 2 the attacker tries to invalidate data at a knownaddress If cache miss is observed in Step 3 that will tellthe attacker the data address he or she invalidated in Step 2maps to the secret data
Invalidation related (new names assigned in this paper)Vulnerabilities that have names ending with ldquoinvalidationrdquoin Table 3 correspond to the vulnerabilities that havethe same name (except for the ldquoinvalidationrdquo part) inTable 2 The difference between each set of correspondingvulnerabilities is that the vulnerabilities ending withldquoinvalidationrdquo use invalidation related operation in the laststep to derive the timing information rather than the normalmemory access related operations
Appendix B Soundness Analysisof the Three-StepModel
In this section we analyze the soundness of the three-stepmodel to demonstrate that the three-step model can coverall possible timing-based cache vulnerabilities in normalcaches If there is a vulnerability that is represented usingmore than three steps the steps can be reduced to only three
steps or a three-step sub-pattern can be found in the longerrepresentation
In the below analysis we use β to denote the numberof memory-related operations ie steps in a representationof a vulnerability We show that β = 1 is not sufficient torepresent a vulnerability β = 2 covers some vulnerabilitiesbut not all β = 3 represents all the vulnerabilities andβ gt 3 can be reduced to only three steps or a three-step sub-pattern can be found in the longer representation Knownaddresses refer to all the cache states that interferencewith the data a aalias and d Unknown address refersu An access to a known memory address is denoted asknown access operation and an invalidation of a knownmemory address is denoted as known inv operationThe known access operation and known inv operation
together make up not u operations An unknown memoryrelated operation (containing u) is denoted as u operation
B1 Patterns with β = 1
When β = 1 there is only one memory-relatedoperation and it is not possible to create interferencebetween memory-related operations since two memory-related operations are the minimum requirement for aninterference Furthermore β = 1 corresponds to the three-step pattern with both Step 1 and Step 2 being since thecache state gives no information and Step 3 being theone operation These types of patterns are all examined bythe cache three-step simulator and none of these types arefound to be effective Consequently a vulnerability cannotexit when β = 1
B2 Patterns with β = 2
When β = 2 it satisfies the minimum requirementof an interference for memory related operations andcorresponds to the three-step cases where Step 1 is and Step 2 and Step 3 are the two operations Thesetypes are all examined by the cache three-step simulatorand some of them belong to Weak Vulnerabilities like Aa Vu Therefore three-step cases where Step
1 is have corresponding effective vulnerabilities shownin Table 2 Consequently β = 2 can represent someweak vulnerabilities but not all vulnerabilities as there existsome that are represented with three steps as discussednext
B3 Patterns with β = 3
When β = 3 we have tested all possible combinations ofthree-step memory related operations in Section 33 usingour cache simulator for the three-step model We foundthat there are in total 72 types of Strong Vulnerabilities
J Hardw Syst Secur (2019) 3397ndash425 419
and 64 types of Weak Vulnerabilities that are representedby patterns with β = 3 steps Consequently β = 3can represent all the vulnerabilities (including some weakones where Step 1 is ) Using more steps to representvulnerabilities is not necessary as discussed next
B4 Patterns with β gt 3
When β gt 3 the pattern of memory-related operations fora vulnerability can be reduced using the following rules
B41 Subdivision Rules
First a set of subdivision rules is used to divide the longpattern into shorter patterns following the below rulesEach subdivision rule should be applied recursively beforeapplying the next rule
Subdivision Rule 1 If the longer pattern contains a sub-pattern such as the longer pattern canbe divided into two separate patterns where is assignedas Step 1 of the second pattern This is because givesno timing information and the attacker loses track of thecache state after This rule should be recursively applieduntil there are no sub-patterns left with a in the middleor as last step ( in the last step will be deleted) in thelonger pattern
Subdivision Rule 2 Next if a pattern (derived afterrecursive application of the Rule 1 contains a sub-patternsuch as AinvVinv the longer pattern canbe divided into two separate patterns where AinvVinv isassigned as Step 1 of the second pattern This is becauseAinvVinv will flush all the timing information of thecurrent block and it can be used as the flushing step forStep 1 eg vulnerability Ainv Vu Aa(f ast)shown in Table 2 AinvVinv cannot be a candidate formiddle steps or the last step because it will flush alltiming information making the attacker unable to deducethe final timing with victimrsquos sensitive address translationinformation This rule should be recursively applied untilthere are no sub-patterns left with a AinvVinv in themiddle or the last step (AinvVinv in the last step will bedeleted)
B42 Simplification Rules
For each of the patterns resulting from the subdivisionof the original pattern we define Commute Rules UnionRules and Reduction Rules for a each set of two adjacentsteps in these remaining patterns In Table 7 we showall the possible cases of the rule applying conditions for
each adjacent two steps regardless of the attackerrsquos access(A) or the victimrsquos access (V ) The table shows whetherthe corresponding two steps can be commuted reduced orunioned (and the reduced or the unioned result if the rulescan be applied)
B421 Commute Rules
Suppose there are two adjacent steps M and N for a memorysequences M N If commuting M andN lead to the same observation result ie M N and N M will have thesame timing observation information in the final step for theattacker we can freely exchange the place of M and N inthis pattern In this case we have more chance to Reduceand Union the steps within the memory sequence by thefollowing Rules In the possible commuting process we willtry every possible combinations to commute different pairsof two steps that are able to apply the Commute Rules andthen further apply Reduce Rules and Union Rules to seewhether the commute is effective ie there can be stepsreduced or unioned after the proper commuting process Thefollowing two adjacent memory-related operations can becommuted
ndash Commute Rule 1 For two adjacent steps if one stepis a known access operation and another step is aknown inv operation and the addresses they referto are different these two steps can be commuted nomatter which position of the two steps they are in withinthe whole memory sequence It will show a ldquoyesrdquo forthe corresponding two-step pattern for the CommuteRule 1 column if these two can be commuted in Table 7
ndash Commute Rule 2 A superset of two-step patterns thatcan apply Commute Rule 1 can be commuted if thesecond step of these two adjacent steps is not thelast step in the whole memory sequence There aresome two adjacent steps that can only be commutedif the second step of these two adjacent steps is notthe last step in the whole memory sequence Therewill be a ldquoyesrdquo for the corresponding two-step patternfor the Commute Rule 2 column and a ldquonordquo for thecorresponding two-step pattern for the Commute Rule 1column in Table 7
B422 Reduction Rules
If the memory sequence after applying Commute Ruleshave a sub-pattern that has two adjacent steps both relatedto known addresses or both related to unknown address(including repeating states) the two adjacent steps can bereduced to only one following the reduction rules (if thetwo-step pattern has ldquoyesrdquo for the Column ldquoUnion Rule or
J Hardw Syst Secur (2019) 3397ndash425420
Table7
Rul
esfo
rco
mbi
ning
two
adja
cent
step
s
Firs
tSe
cond
Com
mute
Com
mute
Union
Rule
orC
ombi
ned
Firs
tSe
cond
Com
mute
Com
mute
Union
Rule
orC
ombi
ned
Rule1
Rule2
ReduceRule
Step
Rule1
Rule2
ReduceRule
Step
aa
yes
yes
yes
aa
inv
ano
noye
sa
aa
ali
as
nono
yes
aali
as
ain
va
ali
as
yes
yes
yes
aali
as
ad
nono
yes
da
inv
dye
sye
sye
sd
au
nono
nominus
ain
vu
nono
nominus
aa
inv
nono
yes
ain
va
inv
ain
vye
sye
sye
sa
inv
aa
ali
asin
vye
sye
sye
sa
ain
va
ali
asin
vye
sye
sye
sU
nio
n(a
inva
ali
asin
v)
ad
inv
yes
yes
yes
aa
inv
din
vye
sye
sye
sU
nio
n(a
invd
inv)
au
inv
nono
nominus
ain
vu
inv
noye
sno
minusa
ali
as
ano
noye
sa
aali
asin
va
yes
yes
yes
a
aali
as
aali
as
yes
yes
yes
aali
as
aali
asin
va
ali
as
nono
yes
aali
as
aali
as
dno
noye
sd
aali
asin
vd
yes
yes
yes
d
aali
as
uno
nono
minusa
ali
asin
vu
nono
nominus
aali
as
ain
vye
sye
sye
sa
ali
as
aali
asin
va
inv
yes
yes
yes
Unio
n(a
inva
ali
asin
v)
aali
as
aali
asin
vno
noye
sa
ali
asin
va
ali
asin
va
ali
asin
vye
sye
sye
sa
ali
asin
v
aali
as
din
vye
sye
sye
sa
ali
as
aali
asin
vd
inv
yes
yes
yes
Unio
n(d
inva
ali
asin
v)
aali
as
uin
vno
nono
minusa
ali
asin
vu
inv
noye
sno
minusd
ano
noye
sa
din
va
yes
yes
yes
a
da
ali
as
nono
yes
aali
as
din
va
ali
as
yes
yes
yes
aali
as
dd
yes
yes
yes
dd
inv
dno
noye
sd
du
nono
nominus
din
vu
noye
sno
minusd
ain
vye
sye
sye
sd
din
va
inv
yes
yes
yes
Unio
n(a
invd
inv)
da
ali
asin
vye
sye
sye
sd
din
va
ali
asin
vye
sye
sye
sU
nio
n(d
inva
ali
asin
v)
dd
inv
nono
yes
din
vd
inv
din
vye
sye
sye
sd
inv
du
inv
noye
sno
minusd
inv
uin
vno
yes
nominus
ua
nono
nominus
uin
va
nono
nominus
ua
ali
as
nono
nominus
uin
va
ali
as
nono
nominus
ud
nono
nominus
uin
vd
noye
sno
minusu
uye
sye
sye
su
uin
vu
nono
yes
u
ua
inv
nono
nominus
uin
va
inv
noye
sno
minusu
aali
asin
vno
nono
minusu
inv
aali
asin
vno
yes
nominus
ud
inv
noye
sno
minusu
inv
din
vno
yes
nominus
uu
inv
nono
yes
uin
vu
inv
uin
vye
sye
sye
su
inv
J Hardw Syst Secur (2019) 3397ndash425 421
Reduce Rulerdquo and has no Union result for the ldquoCombinedSteprdquo Column in Table 7
ndash Reduction Rule 1 For two u operations although u isunknown both of the accesses target on the same u socan be reduced to only keep the second access in thememory sequence
ndash Reduction Rule 2 For two known adjacent memoryaccessndashrelated operations (known access operation)they always result in a deterministic state of the secondmemory accessndashrelated cache block so these two stepscan be reduced to only one step
ndash Reduction Rule 3 For two adjacent steps if onestep is known access operation and another one isknown inv operation no matter what order theyhave and the address they refer to is the same these twocan be reduced to one step which is the second step
B423 Union Rules
Suppose there are two adjacent steps M and N for memorysequences M N If combining M andN leads to the same timing observation result ie M N and Union(M N) will havethe same timing observation information in the final step forthe attacker we can combine step M and N to be a joint onestep for this memory sequence defined as Union(M N)Two adjacent steps that can be combined are discussed inthe following cases
ndash Union Rule 1 Two invalidations to two knowndifferent memory addresses can be applied UnionRule 1 known inv operation are two operations bothinvalidating some known address therefore they canbe combined to only one step The Union Rule can becontinuously done to union all the adjacent invalidationstep that invalidates known different memory addresses
B424 Final Check Rules
Each long memory sequence will recursively apply thesethree categorizations of the rules in the order Com-mute Rules first to put known access operations andknown inv operation that targets the same address as nearas possible and u operations and not u operations areputting together as much as possible The Reduced Rulesare then checked and applied to the processed memorysequence to reduce the steps Then the Union Rule isapplied to the processed memory sequence
The recursion at each application to these threecategorizations of the rules should be always applied andreduce at least one step until the resulting sequence matchesone of the two possible cases
bull the long (β gt 3) memory sequence with u operation
and not u operation is further reduced to a sequencewhere there are at most three steps in the followingpatterns or less
ndash u operation not u operation u operation
ndash not u operation u operation not u operation
There might be possible extra or AinvV inv beforethese three-step pattern where
bull An extra in the first step will not influencethe result and can be directly removed
bull If an extra AinvV inv in the first step
ndash If followed by known access ope-ration AinvV inv can be removeddue to the actual state further put intothe cache block
ndash If followed by known inv opera-t ion or V inv
u AinvV inv can also beremoved since the memory locationis repeatedly flushed by the two steps
ndash If followed by Vu worst case will beAinvV inv Vu not u operation u operationwhich is either an effective vul-nerability according to Table 2and reduction rules shown inSection 33 or AinvV inv Vu Ainv
d V invd u operation where
Vu Ainvd V inv
d can further beapplied Commute Rule 2 to reduceand be within three steps
In this case the steps are finally within three steps andthe checking is done
bull There exist two adjacent steps that cannot be appliedany Rules above and requires the Rest Checking
The only left two adjacent steps that cannot be appliedby any of the three categorizations of the Rules are thefollowing
ndash AaVaAaalias Vaalias AdVdAinva V inv
a
Ainvaalias V inv
aalias Vu ndash AaVaAaalias Vaalias V inv
u ndash Vu AaVaAaalias Vaalias AdVd
Ainva V inv
a Ainvaalias V inv
aalias ndash V inv
u AaVaAaalias Vaalias We manually checked all of the two adjacent step patterns
above and found that adding extra step before or after these
J Hardw Syst Secur (2019) 3397ndash425422
Algorithm 2 -Step ( 3) pattern reduction
Input number of steps of the pattern
a two-dimensional dynamic-size array [0] contains the states of each step of the original pattern in
order [1] [2] are empty initially
Output reduced effective vulnerability pattern(s) array It will be an empty list if the original pattern does not
correspond to an effective vulnerability
1 = Oslash
2 while contain( ) and index not 0 do3 = 1 ( )
4 end while5 while ( contain( ) and index not 0) or ( contain( ) and index not 0) do6 = 2 ( )
7 end while8 while ( is ineffecitve or has interval effective three steps) do9 = ( )
10 = ( )
11 = ( )
12 if ( is ineffecitve or has interval effective three steps) then13 += ( )
14 end if15 end while16 return
two steps can either generate two adjacent step patterns thatbe processed by the three Rules where further step can bereduced or construct effective vulnerability according toTable 2 and reduction rules shown in Section 33 wherethe corresponding pattern can be treated effective and thechecking is done
B43 Algorithm for Reducing and CheckingMemorySequence
Algorithm 2 is used to (i) reduce a β-step (β gt 3)pattern to a three-step pattern thus demonstrating that thecorresponding β gt 3 step pattern actually is equivalent tothe output three-step pattern and represents a vulnerabilitythat is captured by an existing three-step pattern or (ii)demonstrate that the β-step pattern can be mapped to one ormore three-step vulnerabilities It is not possible for a β-stepvulnerability pattern to not be either (i) or (ii) after doing theRule applications Key outcome of our analysis is that anyβ-step pattern is not a vulnerability or if it is a vulnerabilityit maps to either outputs (i) or (ii) of the algorithm
Inside the Algorithm 2 contain() represents a function tocheck if a list contains a corresponding state is ineffective()represents a function that checks the corresponding memorysequence does not contain any effective three-stepshas interval effective three steps() represents a functionthat check if the corresponding memory sequence can bemapped to one or more three-step vulnerabilities
B44 Summary
In conclusion the three-step model can model all possibletiming-based cache vulnerability in normal caches Vulner-abilities which are represented by more than three steps canbe always reduced to one (or more) vulnerabilities from ourthree-step model and thus using more than three step is notnecessary
References
1 Acıicmez O Koc CK (2006) Trace-driven cache attacks on AES(short paper) In International conference on information andcommunications security Springer pp 112ndash121
2 Agrawal D Archambeault B Rao JR Rohatgi P (2002) TheEM side-channel (s) In International workshop on cryptographichardware and embedded systems Springer pp 29ndash45
3 Bernstein DJ (2005) Cache-timing attacks on AES4 Binkert N Beckmann B Black G Reinhardt SK Saidi A Basu
A Hestness J Hower DR Krishna T Sardashti S et al (2011)The gem5 simulator ACM SIGARCH computer architecture news39(2)1ndash7
5 Bonneau J Mironov I (2006) Cache-collision timing attacksagainst AES In International workshop on cryptographichardware and embedded systems Springer pp 201ndash215
6 Borghoff J Canteaut A Guneysu T Kavun EB Knezevic MKnudsen LR Leander G Nikov V Paar C Rechberger Cet al (2012) Princendasha low-latency block cipher for pervasivecomputing applications In International conference on the theoryand application of cryptology and information security Springerpp 208ndash225
J Hardw Syst Secur (2019) 3397ndash425 423
7 Bourgeat T Lebedev I Wright A Zhang S Devadas S et al (2019)MI6 Secure enclaves in a speculative out-of-order processorIn Proceedings of the 52nd Annual IEEEACM InternationalSymposium on Microarchitecture ACM pp 42ndash56
8 Chen S Liu F Mi Z Zhang Y Lee RB Chen H Wang X(2018) Leveraging hardware transactional memory for cache side-channel defenses In Proceedings of the 2018 on Asia conferenceon computer and communications security ACM pp 601ndash608
9 Clark SS Ransford B Rahmati A Guineau S Sorber J Xu WFu K (2013) Wattsupdoc power side channels to nonintrusivelydiscover untargeted malware on embedded medical devices InPresented as part of the 2013 USENIX workshop on healthinformation technologies
10 Costan V Lebedev IA Devadas S (2016) Sanctum minimalhardware extensions for strong software isolation In USENIXsecurity symposium pp 857ndash874
11 Deng S Xiong W Szefer J (2018) Cache timing side-channel vulnerability checking with computation tree logic InProceedings of the 7th international workshop on hardware andarchitectural support for security and privacy ACM p 2
12 Domnitser L Jaleel A Loew J Abu-Ghazaleh N Ponomarev D(2012) Non-monopolizable caches low-complexity mitigation ofcache side channel attacks ACM Transactions on Architectureand Code Optimization (TACO) 8(4)35
13 Doychev G Kopf B (2017) Rigorous analysis of softwarecountermeasures against cache attacks In Proceedings of the 38thACM SIGPLAN conference on programming language design andimplementation ACM pp 406ndash421
14 Doychev G Kopf B Mauborgne L Reineke J (2015) CacheAudita tool for the static analysis of cache side channels ACM Trans-actions on information and system security (TISSEC) 18(1)4
15 Gruss D Lettner J Schuster F Ohrimenko O Haller I CostaM (2017) Strong and efficient cache side-channel protectionusing hardware transactional memory In USENIX securitysymposium pp 217ndash233
16 Gruss D Maurice C Wagner K Mangard S (2016) Flush+ flusha fast and stealthy cache attack In International conference ondetection of intrusions and malware and vulnerability assessmentSpringer pp 279ndash299
17 Gruss D Spreitzer R Mangard S (2015) Cache template attacksautomating attacks on inclusive last-level caches In USENIXsecurity symposium pp 897ndash912
18 Guanciale R Nemati H Baumann C Dam M (2016) Cache stor-age channels alias-driven attacks and verified countermeasuresIn 2016 IEEE symposium on security and privacy (SP) IEEEpp 38ndash55
19 Gullasch D Bangerter E Krenn S (2011) Cache gamesndashbringingaccess-based cache attacks on AES to practice In 2011 IEEEsymposium on security and privacy (SP) IEEE pp 490ndash505
20 He Z Lee RB (2017) How secure is your cache against side-channel attacks In Proceedings of the 50th annual IEEEACMinternational symposium on microarchitecture ACM pp 341ndash353
21 Intel C (2015) Improving real-time performance by utilizing cacheallocation technology Intel Corporation
22 Kayaalp M Khasawneh KN Esfeden HA Elwell J Abu-Ghazaleh N Ponomarev D Jaleel A (2017) RIC relaxed inclusioncaches for mitigating LLC side-channel attacks In 2017 54thACMEDACIEEE design automation conference (DAC) IEEEpp 1ndash6
23 Keramidas G Antonopoulos A Serpanos DN Kaxiras S (2008)Non deterministic caches a simple and effective defense againstside channel attacks Des Autom Embed Syst 12(3)221ndash230
24 Kessler RE Hill MD (1992) Page placement algorithms for largerealindexed caches ACM Trans Comput Syst (TOCS) 10(4)338ndash359
25 Kiriansky V Lebedev I Amarasinghe S Devadas S Emer J(2018) DAWG a defense against cache timing attacks in spec-ulative execution processors In 2018 51st Annual IEEEACMinternational symposium on microarchitecture (MICRO) IEEEpp 974ndash987
26 Kocher P Genkin D Gruss D Haas W Hamburg M LippM Mangard S Prescher T Schwarz M Yarom Y (2018)Spectre attacks exploiting speculativeexecution In 2019 IEEESymposium on Security and Privacy (SP) IEEE pp 1ndash19
27 Lee RB Kwan P McGregor JP Dwoskin J Wang Z (2005)Architecture for protecting critical secrets in microprocessorsIn ACM SIGARCH computer architecture news vol 33 IEEEComputer Society pp 2ndash13
28 Lee Y Waterman A Avizienis R Cook H Sun C StojanovicV Asanovic K (2014) A 45nm 13 GHz 167 double-precisionGFLOPSW RISC-V processor with vector accelerators InEuropean solid state circuits conference (ESSCIRC) ESSCIRC2014-40th IEEE pp 199ndash202
29 Liu F Ge Q Yarom Y Mckeen F Rozas C Heiser G Lee RB(2016) CATalyst defeating last-level cache side channel attacks incloud computing In 2016 IEEE international symposium on highperformance computer architecture (HPCA) IEEE pp 406ndash418
30 Liu F Lee RB (2014) Random fill cache architecture In2014 47th annual IEEEACM international symposium onmicroarchitecture (MICRO) IEEE pp 203ndash215
31 Liu F Wu H Mai K Lee RB (2016) Newcache secure cachearchitecture thwarting cache side-channel attacks IEEE Micro36(5)8ndash16
32 Luk CK Cohn R Muth R Patil H Klauser A Lowney G WallaceS Reddi VJ Hazelwood K (2005) Pin building customizedprogram analysis tools with dynamic instrumentation In ACMsigplan notices vol 40 ACM pp 190ndash200
33 Masti RJ Rai D Ranganathan A Muller C Thiele L Capkun S(2015) Thermal covert channels on multi-core platforms In 24thUSENIX security symposium (USENIX security 15) pp 865ndash880
34 Osvik DA Shamir A Tromer E (2006) Cache attacks andcountermeasures the case of AES In Cryptographersrsquo track atthe RSA conference Springer pp 1ndash20
35 Patel A Afram F Chen S Ghose K (2011) MARSS Afull system simulator for multicore x86 CPUs In 2011 48thACMEDACIEEE design automation conference (DAC) IEEEpp 1050ndash1055
36 Percival C (2005) Cache missing for fun and profit BSDCan 2005Ottawa 2005 httpwwwdaemonologynetpapershttpdf
37 Hammarlund P Martinez AJ Bajwa AA Hill DL Hallnor EJiang H Dixon M Derr M Hunsaker M Kumar R Osborne RB(2014) Haswell The fourth-generation intel core processor IEEEMicro 1034(2)6ndash20
38 Qureshi MK (2018) CEASER mitigating conflict-based cacheattacks via encrypted-address and remapping In 2018 51Stannual IEEEACM international symposium on microarchitecture(MICRO) IEEE pp 775ndash787
39 Sanchez D Kozyrakis C (2013) ZSIm fast and accuratemicroarchitectural simulation of thousand-core systems In ACMSIGARCH computer architecture news vol 41 ACM pp 475ndash486
40 Schwarz M Schwarzl M Lipp M Gruss D (2018) NetSpectreread arbitrary memory over network In European Symposium onResearch in Computer Security Springer pp 279ndash299
41 Seznec A (1993) A case for two-way skewed-associative cachesACM SIGARCH computer architecture news 21(2)169ndash178
42 Sharkey J Ponomarev D Ghose K (2005) M-sim a flexiblemultithreaded architectural simulation environment Techenicalreport Department of Computer Science State University ofNew York at Binghamton In Technical Report 20012 CompaqComputer Corporation Aug 2001
J Hardw Syst Secur (2019) 3397ndash425424
43 Shivakumar P Jouppi NP (2001) Cacti 30 an integrated cachetiming power and area model In Technical Report 20012Compaq Computer Corporation Aug 2001
44 Taylor G Davies P Farmwald M (1990) The TLB slice-alow-cost high-speed address translation mechanism In 17thannual international symposium on computer architecture 1990Proceedings IEEE pp 355ndash363
45 Thoziyoor S Muralimanohar N Ahn JH Jouppi NP (2008)CACTI 51 Technical Report HPL-2008-20 HP Labs
46 Trippel C Lustig D Martonosi M (2018) MeltdownPrimeand SpectrePrime automatically-synthesized attacks exploitinginvalidation-based coherence protocols arXiv180203802
47 Wang S Wang P Liu X Zhang D Wu D (2017) CacheDidentifying cache-based timing channels in production softwareIn 26th USENIX security symposium USENIX association
48 Wang Y Ferraiuolo A Zhang D Myers AC Suh GE (2016)SecDCP secure dynamic cache partitioning for efficient timingchannel protection In 2016 53nd ACMEDACIEEE designautomation conference (DAC) IEEE pp 1ndash6
49 Wang Z Lee RB (2007) New cache designs for thwarting softwarecache-based side channel attacks In ACM SIGARCH computerarchitecture news vol 35 ACM pp 494ndash505
50 Wang Z Lee RB (2008) A novel cache architecture withenhanced performance and security In 2008 41st IEEEACMinternational symposium on Microarchitecture 2008 MICRO-41IEEE pp 83ndash93
51 Werner M Unterluggauer T Giner L Schwarz M Gruss D Man-gard S (2019) ScatterCache thwarting cache attacks via cache setrandomization In 28th USENIX security symposium (USENIXSecurity 19) USENIX Association Santa Clara CA httpswwwusenixorgconferenceusenixsecurity19presentationwerner
52 Yan M Choi J Skarlatos D Morrison A Fletcher C Torrellas J(2018) InvisiSpec making speculative execution invisible in thecache hierarchy In 2018 51st annual IEEEACM internationalsymposium on microarchitecture (MICRO) IEEE pp 428ndash441
53 Yan M Gopireddy B Shull T Torrellas J (2017) Secure hierarchy-aware cache replacement policy (SHARP) defending againstcache-based side channel attacks In Proceedings of the 44thannual international symposium on computer architecture ACMpp 347ndash360
54 Yao F Doroslovacki M Venkataramani G (2018) Are coherenceprotocol states vulnerable to information leakage In 2018IEEE international symposium on high performance computerarchitecture (HPCA) IEEE pp 168ndash179
55 Yarom Y Falkner K (2014) FLUSH+ RELOAD a high resolutionlow noise L3 cache side-channel attack In USENIX securitysymposium pp 719ndash732
56 Zhang D Askarov A Myers AC (2012) Language-based controland mitigation of timing channels ACM SIGPLAN Not 47(6)99ndash110
57 Zhang D Wang Y Suh GE Myers AC (2015) A hardware designlanguage for timing-sensitive information-flow security In ACMSIGARCH computer architecture news vol 43 ACM pp 503ndash516
58 Zhang S Wright A Bourgeat T Arvind A (2018) Composablebuilding blocks to open up processor design In 2018 51stannual IEEEACM international symposium on microarchitecture(MICRO) IEEE pp 68ndash81
59 Zhang T Lee RB (2014) New models of cache architecturescharacterizing information leakage from cache side channels InProceedings of the 30th annual computer security applicationsconference ACM pp 96ndash105
60 Zhang Y Parikh D Sankaranarayanan K Skadron K Stan M(2003) HotLeakage a temperature-aware model of subthresholdand gate leakage for architects University of Virginia Dept ofComputer Science Tech Report CS-2003 5
Publisherrsquos Note Springer Nature remains neutral with regard tojurisdictional claims in published maps and institutional affiliations
J Hardw Syst Secur (2019) 3397ndash425 425
Ainva V inv
a Ainvaalias V inv
aalias Ainvd V inv
d which correspondto invalidation of the address shown in the subscript ofthe state Some additional possible invalidation statesAinvV inv are shown in Figure 1c These states indicate novalid address is in the cache block Therefore all the possi-ble addresses that mapped to this cache block eg a aalias d and u (if it mapped to this block) before the invalidationstep AinvV inv will be flushed back to the memory
33 Derivation of All Cache TimingndashBasedVulnerabilities
With the 17 candidate states shown in Table 1 for eachstep there are in total 17 lowast 17 lowast 17 = 4913 combinationsof three steps We developed a cache three-step simulatorand a set of reduction rules to process all the three-step combinations and decide which ones can indicate areal attack As is shown in Fig 2 the exhaustive list ofthe 4913 combinations will first be input to the cachethree-step simulator where the preliminary classificationof vulnerabilities is derived The effective vulnerabilitieswill then be sent as the input to the reduction rules toremove the redundant three steps and obtain final list ofvulnerabilities
331 Cache Three-Step Simulator
We developed a cache three-step simulator that simulatesthe state of one cache block and derives the attackerrsquosobservations in the last step of the three-step patterns that itanalyzes for different possible u Since u is in secure rangex the possible candidates of u for a cache block are a aalias
and NIB (Not-In-Block) Here NIB indicates the case thatu does not have same index as a or aalias and thus does notmap to this cache block
The cache three-step simulator is implemented inPython script and its pesudo implementation is shown inAlgorithm 1 Simulatorrsquos inputs are 17 possible states foreach of the step Outputs are all the vulnerabilities thatbelong to the Strong or the Weak type or the Ineffective typeThe simulator uses a nested for loop to check all possiblecombinations (4913) of the three step pattern For each step
Exhaustive List
of all possible
three-step
combinations
Cache
Three-Step
Simulator
Preliminary Strong Vulnerability
Preliminary WeakVulnerability
Ineffective Three-Step
Reduction
Rules
StrongVulnerability
WeakVulnerability
Classification Step
Reduction Step
Vulnerability TypesVulnerability Types
4913
132
572
4209
72
64
Fig 2 Procedure to derive the effective types of three-step timing-based vulnerabilities Ovals refer to the number of vulnerabilitiesin each category
of each pattern if it is Vu this step will be extended to beone of three candidates Va Vaalias and VNIB If it is V inv
u this step will be extended to be one of three candidatesV inv
a V invaalias and V inv
NIB We wrote a function output timingthat takes three known memory access steps as input andoutput whether fast or slow timing will be observed forthe last step In this case for each of the u-related steprsquoscandidate we can derive a timing observation Using thesetiming observation function judge type decides whether athree-step pattern is a potential vulnerability by analyzingwhether the attacker is able to observe different andunambiguous timing for different values of u
The simulator categorizes all the three-step patternsinto three categories as listed below Figure 3 showstwo examples for the Strong Vulnerability (a b) WeakVulnerability (c d) and Ineffective Three-Step (e f)categories respectively
1 Strong Vulnerability When a fast or slow timing isobserved by the attacker he or she is able to uniquelydistinguish the value of u (either it maps to someknown address or has the same index with some knownaddress) In this case the vulnerability has stronginformation leakage (ie attacker can directly obtainthe value of u based on the observed timing) Wecategorize these vulnerabilities to be strong Eg forVd Vu Aa vulnerability shown in Fig 3aif u maps to a the attacker will always derive fasttiming If u is aalias or NIB slow timing will beobserved This indicates that the attacker is able tounambiguously infer the victimrsquos behavior (u) from thetiming observation
2 Weak Vulnerability When fast or slow timing isobserved by the attacker he or she knows it correspondsto more than one possible value of u (eg a or aalias)For these vulnerabilities timing variation can still beobserved due to different victimrsquos behavior Howeverthe attacker cannot learn the value of the index ofthe address u unambiguously For example for type Vu Ainv
a shown in Fig 3c when fast timingis observed u possibly maps to aalias or NIB (thereason for the possibility of u mapping to NIB to derivefast timing is that due to the in Step 1 the cachecould have a hit and then Aa would result in a cachehit) On the other hand when slow timing is observedu possibly maps to a or NIB This pattern leads touncertain u guess about value of u based on timingobservation
3 Ineffective Three-Step The remaining types are treatedto be ineffective For example for type Aa Vu Ad
shown in Fig 3f no matter what the value of u isattackerrsquos observation is always slow timing
J Hardw Syst Secur (2019) 3397ndash425402
Pseudo-code for the cache three-step simulator algorithm
Input a list containing 17 possible states for each of the step
Output a list containing all the vulnerabilities that belong to the Strong type
a list containing all the vulnerabilities that belong to the Weak type
a list containing all the ineffective typs
1 for 1 do2 for 2 do3 for 3 do4 1 2 3
5 array to store all possible candidate combinations of this three-step pattern
6 array to store all possible timing observation regading different candidate combinations for this
three-step pattern
7 if 0 or 1 or 2 then8 for 3 rsquos candidates are and rsquos candidates are
and Both candidatersquos number is 3 do9 0
1 2
10 end for11 for 3 do12
13 end for14 if then15 strongappend(steps)
16 else17 if then18
19 else20
21 end if22 end if23 else24
25 continue26 end if27 end for28 end for29 end for
Algorithm 1
After computing the type of all the three-step patternsthe cache three-step simulator will output effective (StrongVulnerability or Weak Vulnerability) three-step patternsDue to the space limit we only list and analyze the Strongvulnerabilities in this paper Weak vulnerabilities are left forfuture work when channels with smaller channel capacitiesare desired to be analyzed
332 Reduction Rules
We also have developed rules that can further reduce theoutput list of all the effective three steps from the cache
three-step simulator Figure 2 shows how the output of thesimulator is filtered through the reduction rules to get thefinal list of vulnerabilities Reductionrsquos goal is to removevulnerabilities of repeating or redundant types from the liststo form effective Strong Vulnerability or Weak Vulnerabilityoutput A script was developed that automatically appliesbelow reduction rules to the output of the simulator to getthe final list of vulnerabilities A three-step combinationwill be eliminated if it satisfies one of the below rules
1 Three-step patterns with two adjacent steps which arerepeating or which are both known to the attacker canbe eliminated eg Ad Aa Vu can be reduced
J Hardw Syst Secur (2019) 3397ndash425 403
Behavior (u)a
aalias
Observation
NIB
fast
slow
Vd Vu Aa
Behavior (u)a
aalias
Observation
NIB
fast
slow
Vu Aainv
Behavior (u)a
aalias
Observation
NIB
fast
slow
Vd Vuinv Vd
(a)
(c)
(e)
Behavior (u)a
aalias
Observation
NIB
fast
slow
Vu Ad Vuinv
Behavior (u)a
aalias
Observation
NIB
fast
slow
Aaaliasinv Vu
inv Va
Behavior (u)a
aalias
Observation
NIB
fast
slow
Aa Vu Ad
(b)
(d)
(f)
Eg Eg
Eg Eg
Eg Eg
Fig 3 Examples of relations between victimrsquos behavior (u) andattackerrsquos observation for each vulnerability type a b StrongVulnerability c d Weak Vulnerability e f Ineffective Three-Step
to Aa Vu which is equivalent to Aa VuTherefore Ad Aa Vu is a repeat type of Aa Vu and can be eliminated
2 Three-step patterns with a step involving a knownaddress a and an alias to that address aalias givesthe same information Thus three step combinationswhich only differ in use of a or aalias cannot representdifferent attacks and only one combination needs tobe considered For example Vu Aaalias Vu is arepeat type of Vu Aa Vu and we will eliminatethe first pattern
3 Three-step patterns with steps Vu and V invu in adjacent
consecutive steps with each other will only keep thelatter step and eliminate the first step For exampleAa Vu V inv
u can be reduced to Aa V invu and
further equivalent to Aa V invu So Aa Vu
V invu can be eliminated
333 Categorization of StrongVulnerabilities
As is shown in Fig 2 after applying the reduction rulesthere are 72 types of Strong vulnerabilities remaining InAppendix B we analyze the soundness of the three-stepmodel to demonstrate that the three-step model can cover allpossible cache timingndashbased side-channel vulnerabilitiesAnd if there is a vulnerability it can always be reduced toa model that requires only three steps Table 2 lists all the
vulnerability types of which the last step is a memory accessand Table 3 shows all the vulnerability types of which thelast step is an invalidation-related operation To ease theunderstanding of all the vulnerability types we group thevulnerabilities based on attack strategies (left-most columnin Tables 2 and 3) these strategies correspond to well-known names for the attacks if such exist otherwise weprovide a new name In Appendix A we provide descriptionfor each attack strategy to show the main idea behind themWe use existing names for attack strategies where suchexisted before even if similar attacks eg attacks differingin only one step have been given different names before Weuse these established names to avoid confusion but detailsome of the similarities in Appendix A as a clarification
The list of vulnerability types can be further collectedinto four simple macro types which cover one or morevulnerability types internal interference miss-based (IM)internal interference hit-based (IH) external interferencemiss-based (EM) external interference hit-based (EH) aslabeled in the Macro Type column of Tables 2 and 3 Allthe types of vulnerabilities that only involve the victimrsquosbehavior V in the states in Step 2 and Step 3 are calledinternal interference vulnerabilities (I) The remaining onesare called external interference (E) Some vulnerabilitiesallow the attacker to learn that the address of the victimaccesses map to the set the attacker is attacking by observingslow timing due to a cache miss or f ast timing due toinvalidation of data not in the cache2 We call these miss-based vulnerabilities (M) The remaining ones leverageobservation of f ast timing due to a cache hit or slow timingdue to an invalidation of an address that is currently valid inthe cache and are called hit-based vulnerabilities (H)
Many vulnerability types have been explored before Forexample the Cache Collision attack [5] is effectively basedon the Internal Collision and it maps to types labeled (2)in the Attack column in Tables 2 and 3 The types labelednew correspond to new attack not previously discussedin literature We believe these 43 are new attacks notpreviously analyzed nor known
4 Secure Caches
Having explained the three-step model we now explorethe various secure caches which have been presented inliterature to date [7 10 12 22 23 25 27 29 30 38 48ndash53 56 57] Later in Section 5 we will apply the three-stepmodel to check if the secure caches can defend some or allof the vulnerabilities in our model
2Invalidation is fast when the corresponding address which is to beinvalidated does not exist in the cache since no operation is needed forthe invalidation
J Hardw Syst Secur (2019) 3397ndash425404
Table 2 The table shows all the cache timing-based cache vulnerabilities where the last step is a memory access related operation The AttackStrategy column gives a common name for each set of one or more specific vulnerabilities that would be exploited in an attack in a similar mannerThe Vulnerability Type column gives the three steps that define each vulnerability For Step 3 fast indicates a cache hit must be observed toderive sensitive address information while slow indicates a cache miss must be observed The Macro Type column proposes the categorizationthe vulnerability belongs to ldquoErdquo is for external interference vulnerabilities ldquoIrdquo is for internal interference vulnerabilities ldquoMrdquo is for miss-basedvulnerabilities ldquoHrdquo is for hit-based vulnerabilities The Attack column shows if a type of vulnerability has been previously presented in literature
Attack Strategy Vulnerability Type Macro Type Attack
Step 1 Step 2 Step 3
Cache Internal Collision Ainv Vu Va (fast) IH (2)
V inv Vu Va (fast) IH (2)
Ad Vu Va (fast) IH (2)
Vd Vu Va (fast) IH (2)
Aaalias Vu Va (fast) IH (2)
Vaalias Vu Va (fast) IH (2)
Ainva Vu Va (fast) IH (2)
V inva Vu Va (fast) IH (2)
Flush + Reload Ainva Vu Aa (fast) EH (5)
V inva Vu Aa (fast) EH (5)
Ainv Vu Aa (fast) EH (5)
V inv Vu Aa (fast) EH (5)
Ad Vu Aa (fast) EH (5)
Vd Vu Aa (fast) EH (5)
Aaalias Vu Aa (fast) EH (5)
Vaalias Vu Aa (fast) EH (5)
Reload + Time V invu Aa Vu (fast) EH new
V invu Va Vu (fast) IH new
Flush + Probe Aa V invu Aa (slow) EM (6)
Aa V invu Va (slow) IM new
Va V invu Aa (slow) EM new
Va V invu Va (slow) IM new
Evict + Time Vu Ad Vu (slow) EM (1)
Vu Aa Vu (slow) EM (1)
Prime + Probe Ad Vu Ad (slow) EM (4)
Aa Vu Aa (slow) EM (4)
Bernsteinrsquos Attack Vu Va Vu (slow) IM (3)
Vu Vd Vu (slow) IM (3)
Vd Vu Vd (slow) IM (3)
Va Vu Va (slow) IM (3)
Evict + Probe Vd Vu Ad (slow) EM newVa Vu Aa (slow) EM new
Prime + Time Ad Vu Vd (slow) IM newAa Vu Va (slow) IM new
Flush + Time Vu Ainva Vu (slow) EM new
Vu V inva Vu (slow) IM new
(1) Evict + Time attack [34] (2) Cache Internal Collision attack [5] (3) Bernsteinrsquos attack [3] (4) Prime + Probe attack [34 36] Alias-drivenattack [18] (5) Flush + Reload attack [54 55] Evict + Reload attack [17] (6) SpectrePrime MeltdownPrime attack [46]
J Hardw Syst Secur (2019) 3397ndash425 405
Table 3 The table shows the second part of the timing-based cache side-channel vulnerabilities where the last step is an invalidation relatedoperation For Step 3 fast indicates no corresponding address of the data is invalidated while slow indicates invalidation operation makes somedata invalid causing longer processing time
Attack Vulnerability Type Macro Attack
Step 1 Step 2 Step 3
Strategy Type
Cache Internal Collision Invalidation Ainv Vu V inva (slow) IH new
V inv Vu V inva (slow) IH new
Ad Vu V inva (slow) IH new
Vd Vu V inva (slow) IH new
Aaalias Vu V inva (slow) IH new
Vaalias Vu V inva (slow) IH new
Flush + Flush Ainva Vu V inv
a (slow) IH (1)
V inva Vu V inv
a (slow) IH (1)
Ainva Vu Ainv
a (slow) EH (1)
V inva Vu Ainv
a (slow) EH (1)
Flush + Reload Invalidation Ainv Vu Ainva (slow) EH new
V inv Vu Ainva (slow) EH new
Ad Vu Ainva (slow) EH new
Vd Vu Ainva (slow) EH new
Aaalias Vu Ainva (slow) EH new
Vaalias Vu Ainva (slow) EH new
Reload + Time Invalidation V invu Aa V inv
u (slow) EH new
V invu Va V inv
u (slow) IH new
Flush + Probe Invalidation Aa V invu Ainv
a (fast) EM new
Aa V invu V inv
a (fast) IM new
Va V invu Ainv
a (fast) EM new
Va V invu V inv
a (fast) IM new
Evict + Time Invalidation Vu Ad V invu (fast) EM new
Vu Aa V invu (fast) EM new
Prime + Probe Invalidation Ad Vu Ainvd (fast) EM new
Aa Vu Ainva (fast) EM new
Bernsteinrsquos Invalidation Attack Vu Va V invu (fast) IM new
Vu Vd V invu (fast) IM new
Vd Vu V invd (fast) IM new
Va Vu V inva (fast) IM new
Evict + Probe Invalidation Vd Vu Ainvd (fast) EM new
Va Vu Ainva (fast) EM new
Prime + Time Invalidation Ad Vu V invd (fast) IM new
Aa Vu V inva (fast) IM new
Flush + Time Invalidation Vu Ainva V inv
u (fast) EM new
Vu V inva V inv
u (fast) IM new
(1) Flush + Flush attack [16]
J Hardw Syst Secur (2019) 3397ndash425406
This section gives brief overview of the 18 secure cachedesigns that have been presented in academic literaturein the last 15 years To the best of our knowledge thesecover all the secure cache designs proposed to date Mostof the designs have been realized in functional simulationeg [23 48] Some have been realized in FPGA eg [7]and a few have been realized in real ASIC hardwareeg [31] No specific secure caches have been implementedin commercial processors to the best of our knowledgehowever CATalyst [29] leverages Intelrsquos CAT (CacheAllocation Technology) technology available today in IntelXeon E5 2618L v3 processors and could be deployed today
When the secure cache description in the cited papersdid not mention the issue of using flush or cache coherencewe assume the victim or the attacker cannot invalidateeach otherrsquos cache blocks by using clf lush instructions orthrough cache coherence protocol operations but they canflush or use cache coherence to invalidate their own cachelines The victim and the attacker also cannot invalidateprotected or locked data Further if the authors specified anyspecific assumptions (mainly about the software) we list theassumption as part of the description of the cache Whatrsquosmore when the level of cache hierarchy was unspecifiedwe assume the secure cachesrsquo features can be appliedto all levels of caches including L1 cache L2 cacheand Last Level Cache (LLC) If the inclusivity of thecaches was not specified we assume they target inclusivecaches Following the below descriptions of each securecache design the analysis of the secure caches is given inSection 5
SPlowast Cache [20 27]3 uses partitioning techniques tostatically partition the cache ways into High and Lowpartition for the victim and the attacker according to theirdifferent process IDs The victim typically belongs to Highsecurity and attacker belongs to Low security Victimrsquosmemory accesses cannot modify Low partition (assigned toprocesses such as the attacker) while the attackerrsquos memoryaccesses cannot modify High partition (assigned to thevictim) However the memory accesses of both the victimand the attacker can result in a hit in either Low or Highpartition if the data is in the cache
SecVerilog Cache [56 57] statically partitions cacheblocks between security levels L (Low) and H (High)Each instruction in the source code for programs usingSecVerilog cache needs to include a t iming label whicheffectively represents whether the data being accessed by
3Two existing papers give slightly different definitions for an ldquoSPrdquocache thus we selected to define a new cache the SPlowast cachethat combines secure cache features of the Secret-Protecting cachefrom [27] with secure cache features of the Static-Partitioned cachefrom [20]
that instruction is Low or High based on the code andthis t iming label can be similar to a process ID thatdifferentiates attackerrsquos (Low) instructions from victimrsquos(High) instructions The cache is designed such thatoperations in the High partition cannot affect timing ofoperations in the Low partition For a cache miss due to Lowinstructions when the data is in the High partition it willbehave as a cache miss and the data will be moved from theHigh to the Low partition to preserve consistency HoweverHigh instructions are able to result in a cache hit in bothHigh and Low partitions if the data is already in the cache
SecDCPCache [48] builds on the SecVerilog cache and usespartitioning idea from the original SecVerilog cache but thepartitioning is dynamic It can support at least two securityclasses H (High) and L (Low) and configurations with moresecurity classes are possible They use the percentage ofcache misses for L instructions that was reduced (increased)when Lrsquos partition size was increased (reduced) by onecache way to adjust the number of ways of the cacheassigned to the Low partition When adjusting number ofways in the cache dedicated to each partition if Lrsquos partitionsize decreases the process ID is checked and L blocksare flushed before the way is reallocated to H On theother hand if Lrsquos partition size increases H blocks in theadjusted cache way remain unmodified so as to not addmore performance overhead and they will eventually beevicted by Lrsquos memory accesses However the feature ofnot flushing High partition data during way adjustment mayleak timing information to the attacker
NoMo Cache [12] dynamically partitions the cache waysamong the currently ldquoactiverdquo simultaneous multithreading(SMT) threads Each thread is exclusively reserved Y blocksin each cache set where Y is within the range of [0 N
M]
where N is the number of ways and M is the number ofSMT threads NoMo-0 equals to traditional set associativecache while NoMo- N
M partitions cache evenly for the
different threads and there are no non-reserved ways Thenumber of Y assigned to each thread is adjusted based onits activeness When adjusting number of blocks assigned toa thread Y blocks are invalidated for cache sets to protecttiming leakage Eviction is not allowed within each threadrsquosown reserved ways while it is possible for the shared waysTherefore to avoid eviction caused by the unreserved wayswe assume NoMo- N
M is used to fully partition the cache
When the attacker and the victim share the same librarythere will be a cache hit if accessing the shared data andthe normal cache hit policy holds to guarantee the cachecoherence
SHARP Cache [53] uses both partitioning and randomiza-tion techniques to prevent victimrsquos data from being evicted
J Hardw Syst Secur (2019) 3397ndash425 407
or flushed by other malicious processes and it targets on theinclusive caches Each cache block is augmented with thecore valid bits (CVB) to indicate which private cache (pro-cess) it belongs to (similar to the Process ID) where CVBstores a bitmap and ith bit in the bitmap is set if the lineis present in ith corersquos private cache Cache hit is allowedamong different processesrsquo data When there is cache missand data needs to be evicted data not belonging to any cur-rent processes will be evicted first If there is no such datadata belonging to the same process will be evicted If thereis no existing data in the cache that is in the same processa random data in the cache set will be evicted This randomeviction will generate an interrupt to the OS to notify it ofa suspicious activity For pages that are read-only or exe-cutable SHARP cache disallows flushing using clf lush inuser mode However invalidating victimrsquos blocks by usingcache coherence protocol is still possible
Sanctum Cache [10] focuses on isolation of enclaves(equivalent to Trusted Software Module in other designs)from each other and the operating system (OS) In terms ofcaches they implement security features for L1 cache TLBand LLC Cache isolation of LLC is achieved by assigningeach enclave or OS to different DRAM address regionsIt uses page-coloring-based cache partitioning scheme [2444] and a software security monitor that ensures per-core isolation between OS and enclaves For L1 cacheand TLB when there is a transition between enclave andnon-enclave mode the security monitor will flush thecore-private caches to achieve isolation Normal flushestriggered by the enclave or the OS can only be done withinenclave or not within enclave code Also timing-based side-channel attacks exploiting cache coherence are explicitlynot prevented thus behavior on cache coherence operationsis not defined This cache listed extra software assumptionsas follows
Assumption 1 Software security monitor guarantees thatvictim and attacker process cannot share the same cacheblocks It uses page coloring [24 44] to ensure that victimand attackerrsquos memory is never mapped to the same cacheblocks for the LLCAssumption 2 The software runs on a system with asingle processor core where victim and attacker alternateexecution but can never run truly in parallel Moreoversecurity critical data is always flushed by the securitymonitor when program execution switches away from thevictim program for the L1 cache and the TLB
MI6 Cache [7] is part of the memory hierarchy of theMI6 processor which combines Sanctum [10] cachersquossecurity feature with disabling speculation during thespeculative execution of memory-related operations During
normal processor execution for L1 caches and TLB thecorresponding states will be flushed across context switchesbetween software threads For the LLC set partitioning isused to divide DRAM into contiguous regions And cachesets are guaranteed to be strictly partitioned (two DRAMregions cannot map to the same cache set) Each enclaveis only able to access its own partition Speculation issimply disabled when enclave interacts with the outsideworld because of small performance influence based on therare cases of speculation This cache listed extra softwareassumptions as follows
Assumption 1 Software security monitor guarantees thatthe victim and the attacker process cannot share the samecache blocks It uses page coloring [24 44] to ensure thatvictimrsquos and attackerrsquos memory are never mapped to thesame cache blocks for the LLCAssumption 2 The software runs on a system with asingle processor core where victim and attacker alternateexecution but can never run truly in parallel Moreoversecurity critical data is always flushed by the securitymonitor when program execution switches away from thevictim program for the L1 cache and the TLBAssumption 3 When an enclave is interacting with theoutside environment the corresponding speculation isdisabled by the software
InvisiSpec Cache [52] is able to make speculation invisiblein the data cache hierarchy Before a visibility pointshows up when all of its prior control flow instructionsresolve unsafe speculative loads (USL) will be put intoa speculative buffer (SB) without modifying any cachestates When reaching the visibility point there are twocases In one case the USL and successive instructionswill be possibly squashed because of mismatch of datain the SB and the up-to-date values in the cache Inanother case the core receives possible invalidation fromthe OS before checking of memory consistency modeland no comparison is needed When speculative executionhappens the hardware puts the data into SB as to identifyvisibility point for dealing with final state transition of thespeculative execution InvisiSpec cache targets on Spectre-like attacks and futuristic attacks However InvisiSpeccache is vulnerable to all non-speculative side channels
CATalyst Cache [29] uses partitioning especially CacheAllocation Technology (CAT) [21] available in the LLCof some Intel processors CAT allocates up to 4 differentClasses of Services (CoS) for separate cache ways sothat replacement of cache blocks is only allowed withina certain CoS CATalyst first uses CAT mechanism topartition caches into secure and non-secure parts (non-secure parts may map to 3 CoS while secure parts map
J Hardw Syst Secur (2019) 3397ndash425408
to 1 CoS) Secure pages are assigned to virtual machines(VMs) at a granularity of a page and not shared by morethan one VM Here attacker and victim reside in differentVMs Combined with CAT technology and pseudo-lockingmechanism which pins certain page frames managed bysoftware CATalyst guarantees that malicious code cannotevict secure pages CATalyst implicitly performs preloadingby remapping security-critical code or data to secure pagesFlushes can only be done within each VM And cachecoherence is achieved by assigning secure pages to only oneprocessor and not sharing pages among VMs This cachelisted extra software assumptions as follows
Assumption 1 Security critical data is always preloadedinto the cache at the beginning of the whole programexecutionAssumption 2 Security critical data is always able to fitwithin the secure partition of the cache That is all datain the range x can fit in the secure partitionAssumption 3 The victim and the attacker process cannotshare the same memory spaceAssumption 4 Use pseudo-locking mechanism by soft-ware to make sure that victim and attacker process cannotshare the same cache blocksAssumption 5 Secure pages are reloaded immediatelyafter the flush which is done by the virtual machinemonitor (VMM) to make sure all the secure pages are stillpinned in the secure partition
DAWG Cache [25] (Dynamically Allocated Way Guard)partitions the cache by cache ways and provides fullisolation for hits misses and metadata updates acrossdifferent protection domains (between the attacker and thevictim) DAWG cache is partitioned for the attacker and thevictim and each of them keep their own different domain id(which is similar to process ID used in general caches) Eachdomain id has its own bit fields one is called policy fillmapfor masking fills and selecting the victim to replace anotheris called policy hitmap for masking hit ways Only boththe tag and the domain id are the same will a cache hithappen Therefore DAWG allows read-only cache lines tobe replicated across ways for different protection domainFor a cache miss the victim can only be chosen withinthe ways belonging to the same domain id recorded bythe policy fillmap Consistently the replacement policy isupdated with the victim selection and the metadata derivedfrom the policy fillmap for different domains is updatedas well The paper also proposes the idea to dynamicallypartitions the cache ways following the systemrsquos workloadchanges but does not actually implement it
RIC Cache [22] (Relaxed Inclusion Caches) proposes a low-complexity cache to defend against eviction-based timing-based side-channel attacks on the LLC Normally for an
inclusive cache if the data R is in the LLC it is also inthe higher level cache and eviction of the R in the LLCwill cause the same data in the higher level cache eg L1cache to be invalidated making eviction-based attacks inthe higher level cache possible (eg attacker is able to evictvictimrsquos security critical cache line) For RIC each cacheline is extended with a single bit to set the relaxed inclusionOnce the relaxed inclusion is set for that cache line thecorresponding LLC line eviction will not cause the sameline in the higher-level cache to be invalidated Two kinds ofdata will be set relaxed inclusion bit read only data andthread private data when they are loaded into the cacheThese two kinds of data are claimed by the paper to cover allthe critical data for ciphers Therefore RIC will not preventwritable in-private critical data which is currently not foundin any ciphers Apart from that RIC requires flushing for thecorresponding cache lines in the cases that the RIC bits aremodified or for thread migration events to avoid the timingleakage during transition time
PL Cache [49] provides isolation by partitioning the cachebased on cache blocks It extends each cache block with aprocess ID and a lock status bit The process ID and thelock status bits are controlled by the extended load and storeinstructions (ld lockld unlock and st lockst unlock)which allow the programmer and compiler to set or reset thelock bit through use of the right load or store instructionIn terms of cache replacement policy for a cache hit PLcache will perform the normal cache hit handling procedureand the instructions with locking or unlocking capability canupdate the process ID and the lock status bits while the hit isprocessed When there is a cache miss locked data cannotbe evicted by data that is not locked and locked data amongdifferent processes cannot be evicted by each other In thiscase the new data will be either loaded or stored withoutcaching In other cases data eviction is possible This cachelisted extra software assumption as follows
Assumption 1 Security critical data is always preloadedinto the cache at the beginning of the whole programexecution
RP Cache [49] uses randomization to de-correlate thememory address accessing and timing of the cache Foreach block of RP cache there is a process ID and oneprotection bit P set to indicate if this cache block needs to beprotected or not A permutation table (PT) stores each cachesetrsquos pre-computed permuted set number and the numberof tables depends on number of protected processes Formemory access operations cache hits need both process IDand address to be the same When a cache miss happensto data D of a cache set S if the to-be-evicted data andto-be-brought-in data belong to the same process but have
J Hardw Syst Secur (2019) 3397ndash425 409
different protection bit arbitrary data of a random cache setSprime will be evicted and D will be accessed without cachingIf they belong to different processes D will be stored in anevicted cache block of Sprime and mapping of S and Sprime will beswapped as well Otherwise the normal replacement policyis executed
Newcache Cache [31 50] dynamically randomizesmemory-to-cache mapping It introduced a ReMappingTable (RMT) and the mapping between memory addressesand this RMT is as in a direct mapped cache while themapping between the RMT and actual cache is fully asso-ciative The index bits of memory address are used to lookup entries in the RMT to find the cache block that shouldbe accessed It stores the most useful cache lines rather thanhold a fixed set of cache lines This index stored in RMTcombined with the process ID is used to look up the actualcache where each cache line is associated with its real indexand process ID Each cache block is also associated witha protection bit (P) to indicate if it is security critical Forcache replacement policy it is very similar to RP cacheCache hit needs both process ID and address to be the sameWhen cache miss happens to data D arbitrary data willbe evicted and D will be accessed without caching if theybelong to the same process but either one of their protec-tion bit is set If the evicted data and brought-in data havedifferent process IDs D will randomly replace a cache linesince it is fully associative in the actual cache Otherwisethe normal replacement policy for direct mapped cache isexecuted
Random Fill Cache [30] de-correlates cache fills withthe memory access using random filling technique Newinstructions used by applications in Random Fill cachecan control if the requested data belongs to a normalrequest or a random fill request Cache hits are processedas in normal cache For the security critical data accessesof the victim a Nof ill request is executed and therequested data access will be performed without cachingMeanwhile on a Random Fill request arbitrary datafrom the range of addresses will be brought into the cacheIn the paper [30] the authors show that random fill ofspatially near data does not hurt performance For otherprocessesrsquo memory accesses and normal victimrsquos memoryaccesses Normal request will be used to achieve normalreplacement policy Victim and attacker are able to removevictimrsquos own security critical data including using clf lush
instructions or cache coherence protocol since the flush willnot influence timing-based side-channel attack prevention(the random filling technique is used for this)
CEASER Cache [38] is able to mitigate conflict-basedLLC timingndashbased side-channel attacks using address
encryption and dynamic remapping CEASER cache doesnot differentiate whom the address belongs to and whetherthe address is security critical When memory access triesto modify the cache state the address will first be encryptedusing Low-Latency BlockCipher (LLBC) [6] which notonly randomizes the cache set it maps but also scatters theoriginal possibly ordered and location-intensive addressesto different cache sets decreasing the probability of conflictmisses The encryption and decryption can be done withintwo cycles using LLBC Furthermore the encryption keywill be periodically changed to avoid key reconstructionThe periodic re-keying will cause the address remapping todynamically change
SCATTER Cache [51] uses cache set randomization toprevent timing-based attacks It builds upon two ideas Firsta mapping function is used to translate memory addressand process information to cache set indices the mappingis different for each program or security domain Secondthe mapping function also calculates a different index foreach cache way in a similar way to the skewed associativecaches [41] The mapping function can be keyed hash orkeyed permutation derivation functionmdasha different key isused for different application or security domain resultingin a different mapping from address to cache sets for eachSoftware (eg the operating system) is responsible formanaging the security domains and process IDs which areused to differentiate the different software and assign itdifferent keys for the mapping For the hardware extensiona cryptographic primitive such as hashing and an indexdecoder for each scattered cache way is added SCATTERcache also stores the index bits of the physical addressto efficiently perform lookups and writebacks There isalso one bit per page-table entry added to allow the kernelto communicate with the user space for security domainidentification
Non Deterministic Cache [23] uses cache access delay torandomize the relation between cache block access andcache access timing There is no differentiation of datacaching between different process ID or whether the datais secure or not A per-cache-block counter records theinterval of its data activeness and is increased on eachglobal counter clock tick when the data is untouched Whenthe counter reaches a predefined value the correspondingcache line will be invalidated Non Deterministic Cacherandomly sets the local countersrsquo initial value that is lessthan the maximum value of the global counter In thiscase the cache delay is changed to be randomized Cachedelay interval controlled by this non-deterministic executioncan lead to different cache hit and miss statistics becausethe invalidation is determined by the randomized counterof each cache line and therefore de-correlates any cache
J Hardw Syst Secur (2019) 3397ndash425410
access time from the address being accessed However theperformance degradation is tremendous
5 Analysis of the Secure Caches
In this section we manually evaluate the effectiveness ofthe 18 secure caches [7 10 12 22 23 25 27 29 30 3848ndash53 56 57] We analyze how well the different cachescan protect against the 72 types of vulnerabilities definedin Tables 2 and 3 which cover all the possible Strong(according to the definition in Section 3) cache timingndashbased vulnerabilities Following the analysis discuss whattypes of secure caches and features are best suited fordefending different types of timing-based attacks
51 Effectiveness of the Secure Caches AgainstTiming-Based Attacks
Tables 4 and 5 list the result of our analysis of whichcaches can prevent which types of attacks Some cachesare able to prevent certain vulnerabilities denoted by acheckmark and green color in the table For exampleSPlowast cache can defend against Vu Ad Vu (slow)(one type of Evict + Time [34]) vulnerability For someother caches and vulnerabilities the cache is not able toprevent the vulnerabilities and it is indicated by times and redcolor For example SecDCP cache cannot defend againstVu Va Vu (slow) (one type of Bernsteinrsquos Attack [3])vulnerability
Each cache is analyzed for each type of vulnerabilitylisted in Tables 2 and 3 A cache is judged to be able toprevent a type of cache timingndashbased vulnerability in threecases
1 A cache can prevent a timing attack if the timing ofthe last step in a vulnerability is always constant andthe attacker can never observe fast and slow timingdifference for the given set of three steps For instancein a regular set-associative cache the Vd Vu Aa
(fast) (one type of Flush + Reload [55]) vulnerabilitywill allow the attacker to know that address a mapsto secret u when the attacker observes fast timingcompared with observing slow timing in the other casesHowever in case of the RP cache [49] will make thetiming of the last step to be always slow because RPcache does not allow data of different processes toderive cache hit between each other
2 A cache can prevent a timing attack if the timingof last step is randomized and cannot have originalcorresponding relation between victimrsquos behavior andattackerrsquos observation For instance Ad Vu Ainv
d (fast) (one type of Prime + Probe Invalidation)
vulnerability when executed on a normal set-associativecache will allow the attacker to know that the addressd has the same index with secret u when observing fasttiming compared with slow timing in the other casesHowever when executing this attacks on the RandomFill cache [30] for example a slow timing will notdetermine that u and d have the same index as thesecret since in Random Fill cache u would be accessedwithout caching and another random data would becached instead
3 A cache can prevent a timing attack if it disallowscertain steps from the three-step model to be executedthus prevents the corresponding vulnerability Forinstance when PL cache [49] preloads and locks thesecurity critical data in the cache vulnerabilities suchas Ad Vu V inv
d (slow) (one type of Prime + TimeInvalidation) will not be possible since a preloadedlocked security critical data will not allow Ad in Step 1to replace it In this case Ad cannot be in the cache sothis vulnerability cannot be triggered in PL cache
From the security perspective the entries of the securecache in Tables 4 and 5 should have as many green coloredcells as possible If a cache design has any red cells thenit cannot defend against that type of vulnerabilitymdashattackerusing the timing-based vulnerability that corresponds to thered cell can attack the system
The third column in Tables 4 and 5 shows a normalset associative cache which cannot defend against anytype of timing-based vulnerabilities Meanwhile the lastcolumn of Tables 4 and 5 shows the situation where thecache is fully disabled As is expected the timing-basedvulnerabilities are eliminated and timing-based attacks willnot succeed Disabling caches however has tremendousperformance penalty Similarly second-to-last columnshows Nondeterministic Cache which totally randomizescache access time It can defend all the attacks but againwill have a tremendous cost to security when the applicationis complex
For each of the entry that shows the effectivenessof a secure cache against a vulnerability there are tworesults listed Left one is for normal execution and theright one is for speculative execution Some secure cachessuch as InvisiSpec cache target timing-based channels inspeculative execution For most of the caches that do notdifferentiate speculative execution and normal executionthe two sub-columns for each cache are the same
6 Secure Cache Techniques
Among the secure cache designs presented in the priorsection there are three main techniques that the caches
J Hardw Syst Secur (2019) 3397ndash425 411
Table4
Exi
stin
gse
cure
cach
esrsquo
prot
ectio
nag
ains
tal
lpo
ssib
letim
ing-
base
dvu
lner
abili
ties
with
last
step
tobe
mem
ory
acce
ssre
late
dop
erat
ions
Sin
gle
ina
gree
nce
llm
eans
this
cach
eca
npr
even
tth
eco
rres
pond
ing
vuln
erab
ility
i
na
pink
cell
mea
nsth
isca
che
can
prev
ent
the
corr
espo
ndin
gvu
lner
abili
tyin
som
ede
gree
A
timesin
are
dce
llm
eans
this
cach
eca
nnot
prev
ent
this
vuln
erab
ility
Fur
ther
mor
efo
rea
chca
che
we
anal
yze
norm
alex
ecut
ion
(lef
tcol
umn
unde
rth
eca
che
nam
e)an
dsp
ecul
ativ
eex
ecut
ion
(rig
htco
lum
nun
der
the
cach
ena
me)
a Dyn
amic
adju
stm
ento
fw
ays
for
diff
eren
tthr
eads
isas
sum
edto
bepr
oper
lyus
edac
cord
ing
toth
eru
nnin
gpr
ogra
mrsquos
cach
eus
age
bSo
me
soft
war
eas
sum
ptio
nslis
ted
inth
een
trie
sin
this
colu
mn
have
been
impl
emen
ted
byth
eca
chersquo
sre
late
dso
ftw
are
c Flus
his
disa
bled
but
cach
eco
here
nce
mig
htbe
used
todo
the
data
rem
oval
dFo
rL
1ca
che
and
TL
Bf
lush
ing
isdo
nedu
ring
cont
exts
witc
he T
hete
chni
ques
are
impl
emen
ted
inL
1ca
che
TL
Ban
dla
st-l
evel
cach
ew
hich
cons
isto
fth
ew
hole
cach
ehi
erar
chy
whe
reL
1ca
che
and
TL
Bre
quir
eso
ftw
are
flus
hpr
otec
tion
and
the
last
-lev
elca
che
can
beac
hiev
edby
sim
ple
hard
war
epa
rtiti
onin
gTo
prot
ecta
llle
vels
ofca
ches
the
soft
war
eas
sum
ptio
nsne
edto
bead
ded
f The
tech
niqu
eis
now
only
impl
emen
ted
inla
st-l
evel
cach
egT
hete
chni
que
now
only
targ
ets
shar
edca
che
hT
hete
chni
que
only
targ
ets
incl
usio
nla
st-l
evel
cach
ei T
hete
chni
que
targ
ets
data
cach
ehi
erar
chy
j For
the
last
-lev
elca
che
cach
eis
part
ition
edbe
twee
nth
evi
ctim
and
the
atta
cker
kT
hete
chni
que
can
cont
rolt
hepr
obab
ilitie
sof
the
vuln
erab
ility
tobe
succ
essf
ulto
beex
trem
ely
smal
ll T
hete
chni
que
can
wor
kin
shar
edr
ead
only
mem
ory
whi
leno
twor
king
insh
ared
wri
tabl
em
emor
ym
Ran
dom
dela
ybu
tnot
rand
omm
appi
ngca
non
lyde
crea
seth
epr
obab
ilitie
sof
atta
cker
inso
me
limite
dde
gree
J Hardw Syst Secur (2019) 3397ndash425412
Table5
Exi
stin
gse
cure
cach
esrsquop
rote
ctio
nag
ains
tall
poss
ible
timin
g-ba
sed
vuln
erab
ilitie
sw
ithla
stst
epto
bein
valid
atio
nre
late
dop
erat
ions
Sin
gle
ina
gree
nce
llm
eans
this
cach
eca
npr
even
tth
eco
rres
pond
ing
vuln
erab
ility
in
api
nkce
llm
eans
this
cach
eca
npr
even
tthe
corr
espo
ndin
gvu
lner
abili
tyin
som
ede
gree
Atimes
ina
red
cell
mea
nsth
isca
che
cann
otpr
even
tthi
svu
lner
abili
ty
Furt
herm
ore
for
each
cach
ew
ean
alyz
eno
rmal
exec
utio
n(l
eftc
olum
nun
der
the
cach
ena
me)
and
spec
ulat
ive
exec
utio
n(r
ight
colu
mn
unde
rth
eca
che
nam
e)
a Dyn
amic
adju
stm
ento
fw
ays
for
diff
eren
tthr
eads
isas
sum
edto
bepr
oper
lyus
edac
cord
ing
toth
eru
nnin
gpr
ogra
mrsquos
cach
eus
age
bSo
me
soft
war
eas
sum
ptio
nslis
ted
inth
een
trie
sin
this
colu
mn
have
been
impl
emen
ted
byth
eca
chersquo
sre
late
dso
ftw
are
c Flus
his
disa
bled
but
cach
eco
here
nce
mig
htbe
used
todo
the
data
rem
oval
dFo
rL
1ca
che
and
TL
Bf
lush
ing
isdo
nedu
ring
cont
exts
witc
he T
hete
chni
ques
are
impl
emen
ted
inL
1ca
che
TL
Ban
dla
st-l
evel
cach
ew
hich
cons
isto
fth
ew
hole
cach
ehi
erar
chy
whe
reL
1ca
che
and
TL
Bre
quir
eso
ftw
are
flus
hpr
otec
tion
and
the
last
-lev
elca
che
can
beac
hiev
edby
sim
ple
hard
war
epa
rtiti
onin
gTo
prot
ecta
llle
vels
ofca
ches
the
soft
war
eas
sum
ptio
nsne
edto
bead
ded
f The
tech
niqu
eis
now
only
impl
emen
ted
inla
st-l
evel
cach
egT
hete
chni
que
now
only
targ
ets
shar
edca
che
hT
hete
chni
que
only
targ
ets
incl
usio
nla
st-l
evel
cach
ei T
hete
chni
que
targ
ets
data
cach
ehi
erar
chy
j For
the
last
-lev
elca
che
cach
eis
part
ition
edbe
twee
nth
evi
ctim
and
the
atta
cker
kT
hete
chni
que
can
cont
rolt
hepr
obab
ilitie
sof
the
vuln
erab
ility
tobe
succ
essf
ulto
beex
trem
ely
smal
ll T
hete
chni
que
can
wor
kin
shar
edr
ead
only
mem
ory
whi
leno
twor
king
insh
ared
wri
tabl
em
emor
ym
Ran
dom
dela
ybu
tnot
rand
omm
appi
ngca
non
lyde
crea
seth
epr
obab
ilitie
sof
atta
cker
inso
me
limite
dde
gree
J Hardw Syst Secur (2019) 3397ndash425 413
utilize differentiating sensitive data partitioning andrandomization
Differentiating Sensitive Data (columns for CATalyst cacheto columns for Random Fill cache in Tables 4 and 5) allowsthe victim or attacker software or management software toexplicitly label a certain range of the data of victim whichthey think is sensitive The victim process or managementsoftware is able to use cache-specific instructions to protectthe data and limit internal interference between victimrsquosown data For example it is possible to disable victimrsquosown flushing of victimrsquos labeled data and therefore preventvulnerabilities that leverage flushing This technique allowsthe designer to have stronger control over security criticaldata rather than forcing the system to assume all of victimrsquosdata is sensitive However how to identify sensitive dataand whether this identification process is reliable are openresearch questions for caches that support differentiation ofsensitive data
This technique is independent of whether a cache usespartitioning or randomization techniques to eliminate sidechannels between the attacker and the victim Caches thatare able to label and identify sensitive data have theadvantage in preventing internal interference since they areable to differentiate sensitive data from the normal dataand can make use of special instructions to give moreprivileges to sensitive data However it requires careful usewhen identifying the actual sensitive data and implementingcorresponding security features on the cache
Comparing PL cache with SPlowast cache although bothof them use partitioning flush is able to be implementedto be disabled for victimrsquos sensitive data in PL cachewhere Vu V inv
a Vu (slow) (one type of Flush +Time) is prevented Newcache is able to prevent Vu Va Vu (slow) (one type of Bernsteinrsquos Attack [3])while most of the caches without ability to differentiatesensitive data cannot because Newcache disallows replacingdata as long as either data to be evicted or data to becached is identified to be sensitive However permittingdifferentiation of sensitive data can potentially backfire onthe cache itself For example Random Fill cache cannotprevent Vu Ad Vu (slow) (one type of Evict +Time [34]) which most of the other caches can prevent oravoid because the random fill technique loses its intendedrandom behavior when the security critical data is initiallyloaded into the cache in Step 1
Partitioning-Based Caches usually limit the victim and theattacker to be able to only access a limited set of cacheblock (columns for SPlowast cache to column for PL cachein Tables 4 and 5) For example either there is staticor dynamic partitioning of caches which allocates someblocks to High victim and Low attacker The partitioning
can be based not just on whether the memory access isvictimrsquos or attackerrsquos but also on where the access is to(eg High partition is determined by the data address) Forspeculative execution attackerrsquos code can be the part ofspeculation or out-of-order load or store which is able to bepartitioned (eg using speculative load buffer) from othernormal operations The partitioning granularity can be cachesets cache lines or cache ways Partitioning-based securecaches are usually able to prevent external interference bypartitioning but are weak at preventing internal interferenceWhen partitioning is used interference between the attackerand the victim or data belonging to different securitylevels should not be possible and attacks based on externalinterference between the victim and the attacker will failHowever the internal interference of victimrsquos own datais hard to be prevented by the partitioning-based cachesWhatrsquos more partitioning is recognized to be wastefulin terms of cache space and inherently degrades systemperformance [49] Dynamic partitioning can help limit thenegative performance and space impacts but it could be ata cost of revealing some information when adjusting thepartitioning size for each part It also does not help withinternal interference prevention
In terms of the three-step model the partitioning-basedcaches excel at making use of partitioning techniques todisallow the attacker to set initial states (Step 0) of victimpartition by use of flushing or eviction and therefore bringuncertainty to the final timing observation made by theattacker
SPlowast cache can prevent external miss-based interferencebut it still allows the victim and the attacker to getcache hits due to each otherrsquos data which makes hit-based vulnerabilities happen eg Vd Vu Va (fast)(one type of Cache Internal Collision [5]) vulnerabilityis one of the examples that SPlowast cache cannot preventSecVerilog cache is similar to SPlowast cache but prevents theattacker from directly getting cache hit due to victimrsquos datafor confidentiality and therefore prevents vulnerabilitiessuch as Ainv
a Vu Aa (fast) (one type of Flush+ Reload [55]) SHARP cache mainly uses partitioningcombined with random eviction to minimize the probabilityof evicting victimrsquos data and prevent external miss-basedvulnerabilities It is vulnerable to hit-based or internalinterference vulnerabilities such as Vu Va Vu (slow)(one type of Bernsteinrsquos Attack [3]) vulnerability DAWGcache will only allow the data to get a cache hit if bothits address and the process ID are the same Thereforecompared with normal partitioning cache such as SPlowast cacheit is able to prevent vulnerabilities such as Vd Vu Ainv
d
(fast) (one type of Prime + Flush)SecDCP and NoMo cache both leverage dynamic
partitioning to improve performance Compared withSecVerilog cache SecDCP cache introduces certain side
J Hardw Syst Secur (2019) 3397ndash425414
channels which manifest themselves when the number ofways assigned to the victim and attacker changes egVu Ainv
a Vu (slow) (one type of Flush + Time)vulnerability NoMo cache behaves more carefully whenchanging the number of ways during dynamic partitioninghowever it requires victimrsquos sensitive data to fit intothe assigned partitions otherwise it will be put into theunreserved way and allow eviction by the attacker SecDCPdoes not have unreserved way All the space in the cachewill be either belongs to High or Low partition
Sanctum cache and CATalyst cache are both controlledby a powerful software monitor and they disallow securepage sharing between victim and attacker to preventvulnerabilities such as Ad Vu Aa (fast) (onetype of Flush + Reload [55]) Sanctum cache does notconsider internal interference while CATalyst cache ismore carefully designed to prevent different vulnerabilitieswith the implemented software system so far supportingpreventing all of the vulnerabilities but only works forLLC and with high software implementation complexityand some assumptions that might be hard to achievein other scenarios eg assuming the secure partition isbig enough to fit all the secure data MI6 cache is thecombination of Sanctum and disabling speculation wheninteracting with the outside world Therefore in normalexecution it behaves the same as Sanctum For speculativeexecution because it will simply disable all the speculationwhen involving the outside world the external interferencevulnerability such as Vd Vu Ad (slow) (one type ofEvict + Probe) vulnerability will be prevented
InvisiSpec cache does not modify the original cachestate but places the data in a speculative buffer partitionduring the speculation or out-of-order load or store Sinceduring speculation cache state is not actually updated thespeculative execution cannot trigger any of the steps inthe three-step model RIC cache focuses on eviction basedattack and therefore are good at preventing even someinternal miss-based vulnerability such as Vu Va Vu
(slow) (one type of Bernsteinrsquos Attack [3]) but are bad atall hit-based vulnerabilities PL cache is line-partitionedand uses locking techniques for victimrsquos security criticaldata It can prevent many vulnerabilities because preloadingand locking secure data disallow the attacker or non-securevictim data to set initial states (Step 0) for victim partitionand therefore brings uncertainty to the final observation bythe attacker eg Ad Vu Va (fast) (one type of CacheInternal Collision [5]) vulnerability is prevented
Randomization-Based Caches (columns for SHARP cacheand columns for RP cache to columns for Non Deterministiccache in Tables 4 and 5) inherently de-correlate therelationship between information of victimrsquos securitycritical datarsquos address and observed timing from cache
hit or miss or between the address and observed timingof flush or cache coherence operations For speculativeexecution they also de-correlate the relationship betweenthe address of the data being accessed during speculativeexecution or out-of-order load or store and the observedtiming from a cache hit or miss Randomization can beused when bringing data into the cache evicting data orboth Some designs randomize the address to cache setmapping As a result of the randomization the mutualinformation from the observed timing due to having ornot having data in the cache could be reduced to 0 ifrandomization is done on every memory access Somesecure caches use randomization to avoid many of the miss-based internal interference vulnerabilities However theymay still suffer from hit-based vulnerabilities especiallywhen the vulnerabilities are related to internal interferenceHowever randomization is also likewise recognized toincrease performance overheads [23] It also requires afast and secure random number generator Most of therandomization is cache-line-based and can be combinedwith differentiation of sensitive data to be more efficient
RP cache allows eviction between different sensitivedata which leaves vulnerabilities such as Vu Va Vu
(slow) (one type of Bernsteinrsquos Attack [3]) still possiblewhile Newcache prevents this Both of the RP cacheand Newcache are not able to prevent hit-based internal-interference vulnerabilities such as Ainv
a Vu Va
(fast) (one type of Cache Internal Collision [5]) RandomFill cache is able to use total de-correlation of memoryaccess and cache access of victimrsquos security critical datato prevent most of the internal and external interferenceHowever when security critical data is initially directlyloaded into the cache block for Step 1 Random Fill cachewill not randomly load security critical data and allowsvulnerabilities such as Vu V inv
a Vu (slow) (one typeof Flush + Time) vulnerability to exist CEASER cache usesencryption scheme plus dynamic remapping to randomizemapping from memory addresses to cache sets Howeverthis targets eviction-based attacks and cannot prevent hit-based vulnerabilities such as Va V inv
u V inva (fast)
(one type of Flush + Probe Invalidation) SCATTER cacheencrypts both the cache address and process ID whenmapping into different cache index to further prevent morehit-based vulnerabilities for shared and read only memoryNon Deterministic Cache totally randomizes timing ofcache accesses by adding delays and can prevent all attacks(but at tremendous performance cost)
61 Estimated Performance and Security Tradeoffs
Table 6 shows the implementation and performanceresults of the secure caches as listed by the designersin the different papers At the extreme end there is
J Hardw Syst Secur (2019) 3397ndash425 415
Table6
Exi
stin
gse
cure
cach
esrsquo
impl
emen
tatio
nm
etho
dpe
rfor
man
cep
ower
and
area
sum
mar
y
J Hardw Syst Secur (2019) 3397ndash425416
the Non Deterministic cache with random delay thesecure cache can prevent all the cache timingndashbasedvulnerabilities in some degreemdashwhile their paper reportsonly 7 degradation in performance we expect it tobe much more for more complex application than AESalgorithm Disabling caches eliminates the attacks but ata huge performance cost Normally a secure cache needsto sacrifice some performance in order to de-correlatememory access with the timing The secure caches thattend to be able to prevent more vulnerabilities usuallyhave weaker performance compared with other securecaches For example more security seems to imply lessperformance
62 Towards Ideal Secure Cache
Based on the above analysis a good secure cache shouldconsider all the 72 types of Strong vulnerabilities egexternal and internal interference and hit-based and miss-based vulnerabilities Considering all factors and basedon Tables 4 and 5 we have several suggestions andobservations for a secure cache design which can defendtiming-based attacks
bull Internal interference is important for caches to preventtiming-based attacks and is the weak point of most ofthe secure caches To prevent this the following threesubpoints should be considered
ndash Miss-based internal interference can be solvedby randomly evicting data to de-correlatememory access with timing information wheneither data to be evicted or data to be cachedis sensitive eg Newcache prevents Vu Va Vu (slow) (one type of BernsteinrsquosAttack [3]) vulnerability
ndash Hit-based internal interference can be solvedby randomly bringing data into the cache egRandom Fill cache prevents Ad Vu Va
(fast) (Cache Internal Collision) vulnerabilityndash To limit internal interference at lower perfor-
mance cost rather than simply assume all ofvictimrsquos data is sensitive it is better to differ-entiate real sensitive data from other data in thevictim code However identification of sen-sitive information needs to be carefully usedeg Random Fill cache is vulnerable to Vu Ad Vu (fast) (one type of Evict + Time [34])vulnerability which most of the secure cachesare able to prevent
bull Direct partitioning between the victim and the attackeralthough may hurt cache space utilization or perfor-mance is good at disallowing attacker to set known
initial state to victimrsquos partition and therefore pre-vents external interference Alternatively careful use ofrandomization can also prevent external interference
It should be noted that some cache designs only focuson certain levels eg CATalyst cache only works at thelast level cache In order to fully protect the whole cachesystem from timing-based attacks all levels of caches in thehierarchy should be protected with related security featuresFor example Sanctum is able to prevent all levels of cachesfrom L1 to last-level cache Consequently secure cachedesign needs to be realizable at all levels of the cachehierarchy
7 RelatedWork
There are a lot of existing attacks exploring timing-based cache channels eg [1 3 5 16ndash19 34 36 5455] Furthermore our recent paper [11] has summarizedcache timingndashbased side-channel vulnerabilities using athree-step model and inspired this work on checkingwhich vulnerability types are truly defeated by the securecaches in context of timing-based attacks In other workZhang and Lee [59] used finite-state machine to modelcache architectures and leveraged mutual information tomeasure potential side-channel leakage of the modeledcache architectures Meanwhile He and Lee [20] modeledinterference using probabilistic information flow graph andused attackerrsquos success probability to estimate differentcachesrsquo ability to defend against some cache timingndashbasedside-channel attacks However they did not explore allpossible vulnerabilities due to cache timingndashbased channels
There is also some other work focusing on cacheside-channel verification [13 14 47] Among theseCacheAudit [14] efficiently computes possible side-channelobservations using abstractions in a modular way Bit-level and arithmetic reasoning is used in [13] for memoryaccesses in the presence of dynamic memory allocationCacheD [47] detects potential cache differences at eachprogram point leveraging symbolic execution and constraintsolving
Hardware transactional memory has also been lever-aged to prevent timing-based cache side-channel attacks [815] Hardware transactional memory (HTM) is availableon modern commercial processors such as Intelrsquos Transac-tional Synchronization Extensions (TSX) Its main featureis to abort the transaction and roll back the modificationswhenever a cache block contained in the read set or writeset is evicted out of the cache In [15] HTM was com-bined with preloading strategy for code and data to preventFlush + Reload attacks in the local setting and Prime andProbe attacks in the cloud setting In [8] the software-level
J Hardw Syst Secur (2019) 3397ndash425 417
solution targets system calls page faults code refactoringand abort reasoning to eliminate not only Prime + ProbeFlush + Reload but also Evict + time and Cache Collisionattacks
8 Conclusion
This paper first proposed a new three-step model in orderto model all possible cache timing vulnerabilities It furtherprovided a cache three-step simulator and reduction rulesto derive effective vulnerabilities allowing us to find onesthat have not been exploited in literature With exhaustiveeffective vulnerability types listed this paper presentedanalysis of 18 secure processor cache designs with respectto how well they can defend against these timing-basedvulnerabilities Our work showed that vulnerabilities basedon internal interference of the victim application aredifficult to protect against and many secure cache designsfail in this We also provided a summary of secure processorcache features that could be integrated to make an idealsecure cache that is able to defend timing-based attacksOverall implementing a secure cache in a processor can be aviable alternative to defend timing-based attacks Howeverit requires design of an ideal secure cache or correction ofexisting secure cache designs to eliminate the few attacksthat they do not protect against
Acknowledgments This work was support in part by NSF grants1651945 and 1813797 and through Semiconductor Research Corpo-ration (SRC) under GRC Task 2844001
Appendix A Attack Strategies Descriptions
This appendix gives overview of the attack strategies shownin Tables 2 and 3 in Section 3 For each attack strategy anoverview of the three steps of the strategy is given Some ofthe strategies are similar and some may not be precise butwe keep and use the original names as they were assignedin prior work One advantage of our three-step model isthat it gives precise definition of each attack Neverthelessthe attack strategy names used before (and added by us forstrategies which did not have such names) may be useful torecall the attacksrsquo high-level operation
Cache Internal Collision In Step 1 cache blockrsquos data isinvalidated by flushing or eviction done by either theattacker or the victim Then the victim accesses secret datain Step 2 Finally the victim accesses data at a knownaddress in Step 3 if there is a cache hit then it reveals thatthere is an internal collision and leaks value of u
Flush + Reload In Step 1 either the attacker or the victiminvalidates the cache blockrsquos data by flushing or evictionThen the victim access secret data in Step 2 Finally theattacker tries to access some data in Step 2 using a knownaddress If a cache hit is observed then addresses from lasttwo steps are the same and the attacker learns the secretaddress This strategy has similar Step 1 and Step 2 asCache Internal Collision vulnerability but for Step 3 it isthe attacker who does the reload access
Reload + Time (New Name Assigned in this Paper) In Step
1 secret data is invalidated by the victim Then the attackerdoes some known data access in Step 2 that could possiblybring back the invalidated the victimrsquos secret data in Step
1 In Step 3 if the victim reloads the secret data a cachehit is observed and the attacker can derive the secret datarsquosaddress
Flush + Probe (New Name Assigned in this Paper) In Step
1 the victim or the attacker access some known addressIn Step 2 the victim invalidates secret data In Step 3reloading of Step 1rsquos data and observation of a cache misswill help the attacker learn that the secret data maps to theknown address from Step 1
Evict + Time In Step 1 some victimrsquos secret data is put intothe cache by the victim itself In Step 2 the attacker evicts aspecific cache set by performing a memory related operationthat is not a flush In Step 3 the victim reloads secret dataand if a cache miss is observed the will learn the secretdatarsquos cache set information This attack has similar Step 1and Step 3 as Flush + Time vulnerability but for Step 2 inEvict + Time the attacker invalidates some known addressallowing it to find the full address of the secret data insteadof evicting a cache set to only find the secret datarsquos cacheindex as in the Flush + Time attack
Prime + Probe In Step 1 the attacker primes the cache setusing data at address known to the attacker In Step 2 thevictim accesses the secret data which possibly evicts datafrom Step 1 In Step 3 the attacker probes each cache setand if a cache miss is observed the attacker knowns thesecret data maps to the cache set he or she primed
Bernsteinrsquos Attack This attack strategy leverages the vic-timrsquos internal interference to trigger the miss-based attackFor one case the victim does the same secret data access inStep 1 and Step 3 while in Step 2 the victim tries to evictone whole cache setrsquos data by known data accesses If cachemiss is observed in Step 3 that will tell the attacker thecache set is the one secret data maps to For another casethe victim primes and probe a cache set in Step 1 and Step
3 driven by the attacker while in Step 2 the victim tries to
J Hardw Syst Secur (2019) 3397ndash425418
access the secret data Similar to the first case observingcache miss in Step 3 tells the attacker the cache set is theone secret data maps to
Evict + Probe (NewName Assigned in this Paper) In Step 1victim evict the cache set using the access to a data at anaddress known to the attacker In Step 2 the victim accessessecret data which possibly evicts data from Step 1 In Step
3 the attacker probes each cache set using the same data asin Step 1 if a cache miss is observed the attacker knownsthe secret data maps to the cache set he or she primed Thisattack strategy has similar Step 2 and Step 3 as Prime +Probe attack but for Step 1 it is the victim that does theeviction accesses
Prime + Time (NewName Assigned in this Paper) In Step 1the attacker primes the cache set using access to data at anaddress known to the attacker In Step 2 the victim accessessecret data which possibly evicts data from Step 1 In Step
3 the victim probes each cache set using the same data Step
1 if a cache miss is observed the attacker knowns the secretdata maps to the cache set he or she primed in Step 1 Thisattack strategy has similar Step 1 and Step 2 as Prime +Probe attack but for Step 3 it is the victim that does theprobing accesses
Flush + Time (NewNameAssigned in this Paper) The victimaccesses the same secret data in Step 1 and Step 3 whilein Step 2 the attacker tries to invalidate data at a knownaddress If cache miss is observed in Step 3 that will tellthe attacker the data address he or she invalidated in Step 2maps to the secret data
Invalidation related (new names assigned in this paper)Vulnerabilities that have names ending with ldquoinvalidationrdquoin Table 3 correspond to the vulnerabilities that havethe same name (except for the ldquoinvalidationrdquo part) inTable 2 The difference between each set of correspondingvulnerabilities is that the vulnerabilities ending withldquoinvalidationrdquo use invalidation related operation in the laststep to derive the timing information rather than the normalmemory access related operations
Appendix B Soundness Analysisof the Three-StepModel
In this section we analyze the soundness of the three-stepmodel to demonstrate that the three-step model can coverall possible timing-based cache vulnerabilities in normalcaches If there is a vulnerability that is represented usingmore than three steps the steps can be reduced to only three
steps or a three-step sub-pattern can be found in the longerrepresentation
In the below analysis we use β to denote the numberof memory-related operations ie steps in a representationof a vulnerability We show that β = 1 is not sufficient torepresent a vulnerability β = 2 covers some vulnerabilitiesbut not all β = 3 represents all the vulnerabilities andβ gt 3 can be reduced to only three steps or a three-step sub-pattern can be found in the longer representation Knownaddresses refer to all the cache states that interferencewith the data a aalias and d Unknown address refersu An access to a known memory address is denoted asknown access operation and an invalidation of a knownmemory address is denoted as known inv operationThe known access operation and known inv operation
together make up not u operations An unknown memoryrelated operation (containing u) is denoted as u operation
B1 Patterns with β = 1
When β = 1 there is only one memory-relatedoperation and it is not possible to create interferencebetween memory-related operations since two memory-related operations are the minimum requirement for aninterference Furthermore β = 1 corresponds to the three-step pattern with both Step 1 and Step 2 being since thecache state gives no information and Step 3 being theone operation These types of patterns are all examined bythe cache three-step simulator and none of these types arefound to be effective Consequently a vulnerability cannotexit when β = 1
B2 Patterns with β = 2
When β = 2 it satisfies the minimum requirementof an interference for memory related operations andcorresponds to the three-step cases where Step 1 is and Step 2 and Step 3 are the two operations Thesetypes are all examined by the cache three-step simulatorand some of them belong to Weak Vulnerabilities like Aa Vu Therefore three-step cases where Step
1 is have corresponding effective vulnerabilities shownin Table 2 Consequently β = 2 can represent someweak vulnerabilities but not all vulnerabilities as there existsome that are represented with three steps as discussednext
B3 Patterns with β = 3
When β = 3 we have tested all possible combinations ofthree-step memory related operations in Section 33 usingour cache simulator for the three-step model We foundthat there are in total 72 types of Strong Vulnerabilities
J Hardw Syst Secur (2019) 3397ndash425 419
and 64 types of Weak Vulnerabilities that are representedby patterns with β = 3 steps Consequently β = 3can represent all the vulnerabilities (including some weakones where Step 1 is ) Using more steps to representvulnerabilities is not necessary as discussed next
B4 Patterns with β gt 3
When β gt 3 the pattern of memory-related operations fora vulnerability can be reduced using the following rules
B41 Subdivision Rules
First a set of subdivision rules is used to divide the longpattern into shorter patterns following the below rulesEach subdivision rule should be applied recursively beforeapplying the next rule
Subdivision Rule 1 If the longer pattern contains a sub-pattern such as the longer pattern canbe divided into two separate patterns where is assignedas Step 1 of the second pattern This is because givesno timing information and the attacker loses track of thecache state after This rule should be recursively applieduntil there are no sub-patterns left with a in the middleor as last step ( in the last step will be deleted) in thelonger pattern
Subdivision Rule 2 Next if a pattern (derived afterrecursive application of the Rule 1 contains a sub-patternsuch as AinvVinv the longer pattern canbe divided into two separate patterns where AinvVinv isassigned as Step 1 of the second pattern This is becauseAinvVinv will flush all the timing information of thecurrent block and it can be used as the flushing step forStep 1 eg vulnerability Ainv Vu Aa(f ast)shown in Table 2 AinvVinv cannot be a candidate formiddle steps or the last step because it will flush alltiming information making the attacker unable to deducethe final timing with victimrsquos sensitive address translationinformation This rule should be recursively applied untilthere are no sub-patterns left with a AinvVinv in themiddle or the last step (AinvVinv in the last step will bedeleted)
B42 Simplification Rules
For each of the patterns resulting from the subdivisionof the original pattern we define Commute Rules UnionRules and Reduction Rules for a each set of two adjacentsteps in these remaining patterns In Table 7 we showall the possible cases of the rule applying conditions for
each adjacent two steps regardless of the attackerrsquos access(A) or the victimrsquos access (V ) The table shows whetherthe corresponding two steps can be commuted reduced orunioned (and the reduced or the unioned result if the rulescan be applied)
B421 Commute Rules
Suppose there are two adjacent steps M and N for a memorysequences M N If commuting M andN lead to the same observation result ie M N and N M will have thesame timing observation information in the final step for theattacker we can freely exchange the place of M and N inthis pattern In this case we have more chance to Reduceand Union the steps within the memory sequence by thefollowing Rules In the possible commuting process we willtry every possible combinations to commute different pairsof two steps that are able to apply the Commute Rules andthen further apply Reduce Rules and Union Rules to seewhether the commute is effective ie there can be stepsreduced or unioned after the proper commuting process Thefollowing two adjacent memory-related operations can becommuted
ndash Commute Rule 1 For two adjacent steps if one stepis a known access operation and another step is aknown inv operation and the addresses they referto are different these two steps can be commuted nomatter which position of the two steps they are in withinthe whole memory sequence It will show a ldquoyesrdquo forthe corresponding two-step pattern for the CommuteRule 1 column if these two can be commuted in Table 7
ndash Commute Rule 2 A superset of two-step patterns thatcan apply Commute Rule 1 can be commuted if thesecond step of these two adjacent steps is not thelast step in the whole memory sequence There aresome two adjacent steps that can only be commutedif the second step of these two adjacent steps is notthe last step in the whole memory sequence Therewill be a ldquoyesrdquo for the corresponding two-step patternfor the Commute Rule 2 column and a ldquonordquo for thecorresponding two-step pattern for the Commute Rule 1column in Table 7
B422 Reduction Rules
If the memory sequence after applying Commute Ruleshave a sub-pattern that has two adjacent steps both relatedto known addresses or both related to unknown address(including repeating states) the two adjacent steps can bereduced to only one following the reduction rules (if thetwo-step pattern has ldquoyesrdquo for the Column ldquoUnion Rule or
J Hardw Syst Secur (2019) 3397ndash425420
Table7
Rul
esfo
rco
mbi
ning
two
adja
cent
step
s
Firs
tSe
cond
Com
mute
Com
mute
Union
Rule
orC
ombi
ned
Firs
tSe
cond
Com
mute
Com
mute
Union
Rule
orC
ombi
ned
Rule1
Rule2
ReduceRule
Step
Rule1
Rule2
ReduceRule
Step
aa
yes
yes
yes
aa
inv
ano
noye
sa
aa
ali
as
nono
yes
aali
as
ain
va
ali
as
yes
yes
yes
aali
as
ad
nono
yes
da
inv
dye
sye
sye
sd
au
nono
nominus
ain
vu
nono
nominus
aa
inv
nono
yes
ain
va
inv
ain
vye
sye
sye
sa
inv
aa
ali
asin
vye
sye
sye
sa
ain
va
ali
asin
vye
sye
sye
sU
nio
n(a
inva
ali
asin
v)
ad
inv
yes
yes
yes
aa
inv
din
vye
sye
sye
sU
nio
n(a
invd
inv)
au
inv
nono
nominus
ain
vu
inv
noye
sno
minusa
ali
as
ano
noye
sa
aali
asin
va
yes
yes
yes
a
aali
as
aali
as
yes
yes
yes
aali
as
aali
asin
va
ali
as
nono
yes
aali
as
aali
as
dno
noye
sd
aali
asin
vd
yes
yes
yes
d
aali
as
uno
nono
minusa
ali
asin
vu
nono
nominus
aali
as
ain
vye
sye
sye
sa
ali
as
aali
asin
va
inv
yes
yes
yes
Unio
n(a
inva
ali
asin
v)
aali
as
aali
asin
vno
noye
sa
ali
asin
va
ali
asin
va
ali
asin
vye
sye
sye
sa
ali
asin
v
aali
as
din
vye
sye
sye
sa
ali
as
aali
asin
vd
inv
yes
yes
yes
Unio
n(d
inva
ali
asin
v)
aali
as
uin
vno
nono
minusa
ali
asin
vu
inv
noye
sno
minusd
ano
noye
sa
din
va
yes
yes
yes
a
da
ali
as
nono
yes
aali
as
din
va
ali
as
yes
yes
yes
aali
as
dd
yes
yes
yes
dd
inv
dno
noye
sd
du
nono
nominus
din
vu
noye
sno
minusd
ain
vye
sye
sye
sd
din
va
inv
yes
yes
yes
Unio
n(a
invd
inv)
da
ali
asin
vye
sye
sye
sd
din
va
ali
asin
vye
sye
sye
sU
nio
n(d
inva
ali
asin
v)
dd
inv
nono
yes
din
vd
inv
din
vye
sye
sye
sd
inv
du
inv
noye
sno
minusd
inv
uin
vno
yes
nominus
ua
nono
nominus
uin
va
nono
nominus
ua
ali
as
nono
nominus
uin
va
ali
as
nono
nominus
ud
nono
nominus
uin
vd
noye
sno
minusu
uye
sye
sye
su
uin
vu
nono
yes
u
ua
inv
nono
nominus
uin
va
inv
noye
sno
minusu
aali
asin
vno
nono
minusu
inv
aali
asin
vno
yes
nominus
ud
inv
noye
sno
minusu
inv
din
vno
yes
nominus
uu
inv
nono
yes
uin
vu
inv
uin
vye
sye
sye
su
inv
J Hardw Syst Secur (2019) 3397ndash425 421
Reduce Rulerdquo and has no Union result for the ldquoCombinedSteprdquo Column in Table 7
ndash Reduction Rule 1 For two u operations although u isunknown both of the accesses target on the same u socan be reduced to only keep the second access in thememory sequence
ndash Reduction Rule 2 For two known adjacent memoryaccessndashrelated operations (known access operation)they always result in a deterministic state of the secondmemory accessndashrelated cache block so these two stepscan be reduced to only one step
ndash Reduction Rule 3 For two adjacent steps if onestep is known access operation and another one isknown inv operation no matter what order theyhave and the address they refer to is the same these twocan be reduced to one step which is the second step
B423 Union Rules
Suppose there are two adjacent steps M and N for memorysequences M N If combining M andN leads to the same timing observation result ie M N and Union(M N) will havethe same timing observation information in the final step forthe attacker we can combine step M and N to be a joint onestep for this memory sequence defined as Union(M N)Two adjacent steps that can be combined are discussed inthe following cases
ndash Union Rule 1 Two invalidations to two knowndifferent memory addresses can be applied UnionRule 1 known inv operation are two operations bothinvalidating some known address therefore they canbe combined to only one step The Union Rule can becontinuously done to union all the adjacent invalidationstep that invalidates known different memory addresses
B424 Final Check Rules
Each long memory sequence will recursively apply thesethree categorizations of the rules in the order Com-mute Rules first to put known access operations andknown inv operation that targets the same address as nearas possible and u operations and not u operations areputting together as much as possible The Reduced Rulesare then checked and applied to the processed memorysequence to reduce the steps Then the Union Rule isapplied to the processed memory sequence
The recursion at each application to these threecategorizations of the rules should be always applied andreduce at least one step until the resulting sequence matchesone of the two possible cases
bull the long (β gt 3) memory sequence with u operation
and not u operation is further reduced to a sequencewhere there are at most three steps in the followingpatterns or less
ndash u operation not u operation u operation
ndash not u operation u operation not u operation
There might be possible extra or AinvV inv beforethese three-step pattern where
bull An extra in the first step will not influencethe result and can be directly removed
bull If an extra AinvV inv in the first step
ndash If followed by known access ope-ration AinvV inv can be removeddue to the actual state further put intothe cache block
ndash If followed by known inv opera-t ion or V inv
u AinvV inv can also beremoved since the memory locationis repeatedly flushed by the two steps
ndash If followed by Vu worst case will beAinvV inv Vu not u operation u operationwhich is either an effective vul-nerability according to Table 2and reduction rules shown inSection 33 or AinvV inv Vu Ainv
d V invd u operation where
Vu Ainvd V inv
d can further beapplied Commute Rule 2 to reduceand be within three steps
In this case the steps are finally within three steps andthe checking is done
bull There exist two adjacent steps that cannot be appliedany Rules above and requires the Rest Checking
The only left two adjacent steps that cannot be appliedby any of the three categorizations of the Rules are thefollowing
ndash AaVaAaalias Vaalias AdVdAinva V inv
a
Ainvaalias V inv
aalias Vu ndash AaVaAaalias Vaalias V inv
u ndash Vu AaVaAaalias Vaalias AdVd
Ainva V inv
a Ainvaalias V inv
aalias ndash V inv
u AaVaAaalias Vaalias We manually checked all of the two adjacent step patterns
above and found that adding extra step before or after these
J Hardw Syst Secur (2019) 3397ndash425422
Algorithm 2 -Step ( 3) pattern reduction
Input number of steps of the pattern
a two-dimensional dynamic-size array [0] contains the states of each step of the original pattern in
order [1] [2] are empty initially
Output reduced effective vulnerability pattern(s) array It will be an empty list if the original pattern does not
correspond to an effective vulnerability
1 = Oslash
2 while contain( ) and index not 0 do3 = 1 ( )
4 end while5 while ( contain( ) and index not 0) or ( contain( ) and index not 0) do6 = 2 ( )
7 end while8 while ( is ineffecitve or has interval effective three steps) do9 = ( )
10 = ( )
11 = ( )
12 if ( is ineffecitve or has interval effective three steps) then13 += ( )
14 end if15 end while16 return
two steps can either generate two adjacent step patterns thatbe processed by the three Rules where further step can bereduced or construct effective vulnerability according toTable 2 and reduction rules shown in Section 33 wherethe corresponding pattern can be treated effective and thechecking is done
B43 Algorithm for Reducing and CheckingMemorySequence
Algorithm 2 is used to (i) reduce a β-step (β gt 3)pattern to a three-step pattern thus demonstrating that thecorresponding β gt 3 step pattern actually is equivalent tothe output three-step pattern and represents a vulnerabilitythat is captured by an existing three-step pattern or (ii)demonstrate that the β-step pattern can be mapped to one ormore three-step vulnerabilities It is not possible for a β-stepvulnerability pattern to not be either (i) or (ii) after doing theRule applications Key outcome of our analysis is that anyβ-step pattern is not a vulnerability or if it is a vulnerabilityit maps to either outputs (i) or (ii) of the algorithm
Inside the Algorithm 2 contain() represents a function tocheck if a list contains a corresponding state is ineffective()represents a function that checks the corresponding memorysequence does not contain any effective three-stepshas interval effective three steps() represents a functionthat check if the corresponding memory sequence can bemapped to one or more three-step vulnerabilities
B44 Summary
In conclusion the three-step model can model all possibletiming-based cache vulnerability in normal caches Vulner-abilities which are represented by more than three steps canbe always reduced to one (or more) vulnerabilities from ourthree-step model and thus using more than three step is notnecessary
References
1 Acıicmez O Koc CK (2006) Trace-driven cache attacks on AES(short paper) In International conference on information andcommunications security Springer pp 112ndash121
2 Agrawal D Archambeault B Rao JR Rohatgi P (2002) TheEM side-channel (s) In International workshop on cryptographichardware and embedded systems Springer pp 29ndash45
3 Bernstein DJ (2005) Cache-timing attacks on AES4 Binkert N Beckmann B Black G Reinhardt SK Saidi A Basu
A Hestness J Hower DR Krishna T Sardashti S et al (2011)The gem5 simulator ACM SIGARCH computer architecture news39(2)1ndash7
5 Bonneau J Mironov I (2006) Cache-collision timing attacksagainst AES In International workshop on cryptographichardware and embedded systems Springer pp 201ndash215
6 Borghoff J Canteaut A Guneysu T Kavun EB Knezevic MKnudsen LR Leander G Nikov V Paar C Rechberger Cet al (2012) Princendasha low-latency block cipher for pervasivecomputing applications In International conference on the theoryand application of cryptology and information security Springerpp 208ndash225
J Hardw Syst Secur (2019) 3397ndash425 423
7 Bourgeat T Lebedev I Wright A Zhang S Devadas S et al (2019)MI6 Secure enclaves in a speculative out-of-order processorIn Proceedings of the 52nd Annual IEEEACM InternationalSymposium on Microarchitecture ACM pp 42ndash56
8 Chen S Liu F Mi Z Zhang Y Lee RB Chen H Wang X(2018) Leveraging hardware transactional memory for cache side-channel defenses In Proceedings of the 2018 on Asia conferenceon computer and communications security ACM pp 601ndash608
9 Clark SS Ransford B Rahmati A Guineau S Sorber J Xu WFu K (2013) Wattsupdoc power side channels to nonintrusivelydiscover untargeted malware on embedded medical devices InPresented as part of the 2013 USENIX workshop on healthinformation technologies
10 Costan V Lebedev IA Devadas S (2016) Sanctum minimalhardware extensions for strong software isolation In USENIXsecurity symposium pp 857ndash874
11 Deng S Xiong W Szefer J (2018) Cache timing side-channel vulnerability checking with computation tree logic InProceedings of the 7th international workshop on hardware andarchitectural support for security and privacy ACM p 2
12 Domnitser L Jaleel A Loew J Abu-Ghazaleh N Ponomarev D(2012) Non-monopolizable caches low-complexity mitigation ofcache side channel attacks ACM Transactions on Architectureand Code Optimization (TACO) 8(4)35
13 Doychev G Kopf B (2017) Rigorous analysis of softwarecountermeasures against cache attacks In Proceedings of the 38thACM SIGPLAN conference on programming language design andimplementation ACM pp 406ndash421
14 Doychev G Kopf B Mauborgne L Reineke J (2015) CacheAudita tool for the static analysis of cache side channels ACM Trans-actions on information and system security (TISSEC) 18(1)4
15 Gruss D Lettner J Schuster F Ohrimenko O Haller I CostaM (2017) Strong and efficient cache side-channel protectionusing hardware transactional memory In USENIX securitysymposium pp 217ndash233
16 Gruss D Maurice C Wagner K Mangard S (2016) Flush+ flusha fast and stealthy cache attack In International conference ondetection of intrusions and malware and vulnerability assessmentSpringer pp 279ndash299
17 Gruss D Spreitzer R Mangard S (2015) Cache template attacksautomating attacks on inclusive last-level caches In USENIXsecurity symposium pp 897ndash912
18 Guanciale R Nemati H Baumann C Dam M (2016) Cache stor-age channels alias-driven attacks and verified countermeasuresIn 2016 IEEE symposium on security and privacy (SP) IEEEpp 38ndash55
19 Gullasch D Bangerter E Krenn S (2011) Cache gamesndashbringingaccess-based cache attacks on AES to practice In 2011 IEEEsymposium on security and privacy (SP) IEEE pp 490ndash505
20 He Z Lee RB (2017) How secure is your cache against side-channel attacks In Proceedings of the 50th annual IEEEACMinternational symposium on microarchitecture ACM pp 341ndash353
21 Intel C (2015) Improving real-time performance by utilizing cacheallocation technology Intel Corporation
22 Kayaalp M Khasawneh KN Esfeden HA Elwell J Abu-Ghazaleh N Ponomarev D Jaleel A (2017) RIC relaxed inclusioncaches for mitigating LLC side-channel attacks In 2017 54thACMEDACIEEE design automation conference (DAC) IEEEpp 1ndash6
23 Keramidas G Antonopoulos A Serpanos DN Kaxiras S (2008)Non deterministic caches a simple and effective defense againstside channel attacks Des Autom Embed Syst 12(3)221ndash230
24 Kessler RE Hill MD (1992) Page placement algorithms for largerealindexed caches ACM Trans Comput Syst (TOCS) 10(4)338ndash359
25 Kiriansky V Lebedev I Amarasinghe S Devadas S Emer J(2018) DAWG a defense against cache timing attacks in spec-ulative execution processors In 2018 51st Annual IEEEACMinternational symposium on microarchitecture (MICRO) IEEEpp 974ndash987
26 Kocher P Genkin D Gruss D Haas W Hamburg M LippM Mangard S Prescher T Schwarz M Yarom Y (2018)Spectre attacks exploiting speculativeexecution In 2019 IEEESymposium on Security and Privacy (SP) IEEE pp 1ndash19
27 Lee RB Kwan P McGregor JP Dwoskin J Wang Z (2005)Architecture for protecting critical secrets in microprocessorsIn ACM SIGARCH computer architecture news vol 33 IEEEComputer Society pp 2ndash13
28 Lee Y Waterman A Avizienis R Cook H Sun C StojanovicV Asanovic K (2014) A 45nm 13 GHz 167 double-precisionGFLOPSW RISC-V processor with vector accelerators InEuropean solid state circuits conference (ESSCIRC) ESSCIRC2014-40th IEEE pp 199ndash202
29 Liu F Ge Q Yarom Y Mckeen F Rozas C Heiser G Lee RB(2016) CATalyst defeating last-level cache side channel attacks incloud computing In 2016 IEEE international symposium on highperformance computer architecture (HPCA) IEEE pp 406ndash418
30 Liu F Lee RB (2014) Random fill cache architecture In2014 47th annual IEEEACM international symposium onmicroarchitecture (MICRO) IEEE pp 203ndash215
31 Liu F Wu H Mai K Lee RB (2016) Newcache secure cachearchitecture thwarting cache side-channel attacks IEEE Micro36(5)8ndash16
32 Luk CK Cohn R Muth R Patil H Klauser A Lowney G WallaceS Reddi VJ Hazelwood K (2005) Pin building customizedprogram analysis tools with dynamic instrumentation In ACMsigplan notices vol 40 ACM pp 190ndash200
33 Masti RJ Rai D Ranganathan A Muller C Thiele L Capkun S(2015) Thermal covert channels on multi-core platforms In 24thUSENIX security symposium (USENIX security 15) pp 865ndash880
34 Osvik DA Shamir A Tromer E (2006) Cache attacks andcountermeasures the case of AES In Cryptographersrsquo track atthe RSA conference Springer pp 1ndash20
35 Patel A Afram F Chen S Ghose K (2011) MARSS Afull system simulator for multicore x86 CPUs In 2011 48thACMEDACIEEE design automation conference (DAC) IEEEpp 1050ndash1055
36 Percival C (2005) Cache missing for fun and profit BSDCan 2005Ottawa 2005 httpwwwdaemonologynetpapershttpdf
37 Hammarlund P Martinez AJ Bajwa AA Hill DL Hallnor EJiang H Dixon M Derr M Hunsaker M Kumar R Osborne RB(2014) Haswell The fourth-generation intel core processor IEEEMicro 1034(2)6ndash20
38 Qureshi MK (2018) CEASER mitigating conflict-based cacheattacks via encrypted-address and remapping In 2018 51Stannual IEEEACM international symposium on microarchitecture(MICRO) IEEE pp 775ndash787
39 Sanchez D Kozyrakis C (2013) ZSIm fast and accuratemicroarchitectural simulation of thousand-core systems In ACMSIGARCH computer architecture news vol 41 ACM pp 475ndash486
40 Schwarz M Schwarzl M Lipp M Gruss D (2018) NetSpectreread arbitrary memory over network In European Symposium onResearch in Computer Security Springer pp 279ndash299
41 Seznec A (1993) A case for two-way skewed-associative cachesACM SIGARCH computer architecture news 21(2)169ndash178
42 Sharkey J Ponomarev D Ghose K (2005) M-sim a flexiblemultithreaded architectural simulation environment Techenicalreport Department of Computer Science State University ofNew York at Binghamton In Technical Report 20012 CompaqComputer Corporation Aug 2001
J Hardw Syst Secur (2019) 3397ndash425424
43 Shivakumar P Jouppi NP (2001) Cacti 30 an integrated cachetiming power and area model In Technical Report 20012Compaq Computer Corporation Aug 2001
44 Taylor G Davies P Farmwald M (1990) The TLB slice-alow-cost high-speed address translation mechanism In 17thannual international symposium on computer architecture 1990Proceedings IEEE pp 355ndash363
45 Thoziyoor S Muralimanohar N Ahn JH Jouppi NP (2008)CACTI 51 Technical Report HPL-2008-20 HP Labs
46 Trippel C Lustig D Martonosi M (2018) MeltdownPrimeand SpectrePrime automatically-synthesized attacks exploitinginvalidation-based coherence protocols arXiv180203802
47 Wang S Wang P Liu X Zhang D Wu D (2017) CacheDidentifying cache-based timing channels in production softwareIn 26th USENIX security symposium USENIX association
48 Wang Y Ferraiuolo A Zhang D Myers AC Suh GE (2016)SecDCP secure dynamic cache partitioning for efficient timingchannel protection In 2016 53nd ACMEDACIEEE designautomation conference (DAC) IEEE pp 1ndash6
49 Wang Z Lee RB (2007) New cache designs for thwarting softwarecache-based side channel attacks In ACM SIGARCH computerarchitecture news vol 35 ACM pp 494ndash505
50 Wang Z Lee RB (2008) A novel cache architecture withenhanced performance and security In 2008 41st IEEEACMinternational symposium on Microarchitecture 2008 MICRO-41IEEE pp 83ndash93
51 Werner M Unterluggauer T Giner L Schwarz M Gruss D Man-gard S (2019) ScatterCache thwarting cache attacks via cache setrandomization In 28th USENIX security symposium (USENIXSecurity 19) USENIX Association Santa Clara CA httpswwwusenixorgconferenceusenixsecurity19presentationwerner
52 Yan M Choi J Skarlatos D Morrison A Fletcher C Torrellas J(2018) InvisiSpec making speculative execution invisible in thecache hierarchy In 2018 51st annual IEEEACM internationalsymposium on microarchitecture (MICRO) IEEE pp 428ndash441
53 Yan M Gopireddy B Shull T Torrellas J (2017) Secure hierarchy-aware cache replacement policy (SHARP) defending againstcache-based side channel attacks In Proceedings of the 44thannual international symposium on computer architecture ACMpp 347ndash360
54 Yao F Doroslovacki M Venkataramani G (2018) Are coherenceprotocol states vulnerable to information leakage In 2018IEEE international symposium on high performance computerarchitecture (HPCA) IEEE pp 168ndash179
55 Yarom Y Falkner K (2014) FLUSH+ RELOAD a high resolutionlow noise L3 cache side-channel attack In USENIX securitysymposium pp 719ndash732
56 Zhang D Askarov A Myers AC (2012) Language-based controland mitigation of timing channels ACM SIGPLAN Not 47(6)99ndash110
57 Zhang D Wang Y Suh GE Myers AC (2015) A hardware designlanguage for timing-sensitive information-flow security In ACMSIGARCH computer architecture news vol 43 ACM pp 503ndash516
58 Zhang S Wright A Bourgeat T Arvind A (2018) Composablebuilding blocks to open up processor design In 2018 51stannual IEEEACM international symposium on microarchitecture(MICRO) IEEE pp 68ndash81
59 Zhang T Lee RB (2014) New models of cache architecturescharacterizing information leakage from cache side channels InProceedings of the 30th annual computer security applicationsconference ACM pp 96ndash105
60 Zhang Y Parikh D Sankaranarayanan K Skadron K Stan M(2003) HotLeakage a temperature-aware model of subthresholdand gate leakage for architects University of Virginia Dept ofComputer Science Tech Report CS-2003 5
Publisherrsquos Note Springer Nature remains neutral with regard tojurisdictional claims in published maps and institutional affiliations
J Hardw Syst Secur (2019) 3397ndash425 425
Pseudo-code for the cache three-step simulator algorithm
Input a list containing 17 possible states for each of the step
Output a list containing all the vulnerabilities that belong to the Strong type
a list containing all the vulnerabilities that belong to the Weak type
a list containing all the ineffective typs
1 for 1 do2 for 2 do3 for 3 do4 1 2 3
5 array to store all possible candidate combinations of this three-step pattern
6 array to store all possible timing observation regading different candidate combinations for this
three-step pattern
7 if 0 or 1 or 2 then8 for 3 rsquos candidates are and rsquos candidates are
and Both candidatersquos number is 3 do9 0
1 2
10 end for11 for 3 do12
13 end for14 if then15 strongappend(steps)
16 else17 if then18
19 else20
21 end if22 end if23 else24
25 continue26 end if27 end for28 end for29 end for
Algorithm 1
After computing the type of all the three-step patternsthe cache three-step simulator will output effective (StrongVulnerability or Weak Vulnerability) three-step patternsDue to the space limit we only list and analyze the Strongvulnerabilities in this paper Weak vulnerabilities are left forfuture work when channels with smaller channel capacitiesare desired to be analyzed
332 Reduction Rules
We also have developed rules that can further reduce theoutput list of all the effective three steps from the cache
three-step simulator Figure 2 shows how the output of thesimulator is filtered through the reduction rules to get thefinal list of vulnerabilities Reductionrsquos goal is to removevulnerabilities of repeating or redundant types from the liststo form effective Strong Vulnerability or Weak Vulnerabilityoutput A script was developed that automatically appliesbelow reduction rules to the output of the simulator to getthe final list of vulnerabilities A three-step combinationwill be eliminated if it satisfies one of the below rules
1 Three-step patterns with two adjacent steps which arerepeating or which are both known to the attacker canbe eliminated eg Ad Aa Vu can be reduced
J Hardw Syst Secur (2019) 3397ndash425 403
Behavior (u)a
aalias
Observation
NIB
fast
slow
Vd Vu Aa
Behavior (u)a
aalias
Observation
NIB
fast
slow
Vu Aainv
Behavior (u)a
aalias
Observation
NIB
fast
slow
Vd Vuinv Vd
(a)
(c)
(e)
Behavior (u)a
aalias
Observation
NIB
fast
slow
Vu Ad Vuinv
Behavior (u)a
aalias
Observation
NIB
fast
slow
Aaaliasinv Vu
inv Va
Behavior (u)a
aalias
Observation
NIB
fast
slow
Aa Vu Ad
(b)
(d)
(f)
Eg Eg
Eg Eg
Eg Eg
Fig 3 Examples of relations between victimrsquos behavior (u) andattackerrsquos observation for each vulnerability type a b StrongVulnerability c d Weak Vulnerability e f Ineffective Three-Step
to Aa Vu which is equivalent to Aa VuTherefore Ad Aa Vu is a repeat type of Aa Vu and can be eliminated
2 Three-step patterns with a step involving a knownaddress a and an alias to that address aalias givesthe same information Thus three step combinationswhich only differ in use of a or aalias cannot representdifferent attacks and only one combination needs tobe considered For example Vu Aaalias Vu is arepeat type of Vu Aa Vu and we will eliminatethe first pattern
3 Three-step patterns with steps Vu and V invu in adjacent
consecutive steps with each other will only keep thelatter step and eliminate the first step For exampleAa Vu V inv
u can be reduced to Aa V invu and
further equivalent to Aa V invu So Aa Vu
V invu can be eliminated
333 Categorization of StrongVulnerabilities
As is shown in Fig 2 after applying the reduction rulesthere are 72 types of Strong vulnerabilities remaining InAppendix B we analyze the soundness of the three-stepmodel to demonstrate that the three-step model can cover allpossible cache timingndashbased side-channel vulnerabilitiesAnd if there is a vulnerability it can always be reduced toa model that requires only three steps Table 2 lists all the
vulnerability types of which the last step is a memory accessand Table 3 shows all the vulnerability types of which thelast step is an invalidation-related operation To ease theunderstanding of all the vulnerability types we group thevulnerabilities based on attack strategies (left-most columnin Tables 2 and 3) these strategies correspond to well-known names for the attacks if such exist otherwise weprovide a new name In Appendix A we provide descriptionfor each attack strategy to show the main idea behind themWe use existing names for attack strategies where suchexisted before even if similar attacks eg attacks differingin only one step have been given different names before Weuse these established names to avoid confusion but detailsome of the similarities in Appendix A as a clarification
The list of vulnerability types can be further collectedinto four simple macro types which cover one or morevulnerability types internal interference miss-based (IM)internal interference hit-based (IH) external interferencemiss-based (EM) external interference hit-based (EH) aslabeled in the Macro Type column of Tables 2 and 3 Allthe types of vulnerabilities that only involve the victimrsquosbehavior V in the states in Step 2 and Step 3 are calledinternal interference vulnerabilities (I) The remaining onesare called external interference (E) Some vulnerabilitiesallow the attacker to learn that the address of the victimaccesses map to the set the attacker is attacking by observingslow timing due to a cache miss or f ast timing due toinvalidation of data not in the cache2 We call these miss-based vulnerabilities (M) The remaining ones leverageobservation of f ast timing due to a cache hit or slow timingdue to an invalidation of an address that is currently valid inthe cache and are called hit-based vulnerabilities (H)
Many vulnerability types have been explored before Forexample the Cache Collision attack [5] is effectively basedon the Internal Collision and it maps to types labeled (2)in the Attack column in Tables 2 and 3 The types labelednew correspond to new attack not previously discussedin literature We believe these 43 are new attacks notpreviously analyzed nor known
4 Secure Caches
Having explained the three-step model we now explorethe various secure caches which have been presented inliterature to date [7 10 12 22 23 25 27 29 30 38 48ndash53 56 57] Later in Section 5 we will apply the three-stepmodel to check if the secure caches can defend some or allof the vulnerabilities in our model
2Invalidation is fast when the corresponding address which is to beinvalidated does not exist in the cache since no operation is needed forthe invalidation
J Hardw Syst Secur (2019) 3397ndash425404
Table 2 The table shows all the cache timing-based cache vulnerabilities where the last step is a memory access related operation The AttackStrategy column gives a common name for each set of one or more specific vulnerabilities that would be exploited in an attack in a similar mannerThe Vulnerability Type column gives the three steps that define each vulnerability For Step 3 fast indicates a cache hit must be observed toderive sensitive address information while slow indicates a cache miss must be observed The Macro Type column proposes the categorizationthe vulnerability belongs to ldquoErdquo is for external interference vulnerabilities ldquoIrdquo is for internal interference vulnerabilities ldquoMrdquo is for miss-basedvulnerabilities ldquoHrdquo is for hit-based vulnerabilities The Attack column shows if a type of vulnerability has been previously presented in literature
Attack Strategy Vulnerability Type Macro Type Attack
Step 1 Step 2 Step 3
Cache Internal Collision Ainv Vu Va (fast) IH (2)
V inv Vu Va (fast) IH (2)
Ad Vu Va (fast) IH (2)
Vd Vu Va (fast) IH (2)
Aaalias Vu Va (fast) IH (2)
Vaalias Vu Va (fast) IH (2)
Ainva Vu Va (fast) IH (2)
V inva Vu Va (fast) IH (2)
Flush + Reload Ainva Vu Aa (fast) EH (5)
V inva Vu Aa (fast) EH (5)
Ainv Vu Aa (fast) EH (5)
V inv Vu Aa (fast) EH (5)
Ad Vu Aa (fast) EH (5)
Vd Vu Aa (fast) EH (5)
Aaalias Vu Aa (fast) EH (5)
Vaalias Vu Aa (fast) EH (5)
Reload + Time V invu Aa Vu (fast) EH new
V invu Va Vu (fast) IH new
Flush + Probe Aa V invu Aa (slow) EM (6)
Aa V invu Va (slow) IM new
Va V invu Aa (slow) EM new
Va V invu Va (slow) IM new
Evict + Time Vu Ad Vu (slow) EM (1)
Vu Aa Vu (slow) EM (1)
Prime + Probe Ad Vu Ad (slow) EM (4)
Aa Vu Aa (slow) EM (4)
Bernsteinrsquos Attack Vu Va Vu (slow) IM (3)
Vu Vd Vu (slow) IM (3)
Vd Vu Vd (slow) IM (3)
Va Vu Va (slow) IM (3)
Evict + Probe Vd Vu Ad (slow) EM newVa Vu Aa (slow) EM new
Prime + Time Ad Vu Vd (slow) IM newAa Vu Va (slow) IM new
Flush + Time Vu Ainva Vu (slow) EM new
Vu V inva Vu (slow) IM new
(1) Evict + Time attack [34] (2) Cache Internal Collision attack [5] (3) Bernsteinrsquos attack [3] (4) Prime + Probe attack [34 36] Alias-drivenattack [18] (5) Flush + Reload attack [54 55] Evict + Reload attack [17] (6) SpectrePrime MeltdownPrime attack [46]
J Hardw Syst Secur (2019) 3397ndash425 405
Table 3 The table shows the second part of the timing-based cache side-channel vulnerabilities where the last step is an invalidation relatedoperation For Step 3 fast indicates no corresponding address of the data is invalidated while slow indicates invalidation operation makes somedata invalid causing longer processing time
Attack Vulnerability Type Macro Attack
Step 1 Step 2 Step 3
Strategy Type
Cache Internal Collision Invalidation Ainv Vu V inva (slow) IH new
V inv Vu V inva (slow) IH new
Ad Vu V inva (slow) IH new
Vd Vu V inva (slow) IH new
Aaalias Vu V inva (slow) IH new
Vaalias Vu V inva (slow) IH new
Flush + Flush Ainva Vu V inv
a (slow) IH (1)
V inva Vu V inv
a (slow) IH (1)
Ainva Vu Ainv
a (slow) EH (1)
V inva Vu Ainv
a (slow) EH (1)
Flush + Reload Invalidation Ainv Vu Ainva (slow) EH new
V inv Vu Ainva (slow) EH new
Ad Vu Ainva (slow) EH new
Vd Vu Ainva (slow) EH new
Aaalias Vu Ainva (slow) EH new
Vaalias Vu Ainva (slow) EH new
Reload + Time Invalidation V invu Aa V inv
u (slow) EH new
V invu Va V inv
u (slow) IH new
Flush + Probe Invalidation Aa V invu Ainv
a (fast) EM new
Aa V invu V inv
a (fast) IM new
Va V invu Ainv
a (fast) EM new
Va V invu V inv
a (fast) IM new
Evict + Time Invalidation Vu Ad V invu (fast) EM new
Vu Aa V invu (fast) EM new
Prime + Probe Invalidation Ad Vu Ainvd (fast) EM new
Aa Vu Ainva (fast) EM new
Bernsteinrsquos Invalidation Attack Vu Va V invu (fast) IM new
Vu Vd V invu (fast) IM new
Vd Vu V invd (fast) IM new
Va Vu V inva (fast) IM new
Evict + Probe Invalidation Vd Vu Ainvd (fast) EM new
Va Vu Ainva (fast) EM new
Prime + Time Invalidation Ad Vu V invd (fast) IM new
Aa Vu V inva (fast) IM new
Flush + Time Invalidation Vu Ainva V inv
u (fast) EM new
Vu V inva V inv
u (fast) IM new
(1) Flush + Flush attack [16]
J Hardw Syst Secur (2019) 3397ndash425406
This section gives brief overview of the 18 secure cachedesigns that have been presented in academic literaturein the last 15 years To the best of our knowledge thesecover all the secure cache designs proposed to date Mostof the designs have been realized in functional simulationeg [23 48] Some have been realized in FPGA eg [7]and a few have been realized in real ASIC hardwareeg [31] No specific secure caches have been implementedin commercial processors to the best of our knowledgehowever CATalyst [29] leverages Intelrsquos CAT (CacheAllocation Technology) technology available today in IntelXeon E5 2618L v3 processors and could be deployed today
When the secure cache description in the cited papersdid not mention the issue of using flush or cache coherencewe assume the victim or the attacker cannot invalidateeach otherrsquos cache blocks by using clf lush instructions orthrough cache coherence protocol operations but they canflush or use cache coherence to invalidate their own cachelines The victim and the attacker also cannot invalidateprotected or locked data Further if the authors specified anyspecific assumptions (mainly about the software) we list theassumption as part of the description of the cache Whatrsquosmore when the level of cache hierarchy was unspecifiedwe assume the secure cachesrsquo features can be appliedto all levels of caches including L1 cache L2 cacheand Last Level Cache (LLC) If the inclusivity of thecaches was not specified we assume they target inclusivecaches Following the below descriptions of each securecache design the analysis of the secure caches is given inSection 5
SPlowast Cache [20 27]3 uses partitioning techniques tostatically partition the cache ways into High and Lowpartition for the victim and the attacker according to theirdifferent process IDs The victim typically belongs to Highsecurity and attacker belongs to Low security Victimrsquosmemory accesses cannot modify Low partition (assigned toprocesses such as the attacker) while the attackerrsquos memoryaccesses cannot modify High partition (assigned to thevictim) However the memory accesses of both the victimand the attacker can result in a hit in either Low or Highpartition if the data is in the cache
SecVerilog Cache [56 57] statically partitions cacheblocks between security levels L (Low) and H (High)Each instruction in the source code for programs usingSecVerilog cache needs to include a t iming label whicheffectively represents whether the data being accessed by
3Two existing papers give slightly different definitions for an ldquoSPrdquocache thus we selected to define a new cache the SPlowast cachethat combines secure cache features of the Secret-Protecting cachefrom [27] with secure cache features of the Static-Partitioned cachefrom [20]
that instruction is Low or High based on the code andthis t iming label can be similar to a process ID thatdifferentiates attackerrsquos (Low) instructions from victimrsquos(High) instructions The cache is designed such thatoperations in the High partition cannot affect timing ofoperations in the Low partition For a cache miss due to Lowinstructions when the data is in the High partition it willbehave as a cache miss and the data will be moved from theHigh to the Low partition to preserve consistency HoweverHigh instructions are able to result in a cache hit in bothHigh and Low partitions if the data is already in the cache
SecDCPCache [48] builds on the SecVerilog cache and usespartitioning idea from the original SecVerilog cache but thepartitioning is dynamic It can support at least two securityclasses H (High) and L (Low) and configurations with moresecurity classes are possible They use the percentage ofcache misses for L instructions that was reduced (increased)when Lrsquos partition size was increased (reduced) by onecache way to adjust the number of ways of the cacheassigned to the Low partition When adjusting number ofways in the cache dedicated to each partition if Lrsquos partitionsize decreases the process ID is checked and L blocksare flushed before the way is reallocated to H On theother hand if Lrsquos partition size increases H blocks in theadjusted cache way remain unmodified so as to not addmore performance overhead and they will eventually beevicted by Lrsquos memory accesses However the feature ofnot flushing High partition data during way adjustment mayleak timing information to the attacker
NoMo Cache [12] dynamically partitions the cache waysamong the currently ldquoactiverdquo simultaneous multithreading(SMT) threads Each thread is exclusively reserved Y blocksin each cache set where Y is within the range of [0 N
M]
where N is the number of ways and M is the number ofSMT threads NoMo-0 equals to traditional set associativecache while NoMo- N
M partitions cache evenly for the
different threads and there are no non-reserved ways Thenumber of Y assigned to each thread is adjusted based onits activeness When adjusting number of blocks assigned toa thread Y blocks are invalidated for cache sets to protecttiming leakage Eviction is not allowed within each threadrsquosown reserved ways while it is possible for the shared waysTherefore to avoid eviction caused by the unreserved wayswe assume NoMo- N
M is used to fully partition the cache
When the attacker and the victim share the same librarythere will be a cache hit if accessing the shared data andthe normal cache hit policy holds to guarantee the cachecoherence
SHARP Cache [53] uses both partitioning and randomiza-tion techniques to prevent victimrsquos data from being evicted
J Hardw Syst Secur (2019) 3397ndash425 407
or flushed by other malicious processes and it targets on theinclusive caches Each cache block is augmented with thecore valid bits (CVB) to indicate which private cache (pro-cess) it belongs to (similar to the Process ID) where CVBstores a bitmap and ith bit in the bitmap is set if the lineis present in ith corersquos private cache Cache hit is allowedamong different processesrsquo data When there is cache missand data needs to be evicted data not belonging to any cur-rent processes will be evicted first If there is no such datadata belonging to the same process will be evicted If thereis no existing data in the cache that is in the same processa random data in the cache set will be evicted This randomeviction will generate an interrupt to the OS to notify it ofa suspicious activity For pages that are read-only or exe-cutable SHARP cache disallows flushing using clf lush inuser mode However invalidating victimrsquos blocks by usingcache coherence protocol is still possible
Sanctum Cache [10] focuses on isolation of enclaves(equivalent to Trusted Software Module in other designs)from each other and the operating system (OS) In terms ofcaches they implement security features for L1 cache TLBand LLC Cache isolation of LLC is achieved by assigningeach enclave or OS to different DRAM address regionsIt uses page-coloring-based cache partitioning scheme [2444] and a software security monitor that ensures per-core isolation between OS and enclaves For L1 cacheand TLB when there is a transition between enclave andnon-enclave mode the security monitor will flush thecore-private caches to achieve isolation Normal flushestriggered by the enclave or the OS can only be done withinenclave or not within enclave code Also timing-based side-channel attacks exploiting cache coherence are explicitlynot prevented thus behavior on cache coherence operationsis not defined This cache listed extra software assumptionsas follows
Assumption 1 Software security monitor guarantees thatvictim and attacker process cannot share the same cacheblocks It uses page coloring [24 44] to ensure that victimand attackerrsquos memory is never mapped to the same cacheblocks for the LLCAssumption 2 The software runs on a system with asingle processor core where victim and attacker alternateexecution but can never run truly in parallel Moreoversecurity critical data is always flushed by the securitymonitor when program execution switches away from thevictim program for the L1 cache and the TLB
MI6 Cache [7] is part of the memory hierarchy of theMI6 processor which combines Sanctum [10] cachersquossecurity feature with disabling speculation during thespeculative execution of memory-related operations During
normal processor execution for L1 caches and TLB thecorresponding states will be flushed across context switchesbetween software threads For the LLC set partitioning isused to divide DRAM into contiguous regions And cachesets are guaranteed to be strictly partitioned (two DRAMregions cannot map to the same cache set) Each enclaveis only able to access its own partition Speculation issimply disabled when enclave interacts with the outsideworld because of small performance influence based on therare cases of speculation This cache listed extra softwareassumptions as follows
Assumption 1 Software security monitor guarantees thatthe victim and the attacker process cannot share the samecache blocks It uses page coloring [24 44] to ensure thatvictimrsquos and attackerrsquos memory are never mapped to thesame cache blocks for the LLCAssumption 2 The software runs on a system with asingle processor core where victim and attacker alternateexecution but can never run truly in parallel Moreoversecurity critical data is always flushed by the securitymonitor when program execution switches away from thevictim program for the L1 cache and the TLBAssumption 3 When an enclave is interacting with theoutside environment the corresponding speculation isdisabled by the software
InvisiSpec Cache [52] is able to make speculation invisiblein the data cache hierarchy Before a visibility pointshows up when all of its prior control flow instructionsresolve unsafe speculative loads (USL) will be put intoa speculative buffer (SB) without modifying any cachestates When reaching the visibility point there are twocases In one case the USL and successive instructionswill be possibly squashed because of mismatch of datain the SB and the up-to-date values in the cache Inanother case the core receives possible invalidation fromthe OS before checking of memory consistency modeland no comparison is needed When speculative executionhappens the hardware puts the data into SB as to identifyvisibility point for dealing with final state transition of thespeculative execution InvisiSpec cache targets on Spectre-like attacks and futuristic attacks However InvisiSpeccache is vulnerable to all non-speculative side channels
CATalyst Cache [29] uses partitioning especially CacheAllocation Technology (CAT) [21] available in the LLCof some Intel processors CAT allocates up to 4 differentClasses of Services (CoS) for separate cache ways sothat replacement of cache blocks is only allowed withina certain CoS CATalyst first uses CAT mechanism topartition caches into secure and non-secure parts (non-secure parts may map to 3 CoS while secure parts map
J Hardw Syst Secur (2019) 3397ndash425408
to 1 CoS) Secure pages are assigned to virtual machines(VMs) at a granularity of a page and not shared by morethan one VM Here attacker and victim reside in differentVMs Combined with CAT technology and pseudo-lockingmechanism which pins certain page frames managed bysoftware CATalyst guarantees that malicious code cannotevict secure pages CATalyst implicitly performs preloadingby remapping security-critical code or data to secure pagesFlushes can only be done within each VM And cachecoherence is achieved by assigning secure pages to only oneprocessor and not sharing pages among VMs This cachelisted extra software assumptions as follows
Assumption 1 Security critical data is always preloadedinto the cache at the beginning of the whole programexecutionAssumption 2 Security critical data is always able to fitwithin the secure partition of the cache That is all datain the range x can fit in the secure partitionAssumption 3 The victim and the attacker process cannotshare the same memory spaceAssumption 4 Use pseudo-locking mechanism by soft-ware to make sure that victim and attacker process cannotshare the same cache blocksAssumption 5 Secure pages are reloaded immediatelyafter the flush which is done by the virtual machinemonitor (VMM) to make sure all the secure pages are stillpinned in the secure partition
DAWG Cache [25] (Dynamically Allocated Way Guard)partitions the cache by cache ways and provides fullisolation for hits misses and metadata updates acrossdifferent protection domains (between the attacker and thevictim) DAWG cache is partitioned for the attacker and thevictim and each of them keep their own different domain id(which is similar to process ID used in general caches) Eachdomain id has its own bit fields one is called policy fillmapfor masking fills and selecting the victim to replace anotheris called policy hitmap for masking hit ways Only boththe tag and the domain id are the same will a cache hithappen Therefore DAWG allows read-only cache lines tobe replicated across ways for different protection domainFor a cache miss the victim can only be chosen withinthe ways belonging to the same domain id recorded bythe policy fillmap Consistently the replacement policy isupdated with the victim selection and the metadata derivedfrom the policy fillmap for different domains is updatedas well The paper also proposes the idea to dynamicallypartitions the cache ways following the systemrsquos workloadchanges but does not actually implement it
RIC Cache [22] (Relaxed Inclusion Caches) proposes a low-complexity cache to defend against eviction-based timing-based side-channel attacks on the LLC Normally for an
inclusive cache if the data R is in the LLC it is also inthe higher level cache and eviction of the R in the LLCwill cause the same data in the higher level cache eg L1cache to be invalidated making eviction-based attacks inthe higher level cache possible (eg attacker is able to evictvictimrsquos security critical cache line) For RIC each cacheline is extended with a single bit to set the relaxed inclusionOnce the relaxed inclusion is set for that cache line thecorresponding LLC line eviction will not cause the sameline in the higher-level cache to be invalidated Two kinds ofdata will be set relaxed inclusion bit read only data andthread private data when they are loaded into the cacheThese two kinds of data are claimed by the paper to cover allthe critical data for ciphers Therefore RIC will not preventwritable in-private critical data which is currently not foundin any ciphers Apart from that RIC requires flushing for thecorresponding cache lines in the cases that the RIC bits aremodified or for thread migration events to avoid the timingleakage during transition time
PL Cache [49] provides isolation by partitioning the cachebased on cache blocks It extends each cache block with aprocess ID and a lock status bit The process ID and thelock status bits are controlled by the extended load and storeinstructions (ld lockld unlock and st lockst unlock)which allow the programmer and compiler to set or reset thelock bit through use of the right load or store instructionIn terms of cache replacement policy for a cache hit PLcache will perform the normal cache hit handling procedureand the instructions with locking or unlocking capability canupdate the process ID and the lock status bits while the hit isprocessed When there is a cache miss locked data cannotbe evicted by data that is not locked and locked data amongdifferent processes cannot be evicted by each other In thiscase the new data will be either loaded or stored withoutcaching In other cases data eviction is possible This cachelisted extra software assumption as follows
Assumption 1 Security critical data is always preloadedinto the cache at the beginning of the whole programexecution
RP Cache [49] uses randomization to de-correlate thememory address accessing and timing of the cache Foreach block of RP cache there is a process ID and oneprotection bit P set to indicate if this cache block needs to beprotected or not A permutation table (PT) stores each cachesetrsquos pre-computed permuted set number and the numberof tables depends on number of protected processes Formemory access operations cache hits need both process IDand address to be the same When a cache miss happensto data D of a cache set S if the to-be-evicted data andto-be-brought-in data belong to the same process but have
J Hardw Syst Secur (2019) 3397ndash425 409
different protection bit arbitrary data of a random cache setSprime will be evicted and D will be accessed without cachingIf they belong to different processes D will be stored in anevicted cache block of Sprime and mapping of S and Sprime will beswapped as well Otherwise the normal replacement policyis executed
Newcache Cache [31 50] dynamically randomizesmemory-to-cache mapping It introduced a ReMappingTable (RMT) and the mapping between memory addressesand this RMT is as in a direct mapped cache while themapping between the RMT and actual cache is fully asso-ciative The index bits of memory address are used to lookup entries in the RMT to find the cache block that shouldbe accessed It stores the most useful cache lines rather thanhold a fixed set of cache lines This index stored in RMTcombined with the process ID is used to look up the actualcache where each cache line is associated with its real indexand process ID Each cache block is also associated witha protection bit (P) to indicate if it is security critical Forcache replacement policy it is very similar to RP cacheCache hit needs both process ID and address to be the sameWhen cache miss happens to data D arbitrary data willbe evicted and D will be accessed without caching if theybelong to the same process but either one of their protec-tion bit is set If the evicted data and brought-in data havedifferent process IDs D will randomly replace a cache linesince it is fully associative in the actual cache Otherwisethe normal replacement policy for direct mapped cache isexecuted
Random Fill Cache [30] de-correlates cache fills withthe memory access using random filling technique Newinstructions used by applications in Random Fill cachecan control if the requested data belongs to a normalrequest or a random fill request Cache hits are processedas in normal cache For the security critical data accessesof the victim a Nof ill request is executed and therequested data access will be performed without cachingMeanwhile on a Random Fill request arbitrary datafrom the range of addresses will be brought into the cacheIn the paper [30] the authors show that random fill ofspatially near data does not hurt performance For otherprocessesrsquo memory accesses and normal victimrsquos memoryaccesses Normal request will be used to achieve normalreplacement policy Victim and attacker are able to removevictimrsquos own security critical data including using clf lush
instructions or cache coherence protocol since the flush willnot influence timing-based side-channel attack prevention(the random filling technique is used for this)
CEASER Cache [38] is able to mitigate conflict-basedLLC timingndashbased side-channel attacks using address
encryption and dynamic remapping CEASER cache doesnot differentiate whom the address belongs to and whetherthe address is security critical When memory access triesto modify the cache state the address will first be encryptedusing Low-Latency BlockCipher (LLBC) [6] which notonly randomizes the cache set it maps but also scatters theoriginal possibly ordered and location-intensive addressesto different cache sets decreasing the probability of conflictmisses The encryption and decryption can be done withintwo cycles using LLBC Furthermore the encryption keywill be periodically changed to avoid key reconstructionThe periodic re-keying will cause the address remapping todynamically change
SCATTER Cache [51] uses cache set randomization toprevent timing-based attacks It builds upon two ideas Firsta mapping function is used to translate memory addressand process information to cache set indices the mappingis different for each program or security domain Secondthe mapping function also calculates a different index foreach cache way in a similar way to the skewed associativecaches [41] The mapping function can be keyed hash orkeyed permutation derivation functionmdasha different key isused for different application or security domain resultingin a different mapping from address to cache sets for eachSoftware (eg the operating system) is responsible formanaging the security domains and process IDs which areused to differentiate the different software and assign itdifferent keys for the mapping For the hardware extensiona cryptographic primitive such as hashing and an indexdecoder for each scattered cache way is added SCATTERcache also stores the index bits of the physical addressto efficiently perform lookups and writebacks There isalso one bit per page-table entry added to allow the kernelto communicate with the user space for security domainidentification
Non Deterministic Cache [23] uses cache access delay torandomize the relation between cache block access andcache access timing There is no differentiation of datacaching between different process ID or whether the datais secure or not A per-cache-block counter records theinterval of its data activeness and is increased on eachglobal counter clock tick when the data is untouched Whenthe counter reaches a predefined value the correspondingcache line will be invalidated Non Deterministic Cacherandomly sets the local countersrsquo initial value that is lessthan the maximum value of the global counter In thiscase the cache delay is changed to be randomized Cachedelay interval controlled by this non-deterministic executioncan lead to different cache hit and miss statistics becausethe invalidation is determined by the randomized counterof each cache line and therefore de-correlates any cache
J Hardw Syst Secur (2019) 3397ndash425410
access time from the address being accessed However theperformance degradation is tremendous
5 Analysis of the Secure Caches
In this section we manually evaluate the effectiveness ofthe 18 secure caches [7 10 12 22 23 25 27 29 30 3848ndash53 56 57] We analyze how well the different cachescan protect against the 72 types of vulnerabilities definedin Tables 2 and 3 which cover all the possible Strong(according to the definition in Section 3) cache timingndashbased vulnerabilities Following the analysis discuss whattypes of secure caches and features are best suited fordefending different types of timing-based attacks
51 Effectiveness of the Secure Caches AgainstTiming-Based Attacks
Tables 4 and 5 list the result of our analysis of whichcaches can prevent which types of attacks Some cachesare able to prevent certain vulnerabilities denoted by acheckmark and green color in the table For exampleSPlowast cache can defend against Vu Ad Vu (slow)(one type of Evict + Time [34]) vulnerability For someother caches and vulnerabilities the cache is not able toprevent the vulnerabilities and it is indicated by times and redcolor For example SecDCP cache cannot defend againstVu Va Vu (slow) (one type of Bernsteinrsquos Attack [3])vulnerability
Each cache is analyzed for each type of vulnerabilitylisted in Tables 2 and 3 A cache is judged to be able toprevent a type of cache timingndashbased vulnerability in threecases
1 A cache can prevent a timing attack if the timing ofthe last step in a vulnerability is always constant andthe attacker can never observe fast and slow timingdifference for the given set of three steps For instancein a regular set-associative cache the Vd Vu Aa
(fast) (one type of Flush + Reload [55]) vulnerabilitywill allow the attacker to know that address a mapsto secret u when the attacker observes fast timingcompared with observing slow timing in the other casesHowever in case of the RP cache [49] will make thetiming of the last step to be always slow because RPcache does not allow data of different processes toderive cache hit between each other
2 A cache can prevent a timing attack if the timingof last step is randomized and cannot have originalcorresponding relation between victimrsquos behavior andattackerrsquos observation For instance Ad Vu Ainv
d (fast) (one type of Prime + Probe Invalidation)
vulnerability when executed on a normal set-associativecache will allow the attacker to know that the addressd has the same index with secret u when observing fasttiming compared with slow timing in the other casesHowever when executing this attacks on the RandomFill cache [30] for example a slow timing will notdetermine that u and d have the same index as thesecret since in Random Fill cache u would be accessedwithout caching and another random data would becached instead
3 A cache can prevent a timing attack if it disallowscertain steps from the three-step model to be executedthus prevents the corresponding vulnerability Forinstance when PL cache [49] preloads and locks thesecurity critical data in the cache vulnerabilities suchas Ad Vu V inv
d (slow) (one type of Prime + TimeInvalidation) will not be possible since a preloadedlocked security critical data will not allow Ad in Step 1to replace it In this case Ad cannot be in the cache sothis vulnerability cannot be triggered in PL cache
From the security perspective the entries of the securecache in Tables 4 and 5 should have as many green coloredcells as possible If a cache design has any red cells thenit cannot defend against that type of vulnerabilitymdashattackerusing the timing-based vulnerability that corresponds to thered cell can attack the system
The third column in Tables 4 and 5 shows a normalset associative cache which cannot defend against anytype of timing-based vulnerabilities Meanwhile the lastcolumn of Tables 4 and 5 shows the situation where thecache is fully disabled As is expected the timing-basedvulnerabilities are eliminated and timing-based attacks willnot succeed Disabling caches however has tremendousperformance penalty Similarly second-to-last columnshows Nondeterministic Cache which totally randomizescache access time It can defend all the attacks but againwill have a tremendous cost to security when the applicationis complex
For each of the entry that shows the effectivenessof a secure cache against a vulnerability there are tworesults listed Left one is for normal execution and theright one is for speculative execution Some secure cachessuch as InvisiSpec cache target timing-based channels inspeculative execution For most of the caches that do notdifferentiate speculative execution and normal executionthe two sub-columns for each cache are the same
6 Secure Cache Techniques
Among the secure cache designs presented in the priorsection there are three main techniques that the caches
J Hardw Syst Secur (2019) 3397ndash425 411
Table4
Exi
stin
gse
cure
cach
esrsquo
prot
ectio
nag
ains
tal
lpo
ssib
letim
ing-
base
dvu
lner
abili
ties
with
last
step
tobe
mem
ory
acce
ssre
late
dop
erat
ions
Sin
gle
ina
gree
nce
llm
eans
this
cach
eca
npr
even
tth
eco
rres
pond
ing
vuln
erab
ility
i
na
pink
cell
mea
nsth
isca
che
can
prev
ent
the
corr
espo
ndin
gvu
lner
abili
tyin
som
ede
gree
A
timesin
are
dce
llm
eans
this
cach
eca
nnot
prev
ent
this
vuln
erab
ility
Fur
ther
mor
efo
rea
chca
che
we
anal
yze
norm
alex
ecut
ion
(lef
tcol
umn
unde
rth
eca
che
nam
e)an
dsp
ecul
ativ
eex
ecut
ion
(rig
htco
lum
nun
der
the
cach
ena
me)
a Dyn
amic
adju
stm
ento
fw
ays
for
diff
eren
tthr
eads
isas
sum
edto
bepr
oper
lyus
edac
cord
ing
toth
eru
nnin
gpr
ogra
mrsquos
cach
eus
age
bSo
me
soft
war
eas
sum
ptio
nslis
ted
inth
een
trie
sin
this
colu
mn
have
been
impl
emen
ted
byth
eca
chersquo
sre
late
dso
ftw
are
c Flus
his
disa
bled
but
cach
eco
here
nce
mig
htbe
used
todo
the
data
rem
oval
dFo
rL
1ca
che
and
TL
Bf
lush
ing
isdo
nedu
ring
cont
exts
witc
he T
hete
chni
ques
are
impl
emen
ted
inL
1ca
che
TL
Ban
dla
st-l
evel
cach
ew
hich
cons
isto
fth
ew
hole
cach
ehi
erar
chy
whe
reL
1ca
che
and
TL
Bre
quir
eso
ftw
are
flus
hpr
otec
tion
and
the
last
-lev
elca
che
can
beac
hiev
edby
sim
ple
hard
war
epa
rtiti
onin
gTo
prot
ecta
llle
vels
ofca
ches
the
soft
war
eas
sum
ptio
nsne
edto
bead
ded
f The
tech
niqu
eis
now
only
impl
emen
ted
inla
st-l
evel
cach
egT
hete
chni
que
now
only
targ
ets
shar
edca
che
hT
hete
chni
que
only
targ
ets
incl
usio
nla
st-l
evel
cach
ei T
hete
chni
que
targ
ets
data
cach
ehi
erar
chy
j For
the
last
-lev
elca
che
cach
eis
part
ition
edbe
twee
nth
evi
ctim
and
the
atta
cker
kT
hete
chni
que
can
cont
rolt
hepr
obab
ilitie
sof
the
vuln
erab
ility
tobe
succ
essf
ulto
beex
trem
ely
smal
ll T
hete
chni
que
can
wor
kin
shar
edr
ead
only
mem
ory
whi
leno
twor
king
insh
ared
wri
tabl
em
emor
ym
Ran
dom
dela
ybu
tnot
rand
omm
appi
ngca
non
lyde
crea
seth
epr
obab
ilitie
sof
atta
cker
inso
me
limite
dde
gree
J Hardw Syst Secur (2019) 3397ndash425412
Table5
Exi
stin
gse
cure
cach
esrsquop
rote
ctio
nag
ains
tall
poss
ible
timin
g-ba
sed
vuln
erab
ilitie
sw
ithla
stst
epto
bein
valid
atio
nre
late
dop
erat
ions
Sin
gle
ina
gree
nce
llm
eans
this
cach
eca
npr
even
tth
eco
rres
pond
ing
vuln
erab
ility
in
api
nkce
llm
eans
this
cach
eca
npr
even
tthe
corr
espo
ndin
gvu
lner
abili
tyin
som
ede
gree
Atimes
ina
red
cell
mea
nsth
isca
che
cann
otpr
even
tthi
svu
lner
abili
ty
Furt
herm
ore
for
each
cach
ew
ean
alyz
eno
rmal
exec
utio
n(l
eftc
olum
nun
der
the
cach
ena
me)
and
spec
ulat
ive
exec
utio
n(r
ight
colu
mn
unde
rth
eca
che
nam
e)
a Dyn
amic
adju
stm
ento
fw
ays
for
diff
eren
tthr
eads
isas
sum
edto
bepr
oper
lyus
edac
cord
ing
toth
eru
nnin
gpr
ogra
mrsquos
cach
eus
age
bSo
me
soft
war
eas
sum
ptio
nslis
ted
inth
een
trie
sin
this
colu
mn
have
been
impl
emen
ted
byth
eca
chersquo
sre
late
dso
ftw
are
c Flus
his
disa
bled
but
cach
eco
here
nce
mig
htbe
used
todo
the
data
rem
oval
dFo
rL
1ca
che
and
TL
Bf
lush
ing
isdo
nedu
ring
cont
exts
witc
he T
hete
chni
ques
are
impl
emen
ted
inL
1ca
che
TL
Ban
dla
st-l
evel
cach
ew
hich
cons
isto
fth
ew
hole
cach
ehi
erar
chy
whe
reL
1ca
che
and
TL
Bre
quir
eso
ftw
are
flus
hpr
otec
tion
and
the
last
-lev
elca
che
can
beac
hiev
edby
sim
ple
hard
war
epa
rtiti
onin
gTo
prot
ecta
llle
vels
ofca
ches
the
soft
war
eas
sum
ptio
nsne
edto
bead
ded
f The
tech
niqu
eis
now
only
impl
emen
ted
inla
st-l
evel
cach
egT
hete
chni
que
now
only
targ
ets
shar
edca
che
hT
hete
chni
que
only
targ
ets
incl
usio
nla
st-l
evel
cach
ei T
hete
chni
que
targ
ets
data
cach
ehi
erar
chy
j For
the
last
-lev
elca
che
cach
eis
part
ition
edbe
twee
nth
evi
ctim
and
the
atta
cker
kT
hete
chni
que
can
cont
rolt
hepr
obab
ilitie
sof
the
vuln
erab
ility
tobe
succ
essf
ulto
beex
trem
ely
smal
ll T
hete
chni
que
can
wor
kin
shar
edr
ead
only
mem
ory
whi
leno
twor
king
insh
ared
wri
tabl
em
emor
ym
Ran
dom
dela
ybu
tnot
rand
omm
appi
ngca
non
lyde
crea
seth
epr
obab
ilitie
sof
atta
cker
inso
me
limite
dde
gree
J Hardw Syst Secur (2019) 3397ndash425 413
utilize differentiating sensitive data partitioning andrandomization
Differentiating Sensitive Data (columns for CATalyst cacheto columns for Random Fill cache in Tables 4 and 5) allowsthe victim or attacker software or management software toexplicitly label a certain range of the data of victim whichthey think is sensitive The victim process or managementsoftware is able to use cache-specific instructions to protectthe data and limit internal interference between victimrsquosown data For example it is possible to disable victimrsquosown flushing of victimrsquos labeled data and therefore preventvulnerabilities that leverage flushing This technique allowsthe designer to have stronger control over security criticaldata rather than forcing the system to assume all of victimrsquosdata is sensitive However how to identify sensitive dataand whether this identification process is reliable are openresearch questions for caches that support differentiation ofsensitive data
This technique is independent of whether a cache usespartitioning or randomization techniques to eliminate sidechannels between the attacker and the victim Caches thatare able to label and identify sensitive data have theadvantage in preventing internal interference since they areable to differentiate sensitive data from the normal dataand can make use of special instructions to give moreprivileges to sensitive data However it requires careful usewhen identifying the actual sensitive data and implementingcorresponding security features on the cache
Comparing PL cache with SPlowast cache although bothof them use partitioning flush is able to be implementedto be disabled for victimrsquos sensitive data in PL cachewhere Vu V inv
a Vu (slow) (one type of Flush +Time) is prevented Newcache is able to prevent Vu Va Vu (slow) (one type of Bernsteinrsquos Attack [3])while most of the caches without ability to differentiatesensitive data cannot because Newcache disallows replacingdata as long as either data to be evicted or data to becached is identified to be sensitive However permittingdifferentiation of sensitive data can potentially backfire onthe cache itself For example Random Fill cache cannotprevent Vu Ad Vu (slow) (one type of Evict +Time [34]) which most of the other caches can prevent oravoid because the random fill technique loses its intendedrandom behavior when the security critical data is initiallyloaded into the cache in Step 1
Partitioning-Based Caches usually limit the victim and theattacker to be able to only access a limited set of cacheblock (columns for SPlowast cache to column for PL cachein Tables 4 and 5) For example either there is staticor dynamic partitioning of caches which allocates someblocks to High victim and Low attacker The partitioning
can be based not just on whether the memory access isvictimrsquos or attackerrsquos but also on where the access is to(eg High partition is determined by the data address) Forspeculative execution attackerrsquos code can be the part ofspeculation or out-of-order load or store which is able to bepartitioned (eg using speculative load buffer) from othernormal operations The partitioning granularity can be cachesets cache lines or cache ways Partitioning-based securecaches are usually able to prevent external interference bypartitioning but are weak at preventing internal interferenceWhen partitioning is used interference between the attackerand the victim or data belonging to different securitylevels should not be possible and attacks based on externalinterference between the victim and the attacker will failHowever the internal interference of victimrsquos own datais hard to be prevented by the partitioning-based cachesWhatrsquos more partitioning is recognized to be wastefulin terms of cache space and inherently degrades systemperformance [49] Dynamic partitioning can help limit thenegative performance and space impacts but it could be ata cost of revealing some information when adjusting thepartitioning size for each part It also does not help withinternal interference prevention
In terms of the three-step model the partitioning-basedcaches excel at making use of partitioning techniques todisallow the attacker to set initial states (Step 0) of victimpartition by use of flushing or eviction and therefore bringuncertainty to the final timing observation made by theattacker
SPlowast cache can prevent external miss-based interferencebut it still allows the victim and the attacker to getcache hits due to each otherrsquos data which makes hit-based vulnerabilities happen eg Vd Vu Va (fast)(one type of Cache Internal Collision [5]) vulnerabilityis one of the examples that SPlowast cache cannot preventSecVerilog cache is similar to SPlowast cache but prevents theattacker from directly getting cache hit due to victimrsquos datafor confidentiality and therefore prevents vulnerabilitiessuch as Ainv
a Vu Aa (fast) (one type of Flush+ Reload [55]) SHARP cache mainly uses partitioningcombined with random eviction to minimize the probabilityof evicting victimrsquos data and prevent external miss-basedvulnerabilities It is vulnerable to hit-based or internalinterference vulnerabilities such as Vu Va Vu (slow)(one type of Bernsteinrsquos Attack [3]) vulnerability DAWGcache will only allow the data to get a cache hit if bothits address and the process ID are the same Thereforecompared with normal partitioning cache such as SPlowast cacheit is able to prevent vulnerabilities such as Vd Vu Ainv
d
(fast) (one type of Prime + Flush)SecDCP and NoMo cache both leverage dynamic
partitioning to improve performance Compared withSecVerilog cache SecDCP cache introduces certain side
J Hardw Syst Secur (2019) 3397ndash425414
channels which manifest themselves when the number ofways assigned to the victim and attacker changes egVu Ainv
a Vu (slow) (one type of Flush + Time)vulnerability NoMo cache behaves more carefully whenchanging the number of ways during dynamic partitioninghowever it requires victimrsquos sensitive data to fit intothe assigned partitions otherwise it will be put into theunreserved way and allow eviction by the attacker SecDCPdoes not have unreserved way All the space in the cachewill be either belongs to High or Low partition
Sanctum cache and CATalyst cache are both controlledby a powerful software monitor and they disallow securepage sharing between victim and attacker to preventvulnerabilities such as Ad Vu Aa (fast) (onetype of Flush + Reload [55]) Sanctum cache does notconsider internal interference while CATalyst cache ismore carefully designed to prevent different vulnerabilitieswith the implemented software system so far supportingpreventing all of the vulnerabilities but only works forLLC and with high software implementation complexityand some assumptions that might be hard to achievein other scenarios eg assuming the secure partition isbig enough to fit all the secure data MI6 cache is thecombination of Sanctum and disabling speculation wheninteracting with the outside world Therefore in normalexecution it behaves the same as Sanctum For speculativeexecution because it will simply disable all the speculationwhen involving the outside world the external interferencevulnerability such as Vd Vu Ad (slow) (one type ofEvict + Probe) vulnerability will be prevented
InvisiSpec cache does not modify the original cachestate but places the data in a speculative buffer partitionduring the speculation or out-of-order load or store Sinceduring speculation cache state is not actually updated thespeculative execution cannot trigger any of the steps inthe three-step model RIC cache focuses on eviction basedattack and therefore are good at preventing even someinternal miss-based vulnerability such as Vu Va Vu
(slow) (one type of Bernsteinrsquos Attack [3]) but are bad atall hit-based vulnerabilities PL cache is line-partitionedand uses locking techniques for victimrsquos security criticaldata It can prevent many vulnerabilities because preloadingand locking secure data disallow the attacker or non-securevictim data to set initial states (Step 0) for victim partitionand therefore brings uncertainty to the final observation bythe attacker eg Ad Vu Va (fast) (one type of CacheInternal Collision [5]) vulnerability is prevented
Randomization-Based Caches (columns for SHARP cacheand columns for RP cache to columns for Non Deterministiccache in Tables 4 and 5) inherently de-correlate therelationship between information of victimrsquos securitycritical datarsquos address and observed timing from cache
hit or miss or between the address and observed timingof flush or cache coherence operations For speculativeexecution they also de-correlate the relationship betweenthe address of the data being accessed during speculativeexecution or out-of-order load or store and the observedtiming from a cache hit or miss Randomization can beused when bringing data into the cache evicting data orboth Some designs randomize the address to cache setmapping As a result of the randomization the mutualinformation from the observed timing due to having ornot having data in the cache could be reduced to 0 ifrandomization is done on every memory access Somesecure caches use randomization to avoid many of the miss-based internal interference vulnerabilities However theymay still suffer from hit-based vulnerabilities especiallywhen the vulnerabilities are related to internal interferenceHowever randomization is also likewise recognized toincrease performance overheads [23] It also requires afast and secure random number generator Most of therandomization is cache-line-based and can be combinedwith differentiation of sensitive data to be more efficient
RP cache allows eviction between different sensitivedata which leaves vulnerabilities such as Vu Va Vu
(slow) (one type of Bernsteinrsquos Attack [3]) still possiblewhile Newcache prevents this Both of the RP cacheand Newcache are not able to prevent hit-based internal-interference vulnerabilities such as Ainv
a Vu Va
(fast) (one type of Cache Internal Collision [5]) RandomFill cache is able to use total de-correlation of memoryaccess and cache access of victimrsquos security critical datato prevent most of the internal and external interferenceHowever when security critical data is initially directlyloaded into the cache block for Step 1 Random Fill cachewill not randomly load security critical data and allowsvulnerabilities such as Vu V inv
a Vu (slow) (one typeof Flush + Time) vulnerability to exist CEASER cache usesencryption scheme plus dynamic remapping to randomizemapping from memory addresses to cache sets Howeverthis targets eviction-based attacks and cannot prevent hit-based vulnerabilities such as Va V inv
u V inva (fast)
(one type of Flush + Probe Invalidation) SCATTER cacheencrypts both the cache address and process ID whenmapping into different cache index to further prevent morehit-based vulnerabilities for shared and read only memoryNon Deterministic Cache totally randomizes timing ofcache accesses by adding delays and can prevent all attacks(but at tremendous performance cost)
61 Estimated Performance and Security Tradeoffs
Table 6 shows the implementation and performanceresults of the secure caches as listed by the designersin the different papers At the extreme end there is
J Hardw Syst Secur (2019) 3397ndash425 415
Table6
Exi
stin
gse
cure
cach
esrsquo
impl
emen
tatio
nm
etho
dpe
rfor
man
cep
ower
and
area
sum
mar
y
J Hardw Syst Secur (2019) 3397ndash425416
the Non Deterministic cache with random delay thesecure cache can prevent all the cache timingndashbasedvulnerabilities in some degreemdashwhile their paper reportsonly 7 degradation in performance we expect it tobe much more for more complex application than AESalgorithm Disabling caches eliminates the attacks but ata huge performance cost Normally a secure cache needsto sacrifice some performance in order to de-correlatememory access with the timing The secure caches thattend to be able to prevent more vulnerabilities usuallyhave weaker performance compared with other securecaches For example more security seems to imply lessperformance
62 Towards Ideal Secure Cache
Based on the above analysis a good secure cache shouldconsider all the 72 types of Strong vulnerabilities egexternal and internal interference and hit-based and miss-based vulnerabilities Considering all factors and basedon Tables 4 and 5 we have several suggestions andobservations for a secure cache design which can defendtiming-based attacks
bull Internal interference is important for caches to preventtiming-based attacks and is the weak point of most ofthe secure caches To prevent this the following threesubpoints should be considered
ndash Miss-based internal interference can be solvedby randomly evicting data to de-correlatememory access with timing information wheneither data to be evicted or data to be cachedis sensitive eg Newcache prevents Vu Va Vu (slow) (one type of BernsteinrsquosAttack [3]) vulnerability
ndash Hit-based internal interference can be solvedby randomly bringing data into the cache egRandom Fill cache prevents Ad Vu Va
(fast) (Cache Internal Collision) vulnerabilityndash To limit internal interference at lower perfor-
mance cost rather than simply assume all ofvictimrsquos data is sensitive it is better to differ-entiate real sensitive data from other data in thevictim code However identification of sen-sitive information needs to be carefully usedeg Random Fill cache is vulnerable to Vu Ad Vu (fast) (one type of Evict + Time [34])vulnerability which most of the secure cachesare able to prevent
bull Direct partitioning between the victim and the attackeralthough may hurt cache space utilization or perfor-mance is good at disallowing attacker to set known
initial state to victimrsquos partition and therefore pre-vents external interference Alternatively careful use ofrandomization can also prevent external interference
It should be noted that some cache designs only focuson certain levels eg CATalyst cache only works at thelast level cache In order to fully protect the whole cachesystem from timing-based attacks all levels of caches in thehierarchy should be protected with related security featuresFor example Sanctum is able to prevent all levels of cachesfrom L1 to last-level cache Consequently secure cachedesign needs to be realizable at all levels of the cachehierarchy
7 RelatedWork
There are a lot of existing attacks exploring timing-based cache channels eg [1 3 5 16ndash19 34 36 5455] Furthermore our recent paper [11] has summarizedcache timingndashbased side-channel vulnerabilities using athree-step model and inspired this work on checkingwhich vulnerability types are truly defeated by the securecaches in context of timing-based attacks In other workZhang and Lee [59] used finite-state machine to modelcache architectures and leveraged mutual information tomeasure potential side-channel leakage of the modeledcache architectures Meanwhile He and Lee [20] modeledinterference using probabilistic information flow graph andused attackerrsquos success probability to estimate differentcachesrsquo ability to defend against some cache timingndashbasedside-channel attacks However they did not explore allpossible vulnerabilities due to cache timingndashbased channels
There is also some other work focusing on cacheside-channel verification [13 14 47] Among theseCacheAudit [14] efficiently computes possible side-channelobservations using abstractions in a modular way Bit-level and arithmetic reasoning is used in [13] for memoryaccesses in the presence of dynamic memory allocationCacheD [47] detects potential cache differences at eachprogram point leveraging symbolic execution and constraintsolving
Hardware transactional memory has also been lever-aged to prevent timing-based cache side-channel attacks [815] Hardware transactional memory (HTM) is availableon modern commercial processors such as Intelrsquos Transac-tional Synchronization Extensions (TSX) Its main featureis to abort the transaction and roll back the modificationswhenever a cache block contained in the read set or writeset is evicted out of the cache In [15] HTM was com-bined with preloading strategy for code and data to preventFlush + Reload attacks in the local setting and Prime andProbe attacks in the cloud setting In [8] the software-level
J Hardw Syst Secur (2019) 3397ndash425 417
solution targets system calls page faults code refactoringand abort reasoning to eliminate not only Prime + ProbeFlush + Reload but also Evict + time and Cache Collisionattacks
8 Conclusion
This paper first proposed a new three-step model in orderto model all possible cache timing vulnerabilities It furtherprovided a cache three-step simulator and reduction rulesto derive effective vulnerabilities allowing us to find onesthat have not been exploited in literature With exhaustiveeffective vulnerability types listed this paper presentedanalysis of 18 secure processor cache designs with respectto how well they can defend against these timing-basedvulnerabilities Our work showed that vulnerabilities basedon internal interference of the victim application aredifficult to protect against and many secure cache designsfail in this We also provided a summary of secure processorcache features that could be integrated to make an idealsecure cache that is able to defend timing-based attacksOverall implementing a secure cache in a processor can be aviable alternative to defend timing-based attacks Howeverit requires design of an ideal secure cache or correction ofexisting secure cache designs to eliminate the few attacksthat they do not protect against
Acknowledgments This work was support in part by NSF grants1651945 and 1813797 and through Semiconductor Research Corpo-ration (SRC) under GRC Task 2844001
Appendix A Attack Strategies Descriptions
This appendix gives overview of the attack strategies shownin Tables 2 and 3 in Section 3 For each attack strategy anoverview of the three steps of the strategy is given Some ofthe strategies are similar and some may not be precise butwe keep and use the original names as they were assignedin prior work One advantage of our three-step model isthat it gives precise definition of each attack Neverthelessthe attack strategy names used before (and added by us forstrategies which did not have such names) may be useful torecall the attacksrsquo high-level operation
Cache Internal Collision In Step 1 cache blockrsquos data isinvalidated by flushing or eviction done by either theattacker or the victim Then the victim accesses secret datain Step 2 Finally the victim accesses data at a knownaddress in Step 3 if there is a cache hit then it reveals thatthere is an internal collision and leaks value of u
Flush + Reload In Step 1 either the attacker or the victiminvalidates the cache blockrsquos data by flushing or evictionThen the victim access secret data in Step 2 Finally theattacker tries to access some data in Step 2 using a knownaddress If a cache hit is observed then addresses from lasttwo steps are the same and the attacker learns the secretaddress This strategy has similar Step 1 and Step 2 asCache Internal Collision vulnerability but for Step 3 it isthe attacker who does the reload access
Reload + Time (New Name Assigned in this Paper) In Step
1 secret data is invalidated by the victim Then the attackerdoes some known data access in Step 2 that could possiblybring back the invalidated the victimrsquos secret data in Step
1 In Step 3 if the victim reloads the secret data a cachehit is observed and the attacker can derive the secret datarsquosaddress
Flush + Probe (New Name Assigned in this Paper) In Step
1 the victim or the attacker access some known addressIn Step 2 the victim invalidates secret data In Step 3reloading of Step 1rsquos data and observation of a cache misswill help the attacker learn that the secret data maps to theknown address from Step 1
Evict + Time In Step 1 some victimrsquos secret data is put intothe cache by the victim itself In Step 2 the attacker evicts aspecific cache set by performing a memory related operationthat is not a flush In Step 3 the victim reloads secret dataand if a cache miss is observed the will learn the secretdatarsquos cache set information This attack has similar Step 1and Step 3 as Flush + Time vulnerability but for Step 2 inEvict + Time the attacker invalidates some known addressallowing it to find the full address of the secret data insteadof evicting a cache set to only find the secret datarsquos cacheindex as in the Flush + Time attack
Prime + Probe In Step 1 the attacker primes the cache setusing data at address known to the attacker In Step 2 thevictim accesses the secret data which possibly evicts datafrom Step 1 In Step 3 the attacker probes each cache setand if a cache miss is observed the attacker knowns thesecret data maps to the cache set he or she primed
Bernsteinrsquos Attack This attack strategy leverages the vic-timrsquos internal interference to trigger the miss-based attackFor one case the victim does the same secret data access inStep 1 and Step 3 while in Step 2 the victim tries to evictone whole cache setrsquos data by known data accesses If cachemiss is observed in Step 3 that will tell the attacker thecache set is the one secret data maps to For another casethe victim primes and probe a cache set in Step 1 and Step
3 driven by the attacker while in Step 2 the victim tries to
J Hardw Syst Secur (2019) 3397ndash425418
access the secret data Similar to the first case observingcache miss in Step 3 tells the attacker the cache set is theone secret data maps to
Evict + Probe (NewName Assigned in this Paper) In Step 1victim evict the cache set using the access to a data at anaddress known to the attacker In Step 2 the victim accessessecret data which possibly evicts data from Step 1 In Step
3 the attacker probes each cache set using the same data asin Step 1 if a cache miss is observed the attacker knownsthe secret data maps to the cache set he or she primed Thisattack strategy has similar Step 2 and Step 3 as Prime +Probe attack but for Step 1 it is the victim that does theeviction accesses
Prime + Time (NewName Assigned in this Paper) In Step 1the attacker primes the cache set using access to data at anaddress known to the attacker In Step 2 the victim accessessecret data which possibly evicts data from Step 1 In Step
3 the victim probes each cache set using the same data Step
1 if a cache miss is observed the attacker knowns the secretdata maps to the cache set he or she primed in Step 1 Thisattack strategy has similar Step 1 and Step 2 as Prime +Probe attack but for Step 3 it is the victim that does theprobing accesses
Flush + Time (NewNameAssigned in this Paper) The victimaccesses the same secret data in Step 1 and Step 3 whilein Step 2 the attacker tries to invalidate data at a knownaddress If cache miss is observed in Step 3 that will tellthe attacker the data address he or she invalidated in Step 2maps to the secret data
Invalidation related (new names assigned in this paper)Vulnerabilities that have names ending with ldquoinvalidationrdquoin Table 3 correspond to the vulnerabilities that havethe same name (except for the ldquoinvalidationrdquo part) inTable 2 The difference between each set of correspondingvulnerabilities is that the vulnerabilities ending withldquoinvalidationrdquo use invalidation related operation in the laststep to derive the timing information rather than the normalmemory access related operations
Appendix B Soundness Analysisof the Three-StepModel
In this section we analyze the soundness of the three-stepmodel to demonstrate that the three-step model can coverall possible timing-based cache vulnerabilities in normalcaches If there is a vulnerability that is represented usingmore than three steps the steps can be reduced to only three
steps or a three-step sub-pattern can be found in the longerrepresentation
In the below analysis we use β to denote the numberof memory-related operations ie steps in a representationof a vulnerability We show that β = 1 is not sufficient torepresent a vulnerability β = 2 covers some vulnerabilitiesbut not all β = 3 represents all the vulnerabilities andβ gt 3 can be reduced to only three steps or a three-step sub-pattern can be found in the longer representation Knownaddresses refer to all the cache states that interferencewith the data a aalias and d Unknown address refersu An access to a known memory address is denoted asknown access operation and an invalidation of a knownmemory address is denoted as known inv operationThe known access operation and known inv operation
together make up not u operations An unknown memoryrelated operation (containing u) is denoted as u operation
B1 Patterns with β = 1
When β = 1 there is only one memory-relatedoperation and it is not possible to create interferencebetween memory-related operations since two memory-related operations are the minimum requirement for aninterference Furthermore β = 1 corresponds to the three-step pattern with both Step 1 and Step 2 being since thecache state gives no information and Step 3 being theone operation These types of patterns are all examined bythe cache three-step simulator and none of these types arefound to be effective Consequently a vulnerability cannotexit when β = 1
B2 Patterns with β = 2
When β = 2 it satisfies the minimum requirementof an interference for memory related operations andcorresponds to the three-step cases where Step 1 is and Step 2 and Step 3 are the two operations Thesetypes are all examined by the cache three-step simulatorand some of them belong to Weak Vulnerabilities like Aa Vu Therefore three-step cases where Step
1 is have corresponding effective vulnerabilities shownin Table 2 Consequently β = 2 can represent someweak vulnerabilities but not all vulnerabilities as there existsome that are represented with three steps as discussednext
B3 Patterns with β = 3
When β = 3 we have tested all possible combinations ofthree-step memory related operations in Section 33 usingour cache simulator for the three-step model We foundthat there are in total 72 types of Strong Vulnerabilities
J Hardw Syst Secur (2019) 3397ndash425 419
and 64 types of Weak Vulnerabilities that are representedby patterns with β = 3 steps Consequently β = 3can represent all the vulnerabilities (including some weakones where Step 1 is ) Using more steps to representvulnerabilities is not necessary as discussed next
B4 Patterns with β gt 3
When β gt 3 the pattern of memory-related operations fora vulnerability can be reduced using the following rules
B41 Subdivision Rules
First a set of subdivision rules is used to divide the longpattern into shorter patterns following the below rulesEach subdivision rule should be applied recursively beforeapplying the next rule
Subdivision Rule 1 If the longer pattern contains a sub-pattern such as the longer pattern canbe divided into two separate patterns where is assignedas Step 1 of the second pattern This is because givesno timing information and the attacker loses track of thecache state after This rule should be recursively applieduntil there are no sub-patterns left with a in the middleor as last step ( in the last step will be deleted) in thelonger pattern
Subdivision Rule 2 Next if a pattern (derived afterrecursive application of the Rule 1 contains a sub-patternsuch as AinvVinv the longer pattern canbe divided into two separate patterns where AinvVinv isassigned as Step 1 of the second pattern This is becauseAinvVinv will flush all the timing information of thecurrent block and it can be used as the flushing step forStep 1 eg vulnerability Ainv Vu Aa(f ast)shown in Table 2 AinvVinv cannot be a candidate formiddle steps or the last step because it will flush alltiming information making the attacker unable to deducethe final timing with victimrsquos sensitive address translationinformation This rule should be recursively applied untilthere are no sub-patterns left with a AinvVinv in themiddle or the last step (AinvVinv in the last step will bedeleted)
B42 Simplification Rules
For each of the patterns resulting from the subdivisionof the original pattern we define Commute Rules UnionRules and Reduction Rules for a each set of two adjacentsteps in these remaining patterns In Table 7 we showall the possible cases of the rule applying conditions for
each adjacent two steps regardless of the attackerrsquos access(A) or the victimrsquos access (V ) The table shows whetherthe corresponding two steps can be commuted reduced orunioned (and the reduced or the unioned result if the rulescan be applied)
B421 Commute Rules
Suppose there are two adjacent steps M and N for a memorysequences M N If commuting M andN lead to the same observation result ie M N and N M will have thesame timing observation information in the final step for theattacker we can freely exchange the place of M and N inthis pattern In this case we have more chance to Reduceand Union the steps within the memory sequence by thefollowing Rules In the possible commuting process we willtry every possible combinations to commute different pairsof two steps that are able to apply the Commute Rules andthen further apply Reduce Rules and Union Rules to seewhether the commute is effective ie there can be stepsreduced or unioned after the proper commuting process Thefollowing two adjacent memory-related operations can becommuted
ndash Commute Rule 1 For two adjacent steps if one stepis a known access operation and another step is aknown inv operation and the addresses they referto are different these two steps can be commuted nomatter which position of the two steps they are in withinthe whole memory sequence It will show a ldquoyesrdquo forthe corresponding two-step pattern for the CommuteRule 1 column if these two can be commuted in Table 7
ndash Commute Rule 2 A superset of two-step patterns thatcan apply Commute Rule 1 can be commuted if thesecond step of these two adjacent steps is not thelast step in the whole memory sequence There aresome two adjacent steps that can only be commutedif the second step of these two adjacent steps is notthe last step in the whole memory sequence Therewill be a ldquoyesrdquo for the corresponding two-step patternfor the Commute Rule 2 column and a ldquonordquo for thecorresponding two-step pattern for the Commute Rule 1column in Table 7
B422 Reduction Rules
If the memory sequence after applying Commute Ruleshave a sub-pattern that has two adjacent steps both relatedto known addresses or both related to unknown address(including repeating states) the two adjacent steps can bereduced to only one following the reduction rules (if thetwo-step pattern has ldquoyesrdquo for the Column ldquoUnion Rule or
J Hardw Syst Secur (2019) 3397ndash425420
Table7
Rul
esfo
rco
mbi
ning
two
adja
cent
step
s
Firs
tSe
cond
Com
mute
Com
mute
Union
Rule
orC
ombi
ned
Firs
tSe
cond
Com
mute
Com
mute
Union
Rule
orC
ombi
ned
Rule1
Rule2
ReduceRule
Step
Rule1
Rule2
ReduceRule
Step
aa
yes
yes
yes
aa
inv
ano
noye
sa
aa
ali
as
nono
yes
aali
as
ain
va
ali
as
yes
yes
yes
aali
as
ad
nono
yes
da
inv
dye
sye
sye
sd
au
nono
nominus
ain
vu
nono
nominus
aa
inv
nono
yes
ain
va
inv
ain
vye
sye
sye
sa
inv
aa
ali
asin
vye
sye
sye
sa
ain
va
ali
asin
vye
sye
sye
sU
nio
n(a
inva
ali
asin
v)
ad
inv
yes
yes
yes
aa
inv
din
vye
sye
sye
sU
nio
n(a
invd
inv)
au
inv
nono
nominus
ain
vu
inv
noye
sno
minusa
ali
as
ano
noye
sa
aali
asin
va
yes
yes
yes
a
aali
as
aali
as
yes
yes
yes
aali
as
aali
asin
va
ali
as
nono
yes
aali
as
aali
as
dno
noye
sd
aali
asin
vd
yes
yes
yes
d
aali
as
uno
nono
minusa
ali
asin
vu
nono
nominus
aali
as
ain
vye
sye
sye
sa
ali
as
aali
asin
va
inv
yes
yes
yes
Unio
n(a
inva
ali
asin
v)
aali
as
aali
asin
vno
noye
sa
ali
asin
va
ali
asin
va
ali
asin
vye
sye
sye
sa
ali
asin
v
aali
as
din
vye
sye
sye
sa
ali
as
aali
asin
vd
inv
yes
yes
yes
Unio
n(d
inva
ali
asin
v)
aali
as
uin
vno
nono
minusa
ali
asin
vu
inv
noye
sno
minusd
ano
noye
sa
din
va
yes
yes
yes
a
da
ali
as
nono
yes
aali
as
din
va
ali
as
yes
yes
yes
aali
as
dd
yes
yes
yes
dd
inv
dno
noye
sd
du
nono
nominus
din
vu
noye
sno
minusd
ain
vye
sye
sye
sd
din
va
inv
yes
yes
yes
Unio
n(a
invd
inv)
da
ali
asin
vye
sye
sye
sd
din
va
ali
asin
vye
sye
sye
sU
nio
n(d
inva
ali
asin
v)
dd
inv
nono
yes
din
vd
inv
din
vye
sye
sye
sd
inv
du
inv
noye
sno
minusd
inv
uin
vno
yes
nominus
ua
nono
nominus
uin
va
nono
nominus
ua
ali
as
nono
nominus
uin
va
ali
as
nono
nominus
ud
nono
nominus
uin
vd
noye
sno
minusu
uye
sye
sye
su
uin
vu
nono
yes
u
ua
inv
nono
nominus
uin
va
inv
noye
sno
minusu
aali
asin
vno
nono
minusu
inv
aali
asin
vno
yes
nominus
ud
inv
noye
sno
minusu
inv
din
vno
yes
nominus
uu
inv
nono
yes
uin
vu
inv
uin
vye
sye
sye
su
inv
J Hardw Syst Secur (2019) 3397ndash425 421
Reduce Rulerdquo and has no Union result for the ldquoCombinedSteprdquo Column in Table 7
ndash Reduction Rule 1 For two u operations although u isunknown both of the accesses target on the same u socan be reduced to only keep the second access in thememory sequence
ndash Reduction Rule 2 For two known adjacent memoryaccessndashrelated operations (known access operation)they always result in a deterministic state of the secondmemory accessndashrelated cache block so these two stepscan be reduced to only one step
ndash Reduction Rule 3 For two adjacent steps if onestep is known access operation and another one isknown inv operation no matter what order theyhave and the address they refer to is the same these twocan be reduced to one step which is the second step
B423 Union Rules
Suppose there are two adjacent steps M and N for memorysequences M N If combining M andN leads to the same timing observation result ie M N and Union(M N) will havethe same timing observation information in the final step forthe attacker we can combine step M and N to be a joint onestep for this memory sequence defined as Union(M N)Two adjacent steps that can be combined are discussed inthe following cases
ndash Union Rule 1 Two invalidations to two knowndifferent memory addresses can be applied UnionRule 1 known inv operation are two operations bothinvalidating some known address therefore they canbe combined to only one step The Union Rule can becontinuously done to union all the adjacent invalidationstep that invalidates known different memory addresses
B424 Final Check Rules
Each long memory sequence will recursively apply thesethree categorizations of the rules in the order Com-mute Rules first to put known access operations andknown inv operation that targets the same address as nearas possible and u operations and not u operations areputting together as much as possible The Reduced Rulesare then checked and applied to the processed memorysequence to reduce the steps Then the Union Rule isapplied to the processed memory sequence
The recursion at each application to these threecategorizations of the rules should be always applied andreduce at least one step until the resulting sequence matchesone of the two possible cases
bull the long (β gt 3) memory sequence with u operation
and not u operation is further reduced to a sequencewhere there are at most three steps in the followingpatterns or less
ndash u operation not u operation u operation
ndash not u operation u operation not u operation
There might be possible extra or AinvV inv beforethese three-step pattern where
bull An extra in the first step will not influencethe result and can be directly removed
bull If an extra AinvV inv in the first step
ndash If followed by known access ope-ration AinvV inv can be removeddue to the actual state further put intothe cache block
ndash If followed by known inv opera-t ion or V inv
u AinvV inv can also beremoved since the memory locationis repeatedly flushed by the two steps
ndash If followed by Vu worst case will beAinvV inv Vu not u operation u operationwhich is either an effective vul-nerability according to Table 2and reduction rules shown inSection 33 or AinvV inv Vu Ainv
d V invd u operation where
Vu Ainvd V inv
d can further beapplied Commute Rule 2 to reduceand be within three steps
In this case the steps are finally within three steps andthe checking is done
bull There exist two adjacent steps that cannot be appliedany Rules above and requires the Rest Checking
The only left two adjacent steps that cannot be appliedby any of the three categorizations of the Rules are thefollowing
ndash AaVaAaalias Vaalias AdVdAinva V inv
a
Ainvaalias V inv
aalias Vu ndash AaVaAaalias Vaalias V inv
u ndash Vu AaVaAaalias Vaalias AdVd
Ainva V inv
a Ainvaalias V inv
aalias ndash V inv
u AaVaAaalias Vaalias We manually checked all of the two adjacent step patterns
above and found that adding extra step before or after these
J Hardw Syst Secur (2019) 3397ndash425422
Algorithm 2 -Step ( 3) pattern reduction
Input number of steps of the pattern
a two-dimensional dynamic-size array [0] contains the states of each step of the original pattern in
order [1] [2] are empty initially
Output reduced effective vulnerability pattern(s) array It will be an empty list if the original pattern does not
correspond to an effective vulnerability
1 = Oslash
2 while contain( ) and index not 0 do3 = 1 ( )
4 end while5 while ( contain( ) and index not 0) or ( contain( ) and index not 0) do6 = 2 ( )
7 end while8 while ( is ineffecitve or has interval effective three steps) do9 = ( )
10 = ( )
11 = ( )
12 if ( is ineffecitve or has interval effective three steps) then13 += ( )
14 end if15 end while16 return
two steps can either generate two adjacent step patterns thatbe processed by the three Rules where further step can bereduced or construct effective vulnerability according toTable 2 and reduction rules shown in Section 33 wherethe corresponding pattern can be treated effective and thechecking is done
B43 Algorithm for Reducing and CheckingMemorySequence
Algorithm 2 is used to (i) reduce a β-step (β gt 3)pattern to a three-step pattern thus demonstrating that thecorresponding β gt 3 step pattern actually is equivalent tothe output three-step pattern and represents a vulnerabilitythat is captured by an existing three-step pattern or (ii)demonstrate that the β-step pattern can be mapped to one ormore three-step vulnerabilities It is not possible for a β-stepvulnerability pattern to not be either (i) or (ii) after doing theRule applications Key outcome of our analysis is that anyβ-step pattern is not a vulnerability or if it is a vulnerabilityit maps to either outputs (i) or (ii) of the algorithm
Inside the Algorithm 2 contain() represents a function tocheck if a list contains a corresponding state is ineffective()represents a function that checks the corresponding memorysequence does not contain any effective three-stepshas interval effective three steps() represents a functionthat check if the corresponding memory sequence can bemapped to one or more three-step vulnerabilities
B44 Summary
In conclusion the three-step model can model all possibletiming-based cache vulnerability in normal caches Vulner-abilities which are represented by more than three steps canbe always reduced to one (or more) vulnerabilities from ourthree-step model and thus using more than three step is notnecessary
References
1 Acıicmez O Koc CK (2006) Trace-driven cache attacks on AES(short paper) In International conference on information andcommunications security Springer pp 112ndash121
2 Agrawal D Archambeault B Rao JR Rohatgi P (2002) TheEM side-channel (s) In International workshop on cryptographichardware and embedded systems Springer pp 29ndash45
3 Bernstein DJ (2005) Cache-timing attacks on AES4 Binkert N Beckmann B Black G Reinhardt SK Saidi A Basu
A Hestness J Hower DR Krishna T Sardashti S et al (2011)The gem5 simulator ACM SIGARCH computer architecture news39(2)1ndash7
5 Bonneau J Mironov I (2006) Cache-collision timing attacksagainst AES In International workshop on cryptographichardware and embedded systems Springer pp 201ndash215
6 Borghoff J Canteaut A Guneysu T Kavun EB Knezevic MKnudsen LR Leander G Nikov V Paar C Rechberger Cet al (2012) Princendasha low-latency block cipher for pervasivecomputing applications In International conference on the theoryand application of cryptology and information security Springerpp 208ndash225
J Hardw Syst Secur (2019) 3397ndash425 423
7 Bourgeat T Lebedev I Wright A Zhang S Devadas S et al (2019)MI6 Secure enclaves in a speculative out-of-order processorIn Proceedings of the 52nd Annual IEEEACM InternationalSymposium on Microarchitecture ACM pp 42ndash56
8 Chen S Liu F Mi Z Zhang Y Lee RB Chen H Wang X(2018) Leveraging hardware transactional memory for cache side-channel defenses In Proceedings of the 2018 on Asia conferenceon computer and communications security ACM pp 601ndash608
9 Clark SS Ransford B Rahmati A Guineau S Sorber J Xu WFu K (2013) Wattsupdoc power side channels to nonintrusivelydiscover untargeted malware on embedded medical devices InPresented as part of the 2013 USENIX workshop on healthinformation technologies
10 Costan V Lebedev IA Devadas S (2016) Sanctum minimalhardware extensions for strong software isolation In USENIXsecurity symposium pp 857ndash874
11 Deng S Xiong W Szefer J (2018) Cache timing side-channel vulnerability checking with computation tree logic InProceedings of the 7th international workshop on hardware andarchitectural support for security and privacy ACM p 2
12 Domnitser L Jaleel A Loew J Abu-Ghazaleh N Ponomarev D(2012) Non-monopolizable caches low-complexity mitigation ofcache side channel attacks ACM Transactions on Architectureand Code Optimization (TACO) 8(4)35
13 Doychev G Kopf B (2017) Rigorous analysis of softwarecountermeasures against cache attacks In Proceedings of the 38thACM SIGPLAN conference on programming language design andimplementation ACM pp 406ndash421
14 Doychev G Kopf B Mauborgne L Reineke J (2015) CacheAudita tool for the static analysis of cache side channels ACM Trans-actions on information and system security (TISSEC) 18(1)4
15 Gruss D Lettner J Schuster F Ohrimenko O Haller I CostaM (2017) Strong and efficient cache side-channel protectionusing hardware transactional memory In USENIX securitysymposium pp 217ndash233
16 Gruss D Maurice C Wagner K Mangard S (2016) Flush+ flusha fast and stealthy cache attack In International conference ondetection of intrusions and malware and vulnerability assessmentSpringer pp 279ndash299
17 Gruss D Spreitzer R Mangard S (2015) Cache template attacksautomating attacks on inclusive last-level caches In USENIXsecurity symposium pp 897ndash912
18 Guanciale R Nemati H Baumann C Dam M (2016) Cache stor-age channels alias-driven attacks and verified countermeasuresIn 2016 IEEE symposium on security and privacy (SP) IEEEpp 38ndash55
19 Gullasch D Bangerter E Krenn S (2011) Cache gamesndashbringingaccess-based cache attacks on AES to practice In 2011 IEEEsymposium on security and privacy (SP) IEEE pp 490ndash505
20 He Z Lee RB (2017) How secure is your cache against side-channel attacks In Proceedings of the 50th annual IEEEACMinternational symposium on microarchitecture ACM pp 341ndash353
21 Intel C (2015) Improving real-time performance by utilizing cacheallocation technology Intel Corporation
22 Kayaalp M Khasawneh KN Esfeden HA Elwell J Abu-Ghazaleh N Ponomarev D Jaleel A (2017) RIC relaxed inclusioncaches for mitigating LLC side-channel attacks In 2017 54thACMEDACIEEE design automation conference (DAC) IEEEpp 1ndash6
23 Keramidas G Antonopoulos A Serpanos DN Kaxiras S (2008)Non deterministic caches a simple and effective defense againstside channel attacks Des Autom Embed Syst 12(3)221ndash230
24 Kessler RE Hill MD (1992) Page placement algorithms for largerealindexed caches ACM Trans Comput Syst (TOCS) 10(4)338ndash359
25 Kiriansky V Lebedev I Amarasinghe S Devadas S Emer J(2018) DAWG a defense against cache timing attacks in spec-ulative execution processors In 2018 51st Annual IEEEACMinternational symposium on microarchitecture (MICRO) IEEEpp 974ndash987
26 Kocher P Genkin D Gruss D Haas W Hamburg M LippM Mangard S Prescher T Schwarz M Yarom Y (2018)Spectre attacks exploiting speculativeexecution In 2019 IEEESymposium on Security and Privacy (SP) IEEE pp 1ndash19
27 Lee RB Kwan P McGregor JP Dwoskin J Wang Z (2005)Architecture for protecting critical secrets in microprocessorsIn ACM SIGARCH computer architecture news vol 33 IEEEComputer Society pp 2ndash13
28 Lee Y Waterman A Avizienis R Cook H Sun C StojanovicV Asanovic K (2014) A 45nm 13 GHz 167 double-precisionGFLOPSW RISC-V processor with vector accelerators InEuropean solid state circuits conference (ESSCIRC) ESSCIRC2014-40th IEEE pp 199ndash202
29 Liu F Ge Q Yarom Y Mckeen F Rozas C Heiser G Lee RB(2016) CATalyst defeating last-level cache side channel attacks incloud computing In 2016 IEEE international symposium on highperformance computer architecture (HPCA) IEEE pp 406ndash418
30 Liu F Lee RB (2014) Random fill cache architecture In2014 47th annual IEEEACM international symposium onmicroarchitecture (MICRO) IEEE pp 203ndash215
31 Liu F Wu H Mai K Lee RB (2016) Newcache secure cachearchitecture thwarting cache side-channel attacks IEEE Micro36(5)8ndash16
32 Luk CK Cohn R Muth R Patil H Klauser A Lowney G WallaceS Reddi VJ Hazelwood K (2005) Pin building customizedprogram analysis tools with dynamic instrumentation In ACMsigplan notices vol 40 ACM pp 190ndash200
33 Masti RJ Rai D Ranganathan A Muller C Thiele L Capkun S(2015) Thermal covert channels on multi-core platforms In 24thUSENIX security symposium (USENIX security 15) pp 865ndash880
34 Osvik DA Shamir A Tromer E (2006) Cache attacks andcountermeasures the case of AES In Cryptographersrsquo track atthe RSA conference Springer pp 1ndash20
35 Patel A Afram F Chen S Ghose K (2011) MARSS Afull system simulator for multicore x86 CPUs In 2011 48thACMEDACIEEE design automation conference (DAC) IEEEpp 1050ndash1055
36 Percival C (2005) Cache missing for fun and profit BSDCan 2005Ottawa 2005 httpwwwdaemonologynetpapershttpdf
37 Hammarlund P Martinez AJ Bajwa AA Hill DL Hallnor EJiang H Dixon M Derr M Hunsaker M Kumar R Osborne RB(2014) Haswell The fourth-generation intel core processor IEEEMicro 1034(2)6ndash20
38 Qureshi MK (2018) CEASER mitigating conflict-based cacheattacks via encrypted-address and remapping In 2018 51Stannual IEEEACM international symposium on microarchitecture(MICRO) IEEE pp 775ndash787
39 Sanchez D Kozyrakis C (2013) ZSIm fast and accuratemicroarchitectural simulation of thousand-core systems In ACMSIGARCH computer architecture news vol 41 ACM pp 475ndash486
40 Schwarz M Schwarzl M Lipp M Gruss D (2018) NetSpectreread arbitrary memory over network In European Symposium onResearch in Computer Security Springer pp 279ndash299
41 Seznec A (1993) A case for two-way skewed-associative cachesACM SIGARCH computer architecture news 21(2)169ndash178
42 Sharkey J Ponomarev D Ghose K (2005) M-sim a flexiblemultithreaded architectural simulation environment Techenicalreport Department of Computer Science State University ofNew York at Binghamton In Technical Report 20012 CompaqComputer Corporation Aug 2001
J Hardw Syst Secur (2019) 3397ndash425424
43 Shivakumar P Jouppi NP (2001) Cacti 30 an integrated cachetiming power and area model In Technical Report 20012Compaq Computer Corporation Aug 2001
44 Taylor G Davies P Farmwald M (1990) The TLB slice-alow-cost high-speed address translation mechanism In 17thannual international symposium on computer architecture 1990Proceedings IEEE pp 355ndash363
45 Thoziyoor S Muralimanohar N Ahn JH Jouppi NP (2008)CACTI 51 Technical Report HPL-2008-20 HP Labs
46 Trippel C Lustig D Martonosi M (2018) MeltdownPrimeand SpectrePrime automatically-synthesized attacks exploitinginvalidation-based coherence protocols arXiv180203802
47 Wang S Wang P Liu X Zhang D Wu D (2017) CacheDidentifying cache-based timing channels in production softwareIn 26th USENIX security symposium USENIX association
48 Wang Y Ferraiuolo A Zhang D Myers AC Suh GE (2016)SecDCP secure dynamic cache partitioning for efficient timingchannel protection In 2016 53nd ACMEDACIEEE designautomation conference (DAC) IEEE pp 1ndash6
49 Wang Z Lee RB (2007) New cache designs for thwarting softwarecache-based side channel attacks In ACM SIGARCH computerarchitecture news vol 35 ACM pp 494ndash505
50 Wang Z Lee RB (2008) A novel cache architecture withenhanced performance and security In 2008 41st IEEEACMinternational symposium on Microarchitecture 2008 MICRO-41IEEE pp 83ndash93
51 Werner M Unterluggauer T Giner L Schwarz M Gruss D Man-gard S (2019) ScatterCache thwarting cache attacks via cache setrandomization In 28th USENIX security symposium (USENIXSecurity 19) USENIX Association Santa Clara CA httpswwwusenixorgconferenceusenixsecurity19presentationwerner
52 Yan M Choi J Skarlatos D Morrison A Fletcher C Torrellas J(2018) InvisiSpec making speculative execution invisible in thecache hierarchy In 2018 51st annual IEEEACM internationalsymposium on microarchitecture (MICRO) IEEE pp 428ndash441
53 Yan M Gopireddy B Shull T Torrellas J (2017) Secure hierarchy-aware cache replacement policy (SHARP) defending againstcache-based side channel attacks In Proceedings of the 44thannual international symposium on computer architecture ACMpp 347ndash360
54 Yao F Doroslovacki M Venkataramani G (2018) Are coherenceprotocol states vulnerable to information leakage In 2018IEEE international symposium on high performance computerarchitecture (HPCA) IEEE pp 168ndash179
55 Yarom Y Falkner K (2014) FLUSH+ RELOAD a high resolutionlow noise L3 cache side-channel attack In USENIX securitysymposium pp 719ndash732
56 Zhang D Askarov A Myers AC (2012) Language-based controland mitigation of timing channels ACM SIGPLAN Not 47(6)99ndash110
57 Zhang D Wang Y Suh GE Myers AC (2015) A hardware designlanguage for timing-sensitive information-flow security In ACMSIGARCH computer architecture news vol 43 ACM pp 503ndash516
58 Zhang S Wright A Bourgeat T Arvind A (2018) Composablebuilding blocks to open up processor design In 2018 51stannual IEEEACM international symposium on microarchitecture(MICRO) IEEE pp 68ndash81
59 Zhang T Lee RB (2014) New models of cache architecturescharacterizing information leakage from cache side channels InProceedings of the 30th annual computer security applicationsconference ACM pp 96ndash105
60 Zhang Y Parikh D Sankaranarayanan K Skadron K Stan M(2003) HotLeakage a temperature-aware model of subthresholdand gate leakage for architects University of Virginia Dept ofComputer Science Tech Report CS-2003 5
Publisherrsquos Note Springer Nature remains neutral with regard tojurisdictional claims in published maps and institutional affiliations
J Hardw Syst Secur (2019) 3397ndash425 425
Behavior (u)a
aalias
Observation
NIB
fast
slow
Vd Vu Aa
Behavior (u)a
aalias
Observation
NIB
fast
slow
Vu Aainv
Behavior (u)a
aalias
Observation
NIB
fast
slow
Vd Vuinv Vd
(a)
(c)
(e)
Behavior (u)a
aalias
Observation
NIB
fast
slow
Vu Ad Vuinv
Behavior (u)a
aalias
Observation
NIB
fast
slow
Aaaliasinv Vu
inv Va
Behavior (u)a
aalias
Observation
NIB
fast
slow
Aa Vu Ad
(b)
(d)
(f)
Eg Eg
Eg Eg
Eg Eg
Fig 3 Examples of relations between victimrsquos behavior (u) andattackerrsquos observation for each vulnerability type a b StrongVulnerability c d Weak Vulnerability e f Ineffective Three-Step
to Aa Vu which is equivalent to Aa VuTherefore Ad Aa Vu is a repeat type of Aa Vu and can be eliminated
2 Three-step patterns with a step involving a knownaddress a and an alias to that address aalias givesthe same information Thus three step combinationswhich only differ in use of a or aalias cannot representdifferent attacks and only one combination needs tobe considered For example Vu Aaalias Vu is arepeat type of Vu Aa Vu and we will eliminatethe first pattern
3 Three-step patterns with steps Vu and V invu in adjacent
consecutive steps with each other will only keep thelatter step and eliminate the first step For exampleAa Vu V inv
u can be reduced to Aa V invu and
further equivalent to Aa V invu So Aa Vu
V invu can be eliminated
333 Categorization of StrongVulnerabilities
As is shown in Fig 2 after applying the reduction rulesthere are 72 types of Strong vulnerabilities remaining InAppendix B we analyze the soundness of the three-stepmodel to demonstrate that the three-step model can cover allpossible cache timingndashbased side-channel vulnerabilitiesAnd if there is a vulnerability it can always be reduced toa model that requires only three steps Table 2 lists all the
vulnerability types of which the last step is a memory accessand Table 3 shows all the vulnerability types of which thelast step is an invalidation-related operation To ease theunderstanding of all the vulnerability types we group thevulnerabilities based on attack strategies (left-most columnin Tables 2 and 3) these strategies correspond to well-known names for the attacks if such exist otherwise weprovide a new name In Appendix A we provide descriptionfor each attack strategy to show the main idea behind themWe use existing names for attack strategies where suchexisted before even if similar attacks eg attacks differingin only one step have been given different names before Weuse these established names to avoid confusion but detailsome of the similarities in Appendix A as a clarification
The list of vulnerability types can be further collectedinto four simple macro types which cover one or morevulnerability types internal interference miss-based (IM)internal interference hit-based (IH) external interferencemiss-based (EM) external interference hit-based (EH) aslabeled in the Macro Type column of Tables 2 and 3 Allthe types of vulnerabilities that only involve the victimrsquosbehavior V in the states in Step 2 and Step 3 are calledinternal interference vulnerabilities (I) The remaining onesare called external interference (E) Some vulnerabilitiesallow the attacker to learn that the address of the victimaccesses map to the set the attacker is attacking by observingslow timing due to a cache miss or f ast timing due toinvalidation of data not in the cache2 We call these miss-based vulnerabilities (M) The remaining ones leverageobservation of f ast timing due to a cache hit or slow timingdue to an invalidation of an address that is currently valid inthe cache and are called hit-based vulnerabilities (H)
Many vulnerability types have been explored before Forexample the Cache Collision attack [5] is effectively basedon the Internal Collision and it maps to types labeled (2)in the Attack column in Tables 2 and 3 The types labelednew correspond to new attack not previously discussedin literature We believe these 43 are new attacks notpreviously analyzed nor known
4 Secure Caches
Having explained the three-step model we now explorethe various secure caches which have been presented inliterature to date [7 10 12 22 23 25 27 29 30 38 48ndash53 56 57] Later in Section 5 we will apply the three-stepmodel to check if the secure caches can defend some or allof the vulnerabilities in our model
2Invalidation is fast when the corresponding address which is to beinvalidated does not exist in the cache since no operation is needed forthe invalidation
J Hardw Syst Secur (2019) 3397ndash425404
Table 2 The table shows all the cache timing-based cache vulnerabilities where the last step is a memory access related operation The AttackStrategy column gives a common name for each set of one or more specific vulnerabilities that would be exploited in an attack in a similar mannerThe Vulnerability Type column gives the three steps that define each vulnerability For Step 3 fast indicates a cache hit must be observed toderive sensitive address information while slow indicates a cache miss must be observed The Macro Type column proposes the categorizationthe vulnerability belongs to ldquoErdquo is for external interference vulnerabilities ldquoIrdquo is for internal interference vulnerabilities ldquoMrdquo is for miss-basedvulnerabilities ldquoHrdquo is for hit-based vulnerabilities The Attack column shows if a type of vulnerability has been previously presented in literature
Attack Strategy Vulnerability Type Macro Type Attack
Step 1 Step 2 Step 3
Cache Internal Collision Ainv Vu Va (fast) IH (2)
V inv Vu Va (fast) IH (2)
Ad Vu Va (fast) IH (2)
Vd Vu Va (fast) IH (2)
Aaalias Vu Va (fast) IH (2)
Vaalias Vu Va (fast) IH (2)
Ainva Vu Va (fast) IH (2)
V inva Vu Va (fast) IH (2)
Flush + Reload Ainva Vu Aa (fast) EH (5)
V inva Vu Aa (fast) EH (5)
Ainv Vu Aa (fast) EH (5)
V inv Vu Aa (fast) EH (5)
Ad Vu Aa (fast) EH (5)
Vd Vu Aa (fast) EH (5)
Aaalias Vu Aa (fast) EH (5)
Vaalias Vu Aa (fast) EH (5)
Reload + Time V invu Aa Vu (fast) EH new
V invu Va Vu (fast) IH new
Flush + Probe Aa V invu Aa (slow) EM (6)
Aa V invu Va (slow) IM new
Va V invu Aa (slow) EM new
Va V invu Va (slow) IM new
Evict + Time Vu Ad Vu (slow) EM (1)
Vu Aa Vu (slow) EM (1)
Prime + Probe Ad Vu Ad (slow) EM (4)
Aa Vu Aa (slow) EM (4)
Bernsteinrsquos Attack Vu Va Vu (slow) IM (3)
Vu Vd Vu (slow) IM (3)
Vd Vu Vd (slow) IM (3)
Va Vu Va (slow) IM (3)
Evict + Probe Vd Vu Ad (slow) EM newVa Vu Aa (slow) EM new
Prime + Time Ad Vu Vd (slow) IM newAa Vu Va (slow) IM new
Flush + Time Vu Ainva Vu (slow) EM new
Vu V inva Vu (slow) IM new
(1) Evict + Time attack [34] (2) Cache Internal Collision attack [5] (3) Bernsteinrsquos attack [3] (4) Prime + Probe attack [34 36] Alias-drivenattack [18] (5) Flush + Reload attack [54 55] Evict + Reload attack [17] (6) SpectrePrime MeltdownPrime attack [46]
J Hardw Syst Secur (2019) 3397ndash425 405
Table 3 The table shows the second part of the timing-based cache side-channel vulnerabilities where the last step is an invalidation relatedoperation For Step 3 fast indicates no corresponding address of the data is invalidated while slow indicates invalidation operation makes somedata invalid causing longer processing time
Attack Vulnerability Type Macro Attack
Step 1 Step 2 Step 3
Strategy Type
Cache Internal Collision Invalidation Ainv Vu V inva (slow) IH new
V inv Vu V inva (slow) IH new
Ad Vu V inva (slow) IH new
Vd Vu V inva (slow) IH new
Aaalias Vu V inva (slow) IH new
Vaalias Vu V inva (slow) IH new
Flush + Flush Ainva Vu V inv
a (slow) IH (1)
V inva Vu V inv
a (slow) IH (1)
Ainva Vu Ainv
a (slow) EH (1)
V inva Vu Ainv
a (slow) EH (1)
Flush + Reload Invalidation Ainv Vu Ainva (slow) EH new
V inv Vu Ainva (slow) EH new
Ad Vu Ainva (slow) EH new
Vd Vu Ainva (slow) EH new
Aaalias Vu Ainva (slow) EH new
Vaalias Vu Ainva (slow) EH new
Reload + Time Invalidation V invu Aa V inv
u (slow) EH new
V invu Va V inv
u (slow) IH new
Flush + Probe Invalidation Aa V invu Ainv
a (fast) EM new
Aa V invu V inv
a (fast) IM new
Va V invu Ainv
a (fast) EM new
Va V invu V inv
a (fast) IM new
Evict + Time Invalidation Vu Ad V invu (fast) EM new
Vu Aa V invu (fast) EM new
Prime + Probe Invalidation Ad Vu Ainvd (fast) EM new
Aa Vu Ainva (fast) EM new
Bernsteinrsquos Invalidation Attack Vu Va V invu (fast) IM new
Vu Vd V invu (fast) IM new
Vd Vu V invd (fast) IM new
Va Vu V inva (fast) IM new
Evict + Probe Invalidation Vd Vu Ainvd (fast) EM new
Va Vu Ainva (fast) EM new
Prime + Time Invalidation Ad Vu V invd (fast) IM new
Aa Vu V inva (fast) IM new
Flush + Time Invalidation Vu Ainva V inv
u (fast) EM new
Vu V inva V inv
u (fast) IM new
(1) Flush + Flush attack [16]
J Hardw Syst Secur (2019) 3397ndash425406
This section gives brief overview of the 18 secure cachedesigns that have been presented in academic literaturein the last 15 years To the best of our knowledge thesecover all the secure cache designs proposed to date Mostof the designs have been realized in functional simulationeg [23 48] Some have been realized in FPGA eg [7]and a few have been realized in real ASIC hardwareeg [31] No specific secure caches have been implementedin commercial processors to the best of our knowledgehowever CATalyst [29] leverages Intelrsquos CAT (CacheAllocation Technology) technology available today in IntelXeon E5 2618L v3 processors and could be deployed today
When the secure cache description in the cited papersdid not mention the issue of using flush or cache coherencewe assume the victim or the attacker cannot invalidateeach otherrsquos cache blocks by using clf lush instructions orthrough cache coherence protocol operations but they canflush or use cache coherence to invalidate their own cachelines The victim and the attacker also cannot invalidateprotected or locked data Further if the authors specified anyspecific assumptions (mainly about the software) we list theassumption as part of the description of the cache Whatrsquosmore when the level of cache hierarchy was unspecifiedwe assume the secure cachesrsquo features can be appliedto all levels of caches including L1 cache L2 cacheand Last Level Cache (LLC) If the inclusivity of thecaches was not specified we assume they target inclusivecaches Following the below descriptions of each securecache design the analysis of the secure caches is given inSection 5
SPlowast Cache [20 27]3 uses partitioning techniques tostatically partition the cache ways into High and Lowpartition for the victim and the attacker according to theirdifferent process IDs The victim typically belongs to Highsecurity and attacker belongs to Low security Victimrsquosmemory accesses cannot modify Low partition (assigned toprocesses such as the attacker) while the attackerrsquos memoryaccesses cannot modify High partition (assigned to thevictim) However the memory accesses of both the victimand the attacker can result in a hit in either Low or Highpartition if the data is in the cache
SecVerilog Cache [56 57] statically partitions cacheblocks between security levels L (Low) and H (High)Each instruction in the source code for programs usingSecVerilog cache needs to include a t iming label whicheffectively represents whether the data being accessed by
3Two existing papers give slightly different definitions for an ldquoSPrdquocache thus we selected to define a new cache the SPlowast cachethat combines secure cache features of the Secret-Protecting cachefrom [27] with secure cache features of the Static-Partitioned cachefrom [20]
that instruction is Low or High based on the code andthis t iming label can be similar to a process ID thatdifferentiates attackerrsquos (Low) instructions from victimrsquos(High) instructions The cache is designed such thatoperations in the High partition cannot affect timing ofoperations in the Low partition For a cache miss due to Lowinstructions when the data is in the High partition it willbehave as a cache miss and the data will be moved from theHigh to the Low partition to preserve consistency HoweverHigh instructions are able to result in a cache hit in bothHigh and Low partitions if the data is already in the cache
SecDCPCache [48] builds on the SecVerilog cache and usespartitioning idea from the original SecVerilog cache but thepartitioning is dynamic It can support at least two securityclasses H (High) and L (Low) and configurations with moresecurity classes are possible They use the percentage ofcache misses for L instructions that was reduced (increased)when Lrsquos partition size was increased (reduced) by onecache way to adjust the number of ways of the cacheassigned to the Low partition When adjusting number ofways in the cache dedicated to each partition if Lrsquos partitionsize decreases the process ID is checked and L blocksare flushed before the way is reallocated to H On theother hand if Lrsquos partition size increases H blocks in theadjusted cache way remain unmodified so as to not addmore performance overhead and they will eventually beevicted by Lrsquos memory accesses However the feature ofnot flushing High partition data during way adjustment mayleak timing information to the attacker
NoMo Cache [12] dynamically partitions the cache waysamong the currently ldquoactiverdquo simultaneous multithreading(SMT) threads Each thread is exclusively reserved Y blocksin each cache set where Y is within the range of [0 N
M]
where N is the number of ways and M is the number ofSMT threads NoMo-0 equals to traditional set associativecache while NoMo- N
M partitions cache evenly for the
different threads and there are no non-reserved ways Thenumber of Y assigned to each thread is adjusted based onits activeness When adjusting number of blocks assigned toa thread Y blocks are invalidated for cache sets to protecttiming leakage Eviction is not allowed within each threadrsquosown reserved ways while it is possible for the shared waysTherefore to avoid eviction caused by the unreserved wayswe assume NoMo- N
M is used to fully partition the cache
When the attacker and the victim share the same librarythere will be a cache hit if accessing the shared data andthe normal cache hit policy holds to guarantee the cachecoherence
SHARP Cache [53] uses both partitioning and randomiza-tion techniques to prevent victimrsquos data from being evicted
J Hardw Syst Secur (2019) 3397ndash425 407
or flushed by other malicious processes and it targets on theinclusive caches Each cache block is augmented with thecore valid bits (CVB) to indicate which private cache (pro-cess) it belongs to (similar to the Process ID) where CVBstores a bitmap and ith bit in the bitmap is set if the lineis present in ith corersquos private cache Cache hit is allowedamong different processesrsquo data When there is cache missand data needs to be evicted data not belonging to any cur-rent processes will be evicted first If there is no such datadata belonging to the same process will be evicted If thereis no existing data in the cache that is in the same processa random data in the cache set will be evicted This randomeviction will generate an interrupt to the OS to notify it ofa suspicious activity For pages that are read-only or exe-cutable SHARP cache disallows flushing using clf lush inuser mode However invalidating victimrsquos blocks by usingcache coherence protocol is still possible
Sanctum Cache [10] focuses on isolation of enclaves(equivalent to Trusted Software Module in other designs)from each other and the operating system (OS) In terms ofcaches they implement security features for L1 cache TLBand LLC Cache isolation of LLC is achieved by assigningeach enclave or OS to different DRAM address regionsIt uses page-coloring-based cache partitioning scheme [2444] and a software security monitor that ensures per-core isolation between OS and enclaves For L1 cacheand TLB when there is a transition between enclave andnon-enclave mode the security monitor will flush thecore-private caches to achieve isolation Normal flushestriggered by the enclave or the OS can only be done withinenclave or not within enclave code Also timing-based side-channel attacks exploiting cache coherence are explicitlynot prevented thus behavior on cache coherence operationsis not defined This cache listed extra software assumptionsas follows
Assumption 1 Software security monitor guarantees thatvictim and attacker process cannot share the same cacheblocks It uses page coloring [24 44] to ensure that victimand attackerrsquos memory is never mapped to the same cacheblocks for the LLCAssumption 2 The software runs on a system with asingle processor core where victim and attacker alternateexecution but can never run truly in parallel Moreoversecurity critical data is always flushed by the securitymonitor when program execution switches away from thevictim program for the L1 cache and the TLB
MI6 Cache [7] is part of the memory hierarchy of theMI6 processor which combines Sanctum [10] cachersquossecurity feature with disabling speculation during thespeculative execution of memory-related operations During
normal processor execution for L1 caches and TLB thecorresponding states will be flushed across context switchesbetween software threads For the LLC set partitioning isused to divide DRAM into contiguous regions And cachesets are guaranteed to be strictly partitioned (two DRAMregions cannot map to the same cache set) Each enclaveis only able to access its own partition Speculation issimply disabled when enclave interacts with the outsideworld because of small performance influence based on therare cases of speculation This cache listed extra softwareassumptions as follows
Assumption 1 Software security monitor guarantees thatthe victim and the attacker process cannot share the samecache blocks It uses page coloring [24 44] to ensure thatvictimrsquos and attackerrsquos memory are never mapped to thesame cache blocks for the LLCAssumption 2 The software runs on a system with asingle processor core where victim and attacker alternateexecution but can never run truly in parallel Moreoversecurity critical data is always flushed by the securitymonitor when program execution switches away from thevictim program for the L1 cache and the TLBAssumption 3 When an enclave is interacting with theoutside environment the corresponding speculation isdisabled by the software
InvisiSpec Cache [52] is able to make speculation invisiblein the data cache hierarchy Before a visibility pointshows up when all of its prior control flow instructionsresolve unsafe speculative loads (USL) will be put intoa speculative buffer (SB) without modifying any cachestates When reaching the visibility point there are twocases In one case the USL and successive instructionswill be possibly squashed because of mismatch of datain the SB and the up-to-date values in the cache Inanother case the core receives possible invalidation fromthe OS before checking of memory consistency modeland no comparison is needed When speculative executionhappens the hardware puts the data into SB as to identifyvisibility point for dealing with final state transition of thespeculative execution InvisiSpec cache targets on Spectre-like attacks and futuristic attacks However InvisiSpeccache is vulnerable to all non-speculative side channels
CATalyst Cache [29] uses partitioning especially CacheAllocation Technology (CAT) [21] available in the LLCof some Intel processors CAT allocates up to 4 differentClasses of Services (CoS) for separate cache ways sothat replacement of cache blocks is only allowed withina certain CoS CATalyst first uses CAT mechanism topartition caches into secure and non-secure parts (non-secure parts may map to 3 CoS while secure parts map
J Hardw Syst Secur (2019) 3397ndash425408
to 1 CoS) Secure pages are assigned to virtual machines(VMs) at a granularity of a page and not shared by morethan one VM Here attacker and victim reside in differentVMs Combined with CAT technology and pseudo-lockingmechanism which pins certain page frames managed bysoftware CATalyst guarantees that malicious code cannotevict secure pages CATalyst implicitly performs preloadingby remapping security-critical code or data to secure pagesFlushes can only be done within each VM And cachecoherence is achieved by assigning secure pages to only oneprocessor and not sharing pages among VMs This cachelisted extra software assumptions as follows
Assumption 1 Security critical data is always preloadedinto the cache at the beginning of the whole programexecutionAssumption 2 Security critical data is always able to fitwithin the secure partition of the cache That is all datain the range x can fit in the secure partitionAssumption 3 The victim and the attacker process cannotshare the same memory spaceAssumption 4 Use pseudo-locking mechanism by soft-ware to make sure that victim and attacker process cannotshare the same cache blocksAssumption 5 Secure pages are reloaded immediatelyafter the flush which is done by the virtual machinemonitor (VMM) to make sure all the secure pages are stillpinned in the secure partition
DAWG Cache [25] (Dynamically Allocated Way Guard)partitions the cache by cache ways and provides fullisolation for hits misses and metadata updates acrossdifferent protection domains (between the attacker and thevictim) DAWG cache is partitioned for the attacker and thevictim and each of them keep their own different domain id(which is similar to process ID used in general caches) Eachdomain id has its own bit fields one is called policy fillmapfor masking fills and selecting the victim to replace anotheris called policy hitmap for masking hit ways Only boththe tag and the domain id are the same will a cache hithappen Therefore DAWG allows read-only cache lines tobe replicated across ways for different protection domainFor a cache miss the victim can only be chosen withinthe ways belonging to the same domain id recorded bythe policy fillmap Consistently the replacement policy isupdated with the victim selection and the metadata derivedfrom the policy fillmap for different domains is updatedas well The paper also proposes the idea to dynamicallypartitions the cache ways following the systemrsquos workloadchanges but does not actually implement it
RIC Cache [22] (Relaxed Inclusion Caches) proposes a low-complexity cache to defend against eviction-based timing-based side-channel attacks on the LLC Normally for an
inclusive cache if the data R is in the LLC it is also inthe higher level cache and eviction of the R in the LLCwill cause the same data in the higher level cache eg L1cache to be invalidated making eviction-based attacks inthe higher level cache possible (eg attacker is able to evictvictimrsquos security critical cache line) For RIC each cacheline is extended with a single bit to set the relaxed inclusionOnce the relaxed inclusion is set for that cache line thecorresponding LLC line eviction will not cause the sameline in the higher-level cache to be invalidated Two kinds ofdata will be set relaxed inclusion bit read only data andthread private data when they are loaded into the cacheThese two kinds of data are claimed by the paper to cover allthe critical data for ciphers Therefore RIC will not preventwritable in-private critical data which is currently not foundin any ciphers Apart from that RIC requires flushing for thecorresponding cache lines in the cases that the RIC bits aremodified or for thread migration events to avoid the timingleakage during transition time
PL Cache [49] provides isolation by partitioning the cachebased on cache blocks It extends each cache block with aprocess ID and a lock status bit The process ID and thelock status bits are controlled by the extended load and storeinstructions (ld lockld unlock and st lockst unlock)which allow the programmer and compiler to set or reset thelock bit through use of the right load or store instructionIn terms of cache replacement policy for a cache hit PLcache will perform the normal cache hit handling procedureand the instructions with locking or unlocking capability canupdate the process ID and the lock status bits while the hit isprocessed When there is a cache miss locked data cannotbe evicted by data that is not locked and locked data amongdifferent processes cannot be evicted by each other In thiscase the new data will be either loaded or stored withoutcaching In other cases data eviction is possible This cachelisted extra software assumption as follows
Assumption 1 Security critical data is always preloadedinto the cache at the beginning of the whole programexecution
RP Cache [49] uses randomization to de-correlate thememory address accessing and timing of the cache Foreach block of RP cache there is a process ID and oneprotection bit P set to indicate if this cache block needs to beprotected or not A permutation table (PT) stores each cachesetrsquos pre-computed permuted set number and the numberof tables depends on number of protected processes Formemory access operations cache hits need both process IDand address to be the same When a cache miss happensto data D of a cache set S if the to-be-evicted data andto-be-brought-in data belong to the same process but have
J Hardw Syst Secur (2019) 3397ndash425 409
different protection bit arbitrary data of a random cache setSprime will be evicted and D will be accessed without cachingIf they belong to different processes D will be stored in anevicted cache block of Sprime and mapping of S and Sprime will beswapped as well Otherwise the normal replacement policyis executed
Newcache Cache [31 50] dynamically randomizesmemory-to-cache mapping It introduced a ReMappingTable (RMT) and the mapping between memory addressesand this RMT is as in a direct mapped cache while themapping between the RMT and actual cache is fully asso-ciative The index bits of memory address are used to lookup entries in the RMT to find the cache block that shouldbe accessed It stores the most useful cache lines rather thanhold a fixed set of cache lines This index stored in RMTcombined with the process ID is used to look up the actualcache where each cache line is associated with its real indexand process ID Each cache block is also associated witha protection bit (P) to indicate if it is security critical Forcache replacement policy it is very similar to RP cacheCache hit needs both process ID and address to be the sameWhen cache miss happens to data D arbitrary data willbe evicted and D will be accessed without caching if theybelong to the same process but either one of their protec-tion bit is set If the evicted data and brought-in data havedifferent process IDs D will randomly replace a cache linesince it is fully associative in the actual cache Otherwisethe normal replacement policy for direct mapped cache isexecuted
Random Fill Cache [30] de-correlates cache fills withthe memory access using random filling technique Newinstructions used by applications in Random Fill cachecan control if the requested data belongs to a normalrequest or a random fill request Cache hits are processedas in normal cache For the security critical data accessesof the victim a Nof ill request is executed and therequested data access will be performed without cachingMeanwhile on a Random Fill request arbitrary datafrom the range of addresses will be brought into the cacheIn the paper [30] the authors show that random fill ofspatially near data does not hurt performance For otherprocessesrsquo memory accesses and normal victimrsquos memoryaccesses Normal request will be used to achieve normalreplacement policy Victim and attacker are able to removevictimrsquos own security critical data including using clf lush
instructions or cache coherence protocol since the flush willnot influence timing-based side-channel attack prevention(the random filling technique is used for this)
CEASER Cache [38] is able to mitigate conflict-basedLLC timingndashbased side-channel attacks using address
encryption and dynamic remapping CEASER cache doesnot differentiate whom the address belongs to and whetherthe address is security critical When memory access triesto modify the cache state the address will first be encryptedusing Low-Latency BlockCipher (LLBC) [6] which notonly randomizes the cache set it maps but also scatters theoriginal possibly ordered and location-intensive addressesto different cache sets decreasing the probability of conflictmisses The encryption and decryption can be done withintwo cycles using LLBC Furthermore the encryption keywill be periodically changed to avoid key reconstructionThe periodic re-keying will cause the address remapping todynamically change
SCATTER Cache [51] uses cache set randomization toprevent timing-based attacks It builds upon two ideas Firsta mapping function is used to translate memory addressand process information to cache set indices the mappingis different for each program or security domain Secondthe mapping function also calculates a different index foreach cache way in a similar way to the skewed associativecaches [41] The mapping function can be keyed hash orkeyed permutation derivation functionmdasha different key isused for different application or security domain resultingin a different mapping from address to cache sets for eachSoftware (eg the operating system) is responsible formanaging the security domains and process IDs which areused to differentiate the different software and assign itdifferent keys for the mapping For the hardware extensiona cryptographic primitive such as hashing and an indexdecoder for each scattered cache way is added SCATTERcache also stores the index bits of the physical addressto efficiently perform lookups and writebacks There isalso one bit per page-table entry added to allow the kernelto communicate with the user space for security domainidentification
Non Deterministic Cache [23] uses cache access delay torandomize the relation between cache block access andcache access timing There is no differentiation of datacaching between different process ID or whether the datais secure or not A per-cache-block counter records theinterval of its data activeness and is increased on eachglobal counter clock tick when the data is untouched Whenthe counter reaches a predefined value the correspondingcache line will be invalidated Non Deterministic Cacherandomly sets the local countersrsquo initial value that is lessthan the maximum value of the global counter In thiscase the cache delay is changed to be randomized Cachedelay interval controlled by this non-deterministic executioncan lead to different cache hit and miss statistics becausethe invalidation is determined by the randomized counterof each cache line and therefore de-correlates any cache
J Hardw Syst Secur (2019) 3397ndash425410
access time from the address being accessed However theperformance degradation is tremendous
5 Analysis of the Secure Caches
In this section we manually evaluate the effectiveness ofthe 18 secure caches [7 10 12 22 23 25 27 29 30 3848ndash53 56 57] We analyze how well the different cachescan protect against the 72 types of vulnerabilities definedin Tables 2 and 3 which cover all the possible Strong(according to the definition in Section 3) cache timingndashbased vulnerabilities Following the analysis discuss whattypes of secure caches and features are best suited fordefending different types of timing-based attacks
51 Effectiveness of the Secure Caches AgainstTiming-Based Attacks
Tables 4 and 5 list the result of our analysis of whichcaches can prevent which types of attacks Some cachesare able to prevent certain vulnerabilities denoted by acheckmark and green color in the table For exampleSPlowast cache can defend against Vu Ad Vu (slow)(one type of Evict + Time [34]) vulnerability For someother caches and vulnerabilities the cache is not able toprevent the vulnerabilities and it is indicated by times and redcolor For example SecDCP cache cannot defend againstVu Va Vu (slow) (one type of Bernsteinrsquos Attack [3])vulnerability
Each cache is analyzed for each type of vulnerabilitylisted in Tables 2 and 3 A cache is judged to be able toprevent a type of cache timingndashbased vulnerability in threecases
1 A cache can prevent a timing attack if the timing ofthe last step in a vulnerability is always constant andthe attacker can never observe fast and slow timingdifference for the given set of three steps For instancein a regular set-associative cache the Vd Vu Aa
(fast) (one type of Flush + Reload [55]) vulnerabilitywill allow the attacker to know that address a mapsto secret u when the attacker observes fast timingcompared with observing slow timing in the other casesHowever in case of the RP cache [49] will make thetiming of the last step to be always slow because RPcache does not allow data of different processes toderive cache hit between each other
2 A cache can prevent a timing attack if the timingof last step is randomized and cannot have originalcorresponding relation between victimrsquos behavior andattackerrsquos observation For instance Ad Vu Ainv
d (fast) (one type of Prime + Probe Invalidation)
vulnerability when executed on a normal set-associativecache will allow the attacker to know that the addressd has the same index with secret u when observing fasttiming compared with slow timing in the other casesHowever when executing this attacks on the RandomFill cache [30] for example a slow timing will notdetermine that u and d have the same index as thesecret since in Random Fill cache u would be accessedwithout caching and another random data would becached instead
3 A cache can prevent a timing attack if it disallowscertain steps from the three-step model to be executedthus prevents the corresponding vulnerability Forinstance when PL cache [49] preloads and locks thesecurity critical data in the cache vulnerabilities suchas Ad Vu V inv
d (slow) (one type of Prime + TimeInvalidation) will not be possible since a preloadedlocked security critical data will not allow Ad in Step 1to replace it In this case Ad cannot be in the cache sothis vulnerability cannot be triggered in PL cache
From the security perspective the entries of the securecache in Tables 4 and 5 should have as many green coloredcells as possible If a cache design has any red cells thenit cannot defend against that type of vulnerabilitymdashattackerusing the timing-based vulnerability that corresponds to thered cell can attack the system
The third column in Tables 4 and 5 shows a normalset associative cache which cannot defend against anytype of timing-based vulnerabilities Meanwhile the lastcolumn of Tables 4 and 5 shows the situation where thecache is fully disabled As is expected the timing-basedvulnerabilities are eliminated and timing-based attacks willnot succeed Disabling caches however has tremendousperformance penalty Similarly second-to-last columnshows Nondeterministic Cache which totally randomizescache access time It can defend all the attacks but againwill have a tremendous cost to security when the applicationis complex
For each of the entry that shows the effectivenessof a secure cache against a vulnerability there are tworesults listed Left one is for normal execution and theright one is for speculative execution Some secure cachessuch as InvisiSpec cache target timing-based channels inspeculative execution For most of the caches that do notdifferentiate speculative execution and normal executionthe two sub-columns for each cache are the same
6 Secure Cache Techniques
Among the secure cache designs presented in the priorsection there are three main techniques that the caches
J Hardw Syst Secur (2019) 3397ndash425 411
Table4
Exi
stin
gse
cure
cach
esrsquo
prot
ectio
nag
ains
tal
lpo
ssib
letim
ing-
base
dvu
lner
abili
ties
with
last
step
tobe
mem
ory
acce
ssre
late
dop
erat
ions
Sin
gle
ina
gree
nce
llm
eans
this
cach
eca
npr
even
tth
eco
rres
pond
ing
vuln
erab
ility
i
na
pink
cell
mea
nsth
isca
che
can
prev
ent
the
corr
espo
ndin
gvu
lner
abili
tyin
som
ede
gree
A
timesin
are
dce
llm
eans
this
cach
eca
nnot
prev
ent
this
vuln
erab
ility
Fur
ther
mor
efo
rea
chca
che
we
anal
yze
norm
alex
ecut
ion
(lef
tcol
umn
unde
rth
eca
che
nam
e)an
dsp
ecul
ativ
eex
ecut
ion
(rig
htco
lum
nun
der
the
cach
ena
me)
a Dyn
amic
adju
stm
ento
fw
ays
for
diff
eren
tthr
eads
isas
sum
edto
bepr
oper
lyus
edac
cord
ing
toth
eru
nnin
gpr
ogra
mrsquos
cach
eus
age
bSo
me
soft
war
eas
sum
ptio
nslis
ted
inth
een
trie
sin
this
colu
mn
have
been
impl
emen
ted
byth
eca
chersquo
sre
late
dso
ftw
are
c Flus
his
disa
bled
but
cach
eco
here
nce
mig
htbe
used
todo
the
data
rem
oval
dFo
rL
1ca
che
and
TL
Bf
lush
ing
isdo
nedu
ring
cont
exts
witc
he T
hete
chni
ques
are
impl
emen
ted
inL
1ca
che
TL
Ban
dla
st-l
evel
cach
ew
hich
cons
isto
fth
ew
hole
cach
ehi
erar
chy
whe
reL
1ca
che
and
TL
Bre
quir
eso
ftw
are
flus
hpr
otec
tion
and
the
last
-lev
elca
che
can
beac
hiev
edby
sim
ple
hard
war
epa
rtiti
onin
gTo
prot
ecta
llle
vels
ofca
ches
the
soft
war
eas
sum
ptio
nsne
edto
bead
ded
f The
tech
niqu
eis
now
only
impl
emen
ted
inla
st-l
evel
cach
egT
hete
chni
que
now
only
targ
ets
shar
edca
che
hT
hete
chni
que
only
targ
ets
incl
usio
nla
st-l
evel
cach
ei T
hete
chni
que
targ
ets
data
cach
ehi
erar
chy
j For
the
last
-lev
elca
che
cach
eis
part
ition
edbe
twee
nth
evi
ctim
and
the
atta
cker
kT
hete
chni
que
can
cont
rolt
hepr
obab
ilitie
sof
the
vuln
erab
ility
tobe
succ
essf
ulto
beex
trem
ely
smal
ll T
hete
chni
que
can
wor
kin
shar
edr
ead
only
mem
ory
whi
leno
twor
king
insh
ared
wri
tabl
em
emor
ym
Ran
dom
dela
ybu
tnot
rand
omm
appi
ngca
non
lyde
crea
seth
epr
obab
ilitie
sof
atta
cker
inso
me
limite
dde
gree
J Hardw Syst Secur (2019) 3397ndash425412
Table5
Exi
stin
gse
cure
cach
esrsquop
rote
ctio
nag
ains
tall
poss
ible
timin
g-ba
sed
vuln
erab
ilitie
sw
ithla
stst
epto
bein
valid
atio
nre
late
dop
erat
ions
Sin
gle
ina
gree
nce
llm
eans
this
cach
eca
npr
even
tth
eco
rres
pond
ing
vuln
erab
ility
in
api
nkce
llm
eans
this
cach
eca
npr
even
tthe
corr
espo
ndin
gvu
lner
abili
tyin
som
ede
gree
Atimes
ina
red
cell
mea
nsth
isca
che
cann
otpr
even
tthi
svu
lner
abili
ty
Furt
herm
ore
for
each
cach
ew
ean
alyz
eno
rmal
exec
utio
n(l
eftc
olum
nun
der
the
cach
ena
me)
and
spec
ulat
ive
exec
utio
n(r
ight
colu
mn
unde
rth
eca
che
nam
e)
a Dyn
amic
adju
stm
ento
fw
ays
for
diff
eren
tthr
eads
isas
sum
edto
bepr
oper
lyus
edac
cord
ing
toth
eru
nnin
gpr
ogra
mrsquos
cach
eus
age
bSo
me
soft
war
eas
sum
ptio
nslis
ted
inth
een
trie
sin
this
colu
mn
have
been
impl
emen
ted
byth
eca
chersquo
sre
late
dso
ftw
are
c Flus
his
disa
bled
but
cach
eco
here
nce
mig
htbe
used
todo
the
data
rem
oval
dFo
rL
1ca
che
and
TL
Bf
lush
ing
isdo
nedu
ring
cont
exts
witc
he T
hete
chni
ques
are
impl
emen
ted
inL
1ca
che
TL
Ban
dla
st-l
evel
cach
ew
hich
cons
isto
fth
ew
hole
cach
ehi
erar
chy
whe
reL
1ca
che
and
TL
Bre
quir
eso
ftw
are
flus
hpr
otec
tion
and
the
last
-lev
elca
che
can
beac
hiev
edby
sim
ple
hard
war
epa
rtiti
onin
gTo
prot
ecta
llle
vels
ofca
ches
the
soft
war
eas
sum
ptio
nsne
edto
bead
ded
f The
tech
niqu
eis
now
only
impl
emen
ted
inla
st-l
evel
cach
egT
hete
chni
que
now
only
targ
ets
shar
edca
che
hT
hete
chni
que
only
targ
ets
incl
usio
nla
st-l
evel
cach
ei T
hete
chni
que
targ
ets
data
cach
ehi
erar
chy
j For
the
last
-lev
elca
che
cach
eis
part
ition
edbe
twee
nth
evi
ctim
and
the
atta
cker
kT
hete
chni
que
can
cont
rolt
hepr
obab
ilitie
sof
the
vuln
erab
ility
tobe
succ
essf
ulto
beex
trem
ely
smal
ll T
hete
chni
que
can
wor
kin
shar
edr
ead
only
mem
ory
whi
leno
twor
king
insh
ared
wri
tabl
em
emor
ym
Ran
dom
dela
ybu
tnot
rand
omm
appi
ngca
non
lyde
crea
seth
epr
obab
ilitie
sof
atta
cker
inso
me
limite
dde
gree
J Hardw Syst Secur (2019) 3397ndash425 413
utilize differentiating sensitive data partitioning andrandomization
Differentiating Sensitive Data (columns for CATalyst cacheto columns for Random Fill cache in Tables 4 and 5) allowsthe victim or attacker software or management software toexplicitly label a certain range of the data of victim whichthey think is sensitive The victim process or managementsoftware is able to use cache-specific instructions to protectthe data and limit internal interference between victimrsquosown data For example it is possible to disable victimrsquosown flushing of victimrsquos labeled data and therefore preventvulnerabilities that leverage flushing This technique allowsthe designer to have stronger control over security criticaldata rather than forcing the system to assume all of victimrsquosdata is sensitive However how to identify sensitive dataand whether this identification process is reliable are openresearch questions for caches that support differentiation ofsensitive data
This technique is independent of whether a cache usespartitioning or randomization techniques to eliminate sidechannels between the attacker and the victim Caches thatare able to label and identify sensitive data have theadvantage in preventing internal interference since they areable to differentiate sensitive data from the normal dataand can make use of special instructions to give moreprivileges to sensitive data However it requires careful usewhen identifying the actual sensitive data and implementingcorresponding security features on the cache
Comparing PL cache with SPlowast cache although bothof them use partitioning flush is able to be implementedto be disabled for victimrsquos sensitive data in PL cachewhere Vu V inv
a Vu (slow) (one type of Flush +Time) is prevented Newcache is able to prevent Vu Va Vu (slow) (one type of Bernsteinrsquos Attack [3])while most of the caches without ability to differentiatesensitive data cannot because Newcache disallows replacingdata as long as either data to be evicted or data to becached is identified to be sensitive However permittingdifferentiation of sensitive data can potentially backfire onthe cache itself For example Random Fill cache cannotprevent Vu Ad Vu (slow) (one type of Evict +Time [34]) which most of the other caches can prevent oravoid because the random fill technique loses its intendedrandom behavior when the security critical data is initiallyloaded into the cache in Step 1
Partitioning-Based Caches usually limit the victim and theattacker to be able to only access a limited set of cacheblock (columns for SPlowast cache to column for PL cachein Tables 4 and 5) For example either there is staticor dynamic partitioning of caches which allocates someblocks to High victim and Low attacker The partitioning
can be based not just on whether the memory access isvictimrsquos or attackerrsquos but also on where the access is to(eg High partition is determined by the data address) Forspeculative execution attackerrsquos code can be the part ofspeculation or out-of-order load or store which is able to bepartitioned (eg using speculative load buffer) from othernormal operations The partitioning granularity can be cachesets cache lines or cache ways Partitioning-based securecaches are usually able to prevent external interference bypartitioning but are weak at preventing internal interferenceWhen partitioning is used interference between the attackerand the victim or data belonging to different securitylevels should not be possible and attacks based on externalinterference between the victim and the attacker will failHowever the internal interference of victimrsquos own datais hard to be prevented by the partitioning-based cachesWhatrsquos more partitioning is recognized to be wastefulin terms of cache space and inherently degrades systemperformance [49] Dynamic partitioning can help limit thenegative performance and space impacts but it could be ata cost of revealing some information when adjusting thepartitioning size for each part It also does not help withinternal interference prevention
In terms of the three-step model the partitioning-basedcaches excel at making use of partitioning techniques todisallow the attacker to set initial states (Step 0) of victimpartition by use of flushing or eviction and therefore bringuncertainty to the final timing observation made by theattacker
SPlowast cache can prevent external miss-based interferencebut it still allows the victim and the attacker to getcache hits due to each otherrsquos data which makes hit-based vulnerabilities happen eg Vd Vu Va (fast)(one type of Cache Internal Collision [5]) vulnerabilityis one of the examples that SPlowast cache cannot preventSecVerilog cache is similar to SPlowast cache but prevents theattacker from directly getting cache hit due to victimrsquos datafor confidentiality and therefore prevents vulnerabilitiessuch as Ainv
a Vu Aa (fast) (one type of Flush+ Reload [55]) SHARP cache mainly uses partitioningcombined with random eviction to minimize the probabilityof evicting victimrsquos data and prevent external miss-basedvulnerabilities It is vulnerable to hit-based or internalinterference vulnerabilities such as Vu Va Vu (slow)(one type of Bernsteinrsquos Attack [3]) vulnerability DAWGcache will only allow the data to get a cache hit if bothits address and the process ID are the same Thereforecompared with normal partitioning cache such as SPlowast cacheit is able to prevent vulnerabilities such as Vd Vu Ainv
d
(fast) (one type of Prime + Flush)SecDCP and NoMo cache both leverage dynamic
partitioning to improve performance Compared withSecVerilog cache SecDCP cache introduces certain side
J Hardw Syst Secur (2019) 3397ndash425414
channels which manifest themselves when the number ofways assigned to the victim and attacker changes egVu Ainv
a Vu (slow) (one type of Flush + Time)vulnerability NoMo cache behaves more carefully whenchanging the number of ways during dynamic partitioninghowever it requires victimrsquos sensitive data to fit intothe assigned partitions otherwise it will be put into theunreserved way and allow eviction by the attacker SecDCPdoes not have unreserved way All the space in the cachewill be either belongs to High or Low partition
Sanctum cache and CATalyst cache are both controlledby a powerful software monitor and they disallow securepage sharing between victim and attacker to preventvulnerabilities such as Ad Vu Aa (fast) (onetype of Flush + Reload [55]) Sanctum cache does notconsider internal interference while CATalyst cache ismore carefully designed to prevent different vulnerabilitieswith the implemented software system so far supportingpreventing all of the vulnerabilities but only works forLLC and with high software implementation complexityand some assumptions that might be hard to achievein other scenarios eg assuming the secure partition isbig enough to fit all the secure data MI6 cache is thecombination of Sanctum and disabling speculation wheninteracting with the outside world Therefore in normalexecution it behaves the same as Sanctum For speculativeexecution because it will simply disable all the speculationwhen involving the outside world the external interferencevulnerability such as Vd Vu Ad (slow) (one type ofEvict + Probe) vulnerability will be prevented
InvisiSpec cache does not modify the original cachestate but places the data in a speculative buffer partitionduring the speculation or out-of-order load or store Sinceduring speculation cache state is not actually updated thespeculative execution cannot trigger any of the steps inthe three-step model RIC cache focuses on eviction basedattack and therefore are good at preventing even someinternal miss-based vulnerability such as Vu Va Vu
(slow) (one type of Bernsteinrsquos Attack [3]) but are bad atall hit-based vulnerabilities PL cache is line-partitionedand uses locking techniques for victimrsquos security criticaldata It can prevent many vulnerabilities because preloadingand locking secure data disallow the attacker or non-securevictim data to set initial states (Step 0) for victim partitionand therefore brings uncertainty to the final observation bythe attacker eg Ad Vu Va (fast) (one type of CacheInternal Collision [5]) vulnerability is prevented
Randomization-Based Caches (columns for SHARP cacheand columns for RP cache to columns for Non Deterministiccache in Tables 4 and 5) inherently de-correlate therelationship between information of victimrsquos securitycritical datarsquos address and observed timing from cache
hit or miss or between the address and observed timingof flush or cache coherence operations For speculativeexecution they also de-correlate the relationship betweenthe address of the data being accessed during speculativeexecution or out-of-order load or store and the observedtiming from a cache hit or miss Randomization can beused when bringing data into the cache evicting data orboth Some designs randomize the address to cache setmapping As a result of the randomization the mutualinformation from the observed timing due to having ornot having data in the cache could be reduced to 0 ifrandomization is done on every memory access Somesecure caches use randomization to avoid many of the miss-based internal interference vulnerabilities However theymay still suffer from hit-based vulnerabilities especiallywhen the vulnerabilities are related to internal interferenceHowever randomization is also likewise recognized toincrease performance overheads [23] It also requires afast and secure random number generator Most of therandomization is cache-line-based and can be combinedwith differentiation of sensitive data to be more efficient
RP cache allows eviction between different sensitivedata which leaves vulnerabilities such as Vu Va Vu
(slow) (one type of Bernsteinrsquos Attack [3]) still possiblewhile Newcache prevents this Both of the RP cacheand Newcache are not able to prevent hit-based internal-interference vulnerabilities such as Ainv
a Vu Va
(fast) (one type of Cache Internal Collision [5]) RandomFill cache is able to use total de-correlation of memoryaccess and cache access of victimrsquos security critical datato prevent most of the internal and external interferenceHowever when security critical data is initially directlyloaded into the cache block for Step 1 Random Fill cachewill not randomly load security critical data and allowsvulnerabilities such as Vu V inv
a Vu (slow) (one typeof Flush + Time) vulnerability to exist CEASER cache usesencryption scheme plus dynamic remapping to randomizemapping from memory addresses to cache sets Howeverthis targets eviction-based attacks and cannot prevent hit-based vulnerabilities such as Va V inv
u V inva (fast)
(one type of Flush + Probe Invalidation) SCATTER cacheencrypts both the cache address and process ID whenmapping into different cache index to further prevent morehit-based vulnerabilities for shared and read only memoryNon Deterministic Cache totally randomizes timing ofcache accesses by adding delays and can prevent all attacks(but at tremendous performance cost)
61 Estimated Performance and Security Tradeoffs
Table 6 shows the implementation and performanceresults of the secure caches as listed by the designersin the different papers At the extreme end there is
J Hardw Syst Secur (2019) 3397ndash425 415
Table6
Exi
stin
gse
cure
cach
esrsquo
impl
emen
tatio
nm
etho
dpe
rfor
man
cep
ower
and
area
sum
mar
y
J Hardw Syst Secur (2019) 3397ndash425416
the Non Deterministic cache with random delay thesecure cache can prevent all the cache timingndashbasedvulnerabilities in some degreemdashwhile their paper reportsonly 7 degradation in performance we expect it tobe much more for more complex application than AESalgorithm Disabling caches eliminates the attacks but ata huge performance cost Normally a secure cache needsto sacrifice some performance in order to de-correlatememory access with the timing The secure caches thattend to be able to prevent more vulnerabilities usuallyhave weaker performance compared with other securecaches For example more security seems to imply lessperformance
62 Towards Ideal Secure Cache
Based on the above analysis a good secure cache shouldconsider all the 72 types of Strong vulnerabilities egexternal and internal interference and hit-based and miss-based vulnerabilities Considering all factors and basedon Tables 4 and 5 we have several suggestions andobservations for a secure cache design which can defendtiming-based attacks
bull Internal interference is important for caches to preventtiming-based attacks and is the weak point of most ofthe secure caches To prevent this the following threesubpoints should be considered
ndash Miss-based internal interference can be solvedby randomly evicting data to de-correlatememory access with timing information wheneither data to be evicted or data to be cachedis sensitive eg Newcache prevents Vu Va Vu (slow) (one type of BernsteinrsquosAttack [3]) vulnerability
ndash Hit-based internal interference can be solvedby randomly bringing data into the cache egRandom Fill cache prevents Ad Vu Va
(fast) (Cache Internal Collision) vulnerabilityndash To limit internal interference at lower perfor-
mance cost rather than simply assume all ofvictimrsquos data is sensitive it is better to differ-entiate real sensitive data from other data in thevictim code However identification of sen-sitive information needs to be carefully usedeg Random Fill cache is vulnerable to Vu Ad Vu (fast) (one type of Evict + Time [34])vulnerability which most of the secure cachesare able to prevent
bull Direct partitioning between the victim and the attackeralthough may hurt cache space utilization or perfor-mance is good at disallowing attacker to set known
initial state to victimrsquos partition and therefore pre-vents external interference Alternatively careful use ofrandomization can also prevent external interference
It should be noted that some cache designs only focuson certain levels eg CATalyst cache only works at thelast level cache In order to fully protect the whole cachesystem from timing-based attacks all levels of caches in thehierarchy should be protected with related security featuresFor example Sanctum is able to prevent all levels of cachesfrom L1 to last-level cache Consequently secure cachedesign needs to be realizable at all levels of the cachehierarchy
7 RelatedWork
There are a lot of existing attacks exploring timing-based cache channels eg [1 3 5 16ndash19 34 36 5455] Furthermore our recent paper [11] has summarizedcache timingndashbased side-channel vulnerabilities using athree-step model and inspired this work on checkingwhich vulnerability types are truly defeated by the securecaches in context of timing-based attacks In other workZhang and Lee [59] used finite-state machine to modelcache architectures and leveraged mutual information tomeasure potential side-channel leakage of the modeledcache architectures Meanwhile He and Lee [20] modeledinterference using probabilistic information flow graph andused attackerrsquos success probability to estimate differentcachesrsquo ability to defend against some cache timingndashbasedside-channel attacks However they did not explore allpossible vulnerabilities due to cache timingndashbased channels
There is also some other work focusing on cacheside-channel verification [13 14 47] Among theseCacheAudit [14] efficiently computes possible side-channelobservations using abstractions in a modular way Bit-level and arithmetic reasoning is used in [13] for memoryaccesses in the presence of dynamic memory allocationCacheD [47] detects potential cache differences at eachprogram point leveraging symbolic execution and constraintsolving
Hardware transactional memory has also been lever-aged to prevent timing-based cache side-channel attacks [815] Hardware transactional memory (HTM) is availableon modern commercial processors such as Intelrsquos Transac-tional Synchronization Extensions (TSX) Its main featureis to abort the transaction and roll back the modificationswhenever a cache block contained in the read set or writeset is evicted out of the cache In [15] HTM was com-bined with preloading strategy for code and data to preventFlush + Reload attacks in the local setting and Prime andProbe attacks in the cloud setting In [8] the software-level
J Hardw Syst Secur (2019) 3397ndash425 417
solution targets system calls page faults code refactoringand abort reasoning to eliminate not only Prime + ProbeFlush + Reload but also Evict + time and Cache Collisionattacks
8 Conclusion
This paper first proposed a new three-step model in orderto model all possible cache timing vulnerabilities It furtherprovided a cache three-step simulator and reduction rulesto derive effective vulnerabilities allowing us to find onesthat have not been exploited in literature With exhaustiveeffective vulnerability types listed this paper presentedanalysis of 18 secure processor cache designs with respectto how well they can defend against these timing-basedvulnerabilities Our work showed that vulnerabilities basedon internal interference of the victim application aredifficult to protect against and many secure cache designsfail in this We also provided a summary of secure processorcache features that could be integrated to make an idealsecure cache that is able to defend timing-based attacksOverall implementing a secure cache in a processor can be aviable alternative to defend timing-based attacks Howeverit requires design of an ideal secure cache or correction ofexisting secure cache designs to eliminate the few attacksthat they do not protect against
Acknowledgments This work was support in part by NSF grants1651945 and 1813797 and through Semiconductor Research Corpo-ration (SRC) under GRC Task 2844001
Appendix A Attack Strategies Descriptions
This appendix gives overview of the attack strategies shownin Tables 2 and 3 in Section 3 For each attack strategy anoverview of the three steps of the strategy is given Some ofthe strategies are similar and some may not be precise butwe keep and use the original names as they were assignedin prior work One advantage of our three-step model isthat it gives precise definition of each attack Neverthelessthe attack strategy names used before (and added by us forstrategies which did not have such names) may be useful torecall the attacksrsquo high-level operation
Cache Internal Collision In Step 1 cache blockrsquos data isinvalidated by flushing or eviction done by either theattacker or the victim Then the victim accesses secret datain Step 2 Finally the victim accesses data at a knownaddress in Step 3 if there is a cache hit then it reveals thatthere is an internal collision and leaks value of u
Flush + Reload In Step 1 either the attacker or the victiminvalidates the cache blockrsquos data by flushing or evictionThen the victim access secret data in Step 2 Finally theattacker tries to access some data in Step 2 using a knownaddress If a cache hit is observed then addresses from lasttwo steps are the same and the attacker learns the secretaddress This strategy has similar Step 1 and Step 2 asCache Internal Collision vulnerability but for Step 3 it isthe attacker who does the reload access
Reload + Time (New Name Assigned in this Paper) In Step
1 secret data is invalidated by the victim Then the attackerdoes some known data access in Step 2 that could possiblybring back the invalidated the victimrsquos secret data in Step
1 In Step 3 if the victim reloads the secret data a cachehit is observed and the attacker can derive the secret datarsquosaddress
Flush + Probe (New Name Assigned in this Paper) In Step
1 the victim or the attacker access some known addressIn Step 2 the victim invalidates secret data In Step 3reloading of Step 1rsquos data and observation of a cache misswill help the attacker learn that the secret data maps to theknown address from Step 1
Evict + Time In Step 1 some victimrsquos secret data is put intothe cache by the victim itself In Step 2 the attacker evicts aspecific cache set by performing a memory related operationthat is not a flush In Step 3 the victim reloads secret dataand if a cache miss is observed the will learn the secretdatarsquos cache set information This attack has similar Step 1and Step 3 as Flush + Time vulnerability but for Step 2 inEvict + Time the attacker invalidates some known addressallowing it to find the full address of the secret data insteadof evicting a cache set to only find the secret datarsquos cacheindex as in the Flush + Time attack
Prime + Probe In Step 1 the attacker primes the cache setusing data at address known to the attacker In Step 2 thevictim accesses the secret data which possibly evicts datafrom Step 1 In Step 3 the attacker probes each cache setand if a cache miss is observed the attacker knowns thesecret data maps to the cache set he or she primed
Bernsteinrsquos Attack This attack strategy leverages the vic-timrsquos internal interference to trigger the miss-based attackFor one case the victim does the same secret data access inStep 1 and Step 3 while in Step 2 the victim tries to evictone whole cache setrsquos data by known data accesses If cachemiss is observed in Step 3 that will tell the attacker thecache set is the one secret data maps to For another casethe victim primes and probe a cache set in Step 1 and Step
3 driven by the attacker while in Step 2 the victim tries to
J Hardw Syst Secur (2019) 3397ndash425418
access the secret data Similar to the first case observingcache miss in Step 3 tells the attacker the cache set is theone secret data maps to
Evict + Probe (NewName Assigned in this Paper) In Step 1victim evict the cache set using the access to a data at anaddress known to the attacker In Step 2 the victim accessessecret data which possibly evicts data from Step 1 In Step
3 the attacker probes each cache set using the same data asin Step 1 if a cache miss is observed the attacker knownsthe secret data maps to the cache set he or she primed Thisattack strategy has similar Step 2 and Step 3 as Prime +Probe attack but for Step 1 it is the victim that does theeviction accesses
Prime + Time (NewName Assigned in this Paper) In Step 1the attacker primes the cache set using access to data at anaddress known to the attacker In Step 2 the victim accessessecret data which possibly evicts data from Step 1 In Step
3 the victim probes each cache set using the same data Step
1 if a cache miss is observed the attacker knowns the secretdata maps to the cache set he or she primed in Step 1 Thisattack strategy has similar Step 1 and Step 2 as Prime +Probe attack but for Step 3 it is the victim that does theprobing accesses
Flush + Time (NewNameAssigned in this Paper) The victimaccesses the same secret data in Step 1 and Step 3 whilein Step 2 the attacker tries to invalidate data at a knownaddress If cache miss is observed in Step 3 that will tellthe attacker the data address he or she invalidated in Step 2maps to the secret data
Invalidation related (new names assigned in this paper)Vulnerabilities that have names ending with ldquoinvalidationrdquoin Table 3 correspond to the vulnerabilities that havethe same name (except for the ldquoinvalidationrdquo part) inTable 2 The difference between each set of correspondingvulnerabilities is that the vulnerabilities ending withldquoinvalidationrdquo use invalidation related operation in the laststep to derive the timing information rather than the normalmemory access related operations
Appendix B Soundness Analysisof the Three-StepModel
In this section we analyze the soundness of the three-stepmodel to demonstrate that the three-step model can coverall possible timing-based cache vulnerabilities in normalcaches If there is a vulnerability that is represented usingmore than three steps the steps can be reduced to only three
steps or a three-step sub-pattern can be found in the longerrepresentation
In the below analysis we use β to denote the numberof memory-related operations ie steps in a representationof a vulnerability We show that β = 1 is not sufficient torepresent a vulnerability β = 2 covers some vulnerabilitiesbut not all β = 3 represents all the vulnerabilities andβ gt 3 can be reduced to only three steps or a three-step sub-pattern can be found in the longer representation Knownaddresses refer to all the cache states that interferencewith the data a aalias and d Unknown address refersu An access to a known memory address is denoted asknown access operation and an invalidation of a knownmemory address is denoted as known inv operationThe known access operation and known inv operation
together make up not u operations An unknown memoryrelated operation (containing u) is denoted as u operation
B1 Patterns with β = 1
When β = 1 there is only one memory-relatedoperation and it is not possible to create interferencebetween memory-related operations since two memory-related operations are the minimum requirement for aninterference Furthermore β = 1 corresponds to the three-step pattern with both Step 1 and Step 2 being since thecache state gives no information and Step 3 being theone operation These types of patterns are all examined bythe cache three-step simulator and none of these types arefound to be effective Consequently a vulnerability cannotexit when β = 1
B2 Patterns with β = 2
When β = 2 it satisfies the minimum requirementof an interference for memory related operations andcorresponds to the three-step cases where Step 1 is and Step 2 and Step 3 are the two operations Thesetypes are all examined by the cache three-step simulatorand some of them belong to Weak Vulnerabilities like Aa Vu Therefore three-step cases where Step
1 is have corresponding effective vulnerabilities shownin Table 2 Consequently β = 2 can represent someweak vulnerabilities but not all vulnerabilities as there existsome that are represented with three steps as discussednext
B3 Patterns with β = 3
When β = 3 we have tested all possible combinations ofthree-step memory related operations in Section 33 usingour cache simulator for the three-step model We foundthat there are in total 72 types of Strong Vulnerabilities
J Hardw Syst Secur (2019) 3397ndash425 419
and 64 types of Weak Vulnerabilities that are representedby patterns with β = 3 steps Consequently β = 3can represent all the vulnerabilities (including some weakones where Step 1 is ) Using more steps to representvulnerabilities is not necessary as discussed next
B4 Patterns with β gt 3
When β gt 3 the pattern of memory-related operations fora vulnerability can be reduced using the following rules
B41 Subdivision Rules
First a set of subdivision rules is used to divide the longpattern into shorter patterns following the below rulesEach subdivision rule should be applied recursively beforeapplying the next rule
Subdivision Rule 1 If the longer pattern contains a sub-pattern such as the longer pattern canbe divided into two separate patterns where is assignedas Step 1 of the second pattern This is because givesno timing information and the attacker loses track of thecache state after This rule should be recursively applieduntil there are no sub-patterns left with a in the middleor as last step ( in the last step will be deleted) in thelonger pattern
Subdivision Rule 2 Next if a pattern (derived afterrecursive application of the Rule 1 contains a sub-patternsuch as AinvVinv the longer pattern canbe divided into two separate patterns where AinvVinv isassigned as Step 1 of the second pattern This is becauseAinvVinv will flush all the timing information of thecurrent block and it can be used as the flushing step forStep 1 eg vulnerability Ainv Vu Aa(f ast)shown in Table 2 AinvVinv cannot be a candidate formiddle steps or the last step because it will flush alltiming information making the attacker unable to deducethe final timing with victimrsquos sensitive address translationinformation This rule should be recursively applied untilthere are no sub-patterns left with a AinvVinv in themiddle or the last step (AinvVinv in the last step will bedeleted)
B42 Simplification Rules
For each of the patterns resulting from the subdivisionof the original pattern we define Commute Rules UnionRules and Reduction Rules for a each set of two adjacentsteps in these remaining patterns In Table 7 we showall the possible cases of the rule applying conditions for
each adjacent two steps regardless of the attackerrsquos access(A) or the victimrsquos access (V ) The table shows whetherthe corresponding two steps can be commuted reduced orunioned (and the reduced or the unioned result if the rulescan be applied)
B421 Commute Rules
Suppose there are two adjacent steps M and N for a memorysequences M N If commuting M andN lead to the same observation result ie M N and N M will have thesame timing observation information in the final step for theattacker we can freely exchange the place of M and N inthis pattern In this case we have more chance to Reduceand Union the steps within the memory sequence by thefollowing Rules In the possible commuting process we willtry every possible combinations to commute different pairsof two steps that are able to apply the Commute Rules andthen further apply Reduce Rules and Union Rules to seewhether the commute is effective ie there can be stepsreduced or unioned after the proper commuting process Thefollowing two adjacent memory-related operations can becommuted
ndash Commute Rule 1 For two adjacent steps if one stepis a known access operation and another step is aknown inv operation and the addresses they referto are different these two steps can be commuted nomatter which position of the two steps they are in withinthe whole memory sequence It will show a ldquoyesrdquo forthe corresponding two-step pattern for the CommuteRule 1 column if these two can be commuted in Table 7
ndash Commute Rule 2 A superset of two-step patterns thatcan apply Commute Rule 1 can be commuted if thesecond step of these two adjacent steps is not thelast step in the whole memory sequence There aresome two adjacent steps that can only be commutedif the second step of these two adjacent steps is notthe last step in the whole memory sequence Therewill be a ldquoyesrdquo for the corresponding two-step patternfor the Commute Rule 2 column and a ldquonordquo for thecorresponding two-step pattern for the Commute Rule 1column in Table 7
B422 Reduction Rules
If the memory sequence after applying Commute Ruleshave a sub-pattern that has two adjacent steps both relatedto known addresses or both related to unknown address(including repeating states) the two adjacent steps can bereduced to only one following the reduction rules (if thetwo-step pattern has ldquoyesrdquo for the Column ldquoUnion Rule or
J Hardw Syst Secur (2019) 3397ndash425420
Table7
Rul
esfo
rco
mbi
ning
two
adja
cent
step
s
Firs
tSe
cond
Com
mute
Com
mute
Union
Rule
orC
ombi
ned
Firs
tSe
cond
Com
mute
Com
mute
Union
Rule
orC
ombi
ned
Rule1
Rule2
ReduceRule
Step
Rule1
Rule2
ReduceRule
Step
aa
yes
yes
yes
aa
inv
ano
noye
sa
aa
ali
as
nono
yes
aali
as
ain
va
ali
as
yes
yes
yes
aali
as
ad
nono
yes
da
inv
dye
sye
sye
sd
au
nono
nominus
ain
vu
nono
nominus
aa
inv
nono
yes
ain
va
inv
ain
vye
sye
sye
sa
inv
aa
ali
asin
vye
sye
sye
sa
ain
va
ali
asin
vye
sye
sye
sU
nio
n(a
inva
ali
asin
v)
ad
inv
yes
yes
yes
aa
inv
din
vye
sye
sye
sU
nio
n(a
invd
inv)
au
inv
nono
nominus
ain
vu
inv
noye
sno
minusa
ali
as
ano
noye
sa
aali
asin
va
yes
yes
yes
a
aali
as
aali
as
yes
yes
yes
aali
as
aali
asin
va
ali
as
nono
yes
aali
as
aali
as
dno
noye
sd
aali
asin
vd
yes
yes
yes
d
aali
as
uno
nono
minusa
ali
asin
vu
nono
nominus
aali
as
ain
vye
sye
sye
sa
ali
as
aali
asin
va
inv
yes
yes
yes
Unio
n(a
inva
ali
asin
v)
aali
as
aali
asin
vno
noye
sa
ali
asin
va
ali
asin
va
ali
asin
vye
sye
sye
sa
ali
asin
v
aali
as
din
vye
sye
sye
sa
ali
as
aali
asin
vd
inv
yes
yes
yes
Unio
n(d
inva
ali
asin
v)
aali
as
uin
vno
nono
minusa
ali
asin
vu
inv
noye
sno
minusd
ano
noye
sa
din
va
yes
yes
yes
a
da
ali
as
nono
yes
aali
as
din
va
ali
as
yes
yes
yes
aali
as
dd
yes
yes
yes
dd
inv
dno
noye
sd
du
nono
nominus
din
vu
noye
sno
minusd
ain
vye
sye
sye
sd
din
va
inv
yes
yes
yes
Unio
n(a
invd
inv)
da
ali
asin
vye
sye
sye
sd
din
va
ali
asin
vye
sye
sye
sU
nio
n(d
inva
ali
asin
v)
dd
inv
nono
yes
din
vd
inv
din
vye
sye
sye
sd
inv
du
inv
noye
sno
minusd
inv
uin
vno
yes
nominus
ua
nono
nominus
uin
va
nono
nominus
ua
ali
as
nono
nominus
uin
va
ali
as
nono
nominus
ud
nono
nominus
uin
vd
noye
sno
minusu
uye
sye
sye
su
uin
vu
nono
yes
u
ua
inv
nono
nominus
uin
va
inv
noye
sno
minusu
aali
asin
vno
nono
minusu
inv
aali
asin
vno
yes
nominus
ud
inv
noye
sno
minusu
inv
din
vno
yes
nominus
uu
inv
nono
yes
uin
vu
inv
uin
vye
sye
sye
su
inv
J Hardw Syst Secur (2019) 3397ndash425 421
Reduce Rulerdquo and has no Union result for the ldquoCombinedSteprdquo Column in Table 7
ndash Reduction Rule 1 For two u operations although u isunknown both of the accesses target on the same u socan be reduced to only keep the second access in thememory sequence
ndash Reduction Rule 2 For two known adjacent memoryaccessndashrelated operations (known access operation)they always result in a deterministic state of the secondmemory accessndashrelated cache block so these two stepscan be reduced to only one step
ndash Reduction Rule 3 For two adjacent steps if onestep is known access operation and another one isknown inv operation no matter what order theyhave and the address they refer to is the same these twocan be reduced to one step which is the second step
B423 Union Rules
Suppose there are two adjacent steps M and N for memorysequences M N If combining M andN leads to the same timing observation result ie M N and Union(M N) will havethe same timing observation information in the final step forthe attacker we can combine step M and N to be a joint onestep for this memory sequence defined as Union(M N)Two adjacent steps that can be combined are discussed inthe following cases
ndash Union Rule 1 Two invalidations to two knowndifferent memory addresses can be applied UnionRule 1 known inv operation are two operations bothinvalidating some known address therefore they canbe combined to only one step The Union Rule can becontinuously done to union all the adjacent invalidationstep that invalidates known different memory addresses
B424 Final Check Rules
Each long memory sequence will recursively apply thesethree categorizations of the rules in the order Com-mute Rules first to put known access operations andknown inv operation that targets the same address as nearas possible and u operations and not u operations areputting together as much as possible The Reduced Rulesare then checked and applied to the processed memorysequence to reduce the steps Then the Union Rule isapplied to the processed memory sequence
The recursion at each application to these threecategorizations of the rules should be always applied andreduce at least one step until the resulting sequence matchesone of the two possible cases
bull the long (β gt 3) memory sequence with u operation
and not u operation is further reduced to a sequencewhere there are at most three steps in the followingpatterns or less
ndash u operation not u operation u operation
ndash not u operation u operation not u operation
There might be possible extra or AinvV inv beforethese three-step pattern where
bull An extra in the first step will not influencethe result and can be directly removed
bull If an extra AinvV inv in the first step
ndash If followed by known access ope-ration AinvV inv can be removeddue to the actual state further put intothe cache block
ndash If followed by known inv opera-t ion or V inv
u AinvV inv can also beremoved since the memory locationis repeatedly flushed by the two steps
ndash If followed by Vu worst case will beAinvV inv Vu not u operation u operationwhich is either an effective vul-nerability according to Table 2and reduction rules shown inSection 33 or AinvV inv Vu Ainv
d V invd u operation where
Vu Ainvd V inv
d can further beapplied Commute Rule 2 to reduceand be within three steps
In this case the steps are finally within three steps andthe checking is done
bull There exist two adjacent steps that cannot be appliedany Rules above and requires the Rest Checking
The only left two adjacent steps that cannot be appliedby any of the three categorizations of the Rules are thefollowing
ndash AaVaAaalias Vaalias AdVdAinva V inv
a
Ainvaalias V inv
aalias Vu ndash AaVaAaalias Vaalias V inv
u ndash Vu AaVaAaalias Vaalias AdVd
Ainva V inv
a Ainvaalias V inv
aalias ndash V inv
u AaVaAaalias Vaalias We manually checked all of the two adjacent step patterns
above and found that adding extra step before or after these
J Hardw Syst Secur (2019) 3397ndash425422
Algorithm 2 -Step ( 3) pattern reduction
Input number of steps of the pattern
a two-dimensional dynamic-size array [0] contains the states of each step of the original pattern in
order [1] [2] are empty initially
Output reduced effective vulnerability pattern(s) array It will be an empty list if the original pattern does not
correspond to an effective vulnerability
1 = Oslash
2 while contain( ) and index not 0 do3 = 1 ( )
4 end while5 while ( contain( ) and index not 0) or ( contain( ) and index not 0) do6 = 2 ( )
7 end while8 while ( is ineffecitve or has interval effective three steps) do9 = ( )
10 = ( )
11 = ( )
12 if ( is ineffecitve or has interval effective three steps) then13 += ( )
14 end if15 end while16 return
two steps can either generate two adjacent step patterns thatbe processed by the three Rules where further step can bereduced or construct effective vulnerability according toTable 2 and reduction rules shown in Section 33 wherethe corresponding pattern can be treated effective and thechecking is done
B43 Algorithm for Reducing and CheckingMemorySequence
Algorithm 2 is used to (i) reduce a β-step (β gt 3)pattern to a three-step pattern thus demonstrating that thecorresponding β gt 3 step pattern actually is equivalent tothe output three-step pattern and represents a vulnerabilitythat is captured by an existing three-step pattern or (ii)demonstrate that the β-step pattern can be mapped to one ormore three-step vulnerabilities It is not possible for a β-stepvulnerability pattern to not be either (i) or (ii) after doing theRule applications Key outcome of our analysis is that anyβ-step pattern is not a vulnerability or if it is a vulnerabilityit maps to either outputs (i) or (ii) of the algorithm
Inside the Algorithm 2 contain() represents a function tocheck if a list contains a corresponding state is ineffective()represents a function that checks the corresponding memorysequence does not contain any effective three-stepshas interval effective three steps() represents a functionthat check if the corresponding memory sequence can bemapped to one or more three-step vulnerabilities
B44 Summary
In conclusion the three-step model can model all possibletiming-based cache vulnerability in normal caches Vulner-abilities which are represented by more than three steps canbe always reduced to one (or more) vulnerabilities from ourthree-step model and thus using more than three step is notnecessary
References
1 Acıicmez O Koc CK (2006) Trace-driven cache attacks on AES(short paper) In International conference on information andcommunications security Springer pp 112ndash121
2 Agrawal D Archambeault B Rao JR Rohatgi P (2002) TheEM side-channel (s) In International workshop on cryptographichardware and embedded systems Springer pp 29ndash45
3 Bernstein DJ (2005) Cache-timing attacks on AES4 Binkert N Beckmann B Black G Reinhardt SK Saidi A Basu
A Hestness J Hower DR Krishna T Sardashti S et al (2011)The gem5 simulator ACM SIGARCH computer architecture news39(2)1ndash7
5 Bonneau J Mironov I (2006) Cache-collision timing attacksagainst AES In International workshop on cryptographichardware and embedded systems Springer pp 201ndash215
6 Borghoff J Canteaut A Guneysu T Kavun EB Knezevic MKnudsen LR Leander G Nikov V Paar C Rechberger Cet al (2012) Princendasha low-latency block cipher for pervasivecomputing applications In International conference on the theoryand application of cryptology and information security Springerpp 208ndash225
J Hardw Syst Secur (2019) 3397ndash425 423
7 Bourgeat T Lebedev I Wright A Zhang S Devadas S et al (2019)MI6 Secure enclaves in a speculative out-of-order processorIn Proceedings of the 52nd Annual IEEEACM InternationalSymposium on Microarchitecture ACM pp 42ndash56
8 Chen S Liu F Mi Z Zhang Y Lee RB Chen H Wang X(2018) Leveraging hardware transactional memory for cache side-channel defenses In Proceedings of the 2018 on Asia conferenceon computer and communications security ACM pp 601ndash608
9 Clark SS Ransford B Rahmati A Guineau S Sorber J Xu WFu K (2013) Wattsupdoc power side channels to nonintrusivelydiscover untargeted malware on embedded medical devices InPresented as part of the 2013 USENIX workshop on healthinformation technologies
10 Costan V Lebedev IA Devadas S (2016) Sanctum minimalhardware extensions for strong software isolation In USENIXsecurity symposium pp 857ndash874
11 Deng S Xiong W Szefer J (2018) Cache timing side-channel vulnerability checking with computation tree logic InProceedings of the 7th international workshop on hardware andarchitectural support for security and privacy ACM p 2
12 Domnitser L Jaleel A Loew J Abu-Ghazaleh N Ponomarev D(2012) Non-monopolizable caches low-complexity mitigation ofcache side channel attacks ACM Transactions on Architectureand Code Optimization (TACO) 8(4)35
13 Doychev G Kopf B (2017) Rigorous analysis of softwarecountermeasures against cache attacks In Proceedings of the 38thACM SIGPLAN conference on programming language design andimplementation ACM pp 406ndash421
14 Doychev G Kopf B Mauborgne L Reineke J (2015) CacheAudita tool for the static analysis of cache side channels ACM Trans-actions on information and system security (TISSEC) 18(1)4
15 Gruss D Lettner J Schuster F Ohrimenko O Haller I CostaM (2017) Strong and efficient cache side-channel protectionusing hardware transactional memory In USENIX securitysymposium pp 217ndash233
16 Gruss D Maurice C Wagner K Mangard S (2016) Flush+ flusha fast and stealthy cache attack In International conference ondetection of intrusions and malware and vulnerability assessmentSpringer pp 279ndash299
17 Gruss D Spreitzer R Mangard S (2015) Cache template attacksautomating attacks on inclusive last-level caches In USENIXsecurity symposium pp 897ndash912
18 Guanciale R Nemati H Baumann C Dam M (2016) Cache stor-age channels alias-driven attacks and verified countermeasuresIn 2016 IEEE symposium on security and privacy (SP) IEEEpp 38ndash55
19 Gullasch D Bangerter E Krenn S (2011) Cache gamesndashbringingaccess-based cache attacks on AES to practice In 2011 IEEEsymposium on security and privacy (SP) IEEE pp 490ndash505
20 He Z Lee RB (2017) How secure is your cache against side-channel attacks In Proceedings of the 50th annual IEEEACMinternational symposium on microarchitecture ACM pp 341ndash353
21 Intel C (2015) Improving real-time performance by utilizing cacheallocation technology Intel Corporation
22 Kayaalp M Khasawneh KN Esfeden HA Elwell J Abu-Ghazaleh N Ponomarev D Jaleel A (2017) RIC relaxed inclusioncaches for mitigating LLC side-channel attacks In 2017 54thACMEDACIEEE design automation conference (DAC) IEEEpp 1ndash6
23 Keramidas G Antonopoulos A Serpanos DN Kaxiras S (2008)Non deterministic caches a simple and effective defense againstside channel attacks Des Autom Embed Syst 12(3)221ndash230
24 Kessler RE Hill MD (1992) Page placement algorithms for largerealindexed caches ACM Trans Comput Syst (TOCS) 10(4)338ndash359
25 Kiriansky V Lebedev I Amarasinghe S Devadas S Emer J(2018) DAWG a defense against cache timing attacks in spec-ulative execution processors In 2018 51st Annual IEEEACMinternational symposium on microarchitecture (MICRO) IEEEpp 974ndash987
26 Kocher P Genkin D Gruss D Haas W Hamburg M LippM Mangard S Prescher T Schwarz M Yarom Y (2018)Spectre attacks exploiting speculativeexecution In 2019 IEEESymposium on Security and Privacy (SP) IEEE pp 1ndash19
27 Lee RB Kwan P McGregor JP Dwoskin J Wang Z (2005)Architecture for protecting critical secrets in microprocessorsIn ACM SIGARCH computer architecture news vol 33 IEEEComputer Society pp 2ndash13
28 Lee Y Waterman A Avizienis R Cook H Sun C StojanovicV Asanovic K (2014) A 45nm 13 GHz 167 double-precisionGFLOPSW RISC-V processor with vector accelerators InEuropean solid state circuits conference (ESSCIRC) ESSCIRC2014-40th IEEE pp 199ndash202
29 Liu F Ge Q Yarom Y Mckeen F Rozas C Heiser G Lee RB(2016) CATalyst defeating last-level cache side channel attacks incloud computing In 2016 IEEE international symposium on highperformance computer architecture (HPCA) IEEE pp 406ndash418
30 Liu F Lee RB (2014) Random fill cache architecture In2014 47th annual IEEEACM international symposium onmicroarchitecture (MICRO) IEEE pp 203ndash215
31 Liu F Wu H Mai K Lee RB (2016) Newcache secure cachearchitecture thwarting cache side-channel attacks IEEE Micro36(5)8ndash16
32 Luk CK Cohn R Muth R Patil H Klauser A Lowney G WallaceS Reddi VJ Hazelwood K (2005) Pin building customizedprogram analysis tools with dynamic instrumentation In ACMsigplan notices vol 40 ACM pp 190ndash200
33 Masti RJ Rai D Ranganathan A Muller C Thiele L Capkun S(2015) Thermal covert channels on multi-core platforms In 24thUSENIX security symposium (USENIX security 15) pp 865ndash880
34 Osvik DA Shamir A Tromer E (2006) Cache attacks andcountermeasures the case of AES In Cryptographersrsquo track atthe RSA conference Springer pp 1ndash20
35 Patel A Afram F Chen S Ghose K (2011) MARSS Afull system simulator for multicore x86 CPUs In 2011 48thACMEDACIEEE design automation conference (DAC) IEEEpp 1050ndash1055
36 Percival C (2005) Cache missing for fun and profit BSDCan 2005Ottawa 2005 httpwwwdaemonologynetpapershttpdf
37 Hammarlund P Martinez AJ Bajwa AA Hill DL Hallnor EJiang H Dixon M Derr M Hunsaker M Kumar R Osborne RB(2014) Haswell The fourth-generation intel core processor IEEEMicro 1034(2)6ndash20
38 Qureshi MK (2018) CEASER mitigating conflict-based cacheattacks via encrypted-address and remapping In 2018 51Stannual IEEEACM international symposium on microarchitecture(MICRO) IEEE pp 775ndash787
39 Sanchez D Kozyrakis C (2013) ZSIm fast and accuratemicroarchitectural simulation of thousand-core systems In ACMSIGARCH computer architecture news vol 41 ACM pp 475ndash486
40 Schwarz M Schwarzl M Lipp M Gruss D (2018) NetSpectreread arbitrary memory over network In European Symposium onResearch in Computer Security Springer pp 279ndash299
41 Seznec A (1993) A case for two-way skewed-associative cachesACM SIGARCH computer architecture news 21(2)169ndash178
42 Sharkey J Ponomarev D Ghose K (2005) M-sim a flexiblemultithreaded architectural simulation environment Techenicalreport Department of Computer Science State University ofNew York at Binghamton In Technical Report 20012 CompaqComputer Corporation Aug 2001
J Hardw Syst Secur (2019) 3397ndash425424
43 Shivakumar P Jouppi NP (2001) Cacti 30 an integrated cachetiming power and area model In Technical Report 20012Compaq Computer Corporation Aug 2001
44 Taylor G Davies P Farmwald M (1990) The TLB slice-alow-cost high-speed address translation mechanism In 17thannual international symposium on computer architecture 1990Proceedings IEEE pp 355ndash363
45 Thoziyoor S Muralimanohar N Ahn JH Jouppi NP (2008)CACTI 51 Technical Report HPL-2008-20 HP Labs
46 Trippel C Lustig D Martonosi M (2018) MeltdownPrimeand SpectrePrime automatically-synthesized attacks exploitinginvalidation-based coherence protocols arXiv180203802
47 Wang S Wang P Liu X Zhang D Wu D (2017) CacheDidentifying cache-based timing channels in production softwareIn 26th USENIX security symposium USENIX association
48 Wang Y Ferraiuolo A Zhang D Myers AC Suh GE (2016)SecDCP secure dynamic cache partitioning for efficient timingchannel protection In 2016 53nd ACMEDACIEEE designautomation conference (DAC) IEEE pp 1ndash6
49 Wang Z Lee RB (2007) New cache designs for thwarting softwarecache-based side channel attacks In ACM SIGARCH computerarchitecture news vol 35 ACM pp 494ndash505
50 Wang Z Lee RB (2008) A novel cache architecture withenhanced performance and security In 2008 41st IEEEACMinternational symposium on Microarchitecture 2008 MICRO-41IEEE pp 83ndash93
51 Werner M Unterluggauer T Giner L Schwarz M Gruss D Man-gard S (2019) ScatterCache thwarting cache attacks via cache setrandomization In 28th USENIX security symposium (USENIXSecurity 19) USENIX Association Santa Clara CA httpswwwusenixorgconferenceusenixsecurity19presentationwerner
52 Yan M Choi J Skarlatos D Morrison A Fletcher C Torrellas J(2018) InvisiSpec making speculative execution invisible in thecache hierarchy In 2018 51st annual IEEEACM internationalsymposium on microarchitecture (MICRO) IEEE pp 428ndash441
53 Yan M Gopireddy B Shull T Torrellas J (2017) Secure hierarchy-aware cache replacement policy (SHARP) defending againstcache-based side channel attacks In Proceedings of the 44thannual international symposium on computer architecture ACMpp 347ndash360
54 Yao F Doroslovacki M Venkataramani G (2018) Are coherenceprotocol states vulnerable to information leakage In 2018IEEE international symposium on high performance computerarchitecture (HPCA) IEEE pp 168ndash179
55 Yarom Y Falkner K (2014) FLUSH+ RELOAD a high resolutionlow noise L3 cache side-channel attack In USENIX securitysymposium pp 719ndash732
56 Zhang D Askarov A Myers AC (2012) Language-based controland mitigation of timing channels ACM SIGPLAN Not 47(6)99ndash110
57 Zhang D Wang Y Suh GE Myers AC (2015) A hardware designlanguage for timing-sensitive information-flow security In ACMSIGARCH computer architecture news vol 43 ACM pp 503ndash516
58 Zhang S Wright A Bourgeat T Arvind A (2018) Composablebuilding blocks to open up processor design In 2018 51stannual IEEEACM international symposium on microarchitecture(MICRO) IEEE pp 68ndash81
59 Zhang T Lee RB (2014) New models of cache architecturescharacterizing information leakage from cache side channels InProceedings of the 30th annual computer security applicationsconference ACM pp 96ndash105
60 Zhang Y Parikh D Sankaranarayanan K Skadron K Stan M(2003) HotLeakage a temperature-aware model of subthresholdand gate leakage for architects University of Virginia Dept ofComputer Science Tech Report CS-2003 5
Publisherrsquos Note Springer Nature remains neutral with regard tojurisdictional claims in published maps and institutional affiliations
J Hardw Syst Secur (2019) 3397ndash425 425
Table 2 The table shows all the cache timing-based cache vulnerabilities where the last step is a memory access related operation The AttackStrategy column gives a common name for each set of one or more specific vulnerabilities that would be exploited in an attack in a similar mannerThe Vulnerability Type column gives the three steps that define each vulnerability For Step 3 fast indicates a cache hit must be observed toderive sensitive address information while slow indicates a cache miss must be observed The Macro Type column proposes the categorizationthe vulnerability belongs to ldquoErdquo is for external interference vulnerabilities ldquoIrdquo is for internal interference vulnerabilities ldquoMrdquo is for miss-basedvulnerabilities ldquoHrdquo is for hit-based vulnerabilities The Attack column shows if a type of vulnerability has been previously presented in literature
Attack Strategy Vulnerability Type Macro Type Attack
Step 1 Step 2 Step 3
Cache Internal Collision Ainv Vu Va (fast) IH (2)
V inv Vu Va (fast) IH (2)
Ad Vu Va (fast) IH (2)
Vd Vu Va (fast) IH (2)
Aaalias Vu Va (fast) IH (2)
Vaalias Vu Va (fast) IH (2)
Ainva Vu Va (fast) IH (2)
V inva Vu Va (fast) IH (2)
Flush + Reload Ainva Vu Aa (fast) EH (5)
V inva Vu Aa (fast) EH (5)
Ainv Vu Aa (fast) EH (5)
V inv Vu Aa (fast) EH (5)
Ad Vu Aa (fast) EH (5)
Vd Vu Aa (fast) EH (5)
Aaalias Vu Aa (fast) EH (5)
Vaalias Vu Aa (fast) EH (5)
Reload + Time V invu Aa Vu (fast) EH new
V invu Va Vu (fast) IH new
Flush + Probe Aa V invu Aa (slow) EM (6)
Aa V invu Va (slow) IM new
Va V invu Aa (slow) EM new
Va V invu Va (slow) IM new
Evict + Time Vu Ad Vu (slow) EM (1)
Vu Aa Vu (slow) EM (1)
Prime + Probe Ad Vu Ad (slow) EM (4)
Aa Vu Aa (slow) EM (4)
Bernsteinrsquos Attack Vu Va Vu (slow) IM (3)
Vu Vd Vu (slow) IM (3)
Vd Vu Vd (slow) IM (3)
Va Vu Va (slow) IM (3)
Evict + Probe Vd Vu Ad (slow) EM newVa Vu Aa (slow) EM new
Prime + Time Ad Vu Vd (slow) IM newAa Vu Va (slow) IM new
Flush + Time Vu Ainva Vu (slow) EM new
Vu V inva Vu (slow) IM new
(1) Evict + Time attack [34] (2) Cache Internal Collision attack [5] (3) Bernsteinrsquos attack [3] (4) Prime + Probe attack [34 36] Alias-drivenattack [18] (5) Flush + Reload attack [54 55] Evict + Reload attack [17] (6) SpectrePrime MeltdownPrime attack [46]
J Hardw Syst Secur (2019) 3397ndash425 405
Table 3 The table shows the second part of the timing-based cache side-channel vulnerabilities where the last step is an invalidation relatedoperation For Step 3 fast indicates no corresponding address of the data is invalidated while slow indicates invalidation operation makes somedata invalid causing longer processing time
Attack Vulnerability Type Macro Attack
Step 1 Step 2 Step 3
Strategy Type
Cache Internal Collision Invalidation Ainv Vu V inva (slow) IH new
V inv Vu V inva (slow) IH new
Ad Vu V inva (slow) IH new
Vd Vu V inva (slow) IH new
Aaalias Vu V inva (slow) IH new
Vaalias Vu V inva (slow) IH new
Flush + Flush Ainva Vu V inv
a (slow) IH (1)
V inva Vu V inv
a (slow) IH (1)
Ainva Vu Ainv
a (slow) EH (1)
V inva Vu Ainv
a (slow) EH (1)
Flush + Reload Invalidation Ainv Vu Ainva (slow) EH new
V inv Vu Ainva (slow) EH new
Ad Vu Ainva (slow) EH new
Vd Vu Ainva (slow) EH new
Aaalias Vu Ainva (slow) EH new
Vaalias Vu Ainva (slow) EH new
Reload + Time Invalidation V invu Aa V inv
u (slow) EH new
V invu Va V inv
u (slow) IH new
Flush + Probe Invalidation Aa V invu Ainv
a (fast) EM new
Aa V invu V inv
a (fast) IM new
Va V invu Ainv
a (fast) EM new
Va V invu V inv
a (fast) IM new
Evict + Time Invalidation Vu Ad V invu (fast) EM new
Vu Aa V invu (fast) EM new
Prime + Probe Invalidation Ad Vu Ainvd (fast) EM new
Aa Vu Ainva (fast) EM new
Bernsteinrsquos Invalidation Attack Vu Va V invu (fast) IM new
Vu Vd V invu (fast) IM new
Vd Vu V invd (fast) IM new
Va Vu V inva (fast) IM new
Evict + Probe Invalidation Vd Vu Ainvd (fast) EM new
Va Vu Ainva (fast) EM new
Prime + Time Invalidation Ad Vu V invd (fast) IM new
Aa Vu V inva (fast) IM new
Flush + Time Invalidation Vu Ainva V inv
u (fast) EM new
Vu V inva V inv
u (fast) IM new
(1) Flush + Flush attack [16]
J Hardw Syst Secur (2019) 3397ndash425406
This section gives brief overview of the 18 secure cachedesigns that have been presented in academic literaturein the last 15 years To the best of our knowledge thesecover all the secure cache designs proposed to date Mostof the designs have been realized in functional simulationeg [23 48] Some have been realized in FPGA eg [7]and a few have been realized in real ASIC hardwareeg [31] No specific secure caches have been implementedin commercial processors to the best of our knowledgehowever CATalyst [29] leverages Intelrsquos CAT (CacheAllocation Technology) technology available today in IntelXeon E5 2618L v3 processors and could be deployed today
When the secure cache description in the cited papersdid not mention the issue of using flush or cache coherencewe assume the victim or the attacker cannot invalidateeach otherrsquos cache blocks by using clf lush instructions orthrough cache coherence protocol operations but they canflush or use cache coherence to invalidate their own cachelines The victim and the attacker also cannot invalidateprotected or locked data Further if the authors specified anyspecific assumptions (mainly about the software) we list theassumption as part of the description of the cache Whatrsquosmore when the level of cache hierarchy was unspecifiedwe assume the secure cachesrsquo features can be appliedto all levels of caches including L1 cache L2 cacheand Last Level Cache (LLC) If the inclusivity of thecaches was not specified we assume they target inclusivecaches Following the below descriptions of each securecache design the analysis of the secure caches is given inSection 5
SPlowast Cache [20 27]3 uses partitioning techniques tostatically partition the cache ways into High and Lowpartition for the victim and the attacker according to theirdifferent process IDs The victim typically belongs to Highsecurity and attacker belongs to Low security Victimrsquosmemory accesses cannot modify Low partition (assigned toprocesses such as the attacker) while the attackerrsquos memoryaccesses cannot modify High partition (assigned to thevictim) However the memory accesses of both the victimand the attacker can result in a hit in either Low or Highpartition if the data is in the cache
SecVerilog Cache [56 57] statically partitions cacheblocks between security levels L (Low) and H (High)Each instruction in the source code for programs usingSecVerilog cache needs to include a t iming label whicheffectively represents whether the data being accessed by
3Two existing papers give slightly different definitions for an ldquoSPrdquocache thus we selected to define a new cache the SPlowast cachethat combines secure cache features of the Secret-Protecting cachefrom [27] with secure cache features of the Static-Partitioned cachefrom [20]
that instruction is Low or High based on the code andthis t iming label can be similar to a process ID thatdifferentiates attackerrsquos (Low) instructions from victimrsquos(High) instructions The cache is designed such thatoperations in the High partition cannot affect timing ofoperations in the Low partition For a cache miss due to Lowinstructions when the data is in the High partition it willbehave as a cache miss and the data will be moved from theHigh to the Low partition to preserve consistency HoweverHigh instructions are able to result in a cache hit in bothHigh and Low partitions if the data is already in the cache
SecDCPCache [48] builds on the SecVerilog cache and usespartitioning idea from the original SecVerilog cache but thepartitioning is dynamic It can support at least two securityclasses H (High) and L (Low) and configurations with moresecurity classes are possible They use the percentage ofcache misses for L instructions that was reduced (increased)when Lrsquos partition size was increased (reduced) by onecache way to adjust the number of ways of the cacheassigned to the Low partition When adjusting number ofways in the cache dedicated to each partition if Lrsquos partitionsize decreases the process ID is checked and L blocksare flushed before the way is reallocated to H On theother hand if Lrsquos partition size increases H blocks in theadjusted cache way remain unmodified so as to not addmore performance overhead and they will eventually beevicted by Lrsquos memory accesses However the feature ofnot flushing High partition data during way adjustment mayleak timing information to the attacker
NoMo Cache [12] dynamically partitions the cache waysamong the currently ldquoactiverdquo simultaneous multithreading(SMT) threads Each thread is exclusively reserved Y blocksin each cache set where Y is within the range of [0 N
M]
where N is the number of ways and M is the number ofSMT threads NoMo-0 equals to traditional set associativecache while NoMo- N
M partitions cache evenly for the
different threads and there are no non-reserved ways Thenumber of Y assigned to each thread is adjusted based onits activeness When adjusting number of blocks assigned toa thread Y blocks are invalidated for cache sets to protecttiming leakage Eviction is not allowed within each threadrsquosown reserved ways while it is possible for the shared waysTherefore to avoid eviction caused by the unreserved wayswe assume NoMo- N
M is used to fully partition the cache
When the attacker and the victim share the same librarythere will be a cache hit if accessing the shared data andthe normal cache hit policy holds to guarantee the cachecoherence
SHARP Cache [53] uses both partitioning and randomiza-tion techniques to prevent victimrsquos data from being evicted
J Hardw Syst Secur (2019) 3397ndash425 407
or flushed by other malicious processes and it targets on theinclusive caches Each cache block is augmented with thecore valid bits (CVB) to indicate which private cache (pro-cess) it belongs to (similar to the Process ID) where CVBstores a bitmap and ith bit in the bitmap is set if the lineis present in ith corersquos private cache Cache hit is allowedamong different processesrsquo data When there is cache missand data needs to be evicted data not belonging to any cur-rent processes will be evicted first If there is no such datadata belonging to the same process will be evicted If thereis no existing data in the cache that is in the same processa random data in the cache set will be evicted This randomeviction will generate an interrupt to the OS to notify it ofa suspicious activity For pages that are read-only or exe-cutable SHARP cache disallows flushing using clf lush inuser mode However invalidating victimrsquos blocks by usingcache coherence protocol is still possible
Sanctum Cache [10] focuses on isolation of enclaves(equivalent to Trusted Software Module in other designs)from each other and the operating system (OS) In terms ofcaches they implement security features for L1 cache TLBand LLC Cache isolation of LLC is achieved by assigningeach enclave or OS to different DRAM address regionsIt uses page-coloring-based cache partitioning scheme [2444] and a software security monitor that ensures per-core isolation between OS and enclaves For L1 cacheand TLB when there is a transition between enclave andnon-enclave mode the security monitor will flush thecore-private caches to achieve isolation Normal flushestriggered by the enclave or the OS can only be done withinenclave or not within enclave code Also timing-based side-channel attacks exploiting cache coherence are explicitlynot prevented thus behavior on cache coherence operationsis not defined This cache listed extra software assumptionsas follows
Assumption 1 Software security monitor guarantees thatvictim and attacker process cannot share the same cacheblocks It uses page coloring [24 44] to ensure that victimand attackerrsquos memory is never mapped to the same cacheblocks for the LLCAssumption 2 The software runs on a system with asingle processor core where victim and attacker alternateexecution but can never run truly in parallel Moreoversecurity critical data is always flushed by the securitymonitor when program execution switches away from thevictim program for the L1 cache and the TLB
MI6 Cache [7] is part of the memory hierarchy of theMI6 processor which combines Sanctum [10] cachersquossecurity feature with disabling speculation during thespeculative execution of memory-related operations During
normal processor execution for L1 caches and TLB thecorresponding states will be flushed across context switchesbetween software threads For the LLC set partitioning isused to divide DRAM into contiguous regions And cachesets are guaranteed to be strictly partitioned (two DRAMregions cannot map to the same cache set) Each enclaveis only able to access its own partition Speculation issimply disabled when enclave interacts with the outsideworld because of small performance influence based on therare cases of speculation This cache listed extra softwareassumptions as follows
Assumption 1 Software security monitor guarantees thatthe victim and the attacker process cannot share the samecache blocks It uses page coloring [24 44] to ensure thatvictimrsquos and attackerrsquos memory are never mapped to thesame cache blocks for the LLCAssumption 2 The software runs on a system with asingle processor core where victim and attacker alternateexecution but can never run truly in parallel Moreoversecurity critical data is always flushed by the securitymonitor when program execution switches away from thevictim program for the L1 cache and the TLBAssumption 3 When an enclave is interacting with theoutside environment the corresponding speculation isdisabled by the software
InvisiSpec Cache [52] is able to make speculation invisiblein the data cache hierarchy Before a visibility pointshows up when all of its prior control flow instructionsresolve unsafe speculative loads (USL) will be put intoa speculative buffer (SB) without modifying any cachestates When reaching the visibility point there are twocases In one case the USL and successive instructionswill be possibly squashed because of mismatch of datain the SB and the up-to-date values in the cache Inanother case the core receives possible invalidation fromthe OS before checking of memory consistency modeland no comparison is needed When speculative executionhappens the hardware puts the data into SB as to identifyvisibility point for dealing with final state transition of thespeculative execution InvisiSpec cache targets on Spectre-like attacks and futuristic attacks However InvisiSpeccache is vulnerable to all non-speculative side channels
CATalyst Cache [29] uses partitioning especially CacheAllocation Technology (CAT) [21] available in the LLCof some Intel processors CAT allocates up to 4 differentClasses of Services (CoS) for separate cache ways sothat replacement of cache blocks is only allowed withina certain CoS CATalyst first uses CAT mechanism topartition caches into secure and non-secure parts (non-secure parts may map to 3 CoS while secure parts map
J Hardw Syst Secur (2019) 3397ndash425408
to 1 CoS) Secure pages are assigned to virtual machines(VMs) at a granularity of a page and not shared by morethan one VM Here attacker and victim reside in differentVMs Combined with CAT technology and pseudo-lockingmechanism which pins certain page frames managed bysoftware CATalyst guarantees that malicious code cannotevict secure pages CATalyst implicitly performs preloadingby remapping security-critical code or data to secure pagesFlushes can only be done within each VM And cachecoherence is achieved by assigning secure pages to only oneprocessor and not sharing pages among VMs This cachelisted extra software assumptions as follows
Assumption 1 Security critical data is always preloadedinto the cache at the beginning of the whole programexecutionAssumption 2 Security critical data is always able to fitwithin the secure partition of the cache That is all datain the range x can fit in the secure partitionAssumption 3 The victim and the attacker process cannotshare the same memory spaceAssumption 4 Use pseudo-locking mechanism by soft-ware to make sure that victim and attacker process cannotshare the same cache blocksAssumption 5 Secure pages are reloaded immediatelyafter the flush which is done by the virtual machinemonitor (VMM) to make sure all the secure pages are stillpinned in the secure partition
DAWG Cache [25] (Dynamically Allocated Way Guard)partitions the cache by cache ways and provides fullisolation for hits misses and metadata updates acrossdifferent protection domains (between the attacker and thevictim) DAWG cache is partitioned for the attacker and thevictim and each of them keep their own different domain id(which is similar to process ID used in general caches) Eachdomain id has its own bit fields one is called policy fillmapfor masking fills and selecting the victim to replace anotheris called policy hitmap for masking hit ways Only boththe tag and the domain id are the same will a cache hithappen Therefore DAWG allows read-only cache lines tobe replicated across ways for different protection domainFor a cache miss the victim can only be chosen withinthe ways belonging to the same domain id recorded bythe policy fillmap Consistently the replacement policy isupdated with the victim selection and the metadata derivedfrom the policy fillmap for different domains is updatedas well The paper also proposes the idea to dynamicallypartitions the cache ways following the systemrsquos workloadchanges but does not actually implement it
RIC Cache [22] (Relaxed Inclusion Caches) proposes a low-complexity cache to defend against eviction-based timing-based side-channel attacks on the LLC Normally for an
inclusive cache if the data R is in the LLC it is also inthe higher level cache and eviction of the R in the LLCwill cause the same data in the higher level cache eg L1cache to be invalidated making eviction-based attacks inthe higher level cache possible (eg attacker is able to evictvictimrsquos security critical cache line) For RIC each cacheline is extended with a single bit to set the relaxed inclusionOnce the relaxed inclusion is set for that cache line thecorresponding LLC line eviction will not cause the sameline in the higher-level cache to be invalidated Two kinds ofdata will be set relaxed inclusion bit read only data andthread private data when they are loaded into the cacheThese two kinds of data are claimed by the paper to cover allthe critical data for ciphers Therefore RIC will not preventwritable in-private critical data which is currently not foundin any ciphers Apart from that RIC requires flushing for thecorresponding cache lines in the cases that the RIC bits aremodified or for thread migration events to avoid the timingleakage during transition time
PL Cache [49] provides isolation by partitioning the cachebased on cache blocks It extends each cache block with aprocess ID and a lock status bit The process ID and thelock status bits are controlled by the extended load and storeinstructions (ld lockld unlock and st lockst unlock)which allow the programmer and compiler to set or reset thelock bit through use of the right load or store instructionIn terms of cache replacement policy for a cache hit PLcache will perform the normal cache hit handling procedureand the instructions with locking or unlocking capability canupdate the process ID and the lock status bits while the hit isprocessed When there is a cache miss locked data cannotbe evicted by data that is not locked and locked data amongdifferent processes cannot be evicted by each other In thiscase the new data will be either loaded or stored withoutcaching In other cases data eviction is possible This cachelisted extra software assumption as follows
Assumption 1 Security critical data is always preloadedinto the cache at the beginning of the whole programexecution
RP Cache [49] uses randomization to de-correlate thememory address accessing and timing of the cache Foreach block of RP cache there is a process ID and oneprotection bit P set to indicate if this cache block needs to beprotected or not A permutation table (PT) stores each cachesetrsquos pre-computed permuted set number and the numberof tables depends on number of protected processes Formemory access operations cache hits need both process IDand address to be the same When a cache miss happensto data D of a cache set S if the to-be-evicted data andto-be-brought-in data belong to the same process but have
J Hardw Syst Secur (2019) 3397ndash425 409
different protection bit arbitrary data of a random cache setSprime will be evicted and D will be accessed without cachingIf they belong to different processes D will be stored in anevicted cache block of Sprime and mapping of S and Sprime will beswapped as well Otherwise the normal replacement policyis executed
Newcache Cache [31 50] dynamically randomizesmemory-to-cache mapping It introduced a ReMappingTable (RMT) and the mapping between memory addressesand this RMT is as in a direct mapped cache while themapping between the RMT and actual cache is fully asso-ciative The index bits of memory address are used to lookup entries in the RMT to find the cache block that shouldbe accessed It stores the most useful cache lines rather thanhold a fixed set of cache lines This index stored in RMTcombined with the process ID is used to look up the actualcache where each cache line is associated with its real indexand process ID Each cache block is also associated witha protection bit (P) to indicate if it is security critical Forcache replacement policy it is very similar to RP cacheCache hit needs both process ID and address to be the sameWhen cache miss happens to data D arbitrary data willbe evicted and D will be accessed without caching if theybelong to the same process but either one of their protec-tion bit is set If the evicted data and brought-in data havedifferent process IDs D will randomly replace a cache linesince it is fully associative in the actual cache Otherwisethe normal replacement policy for direct mapped cache isexecuted
Random Fill Cache [30] de-correlates cache fills withthe memory access using random filling technique Newinstructions used by applications in Random Fill cachecan control if the requested data belongs to a normalrequest or a random fill request Cache hits are processedas in normal cache For the security critical data accessesof the victim a Nof ill request is executed and therequested data access will be performed without cachingMeanwhile on a Random Fill request arbitrary datafrom the range of addresses will be brought into the cacheIn the paper [30] the authors show that random fill ofspatially near data does not hurt performance For otherprocessesrsquo memory accesses and normal victimrsquos memoryaccesses Normal request will be used to achieve normalreplacement policy Victim and attacker are able to removevictimrsquos own security critical data including using clf lush
instructions or cache coherence protocol since the flush willnot influence timing-based side-channel attack prevention(the random filling technique is used for this)
CEASER Cache [38] is able to mitigate conflict-basedLLC timingndashbased side-channel attacks using address
encryption and dynamic remapping CEASER cache doesnot differentiate whom the address belongs to and whetherthe address is security critical When memory access triesto modify the cache state the address will first be encryptedusing Low-Latency BlockCipher (LLBC) [6] which notonly randomizes the cache set it maps but also scatters theoriginal possibly ordered and location-intensive addressesto different cache sets decreasing the probability of conflictmisses The encryption and decryption can be done withintwo cycles using LLBC Furthermore the encryption keywill be periodically changed to avoid key reconstructionThe periodic re-keying will cause the address remapping todynamically change
SCATTER Cache [51] uses cache set randomization toprevent timing-based attacks It builds upon two ideas Firsta mapping function is used to translate memory addressand process information to cache set indices the mappingis different for each program or security domain Secondthe mapping function also calculates a different index foreach cache way in a similar way to the skewed associativecaches [41] The mapping function can be keyed hash orkeyed permutation derivation functionmdasha different key isused for different application or security domain resultingin a different mapping from address to cache sets for eachSoftware (eg the operating system) is responsible formanaging the security domains and process IDs which areused to differentiate the different software and assign itdifferent keys for the mapping For the hardware extensiona cryptographic primitive such as hashing and an indexdecoder for each scattered cache way is added SCATTERcache also stores the index bits of the physical addressto efficiently perform lookups and writebacks There isalso one bit per page-table entry added to allow the kernelto communicate with the user space for security domainidentification
Non Deterministic Cache [23] uses cache access delay torandomize the relation between cache block access andcache access timing There is no differentiation of datacaching between different process ID or whether the datais secure or not A per-cache-block counter records theinterval of its data activeness and is increased on eachglobal counter clock tick when the data is untouched Whenthe counter reaches a predefined value the correspondingcache line will be invalidated Non Deterministic Cacherandomly sets the local countersrsquo initial value that is lessthan the maximum value of the global counter In thiscase the cache delay is changed to be randomized Cachedelay interval controlled by this non-deterministic executioncan lead to different cache hit and miss statistics becausethe invalidation is determined by the randomized counterof each cache line and therefore de-correlates any cache
J Hardw Syst Secur (2019) 3397ndash425410
access time from the address being accessed However theperformance degradation is tremendous
5 Analysis of the Secure Caches
In this section we manually evaluate the effectiveness ofthe 18 secure caches [7 10 12 22 23 25 27 29 30 3848ndash53 56 57] We analyze how well the different cachescan protect against the 72 types of vulnerabilities definedin Tables 2 and 3 which cover all the possible Strong(according to the definition in Section 3) cache timingndashbased vulnerabilities Following the analysis discuss whattypes of secure caches and features are best suited fordefending different types of timing-based attacks
51 Effectiveness of the Secure Caches AgainstTiming-Based Attacks
Tables 4 and 5 list the result of our analysis of whichcaches can prevent which types of attacks Some cachesare able to prevent certain vulnerabilities denoted by acheckmark and green color in the table For exampleSPlowast cache can defend against Vu Ad Vu (slow)(one type of Evict + Time [34]) vulnerability For someother caches and vulnerabilities the cache is not able toprevent the vulnerabilities and it is indicated by times and redcolor For example SecDCP cache cannot defend againstVu Va Vu (slow) (one type of Bernsteinrsquos Attack [3])vulnerability
Each cache is analyzed for each type of vulnerabilitylisted in Tables 2 and 3 A cache is judged to be able toprevent a type of cache timingndashbased vulnerability in threecases
1 A cache can prevent a timing attack if the timing ofthe last step in a vulnerability is always constant andthe attacker can never observe fast and slow timingdifference for the given set of three steps For instancein a regular set-associative cache the Vd Vu Aa
(fast) (one type of Flush + Reload [55]) vulnerabilitywill allow the attacker to know that address a mapsto secret u when the attacker observes fast timingcompared with observing slow timing in the other casesHowever in case of the RP cache [49] will make thetiming of the last step to be always slow because RPcache does not allow data of different processes toderive cache hit between each other
2 A cache can prevent a timing attack if the timingof last step is randomized and cannot have originalcorresponding relation between victimrsquos behavior andattackerrsquos observation For instance Ad Vu Ainv
d (fast) (one type of Prime + Probe Invalidation)
vulnerability when executed on a normal set-associativecache will allow the attacker to know that the addressd has the same index with secret u when observing fasttiming compared with slow timing in the other casesHowever when executing this attacks on the RandomFill cache [30] for example a slow timing will notdetermine that u and d have the same index as thesecret since in Random Fill cache u would be accessedwithout caching and another random data would becached instead
3 A cache can prevent a timing attack if it disallowscertain steps from the three-step model to be executedthus prevents the corresponding vulnerability Forinstance when PL cache [49] preloads and locks thesecurity critical data in the cache vulnerabilities suchas Ad Vu V inv
d (slow) (one type of Prime + TimeInvalidation) will not be possible since a preloadedlocked security critical data will not allow Ad in Step 1to replace it In this case Ad cannot be in the cache sothis vulnerability cannot be triggered in PL cache
From the security perspective the entries of the securecache in Tables 4 and 5 should have as many green coloredcells as possible If a cache design has any red cells thenit cannot defend against that type of vulnerabilitymdashattackerusing the timing-based vulnerability that corresponds to thered cell can attack the system
The third column in Tables 4 and 5 shows a normalset associative cache which cannot defend against anytype of timing-based vulnerabilities Meanwhile the lastcolumn of Tables 4 and 5 shows the situation where thecache is fully disabled As is expected the timing-basedvulnerabilities are eliminated and timing-based attacks willnot succeed Disabling caches however has tremendousperformance penalty Similarly second-to-last columnshows Nondeterministic Cache which totally randomizescache access time It can defend all the attacks but againwill have a tremendous cost to security when the applicationis complex
For each of the entry that shows the effectivenessof a secure cache against a vulnerability there are tworesults listed Left one is for normal execution and theright one is for speculative execution Some secure cachessuch as InvisiSpec cache target timing-based channels inspeculative execution For most of the caches that do notdifferentiate speculative execution and normal executionthe two sub-columns for each cache are the same
6 Secure Cache Techniques
Among the secure cache designs presented in the priorsection there are three main techniques that the caches
J Hardw Syst Secur (2019) 3397ndash425 411
Table4
Exi
stin
gse
cure
cach
esrsquo
prot
ectio
nag
ains
tal
lpo
ssib
letim
ing-
base
dvu
lner
abili
ties
with
last
step
tobe
mem
ory
acce
ssre
late
dop
erat
ions
Sin
gle
ina
gree
nce
llm
eans
this
cach
eca
npr
even
tth
eco
rres
pond
ing
vuln
erab
ility
i
na
pink
cell
mea
nsth
isca
che
can
prev
ent
the
corr
espo
ndin
gvu
lner
abili
tyin
som
ede
gree
A
timesin
are
dce
llm
eans
this
cach
eca
nnot
prev
ent
this
vuln
erab
ility
Fur
ther
mor
efo
rea
chca
che
we
anal
yze
norm
alex
ecut
ion
(lef
tcol
umn
unde
rth
eca
che
nam
e)an
dsp
ecul
ativ
eex
ecut
ion
(rig
htco
lum
nun
der
the
cach
ena
me)
a Dyn
amic
adju
stm
ento
fw
ays
for
diff
eren
tthr
eads
isas
sum
edto
bepr
oper
lyus
edac
cord
ing
toth
eru
nnin
gpr
ogra
mrsquos
cach
eus
age
bSo
me
soft
war
eas
sum
ptio
nslis
ted
inth
een
trie
sin
this
colu
mn
have
been
impl
emen
ted
byth
eca
chersquo
sre
late
dso
ftw
are
c Flus
his
disa
bled
but
cach
eco
here
nce
mig
htbe
used
todo
the
data
rem
oval
dFo
rL
1ca
che
and
TL
Bf
lush
ing
isdo
nedu
ring
cont
exts
witc
he T
hete
chni
ques
are
impl
emen
ted
inL
1ca
che
TL
Ban
dla
st-l
evel
cach
ew
hich
cons
isto
fth
ew
hole
cach
ehi
erar
chy
whe
reL
1ca
che
and
TL
Bre
quir
eso
ftw
are
flus
hpr
otec
tion
and
the
last
-lev
elca
che
can
beac
hiev
edby
sim
ple
hard
war
epa
rtiti
onin
gTo
prot
ecta
llle
vels
ofca
ches
the
soft
war
eas
sum
ptio
nsne
edto
bead
ded
f The
tech
niqu
eis
now
only
impl
emen
ted
inla
st-l
evel
cach
egT
hete
chni
que
now
only
targ
ets
shar
edca
che
hT
hete
chni
que
only
targ
ets
incl
usio
nla
st-l
evel
cach
ei T
hete
chni
que
targ
ets
data
cach
ehi
erar
chy
j For
the
last
-lev
elca
che
cach
eis
part
ition
edbe
twee
nth
evi
ctim
and
the
atta
cker
kT
hete
chni
que
can
cont
rolt
hepr
obab
ilitie
sof
the
vuln
erab
ility
tobe
succ
essf
ulto
beex
trem
ely
smal
ll T
hete
chni
que
can
wor
kin
shar
edr
ead
only
mem
ory
whi
leno
twor
king
insh
ared
wri
tabl
em
emor
ym
Ran
dom
dela
ybu
tnot
rand
omm
appi
ngca
non
lyde
crea
seth
epr
obab
ilitie
sof
atta
cker
inso
me
limite
dde
gree
J Hardw Syst Secur (2019) 3397ndash425412
Table5
Exi
stin
gse
cure
cach
esrsquop
rote
ctio
nag
ains
tall
poss
ible
timin
g-ba
sed
vuln
erab
ilitie
sw
ithla
stst
epto
bein
valid
atio
nre
late
dop
erat
ions
Sin
gle
ina
gree
nce
llm
eans
this
cach
eca
npr
even
tth
eco
rres
pond
ing
vuln
erab
ility
in
api
nkce
llm
eans
this
cach
eca
npr
even
tthe
corr
espo
ndin
gvu
lner
abili
tyin
som
ede
gree
Atimes
ina
red
cell
mea
nsth
isca
che
cann
otpr
even
tthi
svu
lner
abili
ty
Furt
herm
ore
for
each
cach
ew
ean
alyz
eno
rmal
exec
utio
n(l
eftc
olum
nun
der
the
cach
ena
me)
and
spec
ulat
ive
exec
utio
n(r
ight
colu
mn
unde
rth
eca
che
nam
e)
a Dyn
amic
adju
stm
ento
fw
ays
for
diff
eren
tthr
eads
isas
sum
edto
bepr
oper
lyus
edac
cord
ing
toth
eru
nnin
gpr
ogra
mrsquos
cach
eus
age
bSo
me
soft
war
eas
sum
ptio
nslis
ted
inth
een
trie
sin
this
colu
mn
have
been
impl
emen
ted
byth
eca
chersquo
sre
late
dso
ftw
are
c Flus
his
disa
bled
but
cach
eco
here
nce
mig
htbe
used
todo
the
data
rem
oval
dFo
rL
1ca
che
and
TL
Bf
lush
ing
isdo
nedu
ring
cont
exts
witc
he T
hete
chni
ques
are
impl
emen
ted
inL
1ca
che
TL
Ban
dla
st-l
evel
cach
ew
hich
cons
isto
fth
ew
hole
cach
ehi
erar
chy
whe
reL
1ca
che
and
TL
Bre
quir
eso
ftw
are
flus
hpr
otec
tion
and
the
last
-lev
elca
che
can
beac
hiev
edby
sim
ple
hard
war
epa
rtiti
onin
gTo
prot
ecta
llle
vels
ofca
ches
the
soft
war
eas
sum
ptio
nsne
edto
bead
ded
f The
tech
niqu
eis
now
only
impl
emen
ted
inla
st-l
evel
cach
egT
hete
chni
que
now
only
targ
ets
shar
edca
che
hT
hete
chni
que
only
targ
ets
incl
usio
nla
st-l
evel
cach
ei T
hete
chni
que
targ
ets
data
cach
ehi
erar
chy
j For
the
last
-lev
elca
che
cach
eis
part
ition
edbe
twee
nth
evi
ctim
and
the
atta
cker
kT
hete
chni
que
can
cont
rolt
hepr
obab
ilitie
sof
the
vuln
erab
ility
tobe
succ
essf
ulto
beex
trem
ely
smal
ll T
hete
chni
que
can
wor
kin
shar
edr
ead
only
mem
ory
whi
leno
twor
king
insh
ared
wri
tabl
em
emor
ym
Ran
dom
dela
ybu
tnot
rand
omm
appi
ngca
non
lyde
crea
seth
epr
obab
ilitie
sof
atta
cker
inso
me
limite
dde
gree
J Hardw Syst Secur (2019) 3397ndash425 413
utilize differentiating sensitive data partitioning andrandomization
Differentiating Sensitive Data (columns for CATalyst cacheto columns for Random Fill cache in Tables 4 and 5) allowsthe victim or attacker software or management software toexplicitly label a certain range of the data of victim whichthey think is sensitive The victim process or managementsoftware is able to use cache-specific instructions to protectthe data and limit internal interference between victimrsquosown data For example it is possible to disable victimrsquosown flushing of victimrsquos labeled data and therefore preventvulnerabilities that leverage flushing This technique allowsthe designer to have stronger control over security criticaldata rather than forcing the system to assume all of victimrsquosdata is sensitive However how to identify sensitive dataand whether this identification process is reliable are openresearch questions for caches that support differentiation ofsensitive data
This technique is independent of whether a cache usespartitioning or randomization techniques to eliminate sidechannels between the attacker and the victim Caches thatare able to label and identify sensitive data have theadvantage in preventing internal interference since they areable to differentiate sensitive data from the normal dataand can make use of special instructions to give moreprivileges to sensitive data However it requires careful usewhen identifying the actual sensitive data and implementingcorresponding security features on the cache
Comparing PL cache with SPlowast cache although bothof them use partitioning flush is able to be implementedto be disabled for victimrsquos sensitive data in PL cachewhere Vu V inv
a Vu (slow) (one type of Flush +Time) is prevented Newcache is able to prevent Vu Va Vu (slow) (one type of Bernsteinrsquos Attack [3])while most of the caches without ability to differentiatesensitive data cannot because Newcache disallows replacingdata as long as either data to be evicted or data to becached is identified to be sensitive However permittingdifferentiation of sensitive data can potentially backfire onthe cache itself For example Random Fill cache cannotprevent Vu Ad Vu (slow) (one type of Evict +Time [34]) which most of the other caches can prevent oravoid because the random fill technique loses its intendedrandom behavior when the security critical data is initiallyloaded into the cache in Step 1
Partitioning-Based Caches usually limit the victim and theattacker to be able to only access a limited set of cacheblock (columns for SPlowast cache to column for PL cachein Tables 4 and 5) For example either there is staticor dynamic partitioning of caches which allocates someblocks to High victim and Low attacker The partitioning
can be based not just on whether the memory access isvictimrsquos or attackerrsquos but also on where the access is to(eg High partition is determined by the data address) Forspeculative execution attackerrsquos code can be the part ofspeculation or out-of-order load or store which is able to bepartitioned (eg using speculative load buffer) from othernormal operations The partitioning granularity can be cachesets cache lines or cache ways Partitioning-based securecaches are usually able to prevent external interference bypartitioning but are weak at preventing internal interferenceWhen partitioning is used interference between the attackerand the victim or data belonging to different securitylevels should not be possible and attacks based on externalinterference between the victim and the attacker will failHowever the internal interference of victimrsquos own datais hard to be prevented by the partitioning-based cachesWhatrsquos more partitioning is recognized to be wastefulin terms of cache space and inherently degrades systemperformance [49] Dynamic partitioning can help limit thenegative performance and space impacts but it could be ata cost of revealing some information when adjusting thepartitioning size for each part It also does not help withinternal interference prevention
In terms of the three-step model the partitioning-basedcaches excel at making use of partitioning techniques todisallow the attacker to set initial states (Step 0) of victimpartition by use of flushing or eviction and therefore bringuncertainty to the final timing observation made by theattacker
SPlowast cache can prevent external miss-based interferencebut it still allows the victim and the attacker to getcache hits due to each otherrsquos data which makes hit-based vulnerabilities happen eg Vd Vu Va (fast)(one type of Cache Internal Collision [5]) vulnerabilityis one of the examples that SPlowast cache cannot preventSecVerilog cache is similar to SPlowast cache but prevents theattacker from directly getting cache hit due to victimrsquos datafor confidentiality and therefore prevents vulnerabilitiessuch as Ainv
a Vu Aa (fast) (one type of Flush+ Reload [55]) SHARP cache mainly uses partitioningcombined with random eviction to minimize the probabilityof evicting victimrsquos data and prevent external miss-basedvulnerabilities It is vulnerable to hit-based or internalinterference vulnerabilities such as Vu Va Vu (slow)(one type of Bernsteinrsquos Attack [3]) vulnerability DAWGcache will only allow the data to get a cache hit if bothits address and the process ID are the same Thereforecompared with normal partitioning cache such as SPlowast cacheit is able to prevent vulnerabilities such as Vd Vu Ainv
d
(fast) (one type of Prime + Flush)SecDCP and NoMo cache both leverage dynamic
partitioning to improve performance Compared withSecVerilog cache SecDCP cache introduces certain side
J Hardw Syst Secur (2019) 3397ndash425414
channels which manifest themselves when the number ofways assigned to the victim and attacker changes egVu Ainv
a Vu (slow) (one type of Flush + Time)vulnerability NoMo cache behaves more carefully whenchanging the number of ways during dynamic partitioninghowever it requires victimrsquos sensitive data to fit intothe assigned partitions otherwise it will be put into theunreserved way and allow eviction by the attacker SecDCPdoes not have unreserved way All the space in the cachewill be either belongs to High or Low partition
Sanctum cache and CATalyst cache are both controlledby a powerful software monitor and they disallow securepage sharing between victim and attacker to preventvulnerabilities such as Ad Vu Aa (fast) (onetype of Flush + Reload [55]) Sanctum cache does notconsider internal interference while CATalyst cache ismore carefully designed to prevent different vulnerabilitieswith the implemented software system so far supportingpreventing all of the vulnerabilities but only works forLLC and with high software implementation complexityand some assumptions that might be hard to achievein other scenarios eg assuming the secure partition isbig enough to fit all the secure data MI6 cache is thecombination of Sanctum and disabling speculation wheninteracting with the outside world Therefore in normalexecution it behaves the same as Sanctum For speculativeexecution because it will simply disable all the speculationwhen involving the outside world the external interferencevulnerability such as Vd Vu Ad (slow) (one type ofEvict + Probe) vulnerability will be prevented
InvisiSpec cache does not modify the original cachestate but places the data in a speculative buffer partitionduring the speculation or out-of-order load or store Sinceduring speculation cache state is not actually updated thespeculative execution cannot trigger any of the steps inthe three-step model RIC cache focuses on eviction basedattack and therefore are good at preventing even someinternal miss-based vulnerability such as Vu Va Vu
(slow) (one type of Bernsteinrsquos Attack [3]) but are bad atall hit-based vulnerabilities PL cache is line-partitionedand uses locking techniques for victimrsquos security criticaldata It can prevent many vulnerabilities because preloadingand locking secure data disallow the attacker or non-securevictim data to set initial states (Step 0) for victim partitionand therefore brings uncertainty to the final observation bythe attacker eg Ad Vu Va (fast) (one type of CacheInternal Collision [5]) vulnerability is prevented
Randomization-Based Caches (columns for SHARP cacheand columns for RP cache to columns for Non Deterministiccache in Tables 4 and 5) inherently de-correlate therelationship between information of victimrsquos securitycritical datarsquos address and observed timing from cache
hit or miss or between the address and observed timingof flush or cache coherence operations For speculativeexecution they also de-correlate the relationship betweenthe address of the data being accessed during speculativeexecution or out-of-order load or store and the observedtiming from a cache hit or miss Randomization can beused when bringing data into the cache evicting data orboth Some designs randomize the address to cache setmapping As a result of the randomization the mutualinformation from the observed timing due to having ornot having data in the cache could be reduced to 0 ifrandomization is done on every memory access Somesecure caches use randomization to avoid many of the miss-based internal interference vulnerabilities However theymay still suffer from hit-based vulnerabilities especiallywhen the vulnerabilities are related to internal interferenceHowever randomization is also likewise recognized toincrease performance overheads [23] It also requires afast and secure random number generator Most of therandomization is cache-line-based and can be combinedwith differentiation of sensitive data to be more efficient
RP cache allows eviction between different sensitivedata which leaves vulnerabilities such as Vu Va Vu
(slow) (one type of Bernsteinrsquos Attack [3]) still possiblewhile Newcache prevents this Both of the RP cacheand Newcache are not able to prevent hit-based internal-interference vulnerabilities such as Ainv
a Vu Va
(fast) (one type of Cache Internal Collision [5]) RandomFill cache is able to use total de-correlation of memoryaccess and cache access of victimrsquos security critical datato prevent most of the internal and external interferenceHowever when security critical data is initially directlyloaded into the cache block for Step 1 Random Fill cachewill not randomly load security critical data and allowsvulnerabilities such as Vu V inv
a Vu (slow) (one typeof Flush + Time) vulnerability to exist CEASER cache usesencryption scheme plus dynamic remapping to randomizemapping from memory addresses to cache sets Howeverthis targets eviction-based attacks and cannot prevent hit-based vulnerabilities such as Va V inv
u V inva (fast)
(one type of Flush + Probe Invalidation) SCATTER cacheencrypts both the cache address and process ID whenmapping into different cache index to further prevent morehit-based vulnerabilities for shared and read only memoryNon Deterministic Cache totally randomizes timing ofcache accesses by adding delays and can prevent all attacks(but at tremendous performance cost)
61 Estimated Performance and Security Tradeoffs
Table 6 shows the implementation and performanceresults of the secure caches as listed by the designersin the different papers At the extreme end there is
J Hardw Syst Secur (2019) 3397ndash425 415
Table6
Exi
stin
gse
cure
cach
esrsquo
impl
emen
tatio
nm
etho
dpe
rfor
man
cep
ower
and
area
sum
mar
y
J Hardw Syst Secur (2019) 3397ndash425416
the Non Deterministic cache with random delay thesecure cache can prevent all the cache timingndashbasedvulnerabilities in some degreemdashwhile their paper reportsonly 7 degradation in performance we expect it tobe much more for more complex application than AESalgorithm Disabling caches eliminates the attacks but ata huge performance cost Normally a secure cache needsto sacrifice some performance in order to de-correlatememory access with the timing The secure caches thattend to be able to prevent more vulnerabilities usuallyhave weaker performance compared with other securecaches For example more security seems to imply lessperformance
62 Towards Ideal Secure Cache
Based on the above analysis a good secure cache shouldconsider all the 72 types of Strong vulnerabilities egexternal and internal interference and hit-based and miss-based vulnerabilities Considering all factors and basedon Tables 4 and 5 we have several suggestions andobservations for a secure cache design which can defendtiming-based attacks
bull Internal interference is important for caches to preventtiming-based attacks and is the weak point of most ofthe secure caches To prevent this the following threesubpoints should be considered
ndash Miss-based internal interference can be solvedby randomly evicting data to de-correlatememory access with timing information wheneither data to be evicted or data to be cachedis sensitive eg Newcache prevents Vu Va Vu (slow) (one type of BernsteinrsquosAttack [3]) vulnerability
ndash Hit-based internal interference can be solvedby randomly bringing data into the cache egRandom Fill cache prevents Ad Vu Va
(fast) (Cache Internal Collision) vulnerabilityndash To limit internal interference at lower perfor-
mance cost rather than simply assume all ofvictimrsquos data is sensitive it is better to differ-entiate real sensitive data from other data in thevictim code However identification of sen-sitive information needs to be carefully usedeg Random Fill cache is vulnerable to Vu Ad Vu (fast) (one type of Evict + Time [34])vulnerability which most of the secure cachesare able to prevent
bull Direct partitioning between the victim and the attackeralthough may hurt cache space utilization or perfor-mance is good at disallowing attacker to set known
initial state to victimrsquos partition and therefore pre-vents external interference Alternatively careful use ofrandomization can also prevent external interference
It should be noted that some cache designs only focuson certain levels eg CATalyst cache only works at thelast level cache In order to fully protect the whole cachesystem from timing-based attacks all levels of caches in thehierarchy should be protected with related security featuresFor example Sanctum is able to prevent all levels of cachesfrom L1 to last-level cache Consequently secure cachedesign needs to be realizable at all levels of the cachehierarchy
7 RelatedWork
There are a lot of existing attacks exploring timing-based cache channels eg [1 3 5 16ndash19 34 36 5455] Furthermore our recent paper [11] has summarizedcache timingndashbased side-channel vulnerabilities using athree-step model and inspired this work on checkingwhich vulnerability types are truly defeated by the securecaches in context of timing-based attacks In other workZhang and Lee [59] used finite-state machine to modelcache architectures and leveraged mutual information tomeasure potential side-channel leakage of the modeledcache architectures Meanwhile He and Lee [20] modeledinterference using probabilistic information flow graph andused attackerrsquos success probability to estimate differentcachesrsquo ability to defend against some cache timingndashbasedside-channel attacks However they did not explore allpossible vulnerabilities due to cache timingndashbased channels
There is also some other work focusing on cacheside-channel verification [13 14 47] Among theseCacheAudit [14] efficiently computes possible side-channelobservations using abstractions in a modular way Bit-level and arithmetic reasoning is used in [13] for memoryaccesses in the presence of dynamic memory allocationCacheD [47] detects potential cache differences at eachprogram point leveraging symbolic execution and constraintsolving
Hardware transactional memory has also been lever-aged to prevent timing-based cache side-channel attacks [815] Hardware transactional memory (HTM) is availableon modern commercial processors such as Intelrsquos Transac-tional Synchronization Extensions (TSX) Its main featureis to abort the transaction and roll back the modificationswhenever a cache block contained in the read set or writeset is evicted out of the cache In [15] HTM was com-bined with preloading strategy for code and data to preventFlush + Reload attacks in the local setting and Prime andProbe attacks in the cloud setting In [8] the software-level
J Hardw Syst Secur (2019) 3397ndash425 417
solution targets system calls page faults code refactoringand abort reasoning to eliminate not only Prime + ProbeFlush + Reload but also Evict + time and Cache Collisionattacks
8 Conclusion
This paper first proposed a new three-step model in orderto model all possible cache timing vulnerabilities It furtherprovided a cache three-step simulator and reduction rulesto derive effective vulnerabilities allowing us to find onesthat have not been exploited in literature With exhaustiveeffective vulnerability types listed this paper presentedanalysis of 18 secure processor cache designs with respectto how well they can defend against these timing-basedvulnerabilities Our work showed that vulnerabilities basedon internal interference of the victim application aredifficult to protect against and many secure cache designsfail in this We also provided a summary of secure processorcache features that could be integrated to make an idealsecure cache that is able to defend timing-based attacksOverall implementing a secure cache in a processor can be aviable alternative to defend timing-based attacks Howeverit requires design of an ideal secure cache or correction ofexisting secure cache designs to eliminate the few attacksthat they do not protect against
Acknowledgments This work was support in part by NSF grants1651945 and 1813797 and through Semiconductor Research Corpo-ration (SRC) under GRC Task 2844001
Appendix A Attack Strategies Descriptions
This appendix gives overview of the attack strategies shownin Tables 2 and 3 in Section 3 For each attack strategy anoverview of the three steps of the strategy is given Some ofthe strategies are similar and some may not be precise butwe keep and use the original names as they were assignedin prior work One advantage of our three-step model isthat it gives precise definition of each attack Neverthelessthe attack strategy names used before (and added by us forstrategies which did not have such names) may be useful torecall the attacksrsquo high-level operation
Cache Internal Collision In Step 1 cache blockrsquos data isinvalidated by flushing or eviction done by either theattacker or the victim Then the victim accesses secret datain Step 2 Finally the victim accesses data at a knownaddress in Step 3 if there is a cache hit then it reveals thatthere is an internal collision and leaks value of u
Flush + Reload In Step 1 either the attacker or the victiminvalidates the cache blockrsquos data by flushing or evictionThen the victim access secret data in Step 2 Finally theattacker tries to access some data in Step 2 using a knownaddress If a cache hit is observed then addresses from lasttwo steps are the same and the attacker learns the secretaddress This strategy has similar Step 1 and Step 2 asCache Internal Collision vulnerability but for Step 3 it isthe attacker who does the reload access
Reload + Time (New Name Assigned in this Paper) In Step
1 secret data is invalidated by the victim Then the attackerdoes some known data access in Step 2 that could possiblybring back the invalidated the victimrsquos secret data in Step
1 In Step 3 if the victim reloads the secret data a cachehit is observed and the attacker can derive the secret datarsquosaddress
Flush + Probe (New Name Assigned in this Paper) In Step
1 the victim or the attacker access some known addressIn Step 2 the victim invalidates secret data In Step 3reloading of Step 1rsquos data and observation of a cache misswill help the attacker learn that the secret data maps to theknown address from Step 1
Evict + Time In Step 1 some victimrsquos secret data is put intothe cache by the victim itself In Step 2 the attacker evicts aspecific cache set by performing a memory related operationthat is not a flush In Step 3 the victim reloads secret dataand if a cache miss is observed the will learn the secretdatarsquos cache set information This attack has similar Step 1and Step 3 as Flush + Time vulnerability but for Step 2 inEvict + Time the attacker invalidates some known addressallowing it to find the full address of the secret data insteadof evicting a cache set to only find the secret datarsquos cacheindex as in the Flush + Time attack
Prime + Probe In Step 1 the attacker primes the cache setusing data at address known to the attacker In Step 2 thevictim accesses the secret data which possibly evicts datafrom Step 1 In Step 3 the attacker probes each cache setand if a cache miss is observed the attacker knowns thesecret data maps to the cache set he or she primed
Bernsteinrsquos Attack This attack strategy leverages the vic-timrsquos internal interference to trigger the miss-based attackFor one case the victim does the same secret data access inStep 1 and Step 3 while in Step 2 the victim tries to evictone whole cache setrsquos data by known data accesses If cachemiss is observed in Step 3 that will tell the attacker thecache set is the one secret data maps to For another casethe victim primes and probe a cache set in Step 1 and Step
3 driven by the attacker while in Step 2 the victim tries to
J Hardw Syst Secur (2019) 3397ndash425418
access the secret data Similar to the first case observingcache miss in Step 3 tells the attacker the cache set is theone secret data maps to
Evict + Probe (NewName Assigned in this Paper) In Step 1victim evict the cache set using the access to a data at anaddress known to the attacker In Step 2 the victim accessessecret data which possibly evicts data from Step 1 In Step
3 the attacker probes each cache set using the same data asin Step 1 if a cache miss is observed the attacker knownsthe secret data maps to the cache set he or she primed Thisattack strategy has similar Step 2 and Step 3 as Prime +Probe attack but for Step 1 it is the victim that does theeviction accesses
Prime + Time (NewName Assigned in this Paper) In Step 1the attacker primes the cache set using access to data at anaddress known to the attacker In Step 2 the victim accessessecret data which possibly evicts data from Step 1 In Step
3 the victim probes each cache set using the same data Step
1 if a cache miss is observed the attacker knowns the secretdata maps to the cache set he or she primed in Step 1 Thisattack strategy has similar Step 1 and Step 2 as Prime +Probe attack but for Step 3 it is the victim that does theprobing accesses
Flush + Time (NewNameAssigned in this Paper) The victimaccesses the same secret data in Step 1 and Step 3 whilein Step 2 the attacker tries to invalidate data at a knownaddress If cache miss is observed in Step 3 that will tellthe attacker the data address he or she invalidated in Step 2maps to the secret data
Invalidation related (new names assigned in this paper)Vulnerabilities that have names ending with ldquoinvalidationrdquoin Table 3 correspond to the vulnerabilities that havethe same name (except for the ldquoinvalidationrdquo part) inTable 2 The difference between each set of correspondingvulnerabilities is that the vulnerabilities ending withldquoinvalidationrdquo use invalidation related operation in the laststep to derive the timing information rather than the normalmemory access related operations
Appendix B Soundness Analysisof the Three-StepModel
In this section we analyze the soundness of the three-stepmodel to demonstrate that the three-step model can coverall possible timing-based cache vulnerabilities in normalcaches If there is a vulnerability that is represented usingmore than three steps the steps can be reduced to only three
steps or a three-step sub-pattern can be found in the longerrepresentation
In the below analysis we use β to denote the numberof memory-related operations ie steps in a representationof a vulnerability We show that β = 1 is not sufficient torepresent a vulnerability β = 2 covers some vulnerabilitiesbut not all β = 3 represents all the vulnerabilities andβ gt 3 can be reduced to only three steps or a three-step sub-pattern can be found in the longer representation Knownaddresses refer to all the cache states that interferencewith the data a aalias and d Unknown address refersu An access to a known memory address is denoted asknown access operation and an invalidation of a knownmemory address is denoted as known inv operationThe known access operation and known inv operation
together make up not u operations An unknown memoryrelated operation (containing u) is denoted as u operation
B1 Patterns with β = 1
When β = 1 there is only one memory-relatedoperation and it is not possible to create interferencebetween memory-related operations since two memory-related operations are the minimum requirement for aninterference Furthermore β = 1 corresponds to the three-step pattern with both Step 1 and Step 2 being since thecache state gives no information and Step 3 being theone operation These types of patterns are all examined bythe cache three-step simulator and none of these types arefound to be effective Consequently a vulnerability cannotexit when β = 1
B2 Patterns with β = 2
When β = 2 it satisfies the minimum requirementof an interference for memory related operations andcorresponds to the three-step cases where Step 1 is and Step 2 and Step 3 are the two operations Thesetypes are all examined by the cache three-step simulatorand some of them belong to Weak Vulnerabilities like Aa Vu Therefore three-step cases where Step
1 is have corresponding effective vulnerabilities shownin Table 2 Consequently β = 2 can represent someweak vulnerabilities but not all vulnerabilities as there existsome that are represented with three steps as discussednext
B3 Patterns with β = 3
When β = 3 we have tested all possible combinations ofthree-step memory related operations in Section 33 usingour cache simulator for the three-step model We foundthat there are in total 72 types of Strong Vulnerabilities
J Hardw Syst Secur (2019) 3397ndash425 419
and 64 types of Weak Vulnerabilities that are representedby patterns with β = 3 steps Consequently β = 3can represent all the vulnerabilities (including some weakones where Step 1 is ) Using more steps to representvulnerabilities is not necessary as discussed next
B4 Patterns with β gt 3
When β gt 3 the pattern of memory-related operations fora vulnerability can be reduced using the following rules
B41 Subdivision Rules
First a set of subdivision rules is used to divide the longpattern into shorter patterns following the below rulesEach subdivision rule should be applied recursively beforeapplying the next rule
Subdivision Rule 1 If the longer pattern contains a sub-pattern such as the longer pattern canbe divided into two separate patterns where is assignedas Step 1 of the second pattern This is because givesno timing information and the attacker loses track of thecache state after This rule should be recursively applieduntil there are no sub-patterns left with a in the middleor as last step ( in the last step will be deleted) in thelonger pattern
Subdivision Rule 2 Next if a pattern (derived afterrecursive application of the Rule 1 contains a sub-patternsuch as AinvVinv the longer pattern canbe divided into two separate patterns where AinvVinv isassigned as Step 1 of the second pattern This is becauseAinvVinv will flush all the timing information of thecurrent block and it can be used as the flushing step forStep 1 eg vulnerability Ainv Vu Aa(f ast)shown in Table 2 AinvVinv cannot be a candidate formiddle steps or the last step because it will flush alltiming information making the attacker unable to deducethe final timing with victimrsquos sensitive address translationinformation This rule should be recursively applied untilthere are no sub-patterns left with a AinvVinv in themiddle or the last step (AinvVinv in the last step will bedeleted)
B42 Simplification Rules
For each of the patterns resulting from the subdivisionof the original pattern we define Commute Rules UnionRules and Reduction Rules for a each set of two adjacentsteps in these remaining patterns In Table 7 we showall the possible cases of the rule applying conditions for
each adjacent two steps regardless of the attackerrsquos access(A) or the victimrsquos access (V ) The table shows whetherthe corresponding two steps can be commuted reduced orunioned (and the reduced or the unioned result if the rulescan be applied)
B421 Commute Rules
Suppose there are two adjacent steps M and N for a memorysequences M N If commuting M andN lead to the same observation result ie M N and N M will have thesame timing observation information in the final step for theattacker we can freely exchange the place of M and N inthis pattern In this case we have more chance to Reduceand Union the steps within the memory sequence by thefollowing Rules In the possible commuting process we willtry every possible combinations to commute different pairsof two steps that are able to apply the Commute Rules andthen further apply Reduce Rules and Union Rules to seewhether the commute is effective ie there can be stepsreduced or unioned after the proper commuting process Thefollowing two adjacent memory-related operations can becommuted
ndash Commute Rule 1 For two adjacent steps if one stepis a known access operation and another step is aknown inv operation and the addresses they referto are different these two steps can be commuted nomatter which position of the two steps they are in withinthe whole memory sequence It will show a ldquoyesrdquo forthe corresponding two-step pattern for the CommuteRule 1 column if these two can be commuted in Table 7
ndash Commute Rule 2 A superset of two-step patterns thatcan apply Commute Rule 1 can be commuted if thesecond step of these two adjacent steps is not thelast step in the whole memory sequence There aresome two adjacent steps that can only be commutedif the second step of these two adjacent steps is notthe last step in the whole memory sequence Therewill be a ldquoyesrdquo for the corresponding two-step patternfor the Commute Rule 2 column and a ldquonordquo for thecorresponding two-step pattern for the Commute Rule 1column in Table 7
B422 Reduction Rules
If the memory sequence after applying Commute Ruleshave a sub-pattern that has two adjacent steps both relatedto known addresses or both related to unknown address(including repeating states) the two adjacent steps can bereduced to only one following the reduction rules (if thetwo-step pattern has ldquoyesrdquo for the Column ldquoUnion Rule or
J Hardw Syst Secur (2019) 3397ndash425420
Table7
Rul
esfo
rco
mbi
ning
two
adja
cent
step
s
Firs
tSe
cond
Com
mute
Com
mute
Union
Rule
orC
ombi
ned
Firs
tSe
cond
Com
mute
Com
mute
Union
Rule
orC
ombi
ned
Rule1
Rule2
ReduceRule
Step
Rule1
Rule2
ReduceRule
Step
aa
yes
yes
yes
aa
inv
ano
noye
sa
aa
ali
as
nono
yes
aali
as
ain
va
ali
as
yes
yes
yes
aali
as
ad
nono
yes
da
inv
dye
sye
sye
sd
au
nono
nominus
ain
vu
nono
nominus
aa
inv
nono
yes
ain
va
inv
ain
vye
sye
sye
sa
inv
aa
ali
asin
vye
sye
sye
sa
ain
va
ali
asin
vye
sye
sye
sU
nio
n(a
inva
ali
asin
v)
ad
inv
yes
yes
yes
aa
inv
din
vye
sye
sye
sU
nio
n(a
invd
inv)
au
inv
nono
nominus
ain
vu
inv
noye
sno
minusa
ali
as
ano
noye
sa
aali
asin
va
yes
yes
yes
a
aali
as
aali
as
yes
yes
yes
aali
as
aali
asin
va
ali
as
nono
yes
aali
as
aali
as
dno
noye
sd
aali
asin
vd
yes
yes
yes
d
aali
as
uno
nono
minusa
ali
asin
vu
nono
nominus
aali
as
ain
vye
sye
sye
sa
ali
as
aali
asin
va
inv
yes
yes
yes
Unio
n(a
inva
ali
asin
v)
aali
as
aali
asin
vno
noye
sa
ali
asin
va
ali
asin
va
ali
asin
vye
sye
sye
sa
ali
asin
v
aali
as
din
vye
sye
sye
sa
ali
as
aali
asin
vd
inv
yes
yes
yes
Unio
n(d
inva
ali
asin
v)
aali
as
uin
vno
nono
minusa
ali
asin
vu
inv
noye
sno
minusd
ano
noye
sa
din
va
yes
yes
yes
a
da
ali
as
nono
yes
aali
as
din
va
ali
as
yes
yes
yes
aali
as
dd
yes
yes
yes
dd
inv
dno
noye
sd
du
nono
nominus
din
vu
noye
sno
minusd
ain
vye
sye
sye
sd
din
va
inv
yes
yes
yes
Unio
n(a
invd
inv)
da
ali
asin
vye
sye
sye
sd
din
va
ali
asin
vye
sye
sye
sU
nio
n(d
inva
ali
asin
v)
dd
inv
nono
yes
din
vd
inv
din
vye
sye
sye
sd
inv
du
inv
noye
sno
minusd
inv
uin
vno
yes
nominus
ua
nono
nominus
uin
va
nono
nominus
ua
ali
as
nono
nominus
uin
va
ali
as
nono
nominus
ud
nono
nominus
uin
vd
noye
sno
minusu
uye
sye
sye
su
uin
vu
nono
yes
u
ua
inv
nono
nominus
uin
va
inv
noye
sno
minusu
aali
asin
vno
nono
minusu
inv
aali
asin
vno
yes
nominus
ud
inv
noye
sno
minusu
inv
din
vno
yes
nominus
uu
inv
nono
yes
uin
vu
inv
uin
vye
sye
sye
su
inv
J Hardw Syst Secur (2019) 3397ndash425 421
Reduce Rulerdquo and has no Union result for the ldquoCombinedSteprdquo Column in Table 7
ndash Reduction Rule 1 For two u operations although u isunknown both of the accesses target on the same u socan be reduced to only keep the second access in thememory sequence
ndash Reduction Rule 2 For two known adjacent memoryaccessndashrelated operations (known access operation)they always result in a deterministic state of the secondmemory accessndashrelated cache block so these two stepscan be reduced to only one step
ndash Reduction Rule 3 For two adjacent steps if onestep is known access operation and another one isknown inv operation no matter what order theyhave and the address they refer to is the same these twocan be reduced to one step which is the second step
B423 Union Rules
Suppose there are two adjacent steps M and N for memorysequences M N If combining M andN leads to the same timing observation result ie M N and Union(M N) will havethe same timing observation information in the final step forthe attacker we can combine step M and N to be a joint onestep for this memory sequence defined as Union(M N)Two adjacent steps that can be combined are discussed inthe following cases
ndash Union Rule 1 Two invalidations to two knowndifferent memory addresses can be applied UnionRule 1 known inv operation are two operations bothinvalidating some known address therefore they canbe combined to only one step The Union Rule can becontinuously done to union all the adjacent invalidationstep that invalidates known different memory addresses
B424 Final Check Rules
Each long memory sequence will recursively apply thesethree categorizations of the rules in the order Com-mute Rules first to put known access operations andknown inv operation that targets the same address as nearas possible and u operations and not u operations areputting together as much as possible The Reduced Rulesare then checked and applied to the processed memorysequence to reduce the steps Then the Union Rule isapplied to the processed memory sequence
The recursion at each application to these threecategorizations of the rules should be always applied andreduce at least one step until the resulting sequence matchesone of the two possible cases
bull the long (β gt 3) memory sequence with u operation
and not u operation is further reduced to a sequencewhere there are at most three steps in the followingpatterns or less
ndash u operation not u operation u operation
ndash not u operation u operation not u operation
There might be possible extra or AinvV inv beforethese three-step pattern where
bull An extra in the first step will not influencethe result and can be directly removed
bull If an extra AinvV inv in the first step
ndash If followed by known access ope-ration AinvV inv can be removeddue to the actual state further put intothe cache block
ndash If followed by known inv opera-t ion or V inv
u AinvV inv can also beremoved since the memory locationis repeatedly flushed by the two steps
ndash If followed by Vu worst case will beAinvV inv Vu not u operation u operationwhich is either an effective vul-nerability according to Table 2and reduction rules shown inSection 33 or AinvV inv Vu Ainv
d V invd u operation where
Vu Ainvd V inv
d can further beapplied Commute Rule 2 to reduceand be within three steps
In this case the steps are finally within three steps andthe checking is done
bull There exist two adjacent steps that cannot be appliedany Rules above and requires the Rest Checking
The only left two adjacent steps that cannot be appliedby any of the three categorizations of the Rules are thefollowing
ndash AaVaAaalias Vaalias AdVdAinva V inv
a
Ainvaalias V inv
aalias Vu ndash AaVaAaalias Vaalias V inv
u ndash Vu AaVaAaalias Vaalias AdVd
Ainva V inv
a Ainvaalias V inv
aalias ndash V inv
u AaVaAaalias Vaalias We manually checked all of the two adjacent step patterns
above and found that adding extra step before or after these
J Hardw Syst Secur (2019) 3397ndash425422
Algorithm 2 -Step ( 3) pattern reduction
Input number of steps of the pattern
a two-dimensional dynamic-size array [0] contains the states of each step of the original pattern in
order [1] [2] are empty initially
Output reduced effective vulnerability pattern(s) array It will be an empty list if the original pattern does not
correspond to an effective vulnerability
1 = Oslash
2 while contain( ) and index not 0 do3 = 1 ( )
4 end while5 while ( contain( ) and index not 0) or ( contain( ) and index not 0) do6 = 2 ( )
7 end while8 while ( is ineffecitve or has interval effective three steps) do9 = ( )
10 = ( )
11 = ( )
12 if ( is ineffecitve or has interval effective three steps) then13 += ( )
14 end if15 end while16 return
two steps can either generate two adjacent step patterns thatbe processed by the three Rules where further step can bereduced or construct effective vulnerability according toTable 2 and reduction rules shown in Section 33 wherethe corresponding pattern can be treated effective and thechecking is done
B43 Algorithm for Reducing and CheckingMemorySequence
Algorithm 2 is used to (i) reduce a β-step (β gt 3)pattern to a three-step pattern thus demonstrating that thecorresponding β gt 3 step pattern actually is equivalent tothe output three-step pattern and represents a vulnerabilitythat is captured by an existing three-step pattern or (ii)demonstrate that the β-step pattern can be mapped to one ormore three-step vulnerabilities It is not possible for a β-stepvulnerability pattern to not be either (i) or (ii) after doing theRule applications Key outcome of our analysis is that anyβ-step pattern is not a vulnerability or if it is a vulnerabilityit maps to either outputs (i) or (ii) of the algorithm
Inside the Algorithm 2 contain() represents a function tocheck if a list contains a corresponding state is ineffective()represents a function that checks the corresponding memorysequence does not contain any effective three-stepshas interval effective three steps() represents a functionthat check if the corresponding memory sequence can bemapped to one or more three-step vulnerabilities
B44 Summary
In conclusion the three-step model can model all possibletiming-based cache vulnerability in normal caches Vulner-abilities which are represented by more than three steps canbe always reduced to one (or more) vulnerabilities from ourthree-step model and thus using more than three step is notnecessary
References
1 Acıicmez O Koc CK (2006) Trace-driven cache attacks on AES(short paper) In International conference on information andcommunications security Springer pp 112ndash121
2 Agrawal D Archambeault B Rao JR Rohatgi P (2002) TheEM side-channel (s) In International workshop on cryptographichardware and embedded systems Springer pp 29ndash45
3 Bernstein DJ (2005) Cache-timing attacks on AES4 Binkert N Beckmann B Black G Reinhardt SK Saidi A Basu
A Hestness J Hower DR Krishna T Sardashti S et al (2011)The gem5 simulator ACM SIGARCH computer architecture news39(2)1ndash7
5 Bonneau J Mironov I (2006) Cache-collision timing attacksagainst AES In International workshop on cryptographichardware and embedded systems Springer pp 201ndash215
6 Borghoff J Canteaut A Guneysu T Kavun EB Knezevic MKnudsen LR Leander G Nikov V Paar C Rechberger Cet al (2012) Princendasha low-latency block cipher for pervasivecomputing applications In International conference on the theoryand application of cryptology and information security Springerpp 208ndash225
J Hardw Syst Secur (2019) 3397ndash425 423
7 Bourgeat T Lebedev I Wright A Zhang S Devadas S et al (2019)MI6 Secure enclaves in a speculative out-of-order processorIn Proceedings of the 52nd Annual IEEEACM InternationalSymposium on Microarchitecture ACM pp 42ndash56
8 Chen S Liu F Mi Z Zhang Y Lee RB Chen H Wang X(2018) Leveraging hardware transactional memory for cache side-channel defenses In Proceedings of the 2018 on Asia conferenceon computer and communications security ACM pp 601ndash608
9 Clark SS Ransford B Rahmati A Guineau S Sorber J Xu WFu K (2013) Wattsupdoc power side channels to nonintrusivelydiscover untargeted malware on embedded medical devices InPresented as part of the 2013 USENIX workshop on healthinformation technologies
10 Costan V Lebedev IA Devadas S (2016) Sanctum minimalhardware extensions for strong software isolation In USENIXsecurity symposium pp 857ndash874
11 Deng S Xiong W Szefer J (2018) Cache timing side-channel vulnerability checking with computation tree logic InProceedings of the 7th international workshop on hardware andarchitectural support for security and privacy ACM p 2
12 Domnitser L Jaleel A Loew J Abu-Ghazaleh N Ponomarev D(2012) Non-monopolizable caches low-complexity mitigation ofcache side channel attacks ACM Transactions on Architectureand Code Optimization (TACO) 8(4)35
13 Doychev G Kopf B (2017) Rigorous analysis of softwarecountermeasures against cache attacks In Proceedings of the 38thACM SIGPLAN conference on programming language design andimplementation ACM pp 406ndash421
14 Doychev G Kopf B Mauborgne L Reineke J (2015) CacheAudita tool for the static analysis of cache side channels ACM Trans-actions on information and system security (TISSEC) 18(1)4
15 Gruss D Lettner J Schuster F Ohrimenko O Haller I CostaM (2017) Strong and efficient cache side-channel protectionusing hardware transactional memory In USENIX securitysymposium pp 217ndash233
16 Gruss D Maurice C Wagner K Mangard S (2016) Flush+ flusha fast and stealthy cache attack In International conference ondetection of intrusions and malware and vulnerability assessmentSpringer pp 279ndash299
17 Gruss D Spreitzer R Mangard S (2015) Cache template attacksautomating attacks on inclusive last-level caches In USENIXsecurity symposium pp 897ndash912
18 Guanciale R Nemati H Baumann C Dam M (2016) Cache stor-age channels alias-driven attacks and verified countermeasuresIn 2016 IEEE symposium on security and privacy (SP) IEEEpp 38ndash55
19 Gullasch D Bangerter E Krenn S (2011) Cache gamesndashbringingaccess-based cache attacks on AES to practice In 2011 IEEEsymposium on security and privacy (SP) IEEE pp 490ndash505
20 He Z Lee RB (2017) How secure is your cache against side-channel attacks In Proceedings of the 50th annual IEEEACMinternational symposium on microarchitecture ACM pp 341ndash353
21 Intel C (2015) Improving real-time performance by utilizing cacheallocation technology Intel Corporation
22 Kayaalp M Khasawneh KN Esfeden HA Elwell J Abu-Ghazaleh N Ponomarev D Jaleel A (2017) RIC relaxed inclusioncaches for mitigating LLC side-channel attacks In 2017 54thACMEDACIEEE design automation conference (DAC) IEEEpp 1ndash6
23 Keramidas G Antonopoulos A Serpanos DN Kaxiras S (2008)Non deterministic caches a simple and effective defense againstside channel attacks Des Autom Embed Syst 12(3)221ndash230
24 Kessler RE Hill MD (1992) Page placement algorithms for largerealindexed caches ACM Trans Comput Syst (TOCS) 10(4)338ndash359
25 Kiriansky V Lebedev I Amarasinghe S Devadas S Emer J(2018) DAWG a defense against cache timing attacks in spec-ulative execution processors In 2018 51st Annual IEEEACMinternational symposium on microarchitecture (MICRO) IEEEpp 974ndash987
26 Kocher P Genkin D Gruss D Haas W Hamburg M LippM Mangard S Prescher T Schwarz M Yarom Y (2018)Spectre attacks exploiting speculativeexecution In 2019 IEEESymposium on Security and Privacy (SP) IEEE pp 1ndash19
27 Lee RB Kwan P McGregor JP Dwoskin J Wang Z (2005)Architecture for protecting critical secrets in microprocessorsIn ACM SIGARCH computer architecture news vol 33 IEEEComputer Society pp 2ndash13
28 Lee Y Waterman A Avizienis R Cook H Sun C StojanovicV Asanovic K (2014) A 45nm 13 GHz 167 double-precisionGFLOPSW RISC-V processor with vector accelerators InEuropean solid state circuits conference (ESSCIRC) ESSCIRC2014-40th IEEE pp 199ndash202
29 Liu F Ge Q Yarom Y Mckeen F Rozas C Heiser G Lee RB(2016) CATalyst defeating last-level cache side channel attacks incloud computing In 2016 IEEE international symposium on highperformance computer architecture (HPCA) IEEE pp 406ndash418
30 Liu F Lee RB (2014) Random fill cache architecture In2014 47th annual IEEEACM international symposium onmicroarchitecture (MICRO) IEEE pp 203ndash215
31 Liu F Wu H Mai K Lee RB (2016) Newcache secure cachearchitecture thwarting cache side-channel attacks IEEE Micro36(5)8ndash16
32 Luk CK Cohn R Muth R Patil H Klauser A Lowney G WallaceS Reddi VJ Hazelwood K (2005) Pin building customizedprogram analysis tools with dynamic instrumentation In ACMsigplan notices vol 40 ACM pp 190ndash200
33 Masti RJ Rai D Ranganathan A Muller C Thiele L Capkun S(2015) Thermal covert channels on multi-core platforms In 24thUSENIX security symposium (USENIX security 15) pp 865ndash880
34 Osvik DA Shamir A Tromer E (2006) Cache attacks andcountermeasures the case of AES In Cryptographersrsquo track atthe RSA conference Springer pp 1ndash20
35 Patel A Afram F Chen S Ghose K (2011) MARSS Afull system simulator for multicore x86 CPUs In 2011 48thACMEDACIEEE design automation conference (DAC) IEEEpp 1050ndash1055
36 Percival C (2005) Cache missing for fun and profit BSDCan 2005Ottawa 2005 httpwwwdaemonologynetpapershttpdf
37 Hammarlund P Martinez AJ Bajwa AA Hill DL Hallnor EJiang H Dixon M Derr M Hunsaker M Kumar R Osborne RB(2014) Haswell The fourth-generation intel core processor IEEEMicro 1034(2)6ndash20
38 Qureshi MK (2018) CEASER mitigating conflict-based cacheattacks via encrypted-address and remapping In 2018 51Stannual IEEEACM international symposium on microarchitecture(MICRO) IEEE pp 775ndash787
39 Sanchez D Kozyrakis C (2013) ZSIm fast and accuratemicroarchitectural simulation of thousand-core systems In ACMSIGARCH computer architecture news vol 41 ACM pp 475ndash486
40 Schwarz M Schwarzl M Lipp M Gruss D (2018) NetSpectreread arbitrary memory over network In European Symposium onResearch in Computer Security Springer pp 279ndash299
41 Seznec A (1993) A case for two-way skewed-associative cachesACM SIGARCH computer architecture news 21(2)169ndash178
42 Sharkey J Ponomarev D Ghose K (2005) M-sim a flexiblemultithreaded architectural simulation environment Techenicalreport Department of Computer Science State University ofNew York at Binghamton In Technical Report 20012 CompaqComputer Corporation Aug 2001
J Hardw Syst Secur (2019) 3397ndash425424
43 Shivakumar P Jouppi NP (2001) Cacti 30 an integrated cachetiming power and area model In Technical Report 20012Compaq Computer Corporation Aug 2001
44 Taylor G Davies P Farmwald M (1990) The TLB slice-alow-cost high-speed address translation mechanism In 17thannual international symposium on computer architecture 1990Proceedings IEEE pp 355ndash363
45 Thoziyoor S Muralimanohar N Ahn JH Jouppi NP (2008)CACTI 51 Technical Report HPL-2008-20 HP Labs
46 Trippel C Lustig D Martonosi M (2018) MeltdownPrimeand SpectrePrime automatically-synthesized attacks exploitinginvalidation-based coherence protocols arXiv180203802
47 Wang S Wang P Liu X Zhang D Wu D (2017) CacheDidentifying cache-based timing channels in production softwareIn 26th USENIX security symposium USENIX association
48 Wang Y Ferraiuolo A Zhang D Myers AC Suh GE (2016)SecDCP secure dynamic cache partitioning for efficient timingchannel protection In 2016 53nd ACMEDACIEEE designautomation conference (DAC) IEEE pp 1ndash6
49 Wang Z Lee RB (2007) New cache designs for thwarting softwarecache-based side channel attacks In ACM SIGARCH computerarchitecture news vol 35 ACM pp 494ndash505
50 Wang Z Lee RB (2008) A novel cache architecture withenhanced performance and security In 2008 41st IEEEACMinternational symposium on Microarchitecture 2008 MICRO-41IEEE pp 83ndash93
51 Werner M Unterluggauer T Giner L Schwarz M Gruss D Man-gard S (2019) ScatterCache thwarting cache attacks via cache setrandomization In 28th USENIX security symposium (USENIXSecurity 19) USENIX Association Santa Clara CA httpswwwusenixorgconferenceusenixsecurity19presentationwerner
52 Yan M Choi J Skarlatos D Morrison A Fletcher C Torrellas J(2018) InvisiSpec making speculative execution invisible in thecache hierarchy In 2018 51st annual IEEEACM internationalsymposium on microarchitecture (MICRO) IEEE pp 428ndash441
53 Yan M Gopireddy B Shull T Torrellas J (2017) Secure hierarchy-aware cache replacement policy (SHARP) defending againstcache-based side channel attacks In Proceedings of the 44thannual international symposium on computer architecture ACMpp 347ndash360
54 Yao F Doroslovacki M Venkataramani G (2018) Are coherenceprotocol states vulnerable to information leakage In 2018IEEE international symposium on high performance computerarchitecture (HPCA) IEEE pp 168ndash179
55 Yarom Y Falkner K (2014) FLUSH+ RELOAD a high resolutionlow noise L3 cache side-channel attack In USENIX securitysymposium pp 719ndash732
56 Zhang D Askarov A Myers AC (2012) Language-based controland mitigation of timing channels ACM SIGPLAN Not 47(6)99ndash110
57 Zhang D Wang Y Suh GE Myers AC (2015) A hardware designlanguage for timing-sensitive information-flow security In ACMSIGARCH computer architecture news vol 43 ACM pp 503ndash516
58 Zhang S Wright A Bourgeat T Arvind A (2018) Composablebuilding blocks to open up processor design In 2018 51stannual IEEEACM international symposium on microarchitecture(MICRO) IEEE pp 68ndash81
59 Zhang T Lee RB (2014) New models of cache architecturescharacterizing information leakage from cache side channels InProceedings of the 30th annual computer security applicationsconference ACM pp 96ndash105
60 Zhang Y Parikh D Sankaranarayanan K Skadron K Stan M(2003) HotLeakage a temperature-aware model of subthresholdand gate leakage for architects University of Virginia Dept ofComputer Science Tech Report CS-2003 5
Publisherrsquos Note Springer Nature remains neutral with regard tojurisdictional claims in published maps and institutional affiliations
J Hardw Syst Secur (2019) 3397ndash425 425
Table 3 The table shows the second part of the timing-based cache side-channel vulnerabilities where the last step is an invalidation relatedoperation For Step 3 fast indicates no corresponding address of the data is invalidated while slow indicates invalidation operation makes somedata invalid causing longer processing time
Attack Vulnerability Type Macro Attack
Step 1 Step 2 Step 3
Strategy Type
Cache Internal Collision Invalidation Ainv Vu V inva (slow) IH new
V inv Vu V inva (slow) IH new
Ad Vu V inva (slow) IH new
Vd Vu V inva (slow) IH new
Aaalias Vu V inva (slow) IH new
Vaalias Vu V inva (slow) IH new
Flush + Flush Ainva Vu V inv
a (slow) IH (1)
V inva Vu V inv
a (slow) IH (1)
Ainva Vu Ainv
a (slow) EH (1)
V inva Vu Ainv
a (slow) EH (1)
Flush + Reload Invalidation Ainv Vu Ainva (slow) EH new
V inv Vu Ainva (slow) EH new
Ad Vu Ainva (slow) EH new
Vd Vu Ainva (slow) EH new
Aaalias Vu Ainva (slow) EH new
Vaalias Vu Ainva (slow) EH new
Reload + Time Invalidation V invu Aa V inv
u (slow) EH new
V invu Va V inv
u (slow) IH new
Flush + Probe Invalidation Aa V invu Ainv
a (fast) EM new
Aa V invu V inv
a (fast) IM new
Va V invu Ainv
a (fast) EM new
Va V invu V inv
a (fast) IM new
Evict + Time Invalidation Vu Ad V invu (fast) EM new
Vu Aa V invu (fast) EM new
Prime + Probe Invalidation Ad Vu Ainvd (fast) EM new
Aa Vu Ainva (fast) EM new
Bernsteinrsquos Invalidation Attack Vu Va V invu (fast) IM new
Vu Vd V invu (fast) IM new
Vd Vu V invd (fast) IM new
Va Vu V inva (fast) IM new
Evict + Probe Invalidation Vd Vu Ainvd (fast) EM new
Va Vu Ainva (fast) EM new
Prime + Time Invalidation Ad Vu V invd (fast) IM new
Aa Vu V inva (fast) IM new
Flush + Time Invalidation Vu Ainva V inv
u (fast) EM new
Vu V inva V inv
u (fast) IM new
(1) Flush + Flush attack [16]
J Hardw Syst Secur (2019) 3397ndash425406
This section gives brief overview of the 18 secure cachedesigns that have been presented in academic literaturein the last 15 years To the best of our knowledge thesecover all the secure cache designs proposed to date Mostof the designs have been realized in functional simulationeg [23 48] Some have been realized in FPGA eg [7]and a few have been realized in real ASIC hardwareeg [31] No specific secure caches have been implementedin commercial processors to the best of our knowledgehowever CATalyst [29] leverages Intelrsquos CAT (CacheAllocation Technology) technology available today in IntelXeon E5 2618L v3 processors and could be deployed today
When the secure cache description in the cited papersdid not mention the issue of using flush or cache coherencewe assume the victim or the attacker cannot invalidateeach otherrsquos cache blocks by using clf lush instructions orthrough cache coherence protocol operations but they canflush or use cache coherence to invalidate their own cachelines The victim and the attacker also cannot invalidateprotected or locked data Further if the authors specified anyspecific assumptions (mainly about the software) we list theassumption as part of the description of the cache Whatrsquosmore when the level of cache hierarchy was unspecifiedwe assume the secure cachesrsquo features can be appliedto all levels of caches including L1 cache L2 cacheand Last Level Cache (LLC) If the inclusivity of thecaches was not specified we assume they target inclusivecaches Following the below descriptions of each securecache design the analysis of the secure caches is given inSection 5
SPlowast Cache [20 27]3 uses partitioning techniques tostatically partition the cache ways into High and Lowpartition for the victim and the attacker according to theirdifferent process IDs The victim typically belongs to Highsecurity and attacker belongs to Low security Victimrsquosmemory accesses cannot modify Low partition (assigned toprocesses such as the attacker) while the attackerrsquos memoryaccesses cannot modify High partition (assigned to thevictim) However the memory accesses of both the victimand the attacker can result in a hit in either Low or Highpartition if the data is in the cache
SecVerilog Cache [56 57] statically partitions cacheblocks between security levels L (Low) and H (High)Each instruction in the source code for programs usingSecVerilog cache needs to include a t iming label whicheffectively represents whether the data being accessed by
3Two existing papers give slightly different definitions for an ldquoSPrdquocache thus we selected to define a new cache the SPlowast cachethat combines secure cache features of the Secret-Protecting cachefrom [27] with secure cache features of the Static-Partitioned cachefrom [20]
that instruction is Low or High based on the code andthis t iming label can be similar to a process ID thatdifferentiates attackerrsquos (Low) instructions from victimrsquos(High) instructions The cache is designed such thatoperations in the High partition cannot affect timing ofoperations in the Low partition For a cache miss due to Lowinstructions when the data is in the High partition it willbehave as a cache miss and the data will be moved from theHigh to the Low partition to preserve consistency HoweverHigh instructions are able to result in a cache hit in bothHigh and Low partitions if the data is already in the cache
SecDCPCache [48] builds on the SecVerilog cache and usespartitioning idea from the original SecVerilog cache but thepartitioning is dynamic It can support at least two securityclasses H (High) and L (Low) and configurations with moresecurity classes are possible They use the percentage ofcache misses for L instructions that was reduced (increased)when Lrsquos partition size was increased (reduced) by onecache way to adjust the number of ways of the cacheassigned to the Low partition When adjusting number ofways in the cache dedicated to each partition if Lrsquos partitionsize decreases the process ID is checked and L blocksare flushed before the way is reallocated to H On theother hand if Lrsquos partition size increases H blocks in theadjusted cache way remain unmodified so as to not addmore performance overhead and they will eventually beevicted by Lrsquos memory accesses However the feature ofnot flushing High partition data during way adjustment mayleak timing information to the attacker
NoMo Cache [12] dynamically partitions the cache waysamong the currently ldquoactiverdquo simultaneous multithreading(SMT) threads Each thread is exclusively reserved Y blocksin each cache set where Y is within the range of [0 N
M]
where N is the number of ways and M is the number ofSMT threads NoMo-0 equals to traditional set associativecache while NoMo- N
M partitions cache evenly for the
different threads and there are no non-reserved ways Thenumber of Y assigned to each thread is adjusted based onits activeness When adjusting number of blocks assigned toa thread Y blocks are invalidated for cache sets to protecttiming leakage Eviction is not allowed within each threadrsquosown reserved ways while it is possible for the shared waysTherefore to avoid eviction caused by the unreserved wayswe assume NoMo- N
M is used to fully partition the cache
When the attacker and the victim share the same librarythere will be a cache hit if accessing the shared data andthe normal cache hit policy holds to guarantee the cachecoherence
SHARP Cache [53] uses both partitioning and randomiza-tion techniques to prevent victimrsquos data from being evicted
J Hardw Syst Secur (2019) 3397ndash425 407
or flushed by other malicious processes and it targets on theinclusive caches Each cache block is augmented with thecore valid bits (CVB) to indicate which private cache (pro-cess) it belongs to (similar to the Process ID) where CVBstores a bitmap and ith bit in the bitmap is set if the lineis present in ith corersquos private cache Cache hit is allowedamong different processesrsquo data When there is cache missand data needs to be evicted data not belonging to any cur-rent processes will be evicted first If there is no such datadata belonging to the same process will be evicted If thereis no existing data in the cache that is in the same processa random data in the cache set will be evicted This randomeviction will generate an interrupt to the OS to notify it ofa suspicious activity For pages that are read-only or exe-cutable SHARP cache disallows flushing using clf lush inuser mode However invalidating victimrsquos blocks by usingcache coherence protocol is still possible
Sanctum Cache [10] focuses on isolation of enclaves(equivalent to Trusted Software Module in other designs)from each other and the operating system (OS) In terms ofcaches they implement security features for L1 cache TLBand LLC Cache isolation of LLC is achieved by assigningeach enclave or OS to different DRAM address regionsIt uses page-coloring-based cache partitioning scheme [2444] and a software security monitor that ensures per-core isolation between OS and enclaves For L1 cacheand TLB when there is a transition between enclave andnon-enclave mode the security monitor will flush thecore-private caches to achieve isolation Normal flushestriggered by the enclave or the OS can only be done withinenclave or not within enclave code Also timing-based side-channel attacks exploiting cache coherence are explicitlynot prevented thus behavior on cache coherence operationsis not defined This cache listed extra software assumptionsas follows
Assumption 1 Software security monitor guarantees thatvictim and attacker process cannot share the same cacheblocks It uses page coloring [24 44] to ensure that victimand attackerrsquos memory is never mapped to the same cacheblocks for the LLCAssumption 2 The software runs on a system with asingle processor core where victim and attacker alternateexecution but can never run truly in parallel Moreoversecurity critical data is always flushed by the securitymonitor when program execution switches away from thevictim program for the L1 cache and the TLB
MI6 Cache [7] is part of the memory hierarchy of theMI6 processor which combines Sanctum [10] cachersquossecurity feature with disabling speculation during thespeculative execution of memory-related operations During
normal processor execution for L1 caches and TLB thecorresponding states will be flushed across context switchesbetween software threads For the LLC set partitioning isused to divide DRAM into contiguous regions And cachesets are guaranteed to be strictly partitioned (two DRAMregions cannot map to the same cache set) Each enclaveis only able to access its own partition Speculation issimply disabled when enclave interacts with the outsideworld because of small performance influence based on therare cases of speculation This cache listed extra softwareassumptions as follows
Assumption 1 Software security monitor guarantees thatthe victim and the attacker process cannot share the samecache blocks It uses page coloring [24 44] to ensure thatvictimrsquos and attackerrsquos memory are never mapped to thesame cache blocks for the LLCAssumption 2 The software runs on a system with asingle processor core where victim and attacker alternateexecution but can never run truly in parallel Moreoversecurity critical data is always flushed by the securitymonitor when program execution switches away from thevictim program for the L1 cache and the TLBAssumption 3 When an enclave is interacting with theoutside environment the corresponding speculation isdisabled by the software
InvisiSpec Cache [52] is able to make speculation invisiblein the data cache hierarchy Before a visibility pointshows up when all of its prior control flow instructionsresolve unsafe speculative loads (USL) will be put intoa speculative buffer (SB) without modifying any cachestates When reaching the visibility point there are twocases In one case the USL and successive instructionswill be possibly squashed because of mismatch of datain the SB and the up-to-date values in the cache Inanother case the core receives possible invalidation fromthe OS before checking of memory consistency modeland no comparison is needed When speculative executionhappens the hardware puts the data into SB as to identifyvisibility point for dealing with final state transition of thespeculative execution InvisiSpec cache targets on Spectre-like attacks and futuristic attacks However InvisiSpeccache is vulnerable to all non-speculative side channels
CATalyst Cache [29] uses partitioning especially CacheAllocation Technology (CAT) [21] available in the LLCof some Intel processors CAT allocates up to 4 differentClasses of Services (CoS) for separate cache ways sothat replacement of cache blocks is only allowed withina certain CoS CATalyst first uses CAT mechanism topartition caches into secure and non-secure parts (non-secure parts may map to 3 CoS while secure parts map
J Hardw Syst Secur (2019) 3397ndash425408
to 1 CoS) Secure pages are assigned to virtual machines(VMs) at a granularity of a page and not shared by morethan one VM Here attacker and victim reside in differentVMs Combined with CAT technology and pseudo-lockingmechanism which pins certain page frames managed bysoftware CATalyst guarantees that malicious code cannotevict secure pages CATalyst implicitly performs preloadingby remapping security-critical code or data to secure pagesFlushes can only be done within each VM And cachecoherence is achieved by assigning secure pages to only oneprocessor and not sharing pages among VMs This cachelisted extra software assumptions as follows
Assumption 1 Security critical data is always preloadedinto the cache at the beginning of the whole programexecutionAssumption 2 Security critical data is always able to fitwithin the secure partition of the cache That is all datain the range x can fit in the secure partitionAssumption 3 The victim and the attacker process cannotshare the same memory spaceAssumption 4 Use pseudo-locking mechanism by soft-ware to make sure that victim and attacker process cannotshare the same cache blocksAssumption 5 Secure pages are reloaded immediatelyafter the flush which is done by the virtual machinemonitor (VMM) to make sure all the secure pages are stillpinned in the secure partition
DAWG Cache [25] (Dynamically Allocated Way Guard)partitions the cache by cache ways and provides fullisolation for hits misses and metadata updates acrossdifferent protection domains (between the attacker and thevictim) DAWG cache is partitioned for the attacker and thevictim and each of them keep their own different domain id(which is similar to process ID used in general caches) Eachdomain id has its own bit fields one is called policy fillmapfor masking fills and selecting the victim to replace anotheris called policy hitmap for masking hit ways Only boththe tag and the domain id are the same will a cache hithappen Therefore DAWG allows read-only cache lines tobe replicated across ways for different protection domainFor a cache miss the victim can only be chosen withinthe ways belonging to the same domain id recorded bythe policy fillmap Consistently the replacement policy isupdated with the victim selection and the metadata derivedfrom the policy fillmap for different domains is updatedas well The paper also proposes the idea to dynamicallypartitions the cache ways following the systemrsquos workloadchanges but does not actually implement it
RIC Cache [22] (Relaxed Inclusion Caches) proposes a low-complexity cache to defend against eviction-based timing-based side-channel attacks on the LLC Normally for an
inclusive cache if the data R is in the LLC it is also inthe higher level cache and eviction of the R in the LLCwill cause the same data in the higher level cache eg L1cache to be invalidated making eviction-based attacks inthe higher level cache possible (eg attacker is able to evictvictimrsquos security critical cache line) For RIC each cacheline is extended with a single bit to set the relaxed inclusionOnce the relaxed inclusion is set for that cache line thecorresponding LLC line eviction will not cause the sameline in the higher-level cache to be invalidated Two kinds ofdata will be set relaxed inclusion bit read only data andthread private data when they are loaded into the cacheThese two kinds of data are claimed by the paper to cover allthe critical data for ciphers Therefore RIC will not preventwritable in-private critical data which is currently not foundin any ciphers Apart from that RIC requires flushing for thecorresponding cache lines in the cases that the RIC bits aremodified or for thread migration events to avoid the timingleakage during transition time
PL Cache [49] provides isolation by partitioning the cachebased on cache blocks It extends each cache block with aprocess ID and a lock status bit The process ID and thelock status bits are controlled by the extended load and storeinstructions (ld lockld unlock and st lockst unlock)which allow the programmer and compiler to set or reset thelock bit through use of the right load or store instructionIn terms of cache replacement policy for a cache hit PLcache will perform the normal cache hit handling procedureand the instructions with locking or unlocking capability canupdate the process ID and the lock status bits while the hit isprocessed When there is a cache miss locked data cannotbe evicted by data that is not locked and locked data amongdifferent processes cannot be evicted by each other In thiscase the new data will be either loaded or stored withoutcaching In other cases data eviction is possible This cachelisted extra software assumption as follows
Assumption 1 Security critical data is always preloadedinto the cache at the beginning of the whole programexecution
RP Cache [49] uses randomization to de-correlate thememory address accessing and timing of the cache Foreach block of RP cache there is a process ID and oneprotection bit P set to indicate if this cache block needs to beprotected or not A permutation table (PT) stores each cachesetrsquos pre-computed permuted set number and the numberof tables depends on number of protected processes Formemory access operations cache hits need both process IDand address to be the same When a cache miss happensto data D of a cache set S if the to-be-evicted data andto-be-brought-in data belong to the same process but have
J Hardw Syst Secur (2019) 3397ndash425 409
different protection bit arbitrary data of a random cache setSprime will be evicted and D will be accessed without cachingIf they belong to different processes D will be stored in anevicted cache block of Sprime and mapping of S and Sprime will beswapped as well Otherwise the normal replacement policyis executed
Newcache Cache [31 50] dynamically randomizesmemory-to-cache mapping It introduced a ReMappingTable (RMT) and the mapping between memory addressesand this RMT is as in a direct mapped cache while themapping between the RMT and actual cache is fully asso-ciative The index bits of memory address are used to lookup entries in the RMT to find the cache block that shouldbe accessed It stores the most useful cache lines rather thanhold a fixed set of cache lines This index stored in RMTcombined with the process ID is used to look up the actualcache where each cache line is associated with its real indexand process ID Each cache block is also associated witha protection bit (P) to indicate if it is security critical Forcache replacement policy it is very similar to RP cacheCache hit needs both process ID and address to be the sameWhen cache miss happens to data D arbitrary data willbe evicted and D will be accessed without caching if theybelong to the same process but either one of their protec-tion bit is set If the evicted data and brought-in data havedifferent process IDs D will randomly replace a cache linesince it is fully associative in the actual cache Otherwisethe normal replacement policy for direct mapped cache isexecuted
Random Fill Cache [30] de-correlates cache fills withthe memory access using random filling technique Newinstructions used by applications in Random Fill cachecan control if the requested data belongs to a normalrequest or a random fill request Cache hits are processedas in normal cache For the security critical data accessesof the victim a Nof ill request is executed and therequested data access will be performed without cachingMeanwhile on a Random Fill request arbitrary datafrom the range of addresses will be brought into the cacheIn the paper [30] the authors show that random fill ofspatially near data does not hurt performance For otherprocessesrsquo memory accesses and normal victimrsquos memoryaccesses Normal request will be used to achieve normalreplacement policy Victim and attacker are able to removevictimrsquos own security critical data including using clf lush
instructions or cache coherence protocol since the flush willnot influence timing-based side-channel attack prevention(the random filling technique is used for this)
CEASER Cache [38] is able to mitigate conflict-basedLLC timingndashbased side-channel attacks using address
encryption and dynamic remapping CEASER cache doesnot differentiate whom the address belongs to and whetherthe address is security critical When memory access triesto modify the cache state the address will first be encryptedusing Low-Latency BlockCipher (LLBC) [6] which notonly randomizes the cache set it maps but also scatters theoriginal possibly ordered and location-intensive addressesto different cache sets decreasing the probability of conflictmisses The encryption and decryption can be done withintwo cycles using LLBC Furthermore the encryption keywill be periodically changed to avoid key reconstructionThe periodic re-keying will cause the address remapping todynamically change
SCATTER Cache [51] uses cache set randomization toprevent timing-based attacks It builds upon two ideas Firsta mapping function is used to translate memory addressand process information to cache set indices the mappingis different for each program or security domain Secondthe mapping function also calculates a different index foreach cache way in a similar way to the skewed associativecaches [41] The mapping function can be keyed hash orkeyed permutation derivation functionmdasha different key isused for different application or security domain resultingin a different mapping from address to cache sets for eachSoftware (eg the operating system) is responsible formanaging the security domains and process IDs which areused to differentiate the different software and assign itdifferent keys for the mapping For the hardware extensiona cryptographic primitive such as hashing and an indexdecoder for each scattered cache way is added SCATTERcache also stores the index bits of the physical addressto efficiently perform lookups and writebacks There isalso one bit per page-table entry added to allow the kernelto communicate with the user space for security domainidentification
Non Deterministic Cache [23] uses cache access delay torandomize the relation between cache block access andcache access timing There is no differentiation of datacaching between different process ID or whether the datais secure or not A per-cache-block counter records theinterval of its data activeness and is increased on eachglobal counter clock tick when the data is untouched Whenthe counter reaches a predefined value the correspondingcache line will be invalidated Non Deterministic Cacherandomly sets the local countersrsquo initial value that is lessthan the maximum value of the global counter In thiscase the cache delay is changed to be randomized Cachedelay interval controlled by this non-deterministic executioncan lead to different cache hit and miss statistics becausethe invalidation is determined by the randomized counterof each cache line and therefore de-correlates any cache
J Hardw Syst Secur (2019) 3397ndash425410
access time from the address being accessed However theperformance degradation is tremendous
5 Analysis of the Secure Caches
In this section we manually evaluate the effectiveness ofthe 18 secure caches [7 10 12 22 23 25 27 29 30 3848ndash53 56 57] We analyze how well the different cachescan protect against the 72 types of vulnerabilities definedin Tables 2 and 3 which cover all the possible Strong(according to the definition in Section 3) cache timingndashbased vulnerabilities Following the analysis discuss whattypes of secure caches and features are best suited fordefending different types of timing-based attacks
51 Effectiveness of the Secure Caches AgainstTiming-Based Attacks
Tables 4 and 5 list the result of our analysis of whichcaches can prevent which types of attacks Some cachesare able to prevent certain vulnerabilities denoted by acheckmark and green color in the table For exampleSPlowast cache can defend against Vu Ad Vu (slow)(one type of Evict + Time [34]) vulnerability For someother caches and vulnerabilities the cache is not able toprevent the vulnerabilities and it is indicated by times and redcolor For example SecDCP cache cannot defend againstVu Va Vu (slow) (one type of Bernsteinrsquos Attack [3])vulnerability
Each cache is analyzed for each type of vulnerabilitylisted in Tables 2 and 3 A cache is judged to be able toprevent a type of cache timingndashbased vulnerability in threecases
1 A cache can prevent a timing attack if the timing ofthe last step in a vulnerability is always constant andthe attacker can never observe fast and slow timingdifference for the given set of three steps For instancein a regular set-associative cache the Vd Vu Aa
(fast) (one type of Flush + Reload [55]) vulnerabilitywill allow the attacker to know that address a mapsto secret u when the attacker observes fast timingcompared with observing slow timing in the other casesHowever in case of the RP cache [49] will make thetiming of the last step to be always slow because RPcache does not allow data of different processes toderive cache hit between each other
2 A cache can prevent a timing attack if the timingof last step is randomized and cannot have originalcorresponding relation between victimrsquos behavior andattackerrsquos observation For instance Ad Vu Ainv
d (fast) (one type of Prime + Probe Invalidation)
vulnerability when executed on a normal set-associativecache will allow the attacker to know that the addressd has the same index with secret u when observing fasttiming compared with slow timing in the other casesHowever when executing this attacks on the RandomFill cache [30] for example a slow timing will notdetermine that u and d have the same index as thesecret since in Random Fill cache u would be accessedwithout caching and another random data would becached instead
3 A cache can prevent a timing attack if it disallowscertain steps from the three-step model to be executedthus prevents the corresponding vulnerability Forinstance when PL cache [49] preloads and locks thesecurity critical data in the cache vulnerabilities suchas Ad Vu V inv
d (slow) (one type of Prime + TimeInvalidation) will not be possible since a preloadedlocked security critical data will not allow Ad in Step 1to replace it In this case Ad cannot be in the cache sothis vulnerability cannot be triggered in PL cache
From the security perspective the entries of the securecache in Tables 4 and 5 should have as many green coloredcells as possible If a cache design has any red cells thenit cannot defend against that type of vulnerabilitymdashattackerusing the timing-based vulnerability that corresponds to thered cell can attack the system
The third column in Tables 4 and 5 shows a normalset associative cache which cannot defend against anytype of timing-based vulnerabilities Meanwhile the lastcolumn of Tables 4 and 5 shows the situation where thecache is fully disabled As is expected the timing-basedvulnerabilities are eliminated and timing-based attacks willnot succeed Disabling caches however has tremendousperformance penalty Similarly second-to-last columnshows Nondeterministic Cache which totally randomizescache access time It can defend all the attacks but againwill have a tremendous cost to security when the applicationis complex
For each of the entry that shows the effectivenessof a secure cache against a vulnerability there are tworesults listed Left one is for normal execution and theright one is for speculative execution Some secure cachessuch as InvisiSpec cache target timing-based channels inspeculative execution For most of the caches that do notdifferentiate speculative execution and normal executionthe two sub-columns for each cache are the same
6 Secure Cache Techniques
Among the secure cache designs presented in the priorsection there are three main techniques that the caches
J Hardw Syst Secur (2019) 3397ndash425 411
Table4
Exi
stin
gse
cure
cach
esrsquo
prot
ectio
nag
ains
tal
lpo
ssib
letim
ing-
base
dvu
lner
abili
ties
with
last
step
tobe
mem
ory
acce
ssre
late
dop
erat
ions
Sin
gle
ina
gree
nce
llm
eans
this
cach
eca
npr
even
tth
eco
rres
pond
ing
vuln
erab
ility
i
na
pink
cell
mea
nsth
isca
che
can
prev
ent
the
corr
espo
ndin
gvu
lner
abili
tyin
som
ede
gree
A
timesin
are
dce
llm
eans
this
cach
eca
nnot
prev
ent
this
vuln
erab
ility
Fur
ther
mor
efo
rea
chca
che
we
anal
yze
norm
alex
ecut
ion
(lef
tcol
umn
unde
rth
eca
che
nam
e)an
dsp
ecul
ativ
eex
ecut
ion
(rig
htco
lum
nun
der
the
cach
ena
me)
a Dyn
amic
adju
stm
ento
fw
ays
for
diff
eren
tthr
eads
isas
sum
edto
bepr
oper
lyus
edac
cord
ing
toth
eru
nnin
gpr
ogra
mrsquos
cach
eus
age
bSo
me
soft
war
eas
sum
ptio
nslis
ted
inth
een
trie
sin
this
colu
mn
have
been
impl
emen
ted
byth
eca
chersquo
sre
late
dso
ftw
are
c Flus
his
disa
bled
but
cach
eco
here
nce
mig
htbe
used
todo
the
data
rem
oval
dFo
rL
1ca
che
and
TL
Bf
lush
ing
isdo
nedu
ring
cont
exts
witc
he T
hete
chni
ques
are
impl
emen
ted
inL
1ca
che
TL
Ban
dla
st-l
evel
cach
ew
hich
cons
isto
fth
ew
hole
cach
ehi
erar
chy
whe
reL
1ca
che
and
TL
Bre
quir
eso
ftw
are
flus
hpr
otec
tion
and
the
last
-lev
elca
che
can
beac
hiev
edby
sim
ple
hard
war
epa
rtiti
onin
gTo
prot
ecta
llle
vels
ofca
ches
the
soft
war
eas
sum
ptio
nsne
edto
bead
ded
f The
tech
niqu
eis
now
only
impl
emen
ted
inla
st-l
evel
cach
egT
hete
chni
que
now
only
targ
ets
shar
edca
che
hT
hete
chni
que
only
targ
ets
incl
usio
nla
st-l
evel
cach
ei T
hete
chni
que
targ
ets
data
cach
ehi
erar
chy
j For
the
last
-lev
elca
che
cach
eis
part
ition
edbe
twee
nth
evi
ctim
and
the
atta
cker
kT
hete
chni
que
can
cont
rolt
hepr
obab
ilitie
sof
the
vuln
erab
ility
tobe
succ
essf
ulto
beex
trem
ely
smal
ll T
hete
chni
que
can
wor
kin
shar
edr
ead
only
mem
ory
whi
leno
twor
king
insh
ared
wri
tabl
em
emor
ym
Ran
dom
dela
ybu
tnot
rand
omm
appi
ngca
non
lyde
crea
seth
epr
obab
ilitie
sof
atta
cker
inso
me
limite
dde
gree
J Hardw Syst Secur (2019) 3397ndash425412
Table5
Exi
stin
gse
cure
cach
esrsquop
rote
ctio
nag
ains
tall
poss
ible
timin
g-ba
sed
vuln
erab
ilitie
sw
ithla
stst
epto
bein
valid
atio
nre
late
dop
erat
ions
Sin
gle
ina
gree
nce
llm
eans
this
cach
eca
npr
even
tth
eco
rres
pond
ing
vuln
erab
ility
in
api
nkce
llm
eans
this
cach
eca
npr
even
tthe
corr
espo
ndin
gvu
lner
abili
tyin
som
ede
gree
Atimes
ina
red
cell
mea
nsth
isca
che
cann
otpr
even
tthi
svu
lner
abili
ty
Furt
herm
ore
for
each
cach
ew
ean
alyz
eno
rmal
exec
utio
n(l
eftc
olum
nun
der
the
cach
ena
me)
and
spec
ulat
ive
exec
utio
n(r
ight
colu
mn
unde
rth
eca
che
nam
e)
a Dyn
amic
adju
stm
ento
fw
ays
for
diff
eren
tthr
eads
isas
sum
edto
bepr
oper
lyus
edac
cord
ing
toth
eru
nnin
gpr
ogra
mrsquos
cach
eus
age
bSo
me
soft
war
eas
sum
ptio
nslis
ted
inth
een
trie
sin
this
colu
mn
have
been
impl
emen
ted
byth
eca
chersquo
sre
late
dso
ftw
are
c Flus
his
disa
bled
but
cach
eco
here
nce
mig
htbe
used
todo
the
data
rem
oval
dFo
rL
1ca
che
and
TL
Bf
lush
ing
isdo
nedu
ring
cont
exts
witc
he T
hete
chni
ques
are
impl
emen
ted
inL
1ca
che
TL
Ban
dla
st-l
evel
cach
ew
hich
cons
isto
fth
ew
hole
cach
ehi
erar
chy
whe
reL
1ca
che
and
TL
Bre
quir
eso
ftw
are
flus
hpr
otec
tion
and
the
last
-lev
elca
che
can
beac
hiev
edby
sim
ple
hard
war
epa
rtiti
onin
gTo
prot
ecta
llle
vels
ofca
ches
the
soft
war
eas
sum
ptio
nsne
edto
bead
ded
f The
tech
niqu
eis
now
only
impl
emen
ted
inla
st-l
evel
cach
egT
hete
chni
que
now
only
targ
ets
shar
edca
che
hT
hete
chni
que
only
targ
ets
incl
usio
nla
st-l
evel
cach
ei T
hete
chni
que
targ
ets
data
cach
ehi
erar
chy
j For
the
last
-lev
elca
che
cach
eis
part
ition
edbe
twee
nth
evi
ctim
and
the
atta
cker
kT
hete
chni
que
can
cont
rolt
hepr
obab
ilitie
sof
the
vuln
erab
ility
tobe
succ
essf
ulto
beex
trem
ely
smal
ll T
hete
chni
que
can
wor
kin
shar
edr
ead
only
mem
ory
whi
leno
twor
king
insh
ared
wri
tabl
em
emor
ym
Ran
dom
dela
ybu
tnot
rand
omm
appi
ngca
non
lyde
crea
seth
epr
obab
ilitie
sof
atta
cker
inso
me
limite
dde
gree
J Hardw Syst Secur (2019) 3397ndash425 413
utilize differentiating sensitive data partitioning andrandomization
Differentiating Sensitive Data (columns for CATalyst cacheto columns for Random Fill cache in Tables 4 and 5) allowsthe victim or attacker software or management software toexplicitly label a certain range of the data of victim whichthey think is sensitive The victim process or managementsoftware is able to use cache-specific instructions to protectthe data and limit internal interference between victimrsquosown data For example it is possible to disable victimrsquosown flushing of victimrsquos labeled data and therefore preventvulnerabilities that leverage flushing This technique allowsthe designer to have stronger control over security criticaldata rather than forcing the system to assume all of victimrsquosdata is sensitive However how to identify sensitive dataand whether this identification process is reliable are openresearch questions for caches that support differentiation ofsensitive data
This technique is independent of whether a cache usespartitioning or randomization techniques to eliminate sidechannels between the attacker and the victim Caches thatare able to label and identify sensitive data have theadvantage in preventing internal interference since they areable to differentiate sensitive data from the normal dataand can make use of special instructions to give moreprivileges to sensitive data However it requires careful usewhen identifying the actual sensitive data and implementingcorresponding security features on the cache
Comparing PL cache with SPlowast cache although bothof them use partitioning flush is able to be implementedto be disabled for victimrsquos sensitive data in PL cachewhere Vu V inv
a Vu (slow) (one type of Flush +Time) is prevented Newcache is able to prevent Vu Va Vu (slow) (one type of Bernsteinrsquos Attack [3])while most of the caches without ability to differentiatesensitive data cannot because Newcache disallows replacingdata as long as either data to be evicted or data to becached is identified to be sensitive However permittingdifferentiation of sensitive data can potentially backfire onthe cache itself For example Random Fill cache cannotprevent Vu Ad Vu (slow) (one type of Evict +Time [34]) which most of the other caches can prevent oravoid because the random fill technique loses its intendedrandom behavior when the security critical data is initiallyloaded into the cache in Step 1
Partitioning-Based Caches usually limit the victim and theattacker to be able to only access a limited set of cacheblock (columns for SPlowast cache to column for PL cachein Tables 4 and 5) For example either there is staticor dynamic partitioning of caches which allocates someblocks to High victim and Low attacker The partitioning
can be based not just on whether the memory access isvictimrsquos or attackerrsquos but also on where the access is to(eg High partition is determined by the data address) Forspeculative execution attackerrsquos code can be the part ofspeculation or out-of-order load or store which is able to bepartitioned (eg using speculative load buffer) from othernormal operations The partitioning granularity can be cachesets cache lines or cache ways Partitioning-based securecaches are usually able to prevent external interference bypartitioning but are weak at preventing internal interferenceWhen partitioning is used interference between the attackerand the victim or data belonging to different securitylevels should not be possible and attacks based on externalinterference between the victim and the attacker will failHowever the internal interference of victimrsquos own datais hard to be prevented by the partitioning-based cachesWhatrsquos more partitioning is recognized to be wastefulin terms of cache space and inherently degrades systemperformance [49] Dynamic partitioning can help limit thenegative performance and space impacts but it could be ata cost of revealing some information when adjusting thepartitioning size for each part It also does not help withinternal interference prevention
In terms of the three-step model the partitioning-basedcaches excel at making use of partitioning techniques todisallow the attacker to set initial states (Step 0) of victimpartition by use of flushing or eviction and therefore bringuncertainty to the final timing observation made by theattacker
SPlowast cache can prevent external miss-based interferencebut it still allows the victim and the attacker to getcache hits due to each otherrsquos data which makes hit-based vulnerabilities happen eg Vd Vu Va (fast)(one type of Cache Internal Collision [5]) vulnerabilityis one of the examples that SPlowast cache cannot preventSecVerilog cache is similar to SPlowast cache but prevents theattacker from directly getting cache hit due to victimrsquos datafor confidentiality and therefore prevents vulnerabilitiessuch as Ainv
a Vu Aa (fast) (one type of Flush+ Reload [55]) SHARP cache mainly uses partitioningcombined with random eviction to minimize the probabilityof evicting victimrsquos data and prevent external miss-basedvulnerabilities It is vulnerable to hit-based or internalinterference vulnerabilities such as Vu Va Vu (slow)(one type of Bernsteinrsquos Attack [3]) vulnerability DAWGcache will only allow the data to get a cache hit if bothits address and the process ID are the same Thereforecompared with normal partitioning cache such as SPlowast cacheit is able to prevent vulnerabilities such as Vd Vu Ainv
d
(fast) (one type of Prime + Flush)SecDCP and NoMo cache both leverage dynamic
partitioning to improve performance Compared withSecVerilog cache SecDCP cache introduces certain side
J Hardw Syst Secur (2019) 3397ndash425414
channels which manifest themselves when the number ofways assigned to the victim and attacker changes egVu Ainv
a Vu (slow) (one type of Flush + Time)vulnerability NoMo cache behaves more carefully whenchanging the number of ways during dynamic partitioninghowever it requires victimrsquos sensitive data to fit intothe assigned partitions otherwise it will be put into theunreserved way and allow eviction by the attacker SecDCPdoes not have unreserved way All the space in the cachewill be either belongs to High or Low partition
Sanctum cache and CATalyst cache are both controlledby a powerful software monitor and they disallow securepage sharing between victim and attacker to preventvulnerabilities such as Ad Vu Aa (fast) (onetype of Flush + Reload [55]) Sanctum cache does notconsider internal interference while CATalyst cache ismore carefully designed to prevent different vulnerabilitieswith the implemented software system so far supportingpreventing all of the vulnerabilities but only works forLLC and with high software implementation complexityand some assumptions that might be hard to achievein other scenarios eg assuming the secure partition isbig enough to fit all the secure data MI6 cache is thecombination of Sanctum and disabling speculation wheninteracting with the outside world Therefore in normalexecution it behaves the same as Sanctum For speculativeexecution because it will simply disable all the speculationwhen involving the outside world the external interferencevulnerability such as Vd Vu Ad (slow) (one type ofEvict + Probe) vulnerability will be prevented
InvisiSpec cache does not modify the original cachestate but places the data in a speculative buffer partitionduring the speculation or out-of-order load or store Sinceduring speculation cache state is not actually updated thespeculative execution cannot trigger any of the steps inthe three-step model RIC cache focuses on eviction basedattack and therefore are good at preventing even someinternal miss-based vulnerability such as Vu Va Vu
(slow) (one type of Bernsteinrsquos Attack [3]) but are bad atall hit-based vulnerabilities PL cache is line-partitionedand uses locking techniques for victimrsquos security criticaldata It can prevent many vulnerabilities because preloadingand locking secure data disallow the attacker or non-securevictim data to set initial states (Step 0) for victim partitionand therefore brings uncertainty to the final observation bythe attacker eg Ad Vu Va (fast) (one type of CacheInternal Collision [5]) vulnerability is prevented
Randomization-Based Caches (columns for SHARP cacheand columns for RP cache to columns for Non Deterministiccache in Tables 4 and 5) inherently de-correlate therelationship between information of victimrsquos securitycritical datarsquos address and observed timing from cache
hit or miss or between the address and observed timingof flush or cache coherence operations For speculativeexecution they also de-correlate the relationship betweenthe address of the data being accessed during speculativeexecution or out-of-order load or store and the observedtiming from a cache hit or miss Randomization can beused when bringing data into the cache evicting data orboth Some designs randomize the address to cache setmapping As a result of the randomization the mutualinformation from the observed timing due to having ornot having data in the cache could be reduced to 0 ifrandomization is done on every memory access Somesecure caches use randomization to avoid many of the miss-based internal interference vulnerabilities However theymay still suffer from hit-based vulnerabilities especiallywhen the vulnerabilities are related to internal interferenceHowever randomization is also likewise recognized toincrease performance overheads [23] It also requires afast and secure random number generator Most of therandomization is cache-line-based and can be combinedwith differentiation of sensitive data to be more efficient
RP cache allows eviction between different sensitivedata which leaves vulnerabilities such as Vu Va Vu
(slow) (one type of Bernsteinrsquos Attack [3]) still possiblewhile Newcache prevents this Both of the RP cacheand Newcache are not able to prevent hit-based internal-interference vulnerabilities such as Ainv
a Vu Va
(fast) (one type of Cache Internal Collision [5]) RandomFill cache is able to use total de-correlation of memoryaccess and cache access of victimrsquos security critical datato prevent most of the internal and external interferenceHowever when security critical data is initially directlyloaded into the cache block for Step 1 Random Fill cachewill not randomly load security critical data and allowsvulnerabilities such as Vu V inv
a Vu (slow) (one typeof Flush + Time) vulnerability to exist CEASER cache usesencryption scheme plus dynamic remapping to randomizemapping from memory addresses to cache sets Howeverthis targets eviction-based attacks and cannot prevent hit-based vulnerabilities such as Va V inv
u V inva (fast)
(one type of Flush + Probe Invalidation) SCATTER cacheencrypts both the cache address and process ID whenmapping into different cache index to further prevent morehit-based vulnerabilities for shared and read only memoryNon Deterministic Cache totally randomizes timing ofcache accesses by adding delays and can prevent all attacks(but at tremendous performance cost)
61 Estimated Performance and Security Tradeoffs
Table 6 shows the implementation and performanceresults of the secure caches as listed by the designersin the different papers At the extreme end there is
J Hardw Syst Secur (2019) 3397ndash425 415
Table6
Exi
stin
gse
cure
cach
esrsquo
impl
emen
tatio
nm
etho
dpe
rfor
man
cep
ower
and
area
sum
mar
y
J Hardw Syst Secur (2019) 3397ndash425416
the Non Deterministic cache with random delay thesecure cache can prevent all the cache timingndashbasedvulnerabilities in some degreemdashwhile their paper reportsonly 7 degradation in performance we expect it tobe much more for more complex application than AESalgorithm Disabling caches eliminates the attacks but ata huge performance cost Normally a secure cache needsto sacrifice some performance in order to de-correlatememory access with the timing The secure caches thattend to be able to prevent more vulnerabilities usuallyhave weaker performance compared with other securecaches For example more security seems to imply lessperformance
62 Towards Ideal Secure Cache
Based on the above analysis a good secure cache shouldconsider all the 72 types of Strong vulnerabilities egexternal and internal interference and hit-based and miss-based vulnerabilities Considering all factors and basedon Tables 4 and 5 we have several suggestions andobservations for a secure cache design which can defendtiming-based attacks
bull Internal interference is important for caches to preventtiming-based attacks and is the weak point of most ofthe secure caches To prevent this the following threesubpoints should be considered
ndash Miss-based internal interference can be solvedby randomly evicting data to de-correlatememory access with timing information wheneither data to be evicted or data to be cachedis sensitive eg Newcache prevents Vu Va Vu (slow) (one type of BernsteinrsquosAttack [3]) vulnerability
ndash Hit-based internal interference can be solvedby randomly bringing data into the cache egRandom Fill cache prevents Ad Vu Va
(fast) (Cache Internal Collision) vulnerabilityndash To limit internal interference at lower perfor-
mance cost rather than simply assume all ofvictimrsquos data is sensitive it is better to differ-entiate real sensitive data from other data in thevictim code However identification of sen-sitive information needs to be carefully usedeg Random Fill cache is vulnerable to Vu Ad Vu (fast) (one type of Evict + Time [34])vulnerability which most of the secure cachesare able to prevent
bull Direct partitioning between the victim and the attackeralthough may hurt cache space utilization or perfor-mance is good at disallowing attacker to set known
initial state to victimrsquos partition and therefore pre-vents external interference Alternatively careful use ofrandomization can also prevent external interference
It should be noted that some cache designs only focuson certain levels eg CATalyst cache only works at thelast level cache In order to fully protect the whole cachesystem from timing-based attacks all levels of caches in thehierarchy should be protected with related security featuresFor example Sanctum is able to prevent all levels of cachesfrom L1 to last-level cache Consequently secure cachedesign needs to be realizable at all levels of the cachehierarchy
7 RelatedWork
There are a lot of existing attacks exploring timing-based cache channels eg [1 3 5 16ndash19 34 36 5455] Furthermore our recent paper [11] has summarizedcache timingndashbased side-channel vulnerabilities using athree-step model and inspired this work on checkingwhich vulnerability types are truly defeated by the securecaches in context of timing-based attacks In other workZhang and Lee [59] used finite-state machine to modelcache architectures and leveraged mutual information tomeasure potential side-channel leakage of the modeledcache architectures Meanwhile He and Lee [20] modeledinterference using probabilistic information flow graph andused attackerrsquos success probability to estimate differentcachesrsquo ability to defend against some cache timingndashbasedside-channel attacks However they did not explore allpossible vulnerabilities due to cache timingndashbased channels
There is also some other work focusing on cacheside-channel verification [13 14 47] Among theseCacheAudit [14] efficiently computes possible side-channelobservations using abstractions in a modular way Bit-level and arithmetic reasoning is used in [13] for memoryaccesses in the presence of dynamic memory allocationCacheD [47] detects potential cache differences at eachprogram point leveraging symbolic execution and constraintsolving
Hardware transactional memory has also been lever-aged to prevent timing-based cache side-channel attacks [815] Hardware transactional memory (HTM) is availableon modern commercial processors such as Intelrsquos Transac-tional Synchronization Extensions (TSX) Its main featureis to abort the transaction and roll back the modificationswhenever a cache block contained in the read set or writeset is evicted out of the cache In [15] HTM was com-bined with preloading strategy for code and data to preventFlush + Reload attacks in the local setting and Prime andProbe attacks in the cloud setting In [8] the software-level
J Hardw Syst Secur (2019) 3397ndash425 417
solution targets system calls page faults code refactoringand abort reasoning to eliminate not only Prime + ProbeFlush + Reload but also Evict + time and Cache Collisionattacks
8 Conclusion
This paper first proposed a new three-step model in orderto model all possible cache timing vulnerabilities It furtherprovided a cache three-step simulator and reduction rulesto derive effective vulnerabilities allowing us to find onesthat have not been exploited in literature With exhaustiveeffective vulnerability types listed this paper presentedanalysis of 18 secure processor cache designs with respectto how well they can defend against these timing-basedvulnerabilities Our work showed that vulnerabilities basedon internal interference of the victim application aredifficult to protect against and many secure cache designsfail in this We also provided a summary of secure processorcache features that could be integrated to make an idealsecure cache that is able to defend timing-based attacksOverall implementing a secure cache in a processor can be aviable alternative to defend timing-based attacks Howeverit requires design of an ideal secure cache or correction ofexisting secure cache designs to eliminate the few attacksthat they do not protect against
Acknowledgments This work was support in part by NSF grants1651945 and 1813797 and through Semiconductor Research Corpo-ration (SRC) under GRC Task 2844001
Appendix A Attack Strategies Descriptions
This appendix gives overview of the attack strategies shownin Tables 2 and 3 in Section 3 For each attack strategy anoverview of the three steps of the strategy is given Some ofthe strategies are similar and some may not be precise butwe keep and use the original names as they were assignedin prior work One advantage of our three-step model isthat it gives precise definition of each attack Neverthelessthe attack strategy names used before (and added by us forstrategies which did not have such names) may be useful torecall the attacksrsquo high-level operation
Cache Internal Collision In Step 1 cache blockrsquos data isinvalidated by flushing or eviction done by either theattacker or the victim Then the victim accesses secret datain Step 2 Finally the victim accesses data at a knownaddress in Step 3 if there is a cache hit then it reveals thatthere is an internal collision and leaks value of u
Flush + Reload In Step 1 either the attacker or the victiminvalidates the cache blockrsquos data by flushing or evictionThen the victim access secret data in Step 2 Finally theattacker tries to access some data in Step 2 using a knownaddress If a cache hit is observed then addresses from lasttwo steps are the same and the attacker learns the secretaddress This strategy has similar Step 1 and Step 2 asCache Internal Collision vulnerability but for Step 3 it isthe attacker who does the reload access
Reload + Time (New Name Assigned in this Paper) In Step
1 secret data is invalidated by the victim Then the attackerdoes some known data access in Step 2 that could possiblybring back the invalidated the victimrsquos secret data in Step
1 In Step 3 if the victim reloads the secret data a cachehit is observed and the attacker can derive the secret datarsquosaddress
Flush + Probe (New Name Assigned in this Paper) In Step
1 the victim or the attacker access some known addressIn Step 2 the victim invalidates secret data In Step 3reloading of Step 1rsquos data and observation of a cache misswill help the attacker learn that the secret data maps to theknown address from Step 1
Evict + Time In Step 1 some victimrsquos secret data is put intothe cache by the victim itself In Step 2 the attacker evicts aspecific cache set by performing a memory related operationthat is not a flush In Step 3 the victim reloads secret dataand if a cache miss is observed the will learn the secretdatarsquos cache set information This attack has similar Step 1and Step 3 as Flush + Time vulnerability but for Step 2 inEvict + Time the attacker invalidates some known addressallowing it to find the full address of the secret data insteadof evicting a cache set to only find the secret datarsquos cacheindex as in the Flush + Time attack
Prime + Probe In Step 1 the attacker primes the cache setusing data at address known to the attacker In Step 2 thevictim accesses the secret data which possibly evicts datafrom Step 1 In Step 3 the attacker probes each cache setand if a cache miss is observed the attacker knowns thesecret data maps to the cache set he or she primed
Bernsteinrsquos Attack This attack strategy leverages the vic-timrsquos internal interference to trigger the miss-based attackFor one case the victim does the same secret data access inStep 1 and Step 3 while in Step 2 the victim tries to evictone whole cache setrsquos data by known data accesses If cachemiss is observed in Step 3 that will tell the attacker thecache set is the one secret data maps to For another casethe victim primes and probe a cache set in Step 1 and Step
3 driven by the attacker while in Step 2 the victim tries to
J Hardw Syst Secur (2019) 3397ndash425418
access the secret data Similar to the first case observingcache miss in Step 3 tells the attacker the cache set is theone secret data maps to
Evict + Probe (NewName Assigned in this Paper) In Step 1victim evict the cache set using the access to a data at anaddress known to the attacker In Step 2 the victim accessessecret data which possibly evicts data from Step 1 In Step
3 the attacker probes each cache set using the same data asin Step 1 if a cache miss is observed the attacker knownsthe secret data maps to the cache set he or she primed Thisattack strategy has similar Step 2 and Step 3 as Prime +Probe attack but for Step 1 it is the victim that does theeviction accesses
Prime + Time (NewName Assigned in this Paper) In Step 1the attacker primes the cache set using access to data at anaddress known to the attacker In Step 2 the victim accessessecret data which possibly evicts data from Step 1 In Step
3 the victim probes each cache set using the same data Step
1 if a cache miss is observed the attacker knowns the secretdata maps to the cache set he or she primed in Step 1 Thisattack strategy has similar Step 1 and Step 2 as Prime +Probe attack but for Step 3 it is the victim that does theprobing accesses
Flush + Time (NewNameAssigned in this Paper) The victimaccesses the same secret data in Step 1 and Step 3 whilein Step 2 the attacker tries to invalidate data at a knownaddress If cache miss is observed in Step 3 that will tellthe attacker the data address he or she invalidated in Step 2maps to the secret data
Invalidation related (new names assigned in this paper)Vulnerabilities that have names ending with ldquoinvalidationrdquoin Table 3 correspond to the vulnerabilities that havethe same name (except for the ldquoinvalidationrdquo part) inTable 2 The difference between each set of correspondingvulnerabilities is that the vulnerabilities ending withldquoinvalidationrdquo use invalidation related operation in the laststep to derive the timing information rather than the normalmemory access related operations
Appendix B Soundness Analysisof the Three-StepModel
In this section we analyze the soundness of the three-stepmodel to demonstrate that the three-step model can coverall possible timing-based cache vulnerabilities in normalcaches If there is a vulnerability that is represented usingmore than three steps the steps can be reduced to only three
steps or a three-step sub-pattern can be found in the longerrepresentation
In the below analysis we use β to denote the numberof memory-related operations ie steps in a representationof a vulnerability We show that β = 1 is not sufficient torepresent a vulnerability β = 2 covers some vulnerabilitiesbut not all β = 3 represents all the vulnerabilities andβ gt 3 can be reduced to only three steps or a three-step sub-pattern can be found in the longer representation Knownaddresses refer to all the cache states that interferencewith the data a aalias and d Unknown address refersu An access to a known memory address is denoted asknown access operation and an invalidation of a knownmemory address is denoted as known inv operationThe known access operation and known inv operation
together make up not u operations An unknown memoryrelated operation (containing u) is denoted as u operation
B1 Patterns with β = 1
When β = 1 there is only one memory-relatedoperation and it is not possible to create interferencebetween memory-related operations since two memory-related operations are the minimum requirement for aninterference Furthermore β = 1 corresponds to the three-step pattern with both Step 1 and Step 2 being since thecache state gives no information and Step 3 being theone operation These types of patterns are all examined bythe cache three-step simulator and none of these types arefound to be effective Consequently a vulnerability cannotexit when β = 1
B2 Patterns with β = 2
When β = 2 it satisfies the minimum requirementof an interference for memory related operations andcorresponds to the three-step cases where Step 1 is and Step 2 and Step 3 are the two operations Thesetypes are all examined by the cache three-step simulatorand some of them belong to Weak Vulnerabilities like Aa Vu Therefore three-step cases where Step
1 is have corresponding effective vulnerabilities shownin Table 2 Consequently β = 2 can represent someweak vulnerabilities but not all vulnerabilities as there existsome that are represented with three steps as discussednext
B3 Patterns with β = 3
When β = 3 we have tested all possible combinations ofthree-step memory related operations in Section 33 usingour cache simulator for the three-step model We foundthat there are in total 72 types of Strong Vulnerabilities
J Hardw Syst Secur (2019) 3397ndash425 419
and 64 types of Weak Vulnerabilities that are representedby patterns with β = 3 steps Consequently β = 3can represent all the vulnerabilities (including some weakones where Step 1 is ) Using more steps to representvulnerabilities is not necessary as discussed next
B4 Patterns with β gt 3
When β gt 3 the pattern of memory-related operations fora vulnerability can be reduced using the following rules
B41 Subdivision Rules
First a set of subdivision rules is used to divide the longpattern into shorter patterns following the below rulesEach subdivision rule should be applied recursively beforeapplying the next rule
Subdivision Rule 1 If the longer pattern contains a sub-pattern such as the longer pattern canbe divided into two separate patterns where is assignedas Step 1 of the second pattern This is because givesno timing information and the attacker loses track of thecache state after This rule should be recursively applieduntil there are no sub-patterns left with a in the middleor as last step ( in the last step will be deleted) in thelonger pattern
Subdivision Rule 2 Next if a pattern (derived afterrecursive application of the Rule 1 contains a sub-patternsuch as AinvVinv the longer pattern canbe divided into two separate patterns where AinvVinv isassigned as Step 1 of the second pattern This is becauseAinvVinv will flush all the timing information of thecurrent block and it can be used as the flushing step forStep 1 eg vulnerability Ainv Vu Aa(f ast)shown in Table 2 AinvVinv cannot be a candidate formiddle steps or the last step because it will flush alltiming information making the attacker unable to deducethe final timing with victimrsquos sensitive address translationinformation This rule should be recursively applied untilthere are no sub-patterns left with a AinvVinv in themiddle or the last step (AinvVinv in the last step will bedeleted)
B42 Simplification Rules
For each of the patterns resulting from the subdivisionof the original pattern we define Commute Rules UnionRules and Reduction Rules for a each set of two adjacentsteps in these remaining patterns In Table 7 we showall the possible cases of the rule applying conditions for
each adjacent two steps regardless of the attackerrsquos access(A) or the victimrsquos access (V ) The table shows whetherthe corresponding two steps can be commuted reduced orunioned (and the reduced or the unioned result if the rulescan be applied)
B421 Commute Rules
Suppose there are two adjacent steps M and N for a memorysequences M N If commuting M andN lead to the same observation result ie M N and N M will have thesame timing observation information in the final step for theattacker we can freely exchange the place of M and N inthis pattern In this case we have more chance to Reduceand Union the steps within the memory sequence by thefollowing Rules In the possible commuting process we willtry every possible combinations to commute different pairsof two steps that are able to apply the Commute Rules andthen further apply Reduce Rules and Union Rules to seewhether the commute is effective ie there can be stepsreduced or unioned after the proper commuting process Thefollowing two adjacent memory-related operations can becommuted
ndash Commute Rule 1 For two adjacent steps if one stepis a known access operation and another step is aknown inv operation and the addresses they referto are different these two steps can be commuted nomatter which position of the two steps they are in withinthe whole memory sequence It will show a ldquoyesrdquo forthe corresponding two-step pattern for the CommuteRule 1 column if these two can be commuted in Table 7
ndash Commute Rule 2 A superset of two-step patterns thatcan apply Commute Rule 1 can be commuted if thesecond step of these two adjacent steps is not thelast step in the whole memory sequence There aresome two adjacent steps that can only be commutedif the second step of these two adjacent steps is notthe last step in the whole memory sequence Therewill be a ldquoyesrdquo for the corresponding two-step patternfor the Commute Rule 2 column and a ldquonordquo for thecorresponding two-step pattern for the Commute Rule 1column in Table 7
B422 Reduction Rules
If the memory sequence after applying Commute Ruleshave a sub-pattern that has two adjacent steps both relatedto known addresses or both related to unknown address(including repeating states) the two adjacent steps can bereduced to only one following the reduction rules (if thetwo-step pattern has ldquoyesrdquo for the Column ldquoUnion Rule or
J Hardw Syst Secur (2019) 3397ndash425420
Table7
Rul
esfo
rco
mbi
ning
two
adja
cent
step
s
Firs
tSe
cond
Com
mute
Com
mute
Union
Rule
orC
ombi
ned
Firs
tSe
cond
Com
mute
Com
mute
Union
Rule
orC
ombi
ned
Rule1
Rule2
ReduceRule
Step
Rule1
Rule2
ReduceRule
Step
aa
yes
yes
yes
aa
inv
ano
noye
sa
aa
ali
as
nono
yes
aali
as
ain
va
ali
as
yes
yes
yes
aali
as
ad
nono
yes
da
inv
dye
sye
sye
sd
au
nono
nominus
ain
vu
nono
nominus
aa
inv
nono
yes
ain
va
inv
ain
vye
sye
sye
sa
inv
aa
ali
asin
vye
sye
sye
sa
ain
va
ali
asin
vye
sye
sye
sU
nio
n(a
inva
ali
asin
v)
ad
inv
yes
yes
yes
aa
inv
din
vye
sye
sye
sU
nio
n(a
invd
inv)
au
inv
nono
nominus
ain
vu
inv
noye
sno
minusa
ali
as
ano
noye
sa
aali
asin
va
yes
yes
yes
a
aali
as
aali
as
yes
yes
yes
aali
as
aali
asin
va
ali
as
nono
yes
aali
as
aali
as
dno
noye
sd
aali
asin
vd
yes
yes
yes
d
aali
as
uno
nono
minusa
ali
asin
vu
nono
nominus
aali
as
ain
vye
sye
sye
sa
ali
as
aali
asin
va
inv
yes
yes
yes
Unio
n(a
inva
ali
asin
v)
aali
as
aali
asin
vno
noye
sa
ali
asin
va
ali
asin
va
ali
asin
vye
sye
sye
sa
ali
asin
v
aali
as
din
vye
sye
sye
sa
ali
as
aali
asin
vd
inv
yes
yes
yes
Unio
n(d
inva
ali
asin
v)
aali
as
uin
vno
nono
minusa
ali
asin
vu
inv
noye
sno
minusd
ano
noye
sa
din
va
yes
yes
yes
a
da
ali
as
nono
yes
aali
as
din
va
ali
as
yes
yes
yes
aali
as
dd
yes
yes
yes
dd
inv
dno
noye
sd
du
nono
nominus
din
vu
noye
sno
minusd
ain
vye
sye
sye
sd
din
va
inv
yes
yes
yes
Unio
n(a
invd
inv)
da
ali
asin
vye
sye
sye
sd
din
va
ali
asin
vye
sye
sye
sU
nio
n(d
inva
ali
asin
v)
dd
inv
nono
yes
din
vd
inv
din
vye
sye
sye
sd
inv
du
inv
noye
sno
minusd
inv
uin
vno
yes
nominus
ua
nono
nominus
uin
va
nono
nominus
ua
ali
as
nono
nominus
uin
va
ali
as
nono
nominus
ud
nono
nominus
uin
vd
noye
sno
minusu
uye
sye
sye
su
uin
vu
nono
yes
u
ua
inv
nono
nominus
uin
va
inv
noye
sno
minusu
aali
asin
vno
nono
minusu
inv
aali
asin
vno
yes
nominus
ud
inv
noye
sno
minusu
inv
din
vno
yes
nominus
uu
inv
nono
yes
uin
vu
inv
uin
vye
sye
sye
su
inv
J Hardw Syst Secur (2019) 3397ndash425 421
Reduce Rulerdquo and has no Union result for the ldquoCombinedSteprdquo Column in Table 7
ndash Reduction Rule 1 For two u operations although u isunknown both of the accesses target on the same u socan be reduced to only keep the second access in thememory sequence
ndash Reduction Rule 2 For two known adjacent memoryaccessndashrelated operations (known access operation)they always result in a deterministic state of the secondmemory accessndashrelated cache block so these two stepscan be reduced to only one step
ndash Reduction Rule 3 For two adjacent steps if onestep is known access operation and another one isknown inv operation no matter what order theyhave and the address they refer to is the same these twocan be reduced to one step which is the second step
B423 Union Rules
Suppose there are two adjacent steps M and N for memorysequences M N If combining M andN leads to the same timing observation result ie M N and Union(M N) will havethe same timing observation information in the final step forthe attacker we can combine step M and N to be a joint onestep for this memory sequence defined as Union(M N)Two adjacent steps that can be combined are discussed inthe following cases
ndash Union Rule 1 Two invalidations to two knowndifferent memory addresses can be applied UnionRule 1 known inv operation are two operations bothinvalidating some known address therefore they canbe combined to only one step The Union Rule can becontinuously done to union all the adjacent invalidationstep that invalidates known different memory addresses
B424 Final Check Rules
Each long memory sequence will recursively apply thesethree categorizations of the rules in the order Com-mute Rules first to put known access operations andknown inv operation that targets the same address as nearas possible and u operations and not u operations areputting together as much as possible The Reduced Rulesare then checked and applied to the processed memorysequence to reduce the steps Then the Union Rule isapplied to the processed memory sequence
The recursion at each application to these threecategorizations of the rules should be always applied andreduce at least one step until the resulting sequence matchesone of the two possible cases
bull the long (β gt 3) memory sequence with u operation
and not u operation is further reduced to a sequencewhere there are at most three steps in the followingpatterns or less
ndash u operation not u operation u operation
ndash not u operation u operation not u operation
There might be possible extra or AinvV inv beforethese three-step pattern where
bull An extra in the first step will not influencethe result and can be directly removed
bull If an extra AinvV inv in the first step
ndash If followed by known access ope-ration AinvV inv can be removeddue to the actual state further put intothe cache block
ndash If followed by known inv opera-t ion or V inv
u AinvV inv can also beremoved since the memory locationis repeatedly flushed by the two steps
ndash If followed by Vu worst case will beAinvV inv Vu not u operation u operationwhich is either an effective vul-nerability according to Table 2and reduction rules shown inSection 33 or AinvV inv Vu Ainv
d V invd u operation where
Vu Ainvd V inv
d can further beapplied Commute Rule 2 to reduceand be within three steps
In this case the steps are finally within three steps andthe checking is done
bull There exist two adjacent steps that cannot be appliedany Rules above and requires the Rest Checking
The only left two adjacent steps that cannot be appliedby any of the three categorizations of the Rules are thefollowing
ndash AaVaAaalias Vaalias AdVdAinva V inv
a
Ainvaalias V inv
aalias Vu ndash AaVaAaalias Vaalias V inv
u ndash Vu AaVaAaalias Vaalias AdVd
Ainva V inv
a Ainvaalias V inv
aalias ndash V inv
u AaVaAaalias Vaalias We manually checked all of the two adjacent step patterns
above and found that adding extra step before or after these
J Hardw Syst Secur (2019) 3397ndash425422
Algorithm 2 -Step ( 3) pattern reduction
Input number of steps of the pattern
a two-dimensional dynamic-size array [0] contains the states of each step of the original pattern in
order [1] [2] are empty initially
Output reduced effective vulnerability pattern(s) array It will be an empty list if the original pattern does not
correspond to an effective vulnerability
1 = Oslash
2 while contain( ) and index not 0 do3 = 1 ( )
4 end while5 while ( contain( ) and index not 0) or ( contain( ) and index not 0) do6 = 2 ( )
7 end while8 while ( is ineffecitve or has interval effective three steps) do9 = ( )
10 = ( )
11 = ( )
12 if ( is ineffecitve or has interval effective three steps) then13 += ( )
14 end if15 end while16 return
two steps can either generate two adjacent step patterns thatbe processed by the three Rules where further step can bereduced or construct effective vulnerability according toTable 2 and reduction rules shown in Section 33 wherethe corresponding pattern can be treated effective and thechecking is done
B43 Algorithm for Reducing and CheckingMemorySequence
Algorithm 2 is used to (i) reduce a β-step (β gt 3)pattern to a three-step pattern thus demonstrating that thecorresponding β gt 3 step pattern actually is equivalent tothe output three-step pattern and represents a vulnerabilitythat is captured by an existing three-step pattern or (ii)demonstrate that the β-step pattern can be mapped to one ormore three-step vulnerabilities It is not possible for a β-stepvulnerability pattern to not be either (i) or (ii) after doing theRule applications Key outcome of our analysis is that anyβ-step pattern is not a vulnerability or if it is a vulnerabilityit maps to either outputs (i) or (ii) of the algorithm
Inside the Algorithm 2 contain() represents a function tocheck if a list contains a corresponding state is ineffective()represents a function that checks the corresponding memorysequence does not contain any effective three-stepshas interval effective three steps() represents a functionthat check if the corresponding memory sequence can bemapped to one or more three-step vulnerabilities
B44 Summary
In conclusion the three-step model can model all possibletiming-based cache vulnerability in normal caches Vulner-abilities which are represented by more than three steps canbe always reduced to one (or more) vulnerabilities from ourthree-step model and thus using more than three step is notnecessary
References
1 Acıicmez O Koc CK (2006) Trace-driven cache attacks on AES(short paper) In International conference on information andcommunications security Springer pp 112ndash121
2 Agrawal D Archambeault B Rao JR Rohatgi P (2002) TheEM side-channel (s) In International workshop on cryptographichardware and embedded systems Springer pp 29ndash45
3 Bernstein DJ (2005) Cache-timing attacks on AES4 Binkert N Beckmann B Black G Reinhardt SK Saidi A Basu
A Hestness J Hower DR Krishna T Sardashti S et al (2011)The gem5 simulator ACM SIGARCH computer architecture news39(2)1ndash7
5 Bonneau J Mironov I (2006) Cache-collision timing attacksagainst AES In International workshop on cryptographichardware and embedded systems Springer pp 201ndash215
6 Borghoff J Canteaut A Guneysu T Kavun EB Knezevic MKnudsen LR Leander G Nikov V Paar C Rechberger Cet al (2012) Princendasha low-latency block cipher for pervasivecomputing applications In International conference on the theoryand application of cryptology and information security Springerpp 208ndash225
J Hardw Syst Secur (2019) 3397ndash425 423
7 Bourgeat T Lebedev I Wright A Zhang S Devadas S et al (2019)MI6 Secure enclaves in a speculative out-of-order processorIn Proceedings of the 52nd Annual IEEEACM InternationalSymposium on Microarchitecture ACM pp 42ndash56
8 Chen S Liu F Mi Z Zhang Y Lee RB Chen H Wang X(2018) Leveraging hardware transactional memory for cache side-channel defenses In Proceedings of the 2018 on Asia conferenceon computer and communications security ACM pp 601ndash608
9 Clark SS Ransford B Rahmati A Guineau S Sorber J Xu WFu K (2013) Wattsupdoc power side channels to nonintrusivelydiscover untargeted malware on embedded medical devices InPresented as part of the 2013 USENIX workshop on healthinformation technologies
10 Costan V Lebedev IA Devadas S (2016) Sanctum minimalhardware extensions for strong software isolation In USENIXsecurity symposium pp 857ndash874
11 Deng S Xiong W Szefer J (2018) Cache timing side-channel vulnerability checking with computation tree logic InProceedings of the 7th international workshop on hardware andarchitectural support for security and privacy ACM p 2
12 Domnitser L Jaleel A Loew J Abu-Ghazaleh N Ponomarev D(2012) Non-monopolizable caches low-complexity mitigation ofcache side channel attacks ACM Transactions on Architectureand Code Optimization (TACO) 8(4)35
13 Doychev G Kopf B (2017) Rigorous analysis of softwarecountermeasures against cache attacks In Proceedings of the 38thACM SIGPLAN conference on programming language design andimplementation ACM pp 406ndash421
14 Doychev G Kopf B Mauborgne L Reineke J (2015) CacheAudita tool for the static analysis of cache side channels ACM Trans-actions on information and system security (TISSEC) 18(1)4
15 Gruss D Lettner J Schuster F Ohrimenko O Haller I CostaM (2017) Strong and efficient cache side-channel protectionusing hardware transactional memory In USENIX securitysymposium pp 217ndash233
16 Gruss D Maurice C Wagner K Mangard S (2016) Flush+ flusha fast and stealthy cache attack In International conference ondetection of intrusions and malware and vulnerability assessmentSpringer pp 279ndash299
17 Gruss D Spreitzer R Mangard S (2015) Cache template attacksautomating attacks on inclusive last-level caches In USENIXsecurity symposium pp 897ndash912
18 Guanciale R Nemati H Baumann C Dam M (2016) Cache stor-age channels alias-driven attacks and verified countermeasuresIn 2016 IEEE symposium on security and privacy (SP) IEEEpp 38ndash55
19 Gullasch D Bangerter E Krenn S (2011) Cache gamesndashbringingaccess-based cache attacks on AES to practice In 2011 IEEEsymposium on security and privacy (SP) IEEE pp 490ndash505
20 He Z Lee RB (2017) How secure is your cache against side-channel attacks In Proceedings of the 50th annual IEEEACMinternational symposium on microarchitecture ACM pp 341ndash353
21 Intel C (2015) Improving real-time performance by utilizing cacheallocation technology Intel Corporation
22 Kayaalp M Khasawneh KN Esfeden HA Elwell J Abu-Ghazaleh N Ponomarev D Jaleel A (2017) RIC relaxed inclusioncaches for mitigating LLC side-channel attacks In 2017 54thACMEDACIEEE design automation conference (DAC) IEEEpp 1ndash6
23 Keramidas G Antonopoulos A Serpanos DN Kaxiras S (2008)Non deterministic caches a simple and effective defense againstside channel attacks Des Autom Embed Syst 12(3)221ndash230
24 Kessler RE Hill MD (1992) Page placement algorithms for largerealindexed caches ACM Trans Comput Syst (TOCS) 10(4)338ndash359
25 Kiriansky V Lebedev I Amarasinghe S Devadas S Emer J(2018) DAWG a defense against cache timing attacks in spec-ulative execution processors In 2018 51st Annual IEEEACMinternational symposium on microarchitecture (MICRO) IEEEpp 974ndash987
26 Kocher P Genkin D Gruss D Haas W Hamburg M LippM Mangard S Prescher T Schwarz M Yarom Y (2018)Spectre attacks exploiting speculativeexecution In 2019 IEEESymposium on Security and Privacy (SP) IEEE pp 1ndash19
27 Lee RB Kwan P McGregor JP Dwoskin J Wang Z (2005)Architecture for protecting critical secrets in microprocessorsIn ACM SIGARCH computer architecture news vol 33 IEEEComputer Society pp 2ndash13
28 Lee Y Waterman A Avizienis R Cook H Sun C StojanovicV Asanovic K (2014) A 45nm 13 GHz 167 double-precisionGFLOPSW RISC-V processor with vector accelerators InEuropean solid state circuits conference (ESSCIRC) ESSCIRC2014-40th IEEE pp 199ndash202
29 Liu F Ge Q Yarom Y Mckeen F Rozas C Heiser G Lee RB(2016) CATalyst defeating last-level cache side channel attacks incloud computing In 2016 IEEE international symposium on highperformance computer architecture (HPCA) IEEE pp 406ndash418
30 Liu F Lee RB (2014) Random fill cache architecture In2014 47th annual IEEEACM international symposium onmicroarchitecture (MICRO) IEEE pp 203ndash215
31 Liu F Wu H Mai K Lee RB (2016) Newcache secure cachearchitecture thwarting cache side-channel attacks IEEE Micro36(5)8ndash16
32 Luk CK Cohn R Muth R Patil H Klauser A Lowney G WallaceS Reddi VJ Hazelwood K (2005) Pin building customizedprogram analysis tools with dynamic instrumentation In ACMsigplan notices vol 40 ACM pp 190ndash200
33 Masti RJ Rai D Ranganathan A Muller C Thiele L Capkun S(2015) Thermal covert channels on multi-core platforms In 24thUSENIX security symposium (USENIX security 15) pp 865ndash880
34 Osvik DA Shamir A Tromer E (2006) Cache attacks andcountermeasures the case of AES In Cryptographersrsquo track atthe RSA conference Springer pp 1ndash20
35 Patel A Afram F Chen S Ghose K (2011) MARSS Afull system simulator for multicore x86 CPUs In 2011 48thACMEDACIEEE design automation conference (DAC) IEEEpp 1050ndash1055
36 Percival C (2005) Cache missing for fun and profit BSDCan 2005Ottawa 2005 httpwwwdaemonologynetpapershttpdf
37 Hammarlund P Martinez AJ Bajwa AA Hill DL Hallnor EJiang H Dixon M Derr M Hunsaker M Kumar R Osborne RB(2014) Haswell The fourth-generation intel core processor IEEEMicro 1034(2)6ndash20
38 Qureshi MK (2018) CEASER mitigating conflict-based cacheattacks via encrypted-address and remapping In 2018 51Stannual IEEEACM international symposium on microarchitecture(MICRO) IEEE pp 775ndash787
39 Sanchez D Kozyrakis C (2013) ZSIm fast and accuratemicroarchitectural simulation of thousand-core systems In ACMSIGARCH computer architecture news vol 41 ACM pp 475ndash486
40 Schwarz M Schwarzl M Lipp M Gruss D (2018) NetSpectreread arbitrary memory over network In European Symposium onResearch in Computer Security Springer pp 279ndash299
41 Seznec A (1993) A case for two-way skewed-associative cachesACM SIGARCH computer architecture news 21(2)169ndash178
42 Sharkey J Ponomarev D Ghose K (2005) M-sim a flexiblemultithreaded architectural simulation environment Techenicalreport Department of Computer Science State University ofNew York at Binghamton In Technical Report 20012 CompaqComputer Corporation Aug 2001
J Hardw Syst Secur (2019) 3397ndash425424
43 Shivakumar P Jouppi NP (2001) Cacti 30 an integrated cachetiming power and area model In Technical Report 20012Compaq Computer Corporation Aug 2001
44 Taylor G Davies P Farmwald M (1990) The TLB slice-alow-cost high-speed address translation mechanism In 17thannual international symposium on computer architecture 1990Proceedings IEEE pp 355ndash363
45 Thoziyoor S Muralimanohar N Ahn JH Jouppi NP (2008)CACTI 51 Technical Report HPL-2008-20 HP Labs
46 Trippel C Lustig D Martonosi M (2018) MeltdownPrimeand SpectrePrime automatically-synthesized attacks exploitinginvalidation-based coherence protocols arXiv180203802
47 Wang S Wang P Liu X Zhang D Wu D (2017) CacheDidentifying cache-based timing channels in production softwareIn 26th USENIX security symposium USENIX association
48 Wang Y Ferraiuolo A Zhang D Myers AC Suh GE (2016)SecDCP secure dynamic cache partitioning for efficient timingchannel protection In 2016 53nd ACMEDACIEEE designautomation conference (DAC) IEEE pp 1ndash6
49 Wang Z Lee RB (2007) New cache designs for thwarting softwarecache-based side channel attacks In ACM SIGARCH computerarchitecture news vol 35 ACM pp 494ndash505
50 Wang Z Lee RB (2008) A novel cache architecture withenhanced performance and security In 2008 41st IEEEACMinternational symposium on Microarchitecture 2008 MICRO-41IEEE pp 83ndash93
51 Werner M Unterluggauer T Giner L Schwarz M Gruss D Man-gard S (2019) ScatterCache thwarting cache attacks via cache setrandomization In 28th USENIX security symposium (USENIXSecurity 19) USENIX Association Santa Clara CA httpswwwusenixorgconferenceusenixsecurity19presentationwerner
52 Yan M Choi J Skarlatos D Morrison A Fletcher C Torrellas J(2018) InvisiSpec making speculative execution invisible in thecache hierarchy In 2018 51st annual IEEEACM internationalsymposium on microarchitecture (MICRO) IEEE pp 428ndash441
53 Yan M Gopireddy B Shull T Torrellas J (2017) Secure hierarchy-aware cache replacement policy (SHARP) defending againstcache-based side channel attacks In Proceedings of the 44thannual international symposium on computer architecture ACMpp 347ndash360
54 Yao F Doroslovacki M Venkataramani G (2018) Are coherenceprotocol states vulnerable to information leakage In 2018IEEE international symposium on high performance computerarchitecture (HPCA) IEEE pp 168ndash179
55 Yarom Y Falkner K (2014) FLUSH+ RELOAD a high resolutionlow noise L3 cache side-channel attack In USENIX securitysymposium pp 719ndash732
56 Zhang D Askarov A Myers AC (2012) Language-based controland mitigation of timing channels ACM SIGPLAN Not 47(6)99ndash110
57 Zhang D Wang Y Suh GE Myers AC (2015) A hardware designlanguage for timing-sensitive information-flow security In ACMSIGARCH computer architecture news vol 43 ACM pp 503ndash516
58 Zhang S Wright A Bourgeat T Arvind A (2018) Composablebuilding blocks to open up processor design In 2018 51stannual IEEEACM international symposium on microarchitecture(MICRO) IEEE pp 68ndash81
59 Zhang T Lee RB (2014) New models of cache architecturescharacterizing information leakage from cache side channels InProceedings of the 30th annual computer security applicationsconference ACM pp 96ndash105
60 Zhang Y Parikh D Sankaranarayanan K Skadron K Stan M(2003) HotLeakage a temperature-aware model of subthresholdand gate leakage for architects University of Virginia Dept ofComputer Science Tech Report CS-2003 5
Publisherrsquos Note Springer Nature remains neutral with regard tojurisdictional claims in published maps and institutional affiliations
J Hardw Syst Secur (2019) 3397ndash425 425
This section gives brief overview of the 18 secure cachedesigns that have been presented in academic literaturein the last 15 years To the best of our knowledge thesecover all the secure cache designs proposed to date Mostof the designs have been realized in functional simulationeg [23 48] Some have been realized in FPGA eg [7]and a few have been realized in real ASIC hardwareeg [31] No specific secure caches have been implementedin commercial processors to the best of our knowledgehowever CATalyst [29] leverages Intelrsquos CAT (CacheAllocation Technology) technology available today in IntelXeon E5 2618L v3 processors and could be deployed today
When the secure cache description in the cited papersdid not mention the issue of using flush or cache coherencewe assume the victim or the attacker cannot invalidateeach otherrsquos cache blocks by using clf lush instructions orthrough cache coherence protocol operations but they canflush or use cache coherence to invalidate their own cachelines The victim and the attacker also cannot invalidateprotected or locked data Further if the authors specified anyspecific assumptions (mainly about the software) we list theassumption as part of the description of the cache Whatrsquosmore when the level of cache hierarchy was unspecifiedwe assume the secure cachesrsquo features can be appliedto all levels of caches including L1 cache L2 cacheand Last Level Cache (LLC) If the inclusivity of thecaches was not specified we assume they target inclusivecaches Following the below descriptions of each securecache design the analysis of the secure caches is given inSection 5
SPlowast Cache [20 27]3 uses partitioning techniques tostatically partition the cache ways into High and Lowpartition for the victim and the attacker according to theirdifferent process IDs The victim typically belongs to Highsecurity and attacker belongs to Low security Victimrsquosmemory accesses cannot modify Low partition (assigned toprocesses such as the attacker) while the attackerrsquos memoryaccesses cannot modify High partition (assigned to thevictim) However the memory accesses of both the victimand the attacker can result in a hit in either Low or Highpartition if the data is in the cache
SecVerilog Cache [56 57] statically partitions cacheblocks between security levels L (Low) and H (High)Each instruction in the source code for programs usingSecVerilog cache needs to include a t iming label whicheffectively represents whether the data being accessed by
3Two existing papers give slightly different definitions for an ldquoSPrdquocache thus we selected to define a new cache the SPlowast cachethat combines secure cache features of the Secret-Protecting cachefrom [27] with secure cache features of the Static-Partitioned cachefrom [20]
that instruction is Low or High based on the code andthis t iming label can be similar to a process ID thatdifferentiates attackerrsquos (Low) instructions from victimrsquos(High) instructions The cache is designed such thatoperations in the High partition cannot affect timing ofoperations in the Low partition For a cache miss due to Lowinstructions when the data is in the High partition it willbehave as a cache miss and the data will be moved from theHigh to the Low partition to preserve consistency HoweverHigh instructions are able to result in a cache hit in bothHigh and Low partitions if the data is already in the cache
SecDCPCache [48] builds on the SecVerilog cache and usespartitioning idea from the original SecVerilog cache but thepartitioning is dynamic It can support at least two securityclasses H (High) and L (Low) and configurations with moresecurity classes are possible They use the percentage ofcache misses for L instructions that was reduced (increased)when Lrsquos partition size was increased (reduced) by onecache way to adjust the number of ways of the cacheassigned to the Low partition When adjusting number ofways in the cache dedicated to each partition if Lrsquos partitionsize decreases the process ID is checked and L blocksare flushed before the way is reallocated to H On theother hand if Lrsquos partition size increases H blocks in theadjusted cache way remain unmodified so as to not addmore performance overhead and they will eventually beevicted by Lrsquos memory accesses However the feature ofnot flushing High partition data during way adjustment mayleak timing information to the attacker
NoMo Cache [12] dynamically partitions the cache waysamong the currently ldquoactiverdquo simultaneous multithreading(SMT) threads Each thread is exclusively reserved Y blocksin each cache set where Y is within the range of [0 N
M]
where N is the number of ways and M is the number ofSMT threads NoMo-0 equals to traditional set associativecache while NoMo- N
M partitions cache evenly for the
different threads and there are no non-reserved ways Thenumber of Y assigned to each thread is adjusted based onits activeness When adjusting number of blocks assigned toa thread Y blocks are invalidated for cache sets to protecttiming leakage Eviction is not allowed within each threadrsquosown reserved ways while it is possible for the shared waysTherefore to avoid eviction caused by the unreserved wayswe assume NoMo- N
M is used to fully partition the cache
When the attacker and the victim share the same librarythere will be a cache hit if accessing the shared data andthe normal cache hit policy holds to guarantee the cachecoherence
SHARP Cache [53] uses both partitioning and randomiza-tion techniques to prevent victimrsquos data from being evicted
J Hardw Syst Secur (2019) 3397ndash425 407
or flushed by other malicious processes and it targets on theinclusive caches Each cache block is augmented with thecore valid bits (CVB) to indicate which private cache (pro-cess) it belongs to (similar to the Process ID) where CVBstores a bitmap and ith bit in the bitmap is set if the lineis present in ith corersquos private cache Cache hit is allowedamong different processesrsquo data When there is cache missand data needs to be evicted data not belonging to any cur-rent processes will be evicted first If there is no such datadata belonging to the same process will be evicted If thereis no existing data in the cache that is in the same processa random data in the cache set will be evicted This randomeviction will generate an interrupt to the OS to notify it ofa suspicious activity For pages that are read-only or exe-cutable SHARP cache disallows flushing using clf lush inuser mode However invalidating victimrsquos blocks by usingcache coherence protocol is still possible
Sanctum Cache [10] focuses on isolation of enclaves(equivalent to Trusted Software Module in other designs)from each other and the operating system (OS) In terms ofcaches they implement security features for L1 cache TLBand LLC Cache isolation of LLC is achieved by assigningeach enclave or OS to different DRAM address regionsIt uses page-coloring-based cache partitioning scheme [2444] and a software security monitor that ensures per-core isolation between OS and enclaves For L1 cacheand TLB when there is a transition between enclave andnon-enclave mode the security monitor will flush thecore-private caches to achieve isolation Normal flushestriggered by the enclave or the OS can only be done withinenclave or not within enclave code Also timing-based side-channel attacks exploiting cache coherence are explicitlynot prevented thus behavior on cache coherence operationsis not defined This cache listed extra software assumptionsas follows
Assumption 1 Software security monitor guarantees thatvictim and attacker process cannot share the same cacheblocks It uses page coloring [24 44] to ensure that victimand attackerrsquos memory is never mapped to the same cacheblocks for the LLCAssumption 2 The software runs on a system with asingle processor core where victim and attacker alternateexecution but can never run truly in parallel Moreoversecurity critical data is always flushed by the securitymonitor when program execution switches away from thevictim program for the L1 cache and the TLB
MI6 Cache [7] is part of the memory hierarchy of theMI6 processor which combines Sanctum [10] cachersquossecurity feature with disabling speculation during thespeculative execution of memory-related operations During
normal processor execution for L1 caches and TLB thecorresponding states will be flushed across context switchesbetween software threads For the LLC set partitioning isused to divide DRAM into contiguous regions And cachesets are guaranteed to be strictly partitioned (two DRAMregions cannot map to the same cache set) Each enclaveis only able to access its own partition Speculation issimply disabled when enclave interacts with the outsideworld because of small performance influence based on therare cases of speculation This cache listed extra softwareassumptions as follows
Assumption 1 Software security monitor guarantees thatthe victim and the attacker process cannot share the samecache blocks It uses page coloring [24 44] to ensure thatvictimrsquos and attackerrsquos memory are never mapped to thesame cache blocks for the LLCAssumption 2 The software runs on a system with asingle processor core where victim and attacker alternateexecution but can never run truly in parallel Moreoversecurity critical data is always flushed by the securitymonitor when program execution switches away from thevictim program for the L1 cache and the TLBAssumption 3 When an enclave is interacting with theoutside environment the corresponding speculation isdisabled by the software
InvisiSpec Cache [52] is able to make speculation invisiblein the data cache hierarchy Before a visibility pointshows up when all of its prior control flow instructionsresolve unsafe speculative loads (USL) will be put intoa speculative buffer (SB) without modifying any cachestates When reaching the visibility point there are twocases In one case the USL and successive instructionswill be possibly squashed because of mismatch of datain the SB and the up-to-date values in the cache Inanother case the core receives possible invalidation fromthe OS before checking of memory consistency modeland no comparison is needed When speculative executionhappens the hardware puts the data into SB as to identifyvisibility point for dealing with final state transition of thespeculative execution InvisiSpec cache targets on Spectre-like attacks and futuristic attacks However InvisiSpeccache is vulnerable to all non-speculative side channels
CATalyst Cache [29] uses partitioning especially CacheAllocation Technology (CAT) [21] available in the LLCof some Intel processors CAT allocates up to 4 differentClasses of Services (CoS) for separate cache ways sothat replacement of cache blocks is only allowed withina certain CoS CATalyst first uses CAT mechanism topartition caches into secure and non-secure parts (non-secure parts may map to 3 CoS while secure parts map
J Hardw Syst Secur (2019) 3397ndash425408
to 1 CoS) Secure pages are assigned to virtual machines(VMs) at a granularity of a page and not shared by morethan one VM Here attacker and victim reside in differentVMs Combined with CAT technology and pseudo-lockingmechanism which pins certain page frames managed bysoftware CATalyst guarantees that malicious code cannotevict secure pages CATalyst implicitly performs preloadingby remapping security-critical code or data to secure pagesFlushes can only be done within each VM And cachecoherence is achieved by assigning secure pages to only oneprocessor and not sharing pages among VMs This cachelisted extra software assumptions as follows
Assumption 1 Security critical data is always preloadedinto the cache at the beginning of the whole programexecutionAssumption 2 Security critical data is always able to fitwithin the secure partition of the cache That is all datain the range x can fit in the secure partitionAssumption 3 The victim and the attacker process cannotshare the same memory spaceAssumption 4 Use pseudo-locking mechanism by soft-ware to make sure that victim and attacker process cannotshare the same cache blocksAssumption 5 Secure pages are reloaded immediatelyafter the flush which is done by the virtual machinemonitor (VMM) to make sure all the secure pages are stillpinned in the secure partition
DAWG Cache [25] (Dynamically Allocated Way Guard)partitions the cache by cache ways and provides fullisolation for hits misses and metadata updates acrossdifferent protection domains (between the attacker and thevictim) DAWG cache is partitioned for the attacker and thevictim and each of them keep their own different domain id(which is similar to process ID used in general caches) Eachdomain id has its own bit fields one is called policy fillmapfor masking fills and selecting the victim to replace anotheris called policy hitmap for masking hit ways Only boththe tag and the domain id are the same will a cache hithappen Therefore DAWG allows read-only cache lines tobe replicated across ways for different protection domainFor a cache miss the victim can only be chosen withinthe ways belonging to the same domain id recorded bythe policy fillmap Consistently the replacement policy isupdated with the victim selection and the metadata derivedfrom the policy fillmap for different domains is updatedas well The paper also proposes the idea to dynamicallypartitions the cache ways following the systemrsquos workloadchanges but does not actually implement it
RIC Cache [22] (Relaxed Inclusion Caches) proposes a low-complexity cache to defend against eviction-based timing-based side-channel attacks on the LLC Normally for an
inclusive cache if the data R is in the LLC it is also inthe higher level cache and eviction of the R in the LLCwill cause the same data in the higher level cache eg L1cache to be invalidated making eviction-based attacks inthe higher level cache possible (eg attacker is able to evictvictimrsquos security critical cache line) For RIC each cacheline is extended with a single bit to set the relaxed inclusionOnce the relaxed inclusion is set for that cache line thecorresponding LLC line eviction will not cause the sameline in the higher-level cache to be invalidated Two kinds ofdata will be set relaxed inclusion bit read only data andthread private data when they are loaded into the cacheThese two kinds of data are claimed by the paper to cover allthe critical data for ciphers Therefore RIC will not preventwritable in-private critical data which is currently not foundin any ciphers Apart from that RIC requires flushing for thecorresponding cache lines in the cases that the RIC bits aremodified or for thread migration events to avoid the timingleakage during transition time
PL Cache [49] provides isolation by partitioning the cachebased on cache blocks It extends each cache block with aprocess ID and a lock status bit The process ID and thelock status bits are controlled by the extended load and storeinstructions (ld lockld unlock and st lockst unlock)which allow the programmer and compiler to set or reset thelock bit through use of the right load or store instructionIn terms of cache replacement policy for a cache hit PLcache will perform the normal cache hit handling procedureand the instructions with locking or unlocking capability canupdate the process ID and the lock status bits while the hit isprocessed When there is a cache miss locked data cannotbe evicted by data that is not locked and locked data amongdifferent processes cannot be evicted by each other In thiscase the new data will be either loaded or stored withoutcaching In other cases data eviction is possible This cachelisted extra software assumption as follows
Assumption 1 Security critical data is always preloadedinto the cache at the beginning of the whole programexecution
RP Cache [49] uses randomization to de-correlate thememory address accessing and timing of the cache Foreach block of RP cache there is a process ID and oneprotection bit P set to indicate if this cache block needs to beprotected or not A permutation table (PT) stores each cachesetrsquos pre-computed permuted set number and the numberof tables depends on number of protected processes Formemory access operations cache hits need both process IDand address to be the same When a cache miss happensto data D of a cache set S if the to-be-evicted data andto-be-brought-in data belong to the same process but have
J Hardw Syst Secur (2019) 3397ndash425 409
different protection bit arbitrary data of a random cache setSprime will be evicted and D will be accessed without cachingIf they belong to different processes D will be stored in anevicted cache block of Sprime and mapping of S and Sprime will beswapped as well Otherwise the normal replacement policyis executed
Newcache Cache [31 50] dynamically randomizesmemory-to-cache mapping It introduced a ReMappingTable (RMT) and the mapping between memory addressesand this RMT is as in a direct mapped cache while themapping between the RMT and actual cache is fully asso-ciative The index bits of memory address are used to lookup entries in the RMT to find the cache block that shouldbe accessed It stores the most useful cache lines rather thanhold a fixed set of cache lines This index stored in RMTcombined with the process ID is used to look up the actualcache where each cache line is associated with its real indexand process ID Each cache block is also associated witha protection bit (P) to indicate if it is security critical Forcache replacement policy it is very similar to RP cacheCache hit needs both process ID and address to be the sameWhen cache miss happens to data D arbitrary data willbe evicted and D will be accessed without caching if theybelong to the same process but either one of their protec-tion bit is set If the evicted data and brought-in data havedifferent process IDs D will randomly replace a cache linesince it is fully associative in the actual cache Otherwisethe normal replacement policy for direct mapped cache isexecuted
Random Fill Cache [30] de-correlates cache fills withthe memory access using random filling technique Newinstructions used by applications in Random Fill cachecan control if the requested data belongs to a normalrequest or a random fill request Cache hits are processedas in normal cache For the security critical data accessesof the victim a Nof ill request is executed and therequested data access will be performed without cachingMeanwhile on a Random Fill request arbitrary datafrom the range of addresses will be brought into the cacheIn the paper [30] the authors show that random fill ofspatially near data does not hurt performance For otherprocessesrsquo memory accesses and normal victimrsquos memoryaccesses Normal request will be used to achieve normalreplacement policy Victim and attacker are able to removevictimrsquos own security critical data including using clf lush
instructions or cache coherence protocol since the flush willnot influence timing-based side-channel attack prevention(the random filling technique is used for this)
CEASER Cache [38] is able to mitigate conflict-basedLLC timingndashbased side-channel attacks using address
encryption and dynamic remapping CEASER cache doesnot differentiate whom the address belongs to and whetherthe address is security critical When memory access triesto modify the cache state the address will first be encryptedusing Low-Latency BlockCipher (LLBC) [6] which notonly randomizes the cache set it maps but also scatters theoriginal possibly ordered and location-intensive addressesto different cache sets decreasing the probability of conflictmisses The encryption and decryption can be done withintwo cycles using LLBC Furthermore the encryption keywill be periodically changed to avoid key reconstructionThe periodic re-keying will cause the address remapping todynamically change
SCATTER Cache [51] uses cache set randomization toprevent timing-based attacks It builds upon two ideas Firsta mapping function is used to translate memory addressand process information to cache set indices the mappingis different for each program or security domain Secondthe mapping function also calculates a different index foreach cache way in a similar way to the skewed associativecaches [41] The mapping function can be keyed hash orkeyed permutation derivation functionmdasha different key isused for different application or security domain resultingin a different mapping from address to cache sets for eachSoftware (eg the operating system) is responsible formanaging the security domains and process IDs which areused to differentiate the different software and assign itdifferent keys for the mapping For the hardware extensiona cryptographic primitive such as hashing and an indexdecoder for each scattered cache way is added SCATTERcache also stores the index bits of the physical addressto efficiently perform lookups and writebacks There isalso one bit per page-table entry added to allow the kernelto communicate with the user space for security domainidentification
Non Deterministic Cache [23] uses cache access delay torandomize the relation between cache block access andcache access timing There is no differentiation of datacaching between different process ID or whether the datais secure or not A per-cache-block counter records theinterval of its data activeness and is increased on eachglobal counter clock tick when the data is untouched Whenthe counter reaches a predefined value the correspondingcache line will be invalidated Non Deterministic Cacherandomly sets the local countersrsquo initial value that is lessthan the maximum value of the global counter In thiscase the cache delay is changed to be randomized Cachedelay interval controlled by this non-deterministic executioncan lead to different cache hit and miss statistics becausethe invalidation is determined by the randomized counterof each cache line and therefore de-correlates any cache
J Hardw Syst Secur (2019) 3397ndash425410
access time from the address being accessed However theperformance degradation is tremendous
5 Analysis of the Secure Caches
In this section we manually evaluate the effectiveness ofthe 18 secure caches [7 10 12 22 23 25 27 29 30 3848ndash53 56 57] We analyze how well the different cachescan protect against the 72 types of vulnerabilities definedin Tables 2 and 3 which cover all the possible Strong(according to the definition in Section 3) cache timingndashbased vulnerabilities Following the analysis discuss whattypes of secure caches and features are best suited fordefending different types of timing-based attacks
51 Effectiveness of the Secure Caches AgainstTiming-Based Attacks
Tables 4 and 5 list the result of our analysis of whichcaches can prevent which types of attacks Some cachesare able to prevent certain vulnerabilities denoted by acheckmark and green color in the table For exampleSPlowast cache can defend against Vu Ad Vu (slow)(one type of Evict + Time [34]) vulnerability For someother caches and vulnerabilities the cache is not able toprevent the vulnerabilities and it is indicated by times and redcolor For example SecDCP cache cannot defend againstVu Va Vu (slow) (one type of Bernsteinrsquos Attack [3])vulnerability
Each cache is analyzed for each type of vulnerabilitylisted in Tables 2 and 3 A cache is judged to be able toprevent a type of cache timingndashbased vulnerability in threecases
1 A cache can prevent a timing attack if the timing ofthe last step in a vulnerability is always constant andthe attacker can never observe fast and slow timingdifference for the given set of three steps For instancein a regular set-associative cache the Vd Vu Aa
(fast) (one type of Flush + Reload [55]) vulnerabilitywill allow the attacker to know that address a mapsto secret u when the attacker observes fast timingcompared with observing slow timing in the other casesHowever in case of the RP cache [49] will make thetiming of the last step to be always slow because RPcache does not allow data of different processes toderive cache hit between each other
2 A cache can prevent a timing attack if the timingof last step is randomized and cannot have originalcorresponding relation between victimrsquos behavior andattackerrsquos observation For instance Ad Vu Ainv
d (fast) (one type of Prime + Probe Invalidation)
vulnerability when executed on a normal set-associativecache will allow the attacker to know that the addressd has the same index with secret u when observing fasttiming compared with slow timing in the other casesHowever when executing this attacks on the RandomFill cache [30] for example a slow timing will notdetermine that u and d have the same index as thesecret since in Random Fill cache u would be accessedwithout caching and another random data would becached instead
3 A cache can prevent a timing attack if it disallowscertain steps from the three-step model to be executedthus prevents the corresponding vulnerability Forinstance when PL cache [49] preloads and locks thesecurity critical data in the cache vulnerabilities suchas Ad Vu V inv
d (slow) (one type of Prime + TimeInvalidation) will not be possible since a preloadedlocked security critical data will not allow Ad in Step 1to replace it In this case Ad cannot be in the cache sothis vulnerability cannot be triggered in PL cache
From the security perspective the entries of the securecache in Tables 4 and 5 should have as many green coloredcells as possible If a cache design has any red cells thenit cannot defend against that type of vulnerabilitymdashattackerusing the timing-based vulnerability that corresponds to thered cell can attack the system
The third column in Tables 4 and 5 shows a normalset associative cache which cannot defend against anytype of timing-based vulnerabilities Meanwhile the lastcolumn of Tables 4 and 5 shows the situation where thecache is fully disabled As is expected the timing-basedvulnerabilities are eliminated and timing-based attacks willnot succeed Disabling caches however has tremendousperformance penalty Similarly second-to-last columnshows Nondeterministic Cache which totally randomizescache access time It can defend all the attacks but againwill have a tremendous cost to security when the applicationis complex
For each of the entry that shows the effectivenessof a secure cache against a vulnerability there are tworesults listed Left one is for normal execution and theright one is for speculative execution Some secure cachessuch as InvisiSpec cache target timing-based channels inspeculative execution For most of the caches that do notdifferentiate speculative execution and normal executionthe two sub-columns for each cache are the same
6 Secure Cache Techniques
Among the secure cache designs presented in the priorsection there are three main techniques that the caches
J Hardw Syst Secur (2019) 3397ndash425 411
Table4
Exi
stin
gse
cure
cach
esrsquo
prot
ectio
nag
ains
tal
lpo
ssib
letim
ing-
base
dvu
lner
abili
ties
with
last
step
tobe
mem
ory
acce
ssre
late
dop
erat
ions
Sin
gle
ina
gree
nce
llm
eans
this
cach
eca
npr
even
tth
eco
rres
pond
ing
vuln
erab
ility
i
na
pink
cell
mea
nsth
isca
che
can
prev
ent
the
corr
espo
ndin
gvu
lner
abili
tyin
som
ede
gree
A
timesin
are
dce
llm
eans
this
cach
eca
nnot
prev
ent
this
vuln
erab
ility
Fur
ther
mor
efo
rea
chca
che
we
anal
yze
norm
alex
ecut
ion
(lef
tcol
umn
unde
rth
eca
che
nam
e)an
dsp
ecul
ativ
eex
ecut
ion
(rig
htco
lum
nun
der
the
cach
ena
me)
a Dyn
amic
adju
stm
ento
fw
ays
for
diff
eren
tthr
eads
isas
sum
edto
bepr
oper
lyus
edac
cord
ing
toth
eru
nnin
gpr
ogra
mrsquos
cach
eus
age
bSo
me
soft
war
eas
sum
ptio
nslis
ted
inth
een
trie
sin
this
colu
mn
have
been
impl
emen
ted
byth
eca
chersquo
sre
late
dso
ftw
are
c Flus
his
disa
bled
but
cach
eco
here
nce
mig
htbe
used
todo
the
data
rem
oval
dFo
rL
1ca
che
and
TL
Bf
lush
ing
isdo
nedu
ring
cont
exts
witc
he T
hete
chni
ques
are
impl
emen
ted
inL
1ca
che
TL
Ban
dla
st-l
evel
cach
ew
hich
cons
isto
fth
ew
hole
cach
ehi
erar
chy
whe
reL
1ca
che
and
TL
Bre
quir
eso
ftw
are
flus
hpr
otec
tion
and
the
last
-lev
elca
che
can
beac
hiev
edby
sim
ple
hard
war
epa
rtiti
onin
gTo
prot
ecta
llle
vels
ofca
ches
the
soft
war
eas
sum
ptio
nsne
edto
bead
ded
f The
tech
niqu
eis
now
only
impl
emen
ted
inla
st-l
evel
cach
egT
hete
chni
que
now
only
targ
ets
shar
edca
che
hT
hete
chni
que
only
targ
ets
incl
usio
nla
st-l
evel
cach
ei T
hete
chni
que
targ
ets
data
cach
ehi
erar
chy
j For
the
last
-lev
elca
che
cach
eis
part
ition
edbe
twee
nth
evi
ctim
and
the
atta
cker
kT
hete
chni
que
can
cont
rolt
hepr
obab
ilitie
sof
the
vuln
erab
ility
tobe
succ
essf
ulto
beex
trem
ely
smal
ll T
hete
chni
que
can
wor
kin
shar
edr
ead
only
mem
ory
whi
leno
twor
king
insh
ared
wri
tabl
em
emor
ym
Ran
dom
dela
ybu
tnot
rand
omm
appi
ngca
non
lyde
crea
seth
epr
obab
ilitie
sof
atta
cker
inso
me
limite
dde
gree
J Hardw Syst Secur (2019) 3397ndash425412
Table5
Exi
stin
gse
cure
cach
esrsquop
rote
ctio
nag
ains
tall
poss
ible
timin
g-ba
sed
vuln
erab
ilitie
sw
ithla
stst
epto
bein
valid
atio
nre
late
dop
erat
ions
Sin
gle
ina
gree
nce
llm
eans
this
cach
eca
npr
even
tth
eco
rres
pond
ing
vuln
erab
ility
in
api
nkce
llm
eans
this
cach
eca
npr
even
tthe
corr
espo
ndin
gvu
lner
abili
tyin
som
ede
gree
Atimes
ina
red
cell
mea
nsth
isca
che
cann
otpr
even
tthi
svu
lner
abili
ty
Furt
herm
ore
for
each
cach
ew
ean
alyz
eno
rmal
exec
utio
n(l
eftc
olum
nun
der
the
cach
ena
me)
and
spec
ulat
ive
exec
utio
n(r
ight
colu
mn
unde
rth
eca
che
nam
e)
a Dyn
amic
adju
stm
ento
fw
ays
for
diff
eren
tthr
eads
isas
sum
edto
bepr
oper
lyus
edac
cord
ing
toth
eru
nnin
gpr
ogra
mrsquos
cach
eus
age
bSo
me
soft
war
eas
sum
ptio
nslis
ted
inth
een
trie
sin
this
colu
mn
have
been
impl
emen
ted
byth
eca
chersquo
sre
late
dso
ftw
are
c Flus
his
disa
bled
but
cach
eco
here
nce
mig
htbe
used
todo
the
data
rem
oval
dFo
rL
1ca
che
and
TL
Bf
lush
ing
isdo
nedu
ring
cont
exts
witc
he T
hete
chni
ques
are
impl
emen
ted
inL
1ca
che
TL
Ban
dla
st-l
evel
cach
ew
hich
cons
isto
fth
ew
hole
cach
ehi
erar
chy
whe
reL
1ca
che
and
TL
Bre
quir
eso
ftw
are
flus
hpr
otec
tion
and
the
last
-lev
elca
che
can
beac
hiev
edby
sim
ple
hard
war
epa
rtiti
onin
gTo
prot
ecta
llle
vels
ofca
ches
the
soft
war
eas
sum
ptio
nsne
edto
bead
ded
f The
tech
niqu
eis
now
only
impl
emen
ted
inla
st-l
evel
cach
egT
hete
chni
que
now
only
targ
ets
shar
edca
che
hT
hete
chni
que
only
targ
ets
incl
usio
nla
st-l
evel
cach
ei T
hete
chni
que
targ
ets
data
cach
ehi
erar
chy
j For
the
last
-lev
elca
che
cach
eis
part
ition
edbe
twee
nth
evi
ctim
and
the
atta
cker
kT
hete
chni
que
can
cont
rolt
hepr
obab
ilitie
sof
the
vuln
erab
ility
tobe
succ
essf
ulto
beex
trem
ely
smal
ll T
hete
chni
que
can
wor
kin
shar
edr
ead
only
mem
ory
whi
leno
twor
king
insh
ared
wri
tabl
em
emor
ym
Ran
dom
dela
ybu
tnot
rand
omm
appi
ngca
non
lyde
crea
seth
epr
obab
ilitie
sof
atta
cker
inso
me
limite
dde
gree
J Hardw Syst Secur (2019) 3397ndash425 413
utilize differentiating sensitive data partitioning andrandomization
Differentiating Sensitive Data (columns for CATalyst cacheto columns for Random Fill cache in Tables 4 and 5) allowsthe victim or attacker software or management software toexplicitly label a certain range of the data of victim whichthey think is sensitive The victim process or managementsoftware is able to use cache-specific instructions to protectthe data and limit internal interference between victimrsquosown data For example it is possible to disable victimrsquosown flushing of victimrsquos labeled data and therefore preventvulnerabilities that leverage flushing This technique allowsthe designer to have stronger control over security criticaldata rather than forcing the system to assume all of victimrsquosdata is sensitive However how to identify sensitive dataand whether this identification process is reliable are openresearch questions for caches that support differentiation ofsensitive data
This technique is independent of whether a cache usespartitioning or randomization techniques to eliminate sidechannels between the attacker and the victim Caches thatare able to label and identify sensitive data have theadvantage in preventing internal interference since they areable to differentiate sensitive data from the normal dataand can make use of special instructions to give moreprivileges to sensitive data However it requires careful usewhen identifying the actual sensitive data and implementingcorresponding security features on the cache
Comparing PL cache with SPlowast cache although bothof them use partitioning flush is able to be implementedto be disabled for victimrsquos sensitive data in PL cachewhere Vu V inv
a Vu (slow) (one type of Flush +Time) is prevented Newcache is able to prevent Vu Va Vu (slow) (one type of Bernsteinrsquos Attack [3])while most of the caches without ability to differentiatesensitive data cannot because Newcache disallows replacingdata as long as either data to be evicted or data to becached is identified to be sensitive However permittingdifferentiation of sensitive data can potentially backfire onthe cache itself For example Random Fill cache cannotprevent Vu Ad Vu (slow) (one type of Evict +Time [34]) which most of the other caches can prevent oravoid because the random fill technique loses its intendedrandom behavior when the security critical data is initiallyloaded into the cache in Step 1
Partitioning-Based Caches usually limit the victim and theattacker to be able to only access a limited set of cacheblock (columns for SPlowast cache to column for PL cachein Tables 4 and 5) For example either there is staticor dynamic partitioning of caches which allocates someblocks to High victim and Low attacker The partitioning
can be based not just on whether the memory access isvictimrsquos or attackerrsquos but also on where the access is to(eg High partition is determined by the data address) Forspeculative execution attackerrsquos code can be the part ofspeculation or out-of-order load or store which is able to bepartitioned (eg using speculative load buffer) from othernormal operations The partitioning granularity can be cachesets cache lines or cache ways Partitioning-based securecaches are usually able to prevent external interference bypartitioning but are weak at preventing internal interferenceWhen partitioning is used interference between the attackerand the victim or data belonging to different securitylevels should not be possible and attacks based on externalinterference between the victim and the attacker will failHowever the internal interference of victimrsquos own datais hard to be prevented by the partitioning-based cachesWhatrsquos more partitioning is recognized to be wastefulin terms of cache space and inherently degrades systemperformance [49] Dynamic partitioning can help limit thenegative performance and space impacts but it could be ata cost of revealing some information when adjusting thepartitioning size for each part It also does not help withinternal interference prevention
In terms of the three-step model the partitioning-basedcaches excel at making use of partitioning techniques todisallow the attacker to set initial states (Step 0) of victimpartition by use of flushing or eviction and therefore bringuncertainty to the final timing observation made by theattacker
SPlowast cache can prevent external miss-based interferencebut it still allows the victim and the attacker to getcache hits due to each otherrsquos data which makes hit-based vulnerabilities happen eg Vd Vu Va (fast)(one type of Cache Internal Collision [5]) vulnerabilityis one of the examples that SPlowast cache cannot preventSecVerilog cache is similar to SPlowast cache but prevents theattacker from directly getting cache hit due to victimrsquos datafor confidentiality and therefore prevents vulnerabilitiessuch as Ainv
a Vu Aa (fast) (one type of Flush+ Reload [55]) SHARP cache mainly uses partitioningcombined with random eviction to minimize the probabilityof evicting victimrsquos data and prevent external miss-basedvulnerabilities It is vulnerable to hit-based or internalinterference vulnerabilities such as Vu Va Vu (slow)(one type of Bernsteinrsquos Attack [3]) vulnerability DAWGcache will only allow the data to get a cache hit if bothits address and the process ID are the same Thereforecompared with normal partitioning cache such as SPlowast cacheit is able to prevent vulnerabilities such as Vd Vu Ainv
d
(fast) (one type of Prime + Flush)SecDCP and NoMo cache both leverage dynamic
partitioning to improve performance Compared withSecVerilog cache SecDCP cache introduces certain side
J Hardw Syst Secur (2019) 3397ndash425414
channels which manifest themselves when the number ofways assigned to the victim and attacker changes egVu Ainv
a Vu (slow) (one type of Flush + Time)vulnerability NoMo cache behaves more carefully whenchanging the number of ways during dynamic partitioninghowever it requires victimrsquos sensitive data to fit intothe assigned partitions otherwise it will be put into theunreserved way and allow eviction by the attacker SecDCPdoes not have unreserved way All the space in the cachewill be either belongs to High or Low partition
Sanctum cache and CATalyst cache are both controlledby a powerful software monitor and they disallow securepage sharing between victim and attacker to preventvulnerabilities such as Ad Vu Aa (fast) (onetype of Flush + Reload [55]) Sanctum cache does notconsider internal interference while CATalyst cache ismore carefully designed to prevent different vulnerabilitieswith the implemented software system so far supportingpreventing all of the vulnerabilities but only works forLLC and with high software implementation complexityand some assumptions that might be hard to achievein other scenarios eg assuming the secure partition isbig enough to fit all the secure data MI6 cache is thecombination of Sanctum and disabling speculation wheninteracting with the outside world Therefore in normalexecution it behaves the same as Sanctum For speculativeexecution because it will simply disable all the speculationwhen involving the outside world the external interferencevulnerability such as Vd Vu Ad (slow) (one type ofEvict + Probe) vulnerability will be prevented
InvisiSpec cache does not modify the original cachestate but places the data in a speculative buffer partitionduring the speculation or out-of-order load or store Sinceduring speculation cache state is not actually updated thespeculative execution cannot trigger any of the steps inthe three-step model RIC cache focuses on eviction basedattack and therefore are good at preventing even someinternal miss-based vulnerability such as Vu Va Vu
(slow) (one type of Bernsteinrsquos Attack [3]) but are bad atall hit-based vulnerabilities PL cache is line-partitionedand uses locking techniques for victimrsquos security criticaldata It can prevent many vulnerabilities because preloadingand locking secure data disallow the attacker or non-securevictim data to set initial states (Step 0) for victim partitionand therefore brings uncertainty to the final observation bythe attacker eg Ad Vu Va (fast) (one type of CacheInternal Collision [5]) vulnerability is prevented
Randomization-Based Caches (columns for SHARP cacheand columns for RP cache to columns for Non Deterministiccache in Tables 4 and 5) inherently de-correlate therelationship between information of victimrsquos securitycritical datarsquos address and observed timing from cache
hit or miss or between the address and observed timingof flush or cache coherence operations For speculativeexecution they also de-correlate the relationship betweenthe address of the data being accessed during speculativeexecution or out-of-order load or store and the observedtiming from a cache hit or miss Randomization can beused when bringing data into the cache evicting data orboth Some designs randomize the address to cache setmapping As a result of the randomization the mutualinformation from the observed timing due to having ornot having data in the cache could be reduced to 0 ifrandomization is done on every memory access Somesecure caches use randomization to avoid many of the miss-based internal interference vulnerabilities However theymay still suffer from hit-based vulnerabilities especiallywhen the vulnerabilities are related to internal interferenceHowever randomization is also likewise recognized toincrease performance overheads [23] It also requires afast and secure random number generator Most of therandomization is cache-line-based and can be combinedwith differentiation of sensitive data to be more efficient
RP cache allows eviction between different sensitivedata which leaves vulnerabilities such as Vu Va Vu
(slow) (one type of Bernsteinrsquos Attack [3]) still possiblewhile Newcache prevents this Both of the RP cacheand Newcache are not able to prevent hit-based internal-interference vulnerabilities such as Ainv
a Vu Va
(fast) (one type of Cache Internal Collision [5]) RandomFill cache is able to use total de-correlation of memoryaccess and cache access of victimrsquos security critical datato prevent most of the internal and external interferenceHowever when security critical data is initially directlyloaded into the cache block for Step 1 Random Fill cachewill not randomly load security critical data and allowsvulnerabilities such as Vu V inv
a Vu (slow) (one typeof Flush + Time) vulnerability to exist CEASER cache usesencryption scheme plus dynamic remapping to randomizemapping from memory addresses to cache sets Howeverthis targets eviction-based attacks and cannot prevent hit-based vulnerabilities such as Va V inv
u V inva (fast)
(one type of Flush + Probe Invalidation) SCATTER cacheencrypts both the cache address and process ID whenmapping into different cache index to further prevent morehit-based vulnerabilities for shared and read only memoryNon Deterministic Cache totally randomizes timing ofcache accesses by adding delays and can prevent all attacks(but at tremendous performance cost)
61 Estimated Performance and Security Tradeoffs
Table 6 shows the implementation and performanceresults of the secure caches as listed by the designersin the different papers At the extreme end there is
J Hardw Syst Secur (2019) 3397ndash425 415
Table6
Exi
stin
gse
cure
cach
esrsquo
impl
emen
tatio
nm
etho
dpe
rfor
man
cep
ower
and
area
sum
mar
y
J Hardw Syst Secur (2019) 3397ndash425416
the Non Deterministic cache with random delay thesecure cache can prevent all the cache timingndashbasedvulnerabilities in some degreemdashwhile their paper reportsonly 7 degradation in performance we expect it tobe much more for more complex application than AESalgorithm Disabling caches eliminates the attacks but ata huge performance cost Normally a secure cache needsto sacrifice some performance in order to de-correlatememory access with the timing The secure caches thattend to be able to prevent more vulnerabilities usuallyhave weaker performance compared with other securecaches For example more security seems to imply lessperformance
62 Towards Ideal Secure Cache
Based on the above analysis a good secure cache shouldconsider all the 72 types of Strong vulnerabilities egexternal and internal interference and hit-based and miss-based vulnerabilities Considering all factors and basedon Tables 4 and 5 we have several suggestions andobservations for a secure cache design which can defendtiming-based attacks
bull Internal interference is important for caches to preventtiming-based attacks and is the weak point of most ofthe secure caches To prevent this the following threesubpoints should be considered
ndash Miss-based internal interference can be solvedby randomly evicting data to de-correlatememory access with timing information wheneither data to be evicted or data to be cachedis sensitive eg Newcache prevents Vu Va Vu (slow) (one type of BernsteinrsquosAttack [3]) vulnerability
ndash Hit-based internal interference can be solvedby randomly bringing data into the cache egRandom Fill cache prevents Ad Vu Va
(fast) (Cache Internal Collision) vulnerabilityndash To limit internal interference at lower perfor-
mance cost rather than simply assume all ofvictimrsquos data is sensitive it is better to differ-entiate real sensitive data from other data in thevictim code However identification of sen-sitive information needs to be carefully usedeg Random Fill cache is vulnerable to Vu Ad Vu (fast) (one type of Evict + Time [34])vulnerability which most of the secure cachesare able to prevent
bull Direct partitioning between the victim and the attackeralthough may hurt cache space utilization or perfor-mance is good at disallowing attacker to set known
initial state to victimrsquos partition and therefore pre-vents external interference Alternatively careful use ofrandomization can also prevent external interference
It should be noted that some cache designs only focuson certain levels eg CATalyst cache only works at thelast level cache In order to fully protect the whole cachesystem from timing-based attacks all levels of caches in thehierarchy should be protected with related security featuresFor example Sanctum is able to prevent all levels of cachesfrom L1 to last-level cache Consequently secure cachedesign needs to be realizable at all levels of the cachehierarchy
7 RelatedWork
There are a lot of existing attacks exploring timing-based cache channels eg [1 3 5 16ndash19 34 36 5455] Furthermore our recent paper [11] has summarizedcache timingndashbased side-channel vulnerabilities using athree-step model and inspired this work on checkingwhich vulnerability types are truly defeated by the securecaches in context of timing-based attacks In other workZhang and Lee [59] used finite-state machine to modelcache architectures and leveraged mutual information tomeasure potential side-channel leakage of the modeledcache architectures Meanwhile He and Lee [20] modeledinterference using probabilistic information flow graph andused attackerrsquos success probability to estimate differentcachesrsquo ability to defend against some cache timingndashbasedside-channel attacks However they did not explore allpossible vulnerabilities due to cache timingndashbased channels
There is also some other work focusing on cacheside-channel verification [13 14 47] Among theseCacheAudit [14] efficiently computes possible side-channelobservations using abstractions in a modular way Bit-level and arithmetic reasoning is used in [13] for memoryaccesses in the presence of dynamic memory allocationCacheD [47] detects potential cache differences at eachprogram point leveraging symbolic execution and constraintsolving
Hardware transactional memory has also been lever-aged to prevent timing-based cache side-channel attacks [815] Hardware transactional memory (HTM) is availableon modern commercial processors such as Intelrsquos Transac-tional Synchronization Extensions (TSX) Its main featureis to abort the transaction and roll back the modificationswhenever a cache block contained in the read set or writeset is evicted out of the cache In [15] HTM was com-bined with preloading strategy for code and data to preventFlush + Reload attacks in the local setting and Prime andProbe attacks in the cloud setting In [8] the software-level
J Hardw Syst Secur (2019) 3397ndash425 417
solution targets system calls page faults code refactoringand abort reasoning to eliminate not only Prime + ProbeFlush + Reload but also Evict + time and Cache Collisionattacks
8 Conclusion
This paper first proposed a new three-step model in orderto model all possible cache timing vulnerabilities It furtherprovided a cache three-step simulator and reduction rulesto derive effective vulnerabilities allowing us to find onesthat have not been exploited in literature With exhaustiveeffective vulnerability types listed this paper presentedanalysis of 18 secure processor cache designs with respectto how well they can defend against these timing-basedvulnerabilities Our work showed that vulnerabilities basedon internal interference of the victim application aredifficult to protect against and many secure cache designsfail in this We also provided a summary of secure processorcache features that could be integrated to make an idealsecure cache that is able to defend timing-based attacksOverall implementing a secure cache in a processor can be aviable alternative to defend timing-based attacks Howeverit requires design of an ideal secure cache or correction ofexisting secure cache designs to eliminate the few attacksthat they do not protect against
Acknowledgments This work was support in part by NSF grants1651945 and 1813797 and through Semiconductor Research Corpo-ration (SRC) under GRC Task 2844001
Appendix A Attack Strategies Descriptions
This appendix gives overview of the attack strategies shownin Tables 2 and 3 in Section 3 For each attack strategy anoverview of the three steps of the strategy is given Some ofthe strategies are similar and some may not be precise butwe keep and use the original names as they were assignedin prior work One advantage of our three-step model isthat it gives precise definition of each attack Neverthelessthe attack strategy names used before (and added by us forstrategies which did not have such names) may be useful torecall the attacksrsquo high-level operation
Cache Internal Collision In Step 1 cache blockrsquos data isinvalidated by flushing or eviction done by either theattacker or the victim Then the victim accesses secret datain Step 2 Finally the victim accesses data at a knownaddress in Step 3 if there is a cache hit then it reveals thatthere is an internal collision and leaks value of u
Flush + Reload In Step 1 either the attacker or the victiminvalidates the cache blockrsquos data by flushing or evictionThen the victim access secret data in Step 2 Finally theattacker tries to access some data in Step 2 using a knownaddress If a cache hit is observed then addresses from lasttwo steps are the same and the attacker learns the secretaddress This strategy has similar Step 1 and Step 2 asCache Internal Collision vulnerability but for Step 3 it isthe attacker who does the reload access
Reload + Time (New Name Assigned in this Paper) In Step
1 secret data is invalidated by the victim Then the attackerdoes some known data access in Step 2 that could possiblybring back the invalidated the victimrsquos secret data in Step
1 In Step 3 if the victim reloads the secret data a cachehit is observed and the attacker can derive the secret datarsquosaddress
Flush + Probe (New Name Assigned in this Paper) In Step
1 the victim or the attacker access some known addressIn Step 2 the victim invalidates secret data In Step 3reloading of Step 1rsquos data and observation of a cache misswill help the attacker learn that the secret data maps to theknown address from Step 1
Evict + Time In Step 1 some victimrsquos secret data is put intothe cache by the victim itself In Step 2 the attacker evicts aspecific cache set by performing a memory related operationthat is not a flush In Step 3 the victim reloads secret dataand if a cache miss is observed the will learn the secretdatarsquos cache set information This attack has similar Step 1and Step 3 as Flush + Time vulnerability but for Step 2 inEvict + Time the attacker invalidates some known addressallowing it to find the full address of the secret data insteadof evicting a cache set to only find the secret datarsquos cacheindex as in the Flush + Time attack
Prime + Probe In Step 1 the attacker primes the cache setusing data at address known to the attacker In Step 2 thevictim accesses the secret data which possibly evicts datafrom Step 1 In Step 3 the attacker probes each cache setand if a cache miss is observed the attacker knowns thesecret data maps to the cache set he or she primed
Bernsteinrsquos Attack This attack strategy leverages the vic-timrsquos internal interference to trigger the miss-based attackFor one case the victim does the same secret data access inStep 1 and Step 3 while in Step 2 the victim tries to evictone whole cache setrsquos data by known data accesses If cachemiss is observed in Step 3 that will tell the attacker thecache set is the one secret data maps to For another casethe victim primes and probe a cache set in Step 1 and Step
3 driven by the attacker while in Step 2 the victim tries to
J Hardw Syst Secur (2019) 3397ndash425418
access the secret data Similar to the first case observingcache miss in Step 3 tells the attacker the cache set is theone secret data maps to
Evict + Probe (NewName Assigned in this Paper) In Step 1victim evict the cache set using the access to a data at anaddress known to the attacker In Step 2 the victim accessessecret data which possibly evicts data from Step 1 In Step
3 the attacker probes each cache set using the same data asin Step 1 if a cache miss is observed the attacker knownsthe secret data maps to the cache set he or she primed Thisattack strategy has similar Step 2 and Step 3 as Prime +Probe attack but for Step 1 it is the victim that does theeviction accesses
Prime + Time (NewName Assigned in this Paper) In Step 1the attacker primes the cache set using access to data at anaddress known to the attacker In Step 2 the victim accessessecret data which possibly evicts data from Step 1 In Step
3 the victim probes each cache set using the same data Step
1 if a cache miss is observed the attacker knowns the secretdata maps to the cache set he or she primed in Step 1 Thisattack strategy has similar Step 1 and Step 2 as Prime +Probe attack but for Step 3 it is the victim that does theprobing accesses
Flush + Time (NewNameAssigned in this Paper) The victimaccesses the same secret data in Step 1 and Step 3 whilein Step 2 the attacker tries to invalidate data at a knownaddress If cache miss is observed in Step 3 that will tellthe attacker the data address he or she invalidated in Step 2maps to the secret data
Invalidation related (new names assigned in this paper)Vulnerabilities that have names ending with ldquoinvalidationrdquoin Table 3 correspond to the vulnerabilities that havethe same name (except for the ldquoinvalidationrdquo part) inTable 2 The difference between each set of correspondingvulnerabilities is that the vulnerabilities ending withldquoinvalidationrdquo use invalidation related operation in the laststep to derive the timing information rather than the normalmemory access related operations
Appendix B Soundness Analysisof the Three-StepModel
In this section we analyze the soundness of the three-stepmodel to demonstrate that the three-step model can coverall possible timing-based cache vulnerabilities in normalcaches If there is a vulnerability that is represented usingmore than three steps the steps can be reduced to only three
steps or a three-step sub-pattern can be found in the longerrepresentation
In the below analysis we use β to denote the numberof memory-related operations ie steps in a representationof a vulnerability We show that β = 1 is not sufficient torepresent a vulnerability β = 2 covers some vulnerabilitiesbut not all β = 3 represents all the vulnerabilities andβ gt 3 can be reduced to only three steps or a three-step sub-pattern can be found in the longer representation Knownaddresses refer to all the cache states that interferencewith the data a aalias and d Unknown address refersu An access to a known memory address is denoted asknown access operation and an invalidation of a knownmemory address is denoted as known inv operationThe known access operation and known inv operation
together make up not u operations An unknown memoryrelated operation (containing u) is denoted as u operation
B1 Patterns with β = 1
When β = 1 there is only one memory-relatedoperation and it is not possible to create interferencebetween memory-related operations since two memory-related operations are the minimum requirement for aninterference Furthermore β = 1 corresponds to the three-step pattern with both Step 1 and Step 2 being since thecache state gives no information and Step 3 being theone operation These types of patterns are all examined bythe cache three-step simulator and none of these types arefound to be effective Consequently a vulnerability cannotexit when β = 1
B2 Patterns with β = 2
When β = 2 it satisfies the minimum requirementof an interference for memory related operations andcorresponds to the three-step cases where Step 1 is and Step 2 and Step 3 are the two operations Thesetypes are all examined by the cache three-step simulatorand some of them belong to Weak Vulnerabilities like Aa Vu Therefore three-step cases where Step
1 is have corresponding effective vulnerabilities shownin Table 2 Consequently β = 2 can represent someweak vulnerabilities but not all vulnerabilities as there existsome that are represented with three steps as discussednext
B3 Patterns with β = 3
When β = 3 we have tested all possible combinations ofthree-step memory related operations in Section 33 usingour cache simulator for the three-step model We foundthat there are in total 72 types of Strong Vulnerabilities
J Hardw Syst Secur (2019) 3397ndash425 419
and 64 types of Weak Vulnerabilities that are representedby patterns with β = 3 steps Consequently β = 3can represent all the vulnerabilities (including some weakones where Step 1 is ) Using more steps to representvulnerabilities is not necessary as discussed next
B4 Patterns with β gt 3
When β gt 3 the pattern of memory-related operations fora vulnerability can be reduced using the following rules
B41 Subdivision Rules
First a set of subdivision rules is used to divide the longpattern into shorter patterns following the below rulesEach subdivision rule should be applied recursively beforeapplying the next rule
Subdivision Rule 1 If the longer pattern contains a sub-pattern such as the longer pattern canbe divided into two separate patterns where is assignedas Step 1 of the second pattern This is because givesno timing information and the attacker loses track of thecache state after This rule should be recursively applieduntil there are no sub-patterns left with a in the middleor as last step ( in the last step will be deleted) in thelonger pattern
Subdivision Rule 2 Next if a pattern (derived afterrecursive application of the Rule 1 contains a sub-patternsuch as AinvVinv the longer pattern canbe divided into two separate patterns where AinvVinv isassigned as Step 1 of the second pattern This is becauseAinvVinv will flush all the timing information of thecurrent block and it can be used as the flushing step forStep 1 eg vulnerability Ainv Vu Aa(f ast)shown in Table 2 AinvVinv cannot be a candidate formiddle steps or the last step because it will flush alltiming information making the attacker unable to deducethe final timing with victimrsquos sensitive address translationinformation This rule should be recursively applied untilthere are no sub-patterns left with a AinvVinv in themiddle or the last step (AinvVinv in the last step will bedeleted)
B42 Simplification Rules
For each of the patterns resulting from the subdivisionof the original pattern we define Commute Rules UnionRules and Reduction Rules for a each set of two adjacentsteps in these remaining patterns In Table 7 we showall the possible cases of the rule applying conditions for
each adjacent two steps regardless of the attackerrsquos access(A) or the victimrsquos access (V ) The table shows whetherthe corresponding two steps can be commuted reduced orunioned (and the reduced or the unioned result if the rulescan be applied)
B421 Commute Rules
Suppose there are two adjacent steps M and N for a memorysequences M N If commuting M andN lead to the same observation result ie M N and N M will have thesame timing observation information in the final step for theattacker we can freely exchange the place of M and N inthis pattern In this case we have more chance to Reduceand Union the steps within the memory sequence by thefollowing Rules In the possible commuting process we willtry every possible combinations to commute different pairsof two steps that are able to apply the Commute Rules andthen further apply Reduce Rules and Union Rules to seewhether the commute is effective ie there can be stepsreduced or unioned after the proper commuting process Thefollowing two adjacent memory-related operations can becommuted
ndash Commute Rule 1 For two adjacent steps if one stepis a known access operation and another step is aknown inv operation and the addresses they referto are different these two steps can be commuted nomatter which position of the two steps they are in withinthe whole memory sequence It will show a ldquoyesrdquo forthe corresponding two-step pattern for the CommuteRule 1 column if these two can be commuted in Table 7
ndash Commute Rule 2 A superset of two-step patterns thatcan apply Commute Rule 1 can be commuted if thesecond step of these two adjacent steps is not thelast step in the whole memory sequence There aresome two adjacent steps that can only be commutedif the second step of these two adjacent steps is notthe last step in the whole memory sequence Therewill be a ldquoyesrdquo for the corresponding two-step patternfor the Commute Rule 2 column and a ldquonordquo for thecorresponding two-step pattern for the Commute Rule 1column in Table 7
B422 Reduction Rules
If the memory sequence after applying Commute Ruleshave a sub-pattern that has two adjacent steps both relatedto known addresses or both related to unknown address(including repeating states) the two adjacent steps can bereduced to only one following the reduction rules (if thetwo-step pattern has ldquoyesrdquo for the Column ldquoUnion Rule or
J Hardw Syst Secur (2019) 3397ndash425420
Table7
Rul
esfo
rco
mbi
ning
two
adja
cent
step
s
Firs
tSe
cond
Com
mute
Com
mute
Union
Rule
orC
ombi
ned
Firs
tSe
cond
Com
mute
Com
mute
Union
Rule
orC
ombi
ned
Rule1
Rule2
ReduceRule
Step
Rule1
Rule2
ReduceRule
Step
aa
yes
yes
yes
aa
inv
ano
noye
sa
aa
ali
as
nono
yes
aali
as
ain
va
ali
as
yes
yes
yes
aali
as
ad
nono
yes
da
inv
dye
sye
sye
sd
au
nono
nominus
ain
vu
nono
nominus
aa
inv
nono
yes
ain
va
inv
ain
vye
sye
sye
sa
inv
aa
ali
asin
vye
sye
sye
sa
ain
va
ali
asin
vye
sye
sye
sU
nio
n(a
inva
ali
asin
v)
ad
inv
yes
yes
yes
aa
inv
din
vye
sye
sye
sU
nio
n(a
invd
inv)
au
inv
nono
nominus
ain
vu
inv
noye
sno
minusa
ali
as
ano
noye
sa
aali
asin
va
yes
yes
yes
a
aali
as
aali
as
yes
yes
yes
aali
as
aali
asin
va
ali
as
nono
yes
aali
as
aali
as
dno
noye
sd
aali
asin
vd
yes
yes
yes
d
aali
as
uno
nono
minusa
ali
asin
vu
nono
nominus
aali
as
ain
vye
sye
sye
sa
ali
as
aali
asin
va
inv
yes
yes
yes
Unio
n(a
inva
ali
asin
v)
aali
as
aali
asin
vno
noye
sa
ali
asin
va
ali
asin
va
ali
asin
vye
sye
sye
sa
ali
asin
v
aali
as
din
vye
sye
sye
sa
ali
as
aali
asin
vd
inv
yes
yes
yes
Unio
n(d
inva
ali
asin
v)
aali
as
uin
vno
nono
minusa
ali
asin
vu
inv
noye
sno
minusd
ano
noye
sa
din
va
yes
yes
yes
a
da
ali
as
nono
yes
aali
as
din
va
ali
as
yes
yes
yes
aali
as
dd
yes
yes
yes
dd
inv
dno
noye
sd
du
nono
nominus
din
vu
noye
sno
minusd
ain
vye
sye
sye
sd
din
va
inv
yes
yes
yes
Unio
n(a
invd
inv)
da
ali
asin
vye
sye
sye
sd
din
va
ali
asin
vye
sye
sye
sU
nio
n(d
inva
ali
asin
v)
dd
inv
nono
yes
din
vd
inv
din
vye
sye
sye
sd
inv
du
inv
noye
sno
minusd
inv
uin
vno
yes
nominus
ua
nono
nominus
uin
va
nono
nominus
ua
ali
as
nono
nominus
uin
va
ali
as
nono
nominus
ud
nono
nominus
uin
vd
noye
sno
minusu
uye
sye
sye
su
uin
vu
nono
yes
u
ua
inv
nono
nominus
uin
va
inv
noye
sno
minusu
aali
asin
vno
nono
minusu
inv
aali
asin
vno
yes
nominus
ud
inv
noye
sno
minusu
inv
din
vno
yes
nominus
uu
inv
nono
yes
uin
vu
inv
uin
vye
sye
sye
su
inv
J Hardw Syst Secur (2019) 3397ndash425 421
Reduce Rulerdquo and has no Union result for the ldquoCombinedSteprdquo Column in Table 7
ndash Reduction Rule 1 For two u operations although u isunknown both of the accesses target on the same u socan be reduced to only keep the second access in thememory sequence
ndash Reduction Rule 2 For two known adjacent memoryaccessndashrelated operations (known access operation)they always result in a deterministic state of the secondmemory accessndashrelated cache block so these two stepscan be reduced to only one step
ndash Reduction Rule 3 For two adjacent steps if onestep is known access operation and another one isknown inv operation no matter what order theyhave and the address they refer to is the same these twocan be reduced to one step which is the second step
B423 Union Rules
Suppose there are two adjacent steps M and N for memorysequences M N If combining M andN leads to the same timing observation result ie M N and Union(M N) will havethe same timing observation information in the final step forthe attacker we can combine step M and N to be a joint onestep for this memory sequence defined as Union(M N)Two adjacent steps that can be combined are discussed inthe following cases
ndash Union Rule 1 Two invalidations to two knowndifferent memory addresses can be applied UnionRule 1 known inv operation are two operations bothinvalidating some known address therefore they canbe combined to only one step The Union Rule can becontinuously done to union all the adjacent invalidationstep that invalidates known different memory addresses
B424 Final Check Rules
Each long memory sequence will recursively apply thesethree categorizations of the rules in the order Com-mute Rules first to put known access operations andknown inv operation that targets the same address as nearas possible and u operations and not u operations areputting together as much as possible The Reduced Rulesare then checked and applied to the processed memorysequence to reduce the steps Then the Union Rule isapplied to the processed memory sequence
The recursion at each application to these threecategorizations of the rules should be always applied andreduce at least one step until the resulting sequence matchesone of the two possible cases
bull the long (β gt 3) memory sequence with u operation
and not u operation is further reduced to a sequencewhere there are at most three steps in the followingpatterns or less
ndash u operation not u operation u operation
ndash not u operation u operation not u operation
There might be possible extra or AinvV inv beforethese three-step pattern where
bull An extra in the first step will not influencethe result and can be directly removed
bull If an extra AinvV inv in the first step
ndash If followed by known access ope-ration AinvV inv can be removeddue to the actual state further put intothe cache block
ndash If followed by known inv opera-t ion or V inv
u AinvV inv can also beremoved since the memory locationis repeatedly flushed by the two steps
ndash If followed by Vu worst case will beAinvV inv Vu not u operation u operationwhich is either an effective vul-nerability according to Table 2and reduction rules shown inSection 33 or AinvV inv Vu Ainv
d V invd u operation where
Vu Ainvd V inv
d can further beapplied Commute Rule 2 to reduceand be within three steps
In this case the steps are finally within three steps andthe checking is done
bull There exist two adjacent steps that cannot be appliedany Rules above and requires the Rest Checking
The only left two adjacent steps that cannot be appliedby any of the three categorizations of the Rules are thefollowing
ndash AaVaAaalias Vaalias AdVdAinva V inv
a
Ainvaalias V inv
aalias Vu ndash AaVaAaalias Vaalias V inv
u ndash Vu AaVaAaalias Vaalias AdVd
Ainva V inv
a Ainvaalias V inv
aalias ndash V inv
u AaVaAaalias Vaalias We manually checked all of the two adjacent step patterns
above and found that adding extra step before or after these
J Hardw Syst Secur (2019) 3397ndash425422
Algorithm 2 -Step ( 3) pattern reduction
Input number of steps of the pattern
a two-dimensional dynamic-size array [0] contains the states of each step of the original pattern in
order [1] [2] are empty initially
Output reduced effective vulnerability pattern(s) array It will be an empty list if the original pattern does not
correspond to an effective vulnerability
1 = Oslash
2 while contain( ) and index not 0 do3 = 1 ( )
4 end while5 while ( contain( ) and index not 0) or ( contain( ) and index not 0) do6 = 2 ( )
7 end while8 while ( is ineffecitve or has interval effective three steps) do9 = ( )
10 = ( )
11 = ( )
12 if ( is ineffecitve or has interval effective three steps) then13 += ( )
14 end if15 end while16 return
two steps can either generate two adjacent step patterns thatbe processed by the three Rules where further step can bereduced or construct effective vulnerability according toTable 2 and reduction rules shown in Section 33 wherethe corresponding pattern can be treated effective and thechecking is done
B43 Algorithm for Reducing and CheckingMemorySequence
Algorithm 2 is used to (i) reduce a β-step (β gt 3)pattern to a three-step pattern thus demonstrating that thecorresponding β gt 3 step pattern actually is equivalent tothe output three-step pattern and represents a vulnerabilitythat is captured by an existing three-step pattern or (ii)demonstrate that the β-step pattern can be mapped to one ormore three-step vulnerabilities It is not possible for a β-stepvulnerability pattern to not be either (i) or (ii) after doing theRule applications Key outcome of our analysis is that anyβ-step pattern is not a vulnerability or if it is a vulnerabilityit maps to either outputs (i) or (ii) of the algorithm
Inside the Algorithm 2 contain() represents a function tocheck if a list contains a corresponding state is ineffective()represents a function that checks the corresponding memorysequence does not contain any effective three-stepshas interval effective three steps() represents a functionthat check if the corresponding memory sequence can bemapped to one or more three-step vulnerabilities
B44 Summary
In conclusion the three-step model can model all possibletiming-based cache vulnerability in normal caches Vulner-abilities which are represented by more than three steps canbe always reduced to one (or more) vulnerabilities from ourthree-step model and thus using more than three step is notnecessary
References
1 Acıicmez O Koc CK (2006) Trace-driven cache attacks on AES(short paper) In International conference on information andcommunications security Springer pp 112ndash121
2 Agrawal D Archambeault B Rao JR Rohatgi P (2002) TheEM side-channel (s) In International workshop on cryptographichardware and embedded systems Springer pp 29ndash45
3 Bernstein DJ (2005) Cache-timing attacks on AES4 Binkert N Beckmann B Black G Reinhardt SK Saidi A Basu
A Hestness J Hower DR Krishna T Sardashti S et al (2011)The gem5 simulator ACM SIGARCH computer architecture news39(2)1ndash7
5 Bonneau J Mironov I (2006) Cache-collision timing attacksagainst AES In International workshop on cryptographichardware and embedded systems Springer pp 201ndash215
6 Borghoff J Canteaut A Guneysu T Kavun EB Knezevic MKnudsen LR Leander G Nikov V Paar C Rechberger Cet al (2012) Princendasha low-latency block cipher for pervasivecomputing applications In International conference on the theoryand application of cryptology and information security Springerpp 208ndash225
J Hardw Syst Secur (2019) 3397ndash425 423
7 Bourgeat T Lebedev I Wright A Zhang S Devadas S et al (2019)MI6 Secure enclaves in a speculative out-of-order processorIn Proceedings of the 52nd Annual IEEEACM InternationalSymposium on Microarchitecture ACM pp 42ndash56
8 Chen S Liu F Mi Z Zhang Y Lee RB Chen H Wang X(2018) Leveraging hardware transactional memory for cache side-channel defenses In Proceedings of the 2018 on Asia conferenceon computer and communications security ACM pp 601ndash608
9 Clark SS Ransford B Rahmati A Guineau S Sorber J Xu WFu K (2013) Wattsupdoc power side channels to nonintrusivelydiscover untargeted malware on embedded medical devices InPresented as part of the 2013 USENIX workshop on healthinformation technologies
10 Costan V Lebedev IA Devadas S (2016) Sanctum minimalhardware extensions for strong software isolation In USENIXsecurity symposium pp 857ndash874
11 Deng S Xiong W Szefer J (2018) Cache timing side-channel vulnerability checking with computation tree logic InProceedings of the 7th international workshop on hardware andarchitectural support for security and privacy ACM p 2
12 Domnitser L Jaleel A Loew J Abu-Ghazaleh N Ponomarev D(2012) Non-monopolizable caches low-complexity mitigation ofcache side channel attacks ACM Transactions on Architectureand Code Optimization (TACO) 8(4)35
13 Doychev G Kopf B (2017) Rigorous analysis of softwarecountermeasures against cache attacks In Proceedings of the 38thACM SIGPLAN conference on programming language design andimplementation ACM pp 406ndash421
14 Doychev G Kopf B Mauborgne L Reineke J (2015) CacheAudita tool for the static analysis of cache side channels ACM Trans-actions on information and system security (TISSEC) 18(1)4
15 Gruss D Lettner J Schuster F Ohrimenko O Haller I CostaM (2017) Strong and efficient cache side-channel protectionusing hardware transactional memory In USENIX securitysymposium pp 217ndash233
16 Gruss D Maurice C Wagner K Mangard S (2016) Flush+ flusha fast and stealthy cache attack In International conference ondetection of intrusions and malware and vulnerability assessmentSpringer pp 279ndash299
17 Gruss D Spreitzer R Mangard S (2015) Cache template attacksautomating attacks on inclusive last-level caches In USENIXsecurity symposium pp 897ndash912
18 Guanciale R Nemati H Baumann C Dam M (2016) Cache stor-age channels alias-driven attacks and verified countermeasuresIn 2016 IEEE symposium on security and privacy (SP) IEEEpp 38ndash55
19 Gullasch D Bangerter E Krenn S (2011) Cache gamesndashbringingaccess-based cache attacks on AES to practice In 2011 IEEEsymposium on security and privacy (SP) IEEE pp 490ndash505
20 He Z Lee RB (2017) How secure is your cache against side-channel attacks In Proceedings of the 50th annual IEEEACMinternational symposium on microarchitecture ACM pp 341ndash353
21 Intel C (2015) Improving real-time performance by utilizing cacheallocation technology Intel Corporation
22 Kayaalp M Khasawneh KN Esfeden HA Elwell J Abu-Ghazaleh N Ponomarev D Jaleel A (2017) RIC relaxed inclusioncaches for mitigating LLC side-channel attacks In 2017 54thACMEDACIEEE design automation conference (DAC) IEEEpp 1ndash6
23 Keramidas G Antonopoulos A Serpanos DN Kaxiras S (2008)Non deterministic caches a simple and effective defense againstside channel attacks Des Autom Embed Syst 12(3)221ndash230
24 Kessler RE Hill MD (1992) Page placement algorithms for largerealindexed caches ACM Trans Comput Syst (TOCS) 10(4)338ndash359
25 Kiriansky V Lebedev I Amarasinghe S Devadas S Emer J(2018) DAWG a defense against cache timing attacks in spec-ulative execution processors In 2018 51st Annual IEEEACMinternational symposium on microarchitecture (MICRO) IEEEpp 974ndash987
26 Kocher P Genkin D Gruss D Haas W Hamburg M LippM Mangard S Prescher T Schwarz M Yarom Y (2018)Spectre attacks exploiting speculativeexecution In 2019 IEEESymposium on Security and Privacy (SP) IEEE pp 1ndash19
27 Lee RB Kwan P McGregor JP Dwoskin J Wang Z (2005)Architecture for protecting critical secrets in microprocessorsIn ACM SIGARCH computer architecture news vol 33 IEEEComputer Society pp 2ndash13
28 Lee Y Waterman A Avizienis R Cook H Sun C StojanovicV Asanovic K (2014) A 45nm 13 GHz 167 double-precisionGFLOPSW RISC-V processor with vector accelerators InEuropean solid state circuits conference (ESSCIRC) ESSCIRC2014-40th IEEE pp 199ndash202
29 Liu F Ge Q Yarom Y Mckeen F Rozas C Heiser G Lee RB(2016) CATalyst defeating last-level cache side channel attacks incloud computing In 2016 IEEE international symposium on highperformance computer architecture (HPCA) IEEE pp 406ndash418
30 Liu F Lee RB (2014) Random fill cache architecture In2014 47th annual IEEEACM international symposium onmicroarchitecture (MICRO) IEEE pp 203ndash215
31 Liu F Wu H Mai K Lee RB (2016) Newcache secure cachearchitecture thwarting cache side-channel attacks IEEE Micro36(5)8ndash16
32 Luk CK Cohn R Muth R Patil H Klauser A Lowney G WallaceS Reddi VJ Hazelwood K (2005) Pin building customizedprogram analysis tools with dynamic instrumentation In ACMsigplan notices vol 40 ACM pp 190ndash200
33 Masti RJ Rai D Ranganathan A Muller C Thiele L Capkun S(2015) Thermal covert channels on multi-core platforms In 24thUSENIX security symposium (USENIX security 15) pp 865ndash880
34 Osvik DA Shamir A Tromer E (2006) Cache attacks andcountermeasures the case of AES In Cryptographersrsquo track atthe RSA conference Springer pp 1ndash20
35 Patel A Afram F Chen S Ghose K (2011) MARSS Afull system simulator for multicore x86 CPUs In 2011 48thACMEDACIEEE design automation conference (DAC) IEEEpp 1050ndash1055
36 Percival C (2005) Cache missing for fun and profit BSDCan 2005Ottawa 2005 httpwwwdaemonologynetpapershttpdf
37 Hammarlund P Martinez AJ Bajwa AA Hill DL Hallnor EJiang H Dixon M Derr M Hunsaker M Kumar R Osborne RB(2014) Haswell The fourth-generation intel core processor IEEEMicro 1034(2)6ndash20
38 Qureshi MK (2018) CEASER mitigating conflict-based cacheattacks via encrypted-address and remapping In 2018 51Stannual IEEEACM international symposium on microarchitecture(MICRO) IEEE pp 775ndash787
39 Sanchez D Kozyrakis C (2013) ZSIm fast and accuratemicroarchitectural simulation of thousand-core systems In ACMSIGARCH computer architecture news vol 41 ACM pp 475ndash486
40 Schwarz M Schwarzl M Lipp M Gruss D (2018) NetSpectreread arbitrary memory over network In European Symposium onResearch in Computer Security Springer pp 279ndash299
41 Seznec A (1993) A case for two-way skewed-associative cachesACM SIGARCH computer architecture news 21(2)169ndash178
42 Sharkey J Ponomarev D Ghose K (2005) M-sim a flexiblemultithreaded architectural simulation environment Techenicalreport Department of Computer Science State University ofNew York at Binghamton In Technical Report 20012 CompaqComputer Corporation Aug 2001
J Hardw Syst Secur (2019) 3397ndash425424
43 Shivakumar P Jouppi NP (2001) Cacti 30 an integrated cachetiming power and area model In Technical Report 20012Compaq Computer Corporation Aug 2001
44 Taylor G Davies P Farmwald M (1990) The TLB slice-alow-cost high-speed address translation mechanism In 17thannual international symposium on computer architecture 1990Proceedings IEEE pp 355ndash363
45 Thoziyoor S Muralimanohar N Ahn JH Jouppi NP (2008)CACTI 51 Technical Report HPL-2008-20 HP Labs
46 Trippel C Lustig D Martonosi M (2018) MeltdownPrimeand SpectrePrime automatically-synthesized attacks exploitinginvalidation-based coherence protocols arXiv180203802
47 Wang S Wang P Liu X Zhang D Wu D (2017) CacheDidentifying cache-based timing channels in production softwareIn 26th USENIX security symposium USENIX association
48 Wang Y Ferraiuolo A Zhang D Myers AC Suh GE (2016)SecDCP secure dynamic cache partitioning for efficient timingchannel protection In 2016 53nd ACMEDACIEEE designautomation conference (DAC) IEEE pp 1ndash6
49 Wang Z Lee RB (2007) New cache designs for thwarting softwarecache-based side channel attacks In ACM SIGARCH computerarchitecture news vol 35 ACM pp 494ndash505
50 Wang Z Lee RB (2008) A novel cache architecture withenhanced performance and security In 2008 41st IEEEACMinternational symposium on Microarchitecture 2008 MICRO-41IEEE pp 83ndash93
51 Werner M Unterluggauer T Giner L Schwarz M Gruss D Man-gard S (2019) ScatterCache thwarting cache attacks via cache setrandomization In 28th USENIX security symposium (USENIXSecurity 19) USENIX Association Santa Clara CA httpswwwusenixorgconferenceusenixsecurity19presentationwerner
52 Yan M Choi J Skarlatos D Morrison A Fletcher C Torrellas J(2018) InvisiSpec making speculative execution invisible in thecache hierarchy In 2018 51st annual IEEEACM internationalsymposium on microarchitecture (MICRO) IEEE pp 428ndash441
53 Yan M Gopireddy B Shull T Torrellas J (2017) Secure hierarchy-aware cache replacement policy (SHARP) defending againstcache-based side channel attacks In Proceedings of the 44thannual international symposium on computer architecture ACMpp 347ndash360
54 Yao F Doroslovacki M Venkataramani G (2018) Are coherenceprotocol states vulnerable to information leakage In 2018IEEE international symposium on high performance computerarchitecture (HPCA) IEEE pp 168ndash179
55 Yarom Y Falkner K (2014) FLUSH+ RELOAD a high resolutionlow noise L3 cache side-channel attack In USENIX securitysymposium pp 719ndash732
56 Zhang D Askarov A Myers AC (2012) Language-based controland mitigation of timing channels ACM SIGPLAN Not 47(6)99ndash110
57 Zhang D Wang Y Suh GE Myers AC (2015) A hardware designlanguage for timing-sensitive information-flow security In ACMSIGARCH computer architecture news vol 43 ACM pp 503ndash516
58 Zhang S Wright A Bourgeat T Arvind A (2018) Composablebuilding blocks to open up processor design In 2018 51stannual IEEEACM international symposium on microarchitecture(MICRO) IEEE pp 68ndash81
59 Zhang T Lee RB (2014) New models of cache architecturescharacterizing information leakage from cache side channels InProceedings of the 30th annual computer security applicationsconference ACM pp 96ndash105
60 Zhang Y Parikh D Sankaranarayanan K Skadron K Stan M(2003) HotLeakage a temperature-aware model of subthresholdand gate leakage for architects University of Virginia Dept ofComputer Science Tech Report CS-2003 5
Publisherrsquos Note Springer Nature remains neutral with regard tojurisdictional claims in published maps and institutional affiliations
J Hardw Syst Secur (2019) 3397ndash425 425
or flushed by other malicious processes and it targets on theinclusive caches Each cache block is augmented with thecore valid bits (CVB) to indicate which private cache (pro-cess) it belongs to (similar to the Process ID) where CVBstores a bitmap and ith bit in the bitmap is set if the lineis present in ith corersquos private cache Cache hit is allowedamong different processesrsquo data When there is cache missand data needs to be evicted data not belonging to any cur-rent processes will be evicted first If there is no such datadata belonging to the same process will be evicted If thereis no existing data in the cache that is in the same processa random data in the cache set will be evicted This randomeviction will generate an interrupt to the OS to notify it ofa suspicious activity For pages that are read-only or exe-cutable SHARP cache disallows flushing using clf lush inuser mode However invalidating victimrsquos blocks by usingcache coherence protocol is still possible
Sanctum Cache [10] focuses on isolation of enclaves(equivalent to Trusted Software Module in other designs)from each other and the operating system (OS) In terms ofcaches they implement security features for L1 cache TLBand LLC Cache isolation of LLC is achieved by assigningeach enclave or OS to different DRAM address regionsIt uses page-coloring-based cache partitioning scheme [2444] and a software security monitor that ensures per-core isolation between OS and enclaves For L1 cacheand TLB when there is a transition between enclave andnon-enclave mode the security monitor will flush thecore-private caches to achieve isolation Normal flushestriggered by the enclave or the OS can only be done withinenclave or not within enclave code Also timing-based side-channel attacks exploiting cache coherence are explicitlynot prevented thus behavior on cache coherence operationsis not defined This cache listed extra software assumptionsas follows
Assumption 1 Software security monitor guarantees thatvictim and attacker process cannot share the same cacheblocks It uses page coloring [24 44] to ensure that victimand attackerrsquos memory is never mapped to the same cacheblocks for the LLCAssumption 2 The software runs on a system with asingle processor core where victim and attacker alternateexecution but can never run truly in parallel Moreoversecurity critical data is always flushed by the securitymonitor when program execution switches away from thevictim program for the L1 cache and the TLB
MI6 Cache [7] is part of the memory hierarchy of theMI6 processor which combines Sanctum [10] cachersquossecurity feature with disabling speculation during thespeculative execution of memory-related operations During
normal processor execution for L1 caches and TLB thecorresponding states will be flushed across context switchesbetween software threads For the LLC set partitioning isused to divide DRAM into contiguous regions And cachesets are guaranteed to be strictly partitioned (two DRAMregions cannot map to the same cache set) Each enclaveis only able to access its own partition Speculation issimply disabled when enclave interacts with the outsideworld because of small performance influence based on therare cases of speculation This cache listed extra softwareassumptions as follows
Assumption 1 Software security monitor guarantees thatthe victim and the attacker process cannot share the samecache blocks It uses page coloring [24 44] to ensure thatvictimrsquos and attackerrsquos memory are never mapped to thesame cache blocks for the LLCAssumption 2 The software runs on a system with asingle processor core where victim and attacker alternateexecution but can never run truly in parallel Moreoversecurity critical data is always flushed by the securitymonitor when program execution switches away from thevictim program for the L1 cache and the TLBAssumption 3 When an enclave is interacting with theoutside environment the corresponding speculation isdisabled by the software
InvisiSpec Cache [52] is able to make speculation invisiblein the data cache hierarchy Before a visibility pointshows up when all of its prior control flow instructionsresolve unsafe speculative loads (USL) will be put intoa speculative buffer (SB) without modifying any cachestates When reaching the visibility point there are twocases In one case the USL and successive instructionswill be possibly squashed because of mismatch of datain the SB and the up-to-date values in the cache Inanother case the core receives possible invalidation fromthe OS before checking of memory consistency modeland no comparison is needed When speculative executionhappens the hardware puts the data into SB as to identifyvisibility point for dealing with final state transition of thespeculative execution InvisiSpec cache targets on Spectre-like attacks and futuristic attacks However InvisiSpeccache is vulnerable to all non-speculative side channels
CATalyst Cache [29] uses partitioning especially CacheAllocation Technology (CAT) [21] available in the LLCof some Intel processors CAT allocates up to 4 differentClasses of Services (CoS) for separate cache ways sothat replacement of cache blocks is only allowed withina certain CoS CATalyst first uses CAT mechanism topartition caches into secure and non-secure parts (non-secure parts may map to 3 CoS while secure parts map
J Hardw Syst Secur (2019) 3397ndash425408
to 1 CoS) Secure pages are assigned to virtual machines(VMs) at a granularity of a page and not shared by morethan one VM Here attacker and victim reside in differentVMs Combined with CAT technology and pseudo-lockingmechanism which pins certain page frames managed bysoftware CATalyst guarantees that malicious code cannotevict secure pages CATalyst implicitly performs preloadingby remapping security-critical code or data to secure pagesFlushes can only be done within each VM And cachecoherence is achieved by assigning secure pages to only oneprocessor and not sharing pages among VMs This cachelisted extra software assumptions as follows
Assumption 1 Security critical data is always preloadedinto the cache at the beginning of the whole programexecutionAssumption 2 Security critical data is always able to fitwithin the secure partition of the cache That is all datain the range x can fit in the secure partitionAssumption 3 The victim and the attacker process cannotshare the same memory spaceAssumption 4 Use pseudo-locking mechanism by soft-ware to make sure that victim and attacker process cannotshare the same cache blocksAssumption 5 Secure pages are reloaded immediatelyafter the flush which is done by the virtual machinemonitor (VMM) to make sure all the secure pages are stillpinned in the secure partition
DAWG Cache [25] (Dynamically Allocated Way Guard)partitions the cache by cache ways and provides fullisolation for hits misses and metadata updates acrossdifferent protection domains (between the attacker and thevictim) DAWG cache is partitioned for the attacker and thevictim and each of them keep their own different domain id(which is similar to process ID used in general caches) Eachdomain id has its own bit fields one is called policy fillmapfor masking fills and selecting the victim to replace anotheris called policy hitmap for masking hit ways Only boththe tag and the domain id are the same will a cache hithappen Therefore DAWG allows read-only cache lines tobe replicated across ways for different protection domainFor a cache miss the victim can only be chosen withinthe ways belonging to the same domain id recorded bythe policy fillmap Consistently the replacement policy isupdated with the victim selection and the metadata derivedfrom the policy fillmap for different domains is updatedas well The paper also proposes the idea to dynamicallypartitions the cache ways following the systemrsquos workloadchanges but does not actually implement it
RIC Cache [22] (Relaxed Inclusion Caches) proposes a low-complexity cache to defend against eviction-based timing-based side-channel attacks on the LLC Normally for an
inclusive cache if the data R is in the LLC it is also inthe higher level cache and eviction of the R in the LLCwill cause the same data in the higher level cache eg L1cache to be invalidated making eviction-based attacks inthe higher level cache possible (eg attacker is able to evictvictimrsquos security critical cache line) For RIC each cacheline is extended with a single bit to set the relaxed inclusionOnce the relaxed inclusion is set for that cache line thecorresponding LLC line eviction will not cause the sameline in the higher-level cache to be invalidated Two kinds ofdata will be set relaxed inclusion bit read only data andthread private data when they are loaded into the cacheThese two kinds of data are claimed by the paper to cover allthe critical data for ciphers Therefore RIC will not preventwritable in-private critical data which is currently not foundin any ciphers Apart from that RIC requires flushing for thecorresponding cache lines in the cases that the RIC bits aremodified or for thread migration events to avoid the timingleakage during transition time
PL Cache [49] provides isolation by partitioning the cachebased on cache blocks It extends each cache block with aprocess ID and a lock status bit The process ID and thelock status bits are controlled by the extended load and storeinstructions (ld lockld unlock and st lockst unlock)which allow the programmer and compiler to set or reset thelock bit through use of the right load or store instructionIn terms of cache replacement policy for a cache hit PLcache will perform the normal cache hit handling procedureand the instructions with locking or unlocking capability canupdate the process ID and the lock status bits while the hit isprocessed When there is a cache miss locked data cannotbe evicted by data that is not locked and locked data amongdifferent processes cannot be evicted by each other In thiscase the new data will be either loaded or stored withoutcaching In other cases data eviction is possible This cachelisted extra software assumption as follows
Assumption 1 Security critical data is always preloadedinto the cache at the beginning of the whole programexecution
RP Cache [49] uses randomization to de-correlate thememory address accessing and timing of the cache Foreach block of RP cache there is a process ID and oneprotection bit P set to indicate if this cache block needs to beprotected or not A permutation table (PT) stores each cachesetrsquos pre-computed permuted set number and the numberof tables depends on number of protected processes Formemory access operations cache hits need both process IDand address to be the same When a cache miss happensto data D of a cache set S if the to-be-evicted data andto-be-brought-in data belong to the same process but have
J Hardw Syst Secur (2019) 3397ndash425 409
different protection bit arbitrary data of a random cache setSprime will be evicted and D will be accessed without cachingIf they belong to different processes D will be stored in anevicted cache block of Sprime and mapping of S and Sprime will beswapped as well Otherwise the normal replacement policyis executed
Newcache Cache [31 50] dynamically randomizesmemory-to-cache mapping It introduced a ReMappingTable (RMT) and the mapping between memory addressesand this RMT is as in a direct mapped cache while themapping between the RMT and actual cache is fully asso-ciative The index bits of memory address are used to lookup entries in the RMT to find the cache block that shouldbe accessed It stores the most useful cache lines rather thanhold a fixed set of cache lines This index stored in RMTcombined with the process ID is used to look up the actualcache where each cache line is associated with its real indexand process ID Each cache block is also associated witha protection bit (P) to indicate if it is security critical Forcache replacement policy it is very similar to RP cacheCache hit needs both process ID and address to be the sameWhen cache miss happens to data D arbitrary data willbe evicted and D will be accessed without caching if theybelong to the same process but either one of their protec-tion bit is set If the evicted data and brought-in data havedifferent process IDs D will randomly replace a cache linesince it is fully associative in the actual cache Otherwisethe normal replacement policy for direct mapped cache isexecuted
Random Fill Cache [30] de-correlates cache fills withthe memory access using random filling technique Newinstructions used by applications in Random Fill cachecan control if the requested data belongs to a normalrequest or a random fill request Cache hits are processedas in normal cache For the security critical data accessesof the victim a Nof ill request is executed and therequested data access will be performed without cachingMeanwhile on a Random Fill request arbitrary datafrom the range of addresses will be brought into the cacheIn the paper [30] the authors show that random fill ofspatially near data does not hurt performance For otherprocessesrsquo memory accesses and normal victimrsquos memoryaccesses Normal request will be used to achieve normalreplacement policy Victim and attacker are able to removevictimrsquos own security critical data including using clf lush
instructions or cache coherence protocol since the flush willnot influence timing-based side-channel attack prevention(the random filling technique is used for this)
CEASER Cache [38] is able to mitigate conflict-basedLLC timingndashbased side-channel attacks using address
encryption and dynamic remapping CEASER cache doesnot differentiate whom the address belongs to and whetherthe address is security critical When memory access triesto modify the cache state the address will first be encryptedusing Low-Latency BlockCipher (LLBC) [6] which notonly randomizes the cache set it maps but also scatters theoriginal possibly ordered and location-intensive addressesto different cache sets decreasing the probability of conflictmisses The encryption and decryption can be done withintwo cycles using LLBC Furthermore the encryption keywill be periodically changed to avoid key reconstructionThe periodic re-keying will cause the address remapping todynamically change
SCATTER Cache [51] uses cache set randomization toprevent timing-based attacks It builds upon two ideas Firsta mapping function is used to translate memory addressand process information to cache set indices the mappingis different for each program or security domain Secondthe mapping function also calculates a different index foreach cache way in a similar way to the skewed associativecaches [41] The mapping function can be keyed hash orkeyed permutation derivation functionmdasha different key isused for different application or security domain resultingin a different mapping from address to cache sets for eachSoftware (eg the operating system) is responsible formanaging the security domains and process IDs which areused to differentiate the different software and assign itdifferent keys for the mapping For the hardware extensiona cryptographic primitive such as hashing and an indexdecoder for each scattered cache way is added SCATTERcache also stores the index bits of the physical addressto efficiently perform lookups and writebacks There isalso one bit per page-table entry added to allow the kernelto communicate with the user space for security domainidentification
Non Deterministic Cache [23] uses cache access delay torandomize the relation between cache block access andcache access timing There is no differentiation of datacaching between different process ID or whether the datais secure or not A per-cache-block counter records theinterval of its data activeness and is increased on eachglobal counter clock tick when the data is untouched Whenthe counter reaches a predefined value the correspondingcache line will be invalidated Non Deterministic Cacherandomly sets the local countersrsquo initial value that is lessthan the maximum value of the global counter In thiscase the cache delay is changed to be randomized Cachedelay interval controlled by this non-deterministic executioncan lead to different cache hit and miss statistics becausethe invalidation is determined by the randomized counterof each cache line and therefore de-correlates any cache
J Hardw Syst Secur (2019) 3397ndash425410
access time from the address being accessed However theperformance degradation is tremendous
5 Analysis of the Secure Caches
In this section we manually evaluate the effectiveness ofthe 18 secure caches [7 10 12 22 23 25 27 29 30 3848ndash53 56 57] We analyze how well the different cachescan protect against the 72 types of vulnerabilities definedin Tables 2 and 3 which cover all the possible Strong(according to the definition in Section 3) cache timingndashbased vulnerabilities Following the analysis discuss whattypes of secure caches and features are best suited fordefending different types of timing-based attacks
51 Effectiveness of the Secure Caches AgainstTiming-Based Attacks
Tables 4 and 5 list the result of our analysis of whichcaches can prevent which types of attacks Some cachesare able to prevent certain vulnerabilities denoted by acheckmark and green color in the table For exampleSPlowast cache can defend against Vu Ad Vu (slow)(one type of Evict + Time [34]) vulnerability For someother caches and vulnerabilities the cache is not able toprevent the vulnerabilities and it is indicated by times and redcolor For example SecDCP cache cannot defend againstVu Va Vu (slow) (one type of Bernsteinrsquos Attack [3])vulnerability
Each cache is analyzed for each type of vulnerabilitylisted in Tables 2 and 3 A cache is judged to be able toprevent a type of cache timingndashbased vulnerability in threecases
1 A cache can prevent a timing attack if the timing ofthe last step in a vulnerability is always constant andthe attacker can never observe fast and slow timingdifference for the given set of three steps For instancein a regular set-associative cache the Vd Vu Aa
(fast) (one type of Flush + Reload [55]) vulnerabilitywill allow the attacker to know that address a mapsto secret u when the attacker observes fast timingcompared with observing slow timing in the other casesHowever in case of the RP cache [49] will make thetiming of the last step to be always slow because RPcache does not allow data of different processes toderive cache hit between each other
2 A cache can prevent a timing attack if the timingof last step is randomized and cannot have originalcorresponding relation between victimrsquos behavior andattackerrsquos observation For instance Ad Vu Ainv
d (fast) (one type of Prime + Probe Invalidation)
vulnerability when executed on a normal set-associativecache will allow the attacker to know that the addressd has the same index with secret u when observing fasttiming compared with slow timing in the other casesHowever when executing this attacks on the RandomFill cache [30] for example a slow timing will notdetermine that u and d have the same index as thesecret since in Random Fill cache u would be accessedwithout caching and another random data would becached instead
3 A cache can prevent a timing attack if it disallowscertain steps from the three-step model to be executedthus prevents the corresponding vulnerability Forinstance when PL cache [49] preloads and locks thesecurity critical data in the cache vulnerabilities suchas Ad Vu V inv
d (slow) (one type of Prime + TimeInvalidation) will not be possible since a preloadedlocked security critical data will not allow Ad in Step 1to replace it In this case Ad cannot be in the cache sothis vulnerability cannot be triggered in PL cache
From the security perspective the entries of the securecache in Tables 4 and 5 should have as many green coloredcells as possible If a cache design has any red cells thenit cannot defend against that type of vulnerabilitymdashattackerusing the timing-based vulnerability that corresponds to thered cell can attack the system
The third column in Tables 4 and 5 shows a normalset associative cache which cannot defend against anytype of timing-based vulnerabilities Meanwhile the lastcolumn of Tables 4 and 5 shows the situation where thecache is fully disabled As is expected the timing-basedvulnerabilities are eliminated and timing-based attacks willnot succeed Disabling caches however has tremendousperformance penalty Similarly second-to-last columnshows Nondeterministic Cache which totally randomizescache access time It can defend all the attacks but againwill have a tremendous cost to security when the applicationis complex
For each of the entry that shows the effectivenessof a secure cache against a vulnerability there are tworesults listed Left one is for normal execution and theright one is for speculative execution Some secure cachessuch as InvisiSpec cache target timing-based channels inspeculative execution For most of the caches that do notdifferentiate speculative execution and normal executionthe two sub-columns for each cache are the same
6 Secure Cache Techniques
Among the secure cache designs presented in the priorsection there are three main techniques that the caches
J Hardw Syst Secur (2019) 3397ndash425 411
Table4
Exi
stin
gse
cure
cach
esrsquo
prot
ectio
nag
ains
tal
lpo
ssib
letim
ing-
base
dvu
lner
abili
ties
with
last
step
tobe
mem
ory
acce
ssre
late
dop
erat
ions
Sin
gle
ina
gree
nce
llm
eans
this
cach
eca
npr
even
tth
eco
rres
pond
ing
vuln
erab
ility
i
na
pink
cell
mea
nsth
isca
che
can
prev
ent
the
corr
espo
ndin
gvu
lner
abili
tyin
som
ede
gree
A
timesin
are
dce
llm
eans
this
cach
eca
nnot
prev
ent
this
vuln
erab
ility
Fur
ther
mor
efo
rea
chca
che
we
anal
yze
norm
alex
ecut
ion
(lef
tcol
umn
unde
rth
eca
che
nam
e)an
dsp
ecul
ativ
eex
ecut
ion
(rig
htco
lum
nun
der
the
cach
ena
me)
a Dyn
amic
adju
stm
ento
fw
ays
for
diff
eren
tthr
eads
isas
sum
edto
bepr
oper
lyus
edac
cord
ing
toth
eru
nnin
gpr
ogra
mrsquos
cach
eus
age
bSo
me
soft
war
eas
sum
ptio
nslis
ted
inth
een
trie
sin
this
colu
mn
have
been
impl
emen
ted
byth
eca
chersquo
sre
late
dso
ftw
are
c Flus
his
disa
bled
but
cach
eco
here
nce
mig
htbe
used
todo
the
data
rem
oval
dFo
rL
1ca
che
and
TL
Bf
lush
ing
isdo
nedu
ring
cont
exts
witc
he T
hete
chni
ques
are
impl
emen
ted
inL
1ca
che
TL
Ban
dla
st-l
evel
cach
ew
hich
cons
isto
fth
ew
hole
cach
ehi
erar
chy
whe
reL
1ca
che
and
TL
Bre
quir
eso
ftw
are
flus
hpr
otec
tion
and
the
last
-lev
elca
che
can
beac
hiev
edby
sim
ple
hard
war
epa
rtiti
onin
gTo
prot
ecta
llle
vels
ofca
ches
the
soft
war
eas
sum
ptio
nsne
edto
bead
ded
f The
tech
niqu
eis
now
only
impl
emen
ted
inla
st-l
evel
cach
egT
hete
chni
que
now
only
targ
ets
shar
edca
che
hT
hete
chni
que
only
targ
ets
incl
usio
nla
st-l
evel
cach
ei T
hete
chni
que
targ
ets
data
cach
ehi
erar
chy
j For
the
last
-lev
elca
che
cach
eis
part
ition
edbe
twee
nth
evi
ctim
and
the
atta
cker
kT
hete
chni
que
can
cont
rolt
hepr
obab
ilitie
sof
the
vuln
erab
ility
tobe
succ
essf
ulto
beex
trem
ely
smal
ll T
hete
chni
que
can
wor
kin
shar
edr
ead
only
mem
ory
whi
leno
twor
king
insh
ared
wri
tabl
em
emor
ym
Ran
dom
dela
ybu
tnot
rand
omm
appi
ngca
non
lyde
crea
seth
epr
obab
ilitie
sof
atta
cker
inso
me
limite
dde
gree
J Hardw Syst Secur (2019) 3397ndash425412
Table5
Exi
stin
gse
cure
cach
esrsquop
rote
ctio
nag
ains
tall
poss
ible
timin
g-ba
sed
vuln
erab
ilitie
sw
ithla
stst
epto
bein
valid
atio
nre
late
dop
erat
ions
Sin
gle
ina
gree
nce
llm
eans
this
cach
eca
npr
even
tth
eco
rres
pond
ing
vuln
erab
ility
in
api
nkce
llm
eans
this
cach
eca
npr
even
tthe
corr
espo
ndin
gvu
lner
abili
tyin
som
ede
gree
Atimes
ina
red
cell
mea
nsth
isca
che
cann
otpr
even
tthi
svu
lner
abili
ty
Furt
herm
ore
for
each
cach
ew
ean
alyz
eno
rmal
exec
utio
n(l
eftc
olum
nun
der
the
cach
ena
me)
and
spec
ulat
ive
exec
utio
n(r
ight
colu
mn
unde
rth
eca
che
nam
e)
a Dyn
amic
adju
stm
ento
fw
ays
for
diff
eren
tthr
eads
isas
sum
edto
bepr
oper
lyus
edac
cord
ing
toth
eru
nnin
gpr
ogra
mrsquos
cach
eus
age
bSo
me
soft
war
eas
sum
ptio
nslis
ted
inth
een
trie
sin
this
colu
mn
have
been
impl
emen
ted
byth
eca
chersquo
sre
late
dso
ftw
are
c Flus
his
disa
bled
but
cach
eco
here
nce
mig
htbe
used
todo
the
data
rem
oval
dFo
rL
1ca
che
and
TL
Bf
lush
ing
isdo
nedu
ring
cont
exts
witc
he T
hete
chni
ques
are
impl
emen
ted
inL
1ca
che
TL
Ban
dla
st-l
evel
cach
ew
hich
cons
isto
fth
ew
hole
cach
ehi
erar
chy
whe
reL
1ca
che
and
TL
Bre
quir
eso
ftw
are
flus
hpr
otec
tion
and
the
last
-lev
elca
che
can
beac
hiev
edby
sim
ple
hard
war
epa
rtiti
onin
gTo
prot
ecta
llle
vels
ofca
ches
the
soft
war
eas
sum
ptio
nsne
edto
bead
ded
f The
tech
niqu
eis
now
only
impl
emen
ted
inla
st-l
evel
cach
egT
hete
chni
que
now
only
targ
ets
shar
edca
che
hT
hete
chni
que
only
targ
ets
incl
usio
nla
st-l
evel
cach
ei T
hete
chni
que
targ
ets
data
cach
ehi
erar
chy
j For
the
last
-lev
elca
che
cach
eis
part
ition
edbe
twee
nth
evi
ctim
and
the
atta
cker
kT
hete
chni
que
can
cont
rolt
hepr
obab
ilitie
sof
the
vuln
erab
ility
tobe
succ
essf
ulto
beex
trem
ely
smal
ll T
hete
chni
que
can
wor
kin
shar
edr
ead
only
mem
ory
whi
leno
twor
king
insh
ared
wri
tabl
em
emor
ym
Ran
dom
dela
ybu
tnot
rand
omm
appi
ngca
non
lyde
crea
seth
epr
obab
ilitie
sof
atta
cker
inso
me
limite
dde
gree
J Hardw Syst Secur (2019) 3397ndash425 413
utilize differentiating sensitive data partitioning andrandomization
Differentiating Sensitive Data (columns for CATalyst cacheto columns for Random Fill cache in Tables 4 and 5) allowsthe victim or attacker software or management software toexplicitly label a certain range of the data of victim whichthey think is sensitive The victim process or managementsoftware is able to use cache-specific instructions to protectthe data and limit internal interference between victimrsquosown data For example it is possible to disable victimrsquosown flushing of victimrsquos labeled data and therefore preventvulnerabilities that leverage flushing This technique allowsthe designer to have stronger control over security criticaldata rather than forcing the system to assume all of victimrsquosdata is sensitive However how to identify sensitive dataand whether this identification process is reliable are openresearch questions for caches that support differentiation ofsensitive data
This technique is independent of whether a cache usespartitioning or randomization techniques to eliminate sidechannels between the attacker and the victim Caches thatare able to label and identify sensitive data have theadvantage in preventing internal interference since they areable to differentiate sensitive data from the normal dataand can make use of special instructions to give moreprivileges to sensitive data However it requires careful usewhen identifying the actual sensitive data and implementingcorresponding security features on the cache
Comparing PL cache with SPlowast cache although bothof them use partitioning flush is able to be implementedto be disabled for victimrsquos sensitive data in PL cachewhere Vu V inv
a Vu (slow) (one type of Flush +Time) is prevented Newcache is able to prevent Vu Va Vu (slow) (one type of Bernsteinrsquos Attack [3])while most of the caches without ability to differentiatesensitive data cannot because Newcache disallows replacingdata as long as either data to be evicted or data to becached is identified to be sensitive However permittingdifferentiation of sensitive data can potentially backfire onthe cache itself For example Random Fill cache cannotprevent Vu Ad Vu (slow) (one type of Evict +Time [34]) which most of the other caches can prevent oravoid because the random fill technique loses its intendedrandom behavior when the security critical data is initiallyloaded into the cache in Step 1
Partitioning-Based Caches usually limit the victim and theattacker to be able to only access a limited set of cacheblock (columns for SPlowast cache to column for PL cachein Tables 4 and 5) For example either there is staticor dynamic partitioning of caches which allocates someblocks to High victim and Low attacker The partitioning
can be based not just on whether the memory access isvictimrsquos or attackerrsquos but also on where the access is to(eg High partition is determined by the data address) Forspeculative execution attackerrsquos code can be the part ofspeculation or out-of-order load or store which is able to bepartitioned (eg using speculative load buffer) from othernormal operations The partitioning granularity can be cachesets cache lines or cache ways Partitioning-based securecaches are usually able to prevent external interference bypartitioning but are weak at preventing internal interferenceWhen partitioning is used interference between the attackerand the victim or data belonging to different securitylevels should not be possible and attacks based on externalinterference between the victim and the attacker will failHowever the internal interference of victimrsquos own datais hard to be prevented by the partitioning-based cachesWhatrsquos more partitioning is recognized to be wastefulin terms of cache space and inherently degrades systemperformance [49] Dynamic partitioning can help limit thenegative performance and space impacts but it could be ata cost of revealing some information when adjusting thepartitioning size for each part It also does not help withinternal interference prevention
In terms of the three-step model the partitioning-basedcaches excel at making use of partitioning techniques todisallow the attacker to set initial states (Step 0) of victimpartition by use of flushing or eviction and therefore bringuncertainty to the final timing observation made by theattacker
SPlowast cache can prevent external miss-based interferencebut it still allows the victim and the attacker to getcache hits due to each otherrsquos data which makes hit-based vulnerabilities happen eg Vd Vu Va (fast)(one type of Cache Internal Collision [5]) vulnerabilityis one of the examples that SPlowast cache cannot preventSecVerilog cache is similar to SPlowast cache but prevents theattacker from directly getting cache hit due to victimrsquos datafor confidentiality and therefore prevents vulnerabilitiessuch as Ainv
a Vu Aa (fast) (one type of Flush+ Reload [55]) SHARP cache mainly uses partitioningcombined with random eviction to minimize the probabilityof evicting victimrsquos data and prevent external miss-basedvulnerabilities It is vulnerable to hit-based or internalinterference vulnerabilities such as Vu Va Vu (slow)(one type of Bernsteinrsquos Attack [3]) vulnerability DAWGcache will only allow the data to get a cache hit if bothits address and the process ID are the same Thereforecompared with normal partitioning cache such as SPlowast cacheit is able to prevent vulnerabilities such as Vd Vu Ainv
d
(fast) (one type of Prime + Flush)SecDCP and NoMo cache both leverage dynamic
partitioning to improve performance Compared withSecVerilog cache SecDCP cache introduces certain side
J Hardw Syst Secur (2019) 3397ndash425414
channels which manifest themselves when the number ofways assigned to the victim and attacker changes egVu Ainv
a Vu (slow) (one type of Flush + Time)vulnerability NoMo cache behaves more carefully whenchanging the number of ways during dynamic partitioninghowever it requires victimrsquos sensitive data to fit intothe assigned partitions otherwise it will be put into theunreserved way and allow eviction by the attacker SecDCPdoes not have unreserved way All the space in the cachewill be either belongs to High or Low partition
Sanctum cache and CATalyst cache are both controlledby a powerful software monitor and they disallow securepage sharing between victim and attacker to preventvulnerabilities such as Ad Vu Aa (fast) (onetype of Flush + Reload [55]) Sanctum cache does notconsider internal interference while CATalyst cache ismore carefully designed to prevent different vulnerabilitieswith the implemented software system so far supportingpreventing all of the vulnerabilities but only works forLLC and with high software implementation complexityand some assumptions that might be hard to achievein other scenarios eg assuming the secure partition isbig enough to fit all the secure data MI6 cache is thecombination of Sanctum and disabling speculation wheninteracting with the outside world Therefore in normalexecution it behaves the same as Sanctum For speculativeexecution because it will simply disable all the speculationwhen involving the outside world the external interferencevulnerability such as Vd Vu Ad (slow) (one type ofEvict + Probe) vulnerability will be prevented
InvisiSpec cache does not modify the original cachestate but places the data in a speculative buffer partitionduring the speculation or out-of-order load or store Sinceduring speculation cache state is not actually updated thespeculative execution cannot trigger any of the steps inthe three-step model RIC cache focuses on eviction basedattack and therefore are good at preventing even someinternal miss-based vulnerability such as Vu Va Vu
(slow) (one type of Bernsteinrsquos Attack [3]) but are bad atall hit-based vulnerabilities PL cache is line-partitionedand uses locking techniques for victimrsquos security criticaldata It can prevent many vulnerabilities because preloadingand locking secure data disallow the attacker or non-securevictim data to set initial states (Step 0) for victim partitionand therefore brings uncertainty to the final observation bythe attacker eg Ad Vu Va (fast) (one type of CacheInternal Collision [5]) vulnerability is prevented
Randomization-Based Caches (columns for SHARP cacheand columns for RP cache to columns for Non Deterministiccache in Tables 4 and 5) inherently de-correlate therelationship between information of victimrsquos securitycritical datarsquos address and observed timing from cache
hit or miss or between the address and observed timingof flush or cache coherence operations For speculativeexecution they also de-correlate the relationship betweenthe address of the data being accessed during speculativeexecution or out-of-order load or store and the observedtiming from a cache hit or miss Randomization can beused when bringing data into the cache evicting data orboth Some designs randomize the address to cache setmapping As a result of the randomization the mutualinformation from the observed timing due to having ornot having data in the cache could be reduced to 0 ifrandomization is done on every memory access Somesecure caches use randomization to avoid many of the miss-based internal interference vulnerabilities However theymay still suffer from hit-based vulnerabilities especiallywhen the vulnerabilities are related to internal interferenceHowever randomization is also likewise recognized toincrease performance overheads [23] It also requires afast and secure random number generator Most of therandomization is cache-line-based and can be combinedwith differentiation of sensitive data to be more efficient
RP cache allows eviction between different sensitivedata which leaves vulnerabilities such as Vu Va Vu
(slow) (one type of Bernsteinrsquos Attack [3]) still possiblewhile Newcache prevents this Both of the RP cacheand Newcache are not able to prevent hit-based internal-interference vulnerabilities such as Ainv
a Vu Va
(fast) (one type of Cache Internal Collision [5]) RandomFill cache is able to use total de-correlation of memoryaccess and cache access of victimrsquos security critical datato prevent most of the internal and external interferenceHowever when security critical data is initially directlyloaded into the cache block for Step 1 Random Fill cachewill not randomly load security critical data and allowsvulnerabilities such as Vu V inv
a Vu (slow) (one typeof Flush + Time) vulnerability to exist CEASER cache usesencryption scheme plus dynamic remapping to randomizemapping from memory addresses to cache sets Howeverthis targets eviction-based attacks and cannot prevent hit-based vulnerabilities such as Va V inv
u V inva (fast)
(one type of Flush + Probe Invalidation) SCATTER cacheencrypts both the cache address and process ID whenmapping into different cache index to further prevent morehit-based vulnerabilities for shared and read only memoryNon Deterministic Cache totally randomizes timing ofcache accesses by adding delays and can prevent all attacks(but at tremendous performance cost)
61 Estimated Performance and Security Tradeoffs
Table 6 shows the implementation and performanceresults of the secure caches as listed by the designersin the different papers At the extreme end there is
J Hardw Syst Secur (2019) 3397ndash425 415
Table6
Exi
stin
gse
cure
cach
esrsquo
impl
emen
tatio
nm
etho
dpe
rfor
man
cep
ower
and
area
sum
mar
y
J Hardw Syst Secur (2019) 3397ndash425416
the Non Deterministic cache with random delay thesecure cache can prevent all the cache timingndashbasedvulnerabilities in some degreemdashwhile their paper reportsonly 7 degradation in performance we expect it tobe much more for more complex application than AESalgorithm Disabling caches eliminates the attacks but ata huge performance cost Normally a secure cache needsto sacrifice some performance in order to de-correlatememory access with the timing The secure caches thattend to be able to prevent more vulnerabilities usuallyhave weaker performance compared with other securecaches For example more security seems to imply lessperformance
62 Towards Ideal Secure Cache
Based on the above analysis a good secure cache shouldconsider all the 72 types of Strong vulnerabilities egexternal and internal interference and hit-based and miss-based vulnerabilities Considering all factors and basedon Tables 4 and 5 we have several suggestions andobservations for a secure cache design which can defendtiming-based attacks
bull Internal interference is important for caches to preventtiming-based attacks and is the weak point of most ofthe secure caches To prevent this the following threesubpoints should be considered
ndash Miss-based internal interference can be solvedby randomly evicting data to de-correlatememory access with timing information wheneither data to be evicted or data to be cachedis sensitive eg Newcache prevents Vu Va Vu (slow) (one type of BernsteinrsquosAttack [3]) vulnerability
ndash Hit-based internal interference can be solvedby randomly bringing data into the cache egRandom Fill cache prevents Ad Vu Va
(fast) (Cache Internal Collision) vulnerabilityndash To limit internal interference at lower perfor-
mance cost rather than simply assume all ofvictimrsquos data is sensitive it is better to differ-entiate real sensitive data from other data in thevictim code However identification of sen-sitive information needs to be carefully usedeg Random Fill cache is vulnerable to Vu Ad Vu (fast) (one type of Evict + Time [34])vulnerability which most of the secure cachesare able to prevent
bull Direct partitioning between the victim and the attackeralthough may hurt cache space utilization or perfor-mance is good at disallowing attacker to set known
initial state to victimrsquos partition and therefore pre-vents external interference Alternatively careful use ofrandomization can also prevent external interference
It should be noted that some cache designs only focuson certain levels eg CATalyst cache only works at thelast level cache In order to fully protect the whole cachesystem from timing-based attacks all levels of caches in thehierarchy should be protected with related security featuresFor example Sanctum is able to prevent all levels of cachesfrom L1 to last-level cache Consequently secure cachedesign needs to be realizable at all levels of the cachehierarchy
7 RelatedWork
There are a lot of existing attacks exploring timing-based cache channels eg [1 3 5 16ndash19 34 36 5455] Furthermore our recent paper [11] has summarizedcache timingndashbased side-channel vulnerabilities using athree-step model and inspired this work on checkingwhich vulnerability types are truly defeated by the securecaches in context of timing-based attacks In other workZhang and Lee [59] used finite-state machine to modelcache architectures and leveraged mutual information tomeasure potential side-channel leakage of the modeledcache architectures Meanwhile He and Lee [20] modeledinterference using probabilistic information flow graph andused attackerrsquos success probability to estimate differentcachesrsquo ability to defend against some cache timingndashbasedside-channel attacks However they did not explore allpossible vulnerabilities due to cache timingndashbased channels
There is also some other work focusing on cacheside-channel verification [13 14 47] Among theseCacheAudit [14] efficiently computes possible side-channelobservations using abstractions in a modular way Bit-level and arithmetic reasoning is used in [13] for memoryaccesses in the presence of dynamic memory allocationCacheD [47] detects potential cache differences at eachprogram point leveraging symbolic execution and constraintsolving
Hardware transactional memory has also been lever-aged to prevent timing-based cache side-channel attacks [815] Hardware transactional memory (HTM) is availableon modern commercial processors such as Intelrsquos Transac-tional Synchronization Extensions (TSX) Its main featureis to abort the transaction and roll back the modificationswhenever a cache block contained in the read set or writeset is evicted out of the cache In [15] HTM was com-bined with preloading strategy for code and data to preventFlush + Reload attacks in the local setting and Prime andProbe attacks in the cloud setting In [8] the software-level
J Hardw Syst Secur (2019) 3397ndash425 417
solution targets system calls page faults code refactoringand abort reasoning to eliminate not only Prime + ProbeFlush + Reload but also Evict + time and Cache Collisionattacks
8 Conclusion
This paper first proposed a new three-step model in orderto model all possible cache timing vulnerabilities It furtherprovided a cache three-step simulator and reduction rulesto derive effective vulnerabilities allowing us to find onesthat have not been exploited in literature With exhaustiveeffective vulnerability types listed this paper presentedanalysis of 18 secure processor cache designs with respectto how well they can defend against these timing-basedvulnerabilities Our work showed that vulnerabilities basedon internal interference of the victim application aredifficult to protect against and many secure cache designsfail in this We also provided a summary of secure processorcache features that could be integrated to make an idealsecure cache that is able to defend timing-based attacksOverall implementing a secure cache in a processor can be aviable alternative to defend timing-based attacks Howeverit requires design of an ideal secure cache or correction ofexisting secure cache designs to eliminate the few attacksthat they do not protect against
Acknowledgments This work was support in part by NSF grants1651945 and 1813797 and through Semiconductor Research Corpo-ration (SRC) under GRC Task 2844001
Appendix A Attack Strategies Descriptions
This appendix gives overview of the attack strategies shownin Tables 2 and 3 in Section 3 For each attack strategy anoverview of the three steps of the strategy is given Some ofthe strategies are similar and some may not be precise butwe keep and use the original names as they were assignedin prior work One advantage of our three-step model isthat it gives precise definition of each attack Neverthelessthe attack strategy names used before (and added by us forstrategies which did not have such names) may be useful torecall the attacksrsquo high-level operation
Cache Internal Collision In Step 1 cache blockrsquos data isinvalidated by flushing or eviction done by either theattacker or the victim Then the victim accesses secret datain Step 2 Finally the victim accesses data at a knownaddress in Step 3 if there is a cache hit then it reveals thatthere is an internal collision and leaks value of u
Flush + Reload In Step 1 either the attacker or the victiminvalidates the cache blockrsquos data by flushing or evictionThen the victim access secret data in Step 2 Finally theattacker tries to access some data in Step 2 using a knownaddress If a cache hit is observed then addresses from lasttwo steps are the same and the attacker learns the secretaddress This strategy has similar Step 1 and Step 2 asCache Internal Collision vulnerability but for Step 3 it isthe attacker who does the reload access
Reload + Time (New Name Assigned in this Paper) In Step
1 secret data is invalidated by the victim Then the attackerdoes some known data access in Step 2 that could possiblybring back the invalidated the victimrsquos secret data in Step
1 In Step 3 if the victim reloads the secret data a cachehit is observed and the attacker can derive the secret datarsquosaddress
Flush + Probe (New Name Assigned in this Paper) In Step
1 the victim or the attacker access some known addressIn Step 2 the victim invalidates secret data In Step 3reloading of Step 1rsquos data and observation of a cache misswill help the attacker learn that the secret data maps to theknown address from Step 1
Evict + Time In Step 1 some victimrsquos secret data is put intothe cache by the victim itself In Step 2 the attacker evicts aspecific cache set by performing a memory related operationthat is not a flush In Step 3 the victim reloads secret dataand if a cache miss is observed the will learn the secretdatarsquos cache set information This attack has similar Step 1and Step 3 as Flush + Time vulnerability but for Step 2 inEvict + Time the attacker invalidates some known addressallowing it to find the full address of the secret data insteadof evicting a cache set to only find the secret datarsquos cacheindex as in the Flush + Time attack
Prime + Probe In Step 1 the attacker primes the cache setusing data at address known to the attacker In Step 2 thevictim accesses the secret data which possibly evicts datafrom Step 1 In Step 3 the attacker probes each cache setand if a cache miss is observed the attacker knowns thesecret data maps to the cache set he or she primed
Bernsteinrsquos Attack This attack strategy leverages the vic-timrsquos internal interference to trigger the miss-based attackFor one case the victim does the same secret data access inStep 1 and Step 3 while in Step 2 the victim tries to evictone whole cache setrsquos data by known data accesses If cachemiss is observed in Step 3 that will tell the attacker thecache set is the one secret data maps to For another casethe victim primes and probe a cache set in Step 1 and Step
3 driven by the attacker while in Step 2 the victim tries to
J Hardw Syst Secur (2019) 3397ndash425418
access the secret data Similar to the first case observingcache miss in Step 3 tells the attacker the cache set is theone secret data maps to
Evict + Probe (NewName Assigned in this Paper) In Step 1victim evict the cache set using the access to a data at anaddress known to the attacker In Step 2 the victim accessessecret data which possibly evicts data from Step 1 In Step
3 the attacker probes each cache set using the same data asin Step 1 if a cache miss is observed the attacker knownsthe secret data maps to the cache set he or she primed Thisattack strategy has similar Step 2 and Step 3 as Prime +Probe attack but for Step 1 it is the victim that does theeviction accesses
Prime + Time (NewName Assigned in this Paper) In Step 1the attacker primes the cache set using access to data at anaddress known to the attacker In Step 2 the victim accessessecret data which possibly evicts data from Step 1 In Step
3 the victim probes each cache set using the same data Step
1 if a cache miss is observed the attacker knowns the secretdata maps to the cache set he or she primed in Step 1 Thisattack strategy has similar Step 1 and Step 2 as Prime +Probe attack but for Step 3 it is the victim that does theprobing accesses
Flush + Time (NewNameAssigned in this Paper) The victimaccesses the same secret data in Step 1 and Step 3 whilein Step 2 the attacker tries to invalidate data at a knownaddress If cache miss is observed in Step 3 that will tellthe attacker the data address he or she invalidated in Step 2maps to the secret data
Invalidation related (new names assigned in this paper)Vulnerabilities that have names ending with ldquoinvalidationrdquoin Table 3 correspond to the vulnerabilities that havethe same name (except for the ldquoinvalidationrdquo part) inTable 2 The difference between each set of correspondingvulnerabilities is that the vulnerabilities ending withldquoinvalidationrdquo use invalidation related operation in the laststep to derive the timing information rather than the normalmemory access related operations
Appendix B Soundness Analysisof the Three-StepModel
In this section we analyze the soundness of the three-stepmodel to demonstrate that the three-step model can coverall possible timing-based cache vulnerabilities in normalcaches If there is a vulnerability that is represented usingmore than three steps the steps can be reduced to only three
steps or a three-step sub-pattern can be found in the longerrepresentation
In the below analysis we use β to denote the numberof memory-related operations ie steps in a representationof a vulnerability We show that β = 1 is not sufficient torepresent a vulnerability β = 2 covers some vulnerabilitiesbut not all β = 3 represents all the vulnerabilities andβ gt 3 can be reduced to only three steps or a three-step sub-pattern can be found in the longer representation Knownaddresses refer to all the cache states that interferencewith the data a aalias and d Unknown address refersu An access to a known memory address is denoted asknown access operation and an invalidation of a knownmemory address is denoted as known inv operationThe known access operation and known inv operation
together make up not u operations An unknown memoryrelated operation (containing u) is denoted as u operation
B1 Patterns with β = 1
When β = 1 there is only one memory-relatedoperation and it is not possible to create interferencebetween memory-related operations since two memory-related operations are the minimum requirement for aninterference Furthermore β = 1 corresponds to the three-step pattern with both Step 1 and Step 2 being since thecache state gives no information and Step 3 being theone operation These types of patterns are all examined bythe cache three-step simulator and none of these types arefound to be effective Consequently a vulnerability cannotexit when β = 1
B2 Patterns with β = 2
When β = 2 it satisfies the minimum requirementof an interference for memory related operations andcorresponds to the three-step cases where Step 1 is and Step 2 and Step 3 are the two operations Thesetypes are all examined by the cache three-step simulatorand some of them belong to Weak Vulnerabilities like Aa Vu Therefore three-step cases where Step
1 is have corresponding effective vulnerabilities shownin Table 2 Consequently β = 2 can represent someweak vulnerabilities but not all vulnerabilities as there existsome that are represented with three steps as discussednext
B3 Patterns with β = 3
When β = 3 we have tested all possible combinations ofthree-step memory related operations in Section 33 usingour cache simulator for the three-step model We foundthat there are in total 72 types of Strong Vulnerabilities
J Hardw Syst Secur (2019) 3397ndash425 419
and 64 types of Weak Vulnerabilities that are representedby patterns with β = 3 steps Consequently β = 3can represent all the vulnerabilities (including some weakones where Step 1 is ) Using more steps to representvulnerabilities is not necessary as discussed next
B4 Patterns with β gt 3
When β gt 3 the pattern of memory-related operations fora vulnerability can be reduced using the following rules
B41 Subdivision Rules
First a set of subdivision rules is used to divide the longpattern into shorter patterns following the below rulesEach subdivision rule should be applied recursively beforeapplying the next rule
Subdivision Rule 1 If the longer pattern contains a sub-pattern such as the longer pattern canbe divided into two separate patterns where is assignedas Step 1 of the second pattern This is because givesno timing information and the attacker loses track of thecache state after This rule should be recursively applieduntil there are no sub-patterns left with a in the middleor as last step ( in the last step will be deleted) in thelonger pattern
Subdivision Rule 2 Next if a pattern (derived afterrecursive application of the Rule 1 contains a sub-patternsuch as AinvVinv the longer pattern canbe divided into two separate patterns where AinvVinv isassigned as Step 1 of the second pattern This is becauseAinvVinv will flush all the timing information of thecurrent block and it can be used as the flushing step forStep 1 eg vulnerability Ainv Vu Aa(f ast)shown in Table 2 AinvVinv cannot be a candidate formiddle steps or the last step because it will flush alltiming information making the attacker unable to deducethe final timing with victimrsquos sensitive address translationinformation This rule should be recursively applied untilthere are no sub-patterns left with a AinvVinv in themiddle or the last step (AinvVinv in the last step will bedeleted)
B42 Simplification Rules
For each of the patterns resulting from the subdivisionof the original pattern we define Commute Rules UnionRules and Reduction Rules for a each set of two adjacentsteps in these remaining patterns In Table 7 we showall the possible cases of the rule applying conditions for
each adjacent two steps regardless of the attackerrsquos access(A) or the victimrsquos access (V ) The table shows whetherthe corresponding two steps can be commuted reduced orunioned (and the reduced or the unioned result if the rulescan be applied)
B421 Commute Rules
Suppose there are two adjacent steps M and N for a memorysequences M N If commuting M andN lead to the same observation result ie M N and N M will have thesame timing observation information in the final step for theattacker we can freely exchange the place of M and N inthis pattern In this case we have more chance to Reduceand Union the steps within the memory sequence by thefollowing Rules In the possible commuting process we willtry every possible combinations to commute different pairsof two steps that are able to apply the Commute Rules andthen further apply Reduce Rules and Union Rules to seewhether the commute is effective ie there can be stepsreduced or unioned after the proper commuting process Thefollowing two adjacent memory-related operations can becommuted
ndash Commute Rule 1 For two adjacent steps if one stepis a known access operation and another step is aknown inv operation and the addresses they referto are different these two steps can be commuted nomatter which position of the two steps they are in withinthe whole memory sequence It will show a ldquoyesrdquo forthe corresponding two-step pattern for the CommuteRule 1 column if these two can be commuted in Table 7
ndash Commute Rule 2 A superset of two-step patterns thatcan apply Commute Rule 1 can be commuted if thesecond step of these two adjacent steps is not thelast step in the whole memory sequence There aresome two adjacent steps that can only be commutedif the second step of these two adjacent steps is notthe last step in the whole memory sequence Therewill be a ldquoyesrdquo for the corresponding two-step patternfor the Commute Rule 2 column and a ldquonordquo for thecorresponding two-step pattern for the Commute Rule 1column in Table 7
B422 Reduction Rules
If the memory sequence after applying Commute Ruleshave a sub-pattern that has two adjacent steps both relatedto known addresses or both related to unknown address(including repeating states) the two adjacent steps can bereduced to only one following the reduction rules (if thetwo-step pattern has ldquoyesrdquo for the Column ldquoUnion Rule or
J Hardw Syst Secur (2019) 3397ndash425420
Table7
Rul
esfo
rco
mbi
ning
two
adja
cent
step
s
Firs
tSe
cond
Com
mute
Com
mute
Union
Rule
orC
ombi
ned
Firs
tSe
cond
Com
mute
Com
mute
Union
Rule
orC
ombi
ned
Rule1
Rule2
ReduceRule
Step
Rule1
Rule2
ReduceRule
Step
aa
yes
yes
yes
aa
inv
ano
noye
sa
aa
ali
as
nono
yes
aali
as
ain
va
ali
as
yes
yes
yes
aali
as
ad
nono
yes
da
inv
dye
sye
sye
sd
au
nono
nominus
ain
vu
nono
nominus
aa
inv
nono
yes
ain
va
inv
ain
vye
sye
sye
sa
inv
aa
ali
asin
vye
sye
sye
sa
ain
va
ali
asin
vye
sye
sye
sU
nio
n(a
inva
ali
asin
v)
ad
inv
yes
yes
yes
aa
inv
din
vye
sye
sye
sU
nio
n(a
invd
inv)
au
inv
nono
nominus
ain
vu
inv
noye
sno
minusa
ali
as
ano
noye
sa
aali
asin
va
yes
yes
yes
a
aali
as
aali
as
yes
yes
yes
aali
as
aali
asin
va
ali
as
nono
yes
aali
as
aali
as
dno
noye
sd
aali
asin
vd
yes
yes
yes
d
aali
as
uno
nono
minusa
ali
asin
vu
nono
nominus
aali
as
ain
vye
sye
sye
sa
ali
as
aali
asin
va
inv
yes
yes
yes
Unio
n(a
inva
ali
asin
v)
aali
as
aali
asin
vno
noye
sa
ali
asin
va
ali
asin
va
ali
asin
vye
sye
sye
sa
ali
asin
v
aali
as
din
vye
sye
sye
sa
ali
as
aali
asin
vd
inv
yes
yes
yes
Unio
n(d
inva
ali
asin
v)
aali
as
uin
vno
nono
minusa
ali
asin
vu
inv
noye
sno
minusd
ano
noye
sa
din
va
yes
yes
yes
a
da
ali
as
nono
yes
aali
as
din
va
ali
as
yes
yes
yes
aali
as
dd
yes
yes
yes
dd
inv
dno
noye
sd
du
nono
nominus
din
vu
noye
sno
minusd
ain
vye
sye
sye
sd
din
va
inv
yes
yes
yes
Unio
n(a
invd
inv)
da
ali
asin
vye
sye
sye
sd
din
va
ali
asin
vye
sye
sye
sU
nio
n(d
inva
ali
asin
v)
dd
inv
nono
yes
din
vd
inv
din
vye
sye
sye
sd
inv
du
inv
noye
sno
minusd
inv
uin
vno
yes
nominus
ua
nono
nominus
uin
va
nono
nominus
ua
ali
as
nono
nominus
uin
va
ali
as
nono
nominus
ud
nono
nominus
uin
vd
noye
sno
minusu
uye
sye
sye
su
uin
vu
nono
yes
u
ua
inv
nono
nominus
uin
va
inv
noye
sno
minusu
aali
asin
vno
nono
minusu
inv
aali
asin
vno
yes
nominus
ud
inv
noye
sno
minusu
inv
din
vno
yes
nominus
uu
inv
nono
yes
uin
vu
inv
uin
vye
sye
sye
su
inv
J Hardw Syst Secur (2019) 3397ndash425 421
Reduce Rulerdquo and has no Union result for the ldquoCombinedSteprdquo Column in Table 7
ndash Reduction Rule 1 For two u operations although u isunknown both of the accesses target on the same u socan be reduced to only keep the second access in thememory sequence
ndash Reduction Rule 2 For two known adjacent memoryaccessndashrelated operations (known access operation)they always result in a deterministic state of the secondmemory accessndashrelated cache block so these two stepscan be reduced to only one step
ndash Reduction Rule 3 For two adjacent steps if onestep is known access operation and another one isknown inv operation no matter what order theyhave and the address they refer to is the same these twocan be reduced to one step which is the second step
B423 Union Rules
Suppose there are two adjacent steps M and N for memorysequences M N If combining M andN leads to the same timing observation result ie M N and Union(M N) will havethe same timing observation information in the final step forthe attacker we can combine step M and N to be a joint onestep for this memory sequence defined as Union(M N)Two adjacent steps that can be combined are discussed inthe following cases
ndash Union Rule 1 Two invalidations to two knowndifferent memory addresses can be applied UnionRule 1 known inv operation are two operations bothinvalidating some known address therefore they canbe combined to only one step The Union Rule can becontinuously done to union all the adjacent invalidationstep that invalidates known different memory addresses
B424 Final Check Rules
Each long memory sequence will recursively apply thesethree categorizations of the rules in the order Com-mute Rules first to put known access operations andknown inv operation that targets the same address as nearas possible and u operations and not u operations areputting together as much as possible The Reduced Rulesare then checked and applied to the processed memorysequence to reduce the steps Then the Union Rule isapplied to the processed memory sequence
The recursion at each application to these threecategorizations of the rules should be always applied andreduce at least one step until the resulting sequence matchesone of the two possible cases
bull the long (β gt 3) memory sequence with u operation
and not u operation is further reduced to a sequencewhere there are at most three steps in the followingpatterns or less
ndash u operation not u operation u operation
ndash not u operation u operation not u operation
There might be possible extra or AinvV inv beforethese three-step pattern where
bull An extra in the first step will not influencethe result and can be directly removed
bull If an extra AinvV inv in the first step
ndash If followed by known access ope-ration AinvV inv can be removeddue to the actual state further put intothe cache block
ndash If followed by known inv opera-t ion or V inv
u AinvV inv can also beremoved since the memory locationis repeatedly flushed by the two steps
ndash If followed by Vu worst case will beAinvV inv Vu not u operation u operationwhich is either an effective vul-nerability according to Table 2and reduction rules shown inSection 33 or AinvV inv Vu Ainv
d V invd u operation where
Vu Ainvd V inv
d can further beapplied Commute Rule 2 to reduceand be within three steps
In this case the steps are finally within three steps andthe checking is done
bull There exist two adjacent steps that cannot be appliedany Rules above and requires the Rest Checking
The only left two adjacent steps that cannot be appliedby any of the three categorizations of the Rules are thefollowing
ndash AaVaAaalias Vaalias AdVdAinva V inv
a
Ainvaalias V inv
aalias Vu ndash AaVaAaalias Vaalias V inv
u ndash Vu AaVaAaalias Vaalias AdVd
Ainva V inv
a Ainvaalias V inv
aalias ndash V inv
u AaVaAaalias Vaalias We manually checked all of the two adjacent step patterns
above and found that adding extra step before or after these
J Hardw Syst Secur (2019) 3397ndash425422
Algorithm 2 -Step ( 3) pattern reduction
Input number of steps of the pattern
a two-dimensional dynamic-size array [0] contains the states of each step of the original pattern in
order [1] [2] are empty initially
Output reduced effective vulnerability pattern(s) array It will be an empty list if the original pattern does not
correspond to an effective vulnerability
1 = Oslash
2 while contain( ) and index not 0 do3 = 1 ( )
4 end while5 while ( contain( ) and index not 0) or ( contain( ) and index not 0) do6 = 2 ( )
7 end while8 while ( is ineffecitve or has interval effective three steps) do9 = ( )
10 = ( )
11 = ( )
12 if ( is ineffecitve or has interval effective three steps) then13 += ( )
14 end if15 end while16 return
two steps can either generate two adjacent step patterns thatbe processed by the three Rules where further step can bereduced or construct effective vulnerability according toTable 2 and reduction rules shown in Section 33 wherethe corresponding pattern can be treated effective and thechecking is done
B43 Algorithm for Reducing and CheckingMemorySequence
Algorithm 2 is used to (i) reduce a β-step (β gt 3)pattern to a three-step pattern thus demonstrating that thecorresponding β gt 3 step pattern actually is equivalent tothe output three-step pattern and represents a vulnerabilitythat is captured by an existing three-step pattern or (ii)demonstrate that the β-step pattern can be mapped to one ormore three-step vulnerabilities It is not possible for a β-stepvulnerability pattern to not be either (i) or (ii) after doing theRule applications Key outcome of our analysis is that anyβ-step pattern is not a vulnerability or if it is a vulnerabilityit maps to either outputs (i) or (ii) of the algorithm
Inside the Algorithm 2 contain() represents a function tocheck if a list contains a corresponding state is ineffective()represents a function that checks the corresponding memorysequence does not contain any effective three-stepshas interval effective three steps() represents a functionthat check if the corresponding memory sequence can bemapped to one or more three-step vulnerabilities
B44 Summary
In conclusion the three-step model can model all possibletiming-based cache vulnerability in normal caches Vulner-abilities which are represented by more than three steps canbe always reduced to one (or more) vulnerabilities from ourthree-step model and thus using more than three step is notnecessary
References
1 Acıicmez O Koc CK (2006) Trace-driven cache attacks on AES(short paper) In International conference on information andcommunications security Springer pp 112ndash121
2 Agrawal D Archambeault B Rao JR Rohatgi P (2002) TheEM side-channel (s) In International workshop on cryptographichardware and embedded systems Springer pp 29ndash45
3 Bernstein DJ (2005) Cache-timing attacks on AES4 Binkert N Beckmann B Black G Reinhardt SK Saidi A Basu
A Hestness J Hower DR Krishna T Sardashti S et al (2011)The gem5 simulator ACM SIGARCH computer architecture news39(2)1ndash7
5 Bonneau J Mironov I (2006) Cache-collision timing attacksagainst AES In International workshop on cryptographichardware and embedded systems Springer pp 201ndash215
6 Borghoff J Canteaut A Guneysu T Kavun EB Knezevic MKnudsen LR Leander G Nikov V Paar C Rechberger Cet al (2012) Princendasha low-latency block cipher for pervasivecomputing applications In International conference on the theoryand application of cryptology and information security Springerpp 208ndash225
J Hardw Syst Secur (2019) 3397ndash425 423
7 Bourgeat T Lebedev I Wright A Zhang S Devadas S et al (2019)MI6 Secure enclaves in a speculative out-of-order processorIn Proceedings of the 52nd Annual IEEEACM InternationalSymposium on Microarchitecture ACM pp 42ndash56
8 Chen S Liu F Mi Z Zhang Y Lee RB Chen H Wang X(2018) Leveraging hardware transactional memory for cache side-channel defenses In Proceedings of the 2018 on Asia conferenceon computer and communications security ACM pp 601ndash608
9 Clark SS Ransford B Rahmati A Guineau S Sorber J Xu WFu K (2013) Wattsupdoc power side channels to nonintrusivelydiscover untargeted malware on embedded medical devices InPresented as part of the 2013 USENIX workshop on healthinformation technologies
10 Costan V Lebedev IA Devadas S (2016) Sanctum minimalhardware extensions for strong software isolation In USENIXsecurity symposium pp 857ndash874
11 Deng S Xiong W Szefer J (2018) Cache timing side-channel vulnerability checking with computation tree logic InProceedings of the 7th international workshop on hardware andarchitectural support for security and privacy ACM p 2
12 Domnitser L Jaleel A Loew J Abu-Ghazaleh N Ponomarev D(2012) Non-monopolizable caches low-complexity mitigation ofcache side channel attacks ACM Transactions on Architectureand Code Optimization (TACO) 8(4)35
13 Doychev G Kopf B (2017) Rigorous analysis of softwarecountermeasures against cache attacks In Proceedings of the 38thACM SIGPLAN conference on programming language design andimplementation ACM pp 406ndash421
14 Doychev G Kopf B Mauborgne L Reineke J (2015) CacheAudita tool for the static analysis of cache side channels ACM Trans-actions on information and system security (TISSEC) 18(1)4
15 Gruss D Lettner J Schuster F Ohrimenko O Haller I CostaM (2017) Strong and efficient cache side-channel protectionusing hardware transactional memory In USENIX securitysymposium pp 217ndash233
16 Gruss D Maurice C Wagner K Mangard S (2016) Flush+ flusha fast and stealthy cache attack In International conference ondetection of intrusions and malware and vulnerability assessmentSpringer pp 279ndash299
17 Gruss D Spreitzer R Mangard S (2015) Cache template attacksautomating attacks on inclusive last-level caches In USENIXsecurity symposium pp 897ndash912
18 Guanciale R Nemati H Baumann C Dam M (2016) Cache stor-age channels alias-driven attacks and verified countermeasuresIn 2016 IEEE symposium on security and privacy (SP) IEEEpp 38ndash55
19 Gullasch D Bangerter E Krenn S (2011) Cache gamesndashbringingaccess-based cache attacks on AES to practice In 2011 IEEEsymposium on security and privacy (SP) IEEE pp 490ndash505
20 He Z Lee RB (2017) How secure is your cache against side-channel attacks In Proceedings of the 50th annual IEEEACMinternational symposium on microarchitecture ACM pp 341ndash353
21 Intel C (2015) Improving real-time performance by utilizing cacheallocation technology Intel Corporation
22 Kayaalp M Khasawneh KN Esfeden HA Elwell J Abu-Ghazaleh N Ponomarev D Jaleel A (2017) RIC relaxed inclusioncaches for mitigating LLC side-channel attacks In 2017 54thACMEDACIEEE design automation conference (DAC) IEEEpp 1ndash6
23 Keramidas G Antonopoulos A Serpanos DN Kaxiras S (2008)Non deterministic caches a simple and effective defense againstside channel attacks Des Autom Embed Syst 12(3)221ndash230
24 Kessler RE Hill MD (1992) Page placement algorithms for largerealindexed caches ACM Trans Comput Syst (TOCS) 10(4)338ndash359
25 Kiriansky V Lebedev I Amarasinghe S Devadas S Emer J(2018) DAWG a defense against cache timing attacks in spec-ulative execution processors In 2018 51st Annual IEEEACMinternational symposium on microarchitecture (MICRO) IEEEpp 974ndash987
26 Kocher P Genkin D Gruss D Haas W Hamburg M LippM Mangard S Prescher T Schwarz M Yarom Y (2018)Spectre attacks exploiting speculativeexecution In 2019 IEEESymposium on Security and Privacy (SP) IEEE pp 1ndash19
27 Lee RB Kwan P McGregor JP Dwoskin J Wang Z (2005)Architecture for protecting critical secrets in microprocessorsIn ACM SIGARCH computer architecture news vol 33 IEEEComputer Society pp 2ndash13
28 Lee Y Waterman A Avizienis R Cook H Sun C StojanovicV Asanovic K (2014) A 45nm 13 GHz 167 double-precisionGFLOPSW RISC-V processor with vector accelerators InEuropean solid state circuits conference (ESSCIRC) ESSCIRC2014-40th IEEE pp 199ndash202
29 Liu F Ge Q Yarom Y Mckeen F Rozas C Heiser G Lee RB(2016) CATalyst defeating last-level cache side channel attacks incloud computing In 2016 IEEE international symposium on highperformance computer architecture (HPCA) IEEE pp 406ndash418
30 Liu F Lee RB (2014) Random fill cache architecture In2014 47th annual IEEEACM international symposium onmicroarchitecture (MICRO) IEEE pp 203ndash215
31 Liu F Wu H Mai K Lee RB (2016) Newcache secure cachearchitecture thwarting cache side-channel attacks IEEE Micro36(5)8ndash16
32 Luk CK Cohn R Muth R Patil H Klauser A Lowney G WallaceS Reddi VJ Hazelwood K (2005) Pin building customizedprogram analysis tools with dynamic instrumentation In ACMsigplan notices vol 40 ACM pp 190ndash200
33 Masti RJ Rai D Ranganathan A Muller C Thiele L Capkun S(2015) Thermal covert channels on multi-core platforms In 24thUSENIX security symposium (USENIX security 15) pp 865ndash880
34 Osvik DA Shamir A Tromer E (2006) Cache attacks andcountermeasures the case of AES In Cryptographersrsquo track atthe RSA conference Springer pp 1ndash20
35 Patel A Afram F Chen S Ghose K (2011) MARSS Afull system simulator for multicore x86 CPUs In 2011 48thACMEDACIEEE design automation conference (DAC) IEEEpp 1050ndash1055
36 Percival C (2005) Cache missing for fun and profit BSDCan 2005Ottawa 2005 httpwwwdaemonologynetpapershttpdf
37 Hammarlund P Martinez AJ Bajwa AA Hill DL Hallnor EJiang H Dixon M Derr M Hunsaker M Kumar R Osborne RB(2014) Haswell The fourth-generation intel core processor IEEEMicro 1034(2)6ndash20
38 Qureshi MK (2018) CEASER mitigating conflict-based cacheattacks via encrypted-address and remapping In 2018 51Stannual IEEEACM international symposium on microarchitecture(MICRO) IEEE pp 775ndash787
39 Sanchez D Kozyrakis C (2013) ZSIm fast and accuratemicroarchitectural simulation of thousand-core systems In ACMSIGARCH computer architecture news vol 41 ACM pp 475ndash486
40 Schwarz M Schwarzl M Lipp M Gruss D (2018) NetSpectreread arbitrary memory over network In European Symposium onResearch in Computer Security Springer pp 279ndash299
41 Seznec A (1993) A case for two-way skewed-associative cachesACM SIGARCH computer architecture news 21(2)169ndash178
42 Sharkey J Ponomarev D Ghose K (2005) M-sim a flexiblemultithreaded architectural simulation environment Techenicalreport Department of Computer Science State University ofNew York at Binghamton In Technical Report 20012 CompaqComputer Corporation Aug 2001
J Hardw Syst Secur (2019) 3397ndash425424
43 Shivakumar P Jouppi NP (2001) Cacti 30 an integrated cachetiming power and area model In Technical Report 20012Compaq Computer Corporation Aug 2001
44 Taylor G Davies P Farmwald M (1990) The TLB slice-alow-cost high-speed address translation mechanism In 17thannual international symposium on computer architecture 1990Proceedings IEEE pp 355ndash363
45 Thoziyoor S Muralimanohar N Ahn JH Jouppi NP (2008)CACTI 51 Technical Report HPL-2008-20 HP Labs
46 Trippel C Lustig D Martonosi M (2018) MeltdownPrimeand SpectrePrime automatically-synthesized attacks exploitinginvalidation-based coherence protocols arXiv180203802
47 Wang S Wang P Liu X Zhang D Wu D (2017) CacheDidentifying cache-based timing channels in production softwareIn 26th USENIX security symposium USENIX association
48 Wang Y Ferraiuolo A Zhang D Myers AC Suh GE (2016)SecDCP secure dynamic cache partitioning for efficient timingchannel protection In 2016 53nd ACMEDACIEEE designautomation conference (DAC) IEEE pp 1ndash6
49 Wang Z Lee RB (2007) New cache designs for thwarting softwarecache-based side channel attacks In ACM SIGARCH computerarchitecture news vol 35 ACM pp 494ndash505
50 Wang Z Lee RB (2008) A novel cache architecture withenhanced performance and security In 2008 41st IEEEACMinternational symposium on Microarchitecture 2008 MICRO-41IEEE pp 83ndash93
51 Werner M Unterluggauer T Giner L Schwarz M Gruss D Man-gard S (2019) ScatterCache thwarting cache attacks via cache setrandomization In 28th USENIX security symposium (USENIXSecurity 19) USENIX Association Santa Clara CA httpswwwusenixorgconferenceusenixsecurity19presentationwerner
52 Yan M Choi J Skarlatos D Morrison A Fletcher C Torrellas J(2018) InvisiSpec making speculative execution invisible in thecache hierarchy In 2018 51st annual IEEEACM internationalsymposium on microarchitecture (MICRO) IEEE pp 428ndash441
53 Yan M Gopireddy B Shull T Torrellas J (2017) Secure hierarchy-aware cache replacement policy (SHARP) defending againstcache-based side channel attacks In Proceedings of the 44thannual international symposium on computer architecture ACMpp 347ndash360
54 Yao F Doroslovacki M Venkataramani G (2018) Are coherenceprotocol states vulnerable to information leakage In 2018IEEE international symposium on high performance computerarchitecture (HPCA) IEEE pp 168ndash179
55 Yarom Y Falkner K (2014) FLUSH+ RELOAD a high resolutionlow noise L3 cache side-channel attack In USENIX securitysymposium pp 719ndash732
56 Zhang D Askarov A Myers AC (2012) Language-based controland mitigation of timing channels ACM SIGPLAN Not 47(6)99ndash110
57 Zhang D Wang Y Suh GE Myers AC (2015) A hardware designlanguage for timing-sensitive information-flow security In ACMSIGARCH computer architecture news vol 43 ACM pp 503ndash516
58 Zhang S Wright A Bourgeat T Arvind A (2018) Composablebuilding blocks to open up processor design In 2018 51stannual IEEEACM international symposium on microarchitecture(MICRO) IEEE pp 68ndash81
59 Zhang T Lee RB (2014) New models of cache architecturescharacterizing information leakage from cache side channels InProceedings of the 30th annual computer security applicationsconference ACM pp 96ndash105
60 Zhang Y Parikh D Sankaranarayanan K Skadron K Stan M(2003) HotLeakage a temperature-aware model of subthresholdand gate leakage for architects University of Virginia Dept ofComputer Science Tech Report CS-2003 5
Publisherrsquos Note Springer Nature remains neutral with regard tojurisdictional claims in published maps and institutional affiliations
J Hardw Syst Secur (2019) 3397ndash425 425
to 1 CoS) Secure pages are assigned to virtual machines(VMs) at a granularity of a page and not shared by morethan one VM Here attacker and victim reside in differentVMs Combined with CAT technology and pseudo-lockingmechanism which pins certain page frames managed bysoftware CATalyst guarantees that malicious code cannotevict secure pages CATalyst implicitly performs preloadingby remapping security-critical code or data to secure pagesFlushes can only be done within each VM And cachecoherence is achieved by assigning secure pages to only oneprocessor and not sharing pages among VMs This cachelisted extra software assumptions as follows
Assumption 1 Security critical data is always preloadedinto the cache at the beginning of the whole programexecutionAssumption 2 Security critical data is always able to fitwithin the secure partition of the cache That is all datain the range x can fit in the secure partitionAssumption 3 The victim and the attacker process cannotshare the same memory spaceAssumption 4 Use pseudo-locking mechanism by soft-ware to make sure that victim and attacker process cannotshare the same cache blocksAssumption 5 Secure pages are reloaded immediatelyafter the flush which is done by the virtual machinemonitor (VMM) to make sure all the secure pages are stillpinned in the secure partition
DAWG Cache [25] (Dynamically Allocated Way Guard)partitions the cache by cache ways and provides fullisolation for hits misses and metadata updates acrossdifferent protection domains (between the attacker and thevictim) DAWG cache is partitioned for the attacker and thevictim and each of them keep their own different domain id(which is similar to process ID used in general caches) Eachdomain id has its own bit fields one is called policy fillmapfor masking fills and selecting the victim to replace anotheris called policy hitmap for masking hit ways Only boththe tag and the domain id are the same will a cache hithappen Therefore DAWG allows read-only cache lines tobe replicated across ways for different protection domainFor a cache miss the victim can only be chosen withinthe ways belonging to the same domain id recorded bythe policy fillmap Consistently the replacement policy isupdated with the victim selection and the metadata derivedfrom the policy fillmap for different domains is updatedas well The paper also proposes the idea to dynamicallypartitions the cache ways following the systemrsquos workloadchanges but does not actually implement it
RIC Cache [22] (Relaxed Inclusion Caches) proposes a low-complexity cache to defend against eviction-based timing-based side-channel attacks on the LLC Normally for an
inclusive cache if the data R is in the LLC it is also inthe higher level cache and eviction of the R in the LLCwill cause the same data in the higher level cache eg L1cache to be invalidated making eviction-based attacks inthe higher level cache possible (eg attacker is able to evictvictimrsquos security critical cache line) For RIC each cacheline is extended with a single bit to set the relaxed inclusionOnce the relaxed inclusion is set for that cache line thecorresponding LLC line eviction will not cause the sameline in the higher-level cache to be invalidated Two kinds ofdata will be set relaxed inclusion bit read only data andthread private data when they are loaded into the cacheThese two kinds of data are claimed by the paper to cover allthe critical data for ciphers Therefore RIC will not preventwritable in-private critical data which is currently not foundin any ciphers Apart from that RIC requires flushing for thecorresponding cache lines in the cases that the RIC bits aremodified or for thread migration events to avoid the timingleakage during transition time
PL Cache [49] provides isolation by partitioning the cachebased on cache blocks It extends each cache block with aprocess ID and a lock status bit The process ID and thelock status bits are controlled by the extended load and storeinstructions (ld lockld unlock and st lockst unlock)which allow the programmer and compiler to set or reset thelock bit through use of the right load or store instructionIn terms of cache replacement policy for a cache hit PLcache will perform the normal cache hit handling procedureand the instructions with locking or unlocking capability canupdate the process ID and the lock status bits while the hit isprocessed When there is a cache miss locked data cannotbe evicted by data that is not locked and locked data amongdifferent processes cannot be evicted by each other In thiscase the new data will be either loaded or stored withoutcaching In other cases data eviction is possible This cachelisted extra software assumption as follows
Assumption 1 Security critical data is always preloadedinto the cache at the beginning of the whole programexecution
RP Cache [49] uses randomization to de-correlate thememory address accessing and timing of the cache Foreach block of RP cache there is a process ID and oneprotection bit P set to indicate if this cache block needs to beprotected or not A permutation table (PT) stores each cachesetrsquos pre-computed permuted set number and the numberof tables depends on number of protected processes Formemory access operations cache hits need both process IDand address to be the same When a cache miss happensto data D of a cache set S if the to-be-evicted data andto-be-brought-in data belong to the same process but have
J Hardw Syst Secur (2019) 3397ndash425 409
different protection bit arbitrary data of a random cache setSprime will be evicted and D will be accessed without cachingIf they belong to different processes D will be stored in anevicted cache block of Sprime and mapping of S and Sprime will beswapped as well Otherwise the normal replacement policyis executed
Newcache Cache [31 50] dynamically randomizesmemory-to-cache mapping It introduced a ReMappingTable (RMT) and the mapping between memory addressesand this RMT is as in a direct mapped cache while themapping between the RMT and actual cache is fully asso-ciative The index bits of memory address are used to lookup entries in the RMT to find the cache block that shouldbe accessed It stores the most useful cache lines rather thanhold a fixed set of cache lines This index stored in RMTcombined with the process ID is used to look up the actualcache where each cache line is associated with its real indexand process ID Each cache block is also associated witha protection bit (P) to indicate if it is security critical Forcache replacement policy it is very similar to RP cacheCache hit needs both process ID and address to be the sameWhen cache miss happens to data D arbitrary data willbe evicted and D will be accessed without caching if theybelong to the same process but either one of their protec-tion bit is set If the evicted data and brought-in data havedifferent process IDs D will randomly replace a cache linesince it is fully associative in the actual cache Otherwisethe normal replacement policy for direct mapped cache isexecuted
Random Fill Cache [30] de-correlates cache fills withthe memory access using random filling technique Newinstructions used by applications in Random Fill cachecan control if the requested data belongs to a normalrequest or a random fill request Cache hits are processedas in normal cache For the security critical data accessesof the victim a Nof ill request is executed and therequested data access will be performed without cachingMeanwhile on a Random Fill request arbitrary datafrom the range of addresses will be brought into the cacheIn the paper [30] the authors show that random fill ofspatially near data does not hurt performance For otherprocessesrsquo memory accesses and normal victimrsquos memoryaccesses Normal request will be used to achieve normalreplacement policy Victim and attacker are able to removevictimrsquos own security critical data including using clf lush
instructions or cache coherence protocol since the flush willnot influence timing-based side-channel attack prevention(the random filling technique is used for this)
CEASER Cache [38] is able to mitigate conflict-basedLLC timingndashbased side-channel attacks using address
encryption and dynamic remapping CEASER cache doesnot differentiate whom the address belongs to and whetherthe address is security critical When memory access triesto modify the cache state the address will first be encryptedusing Low-Latency BlockCipher (LLBC) [6] which notonly randomizes the cache set it maps but also scatters theoriginal possibly ordered and location-intensive addressesto different cache sets decreasing the probability of conflictmisses The encryption and decryption can be done withintwo cycles using LLBC Furthermore the encryption keywill be periodically changed to avoid key reconstructionThe periodic re-keying will cause the address remapping todynamically change
SCATTER Cache [51] uses cache set randomization toprevent timing-based attacks It builds upon two ideas Firsta mapping function is used to translate memory addressand process information to cache set indices the mappingis different for each program or security domain Secondthe mapping function also calculates a different index foreach cache way in a similar way to the skewed associativecaches [41] The mapping function can be keyed hash orkeyed permutation derivation functionmdasha different key isused for different application or security domain resultingin a different mapping from address to cache sets for eachSoftware (eg the operating system) is responsible formanaging the security domains and process IDs which areused to differentiate the different software and assign itdifferent keys for the mapping For the hardware extensiona cryptographic primitive such as hashing and an indexdecoder for each scattered cache way is added SCATTERcache also stores the index bits of the physical addressto efficiently perform lookups and writebacks There isalso one bit per page-table entry added to allow the kernelto communicate with the user space for security domainidentification
Non Deterministic Cache [23] uses cache access delay torandomize the relation between cache block access andcache access timing There is no differentiation of datacaching between different process ID or whether the datais secure or not A per-cache-block counter records theinterval of its data activeness and is increased on eachglobal counter clock tick when the data is untouched Whenthe counter reaches a predefined value the correspondingcache line will be invalidated Non Deterministic Cacherandomly sets the local countersrsquo initial value that is lessthan the maximum value of the global counter In thiscase the cache delay is changed to be randomized Cachedelay interval controlled by this non-deterministic executioncan lead to different cache hit and miss statistics becausethe invalidation is determined by the randomized counterof each cache line and therefore de-correlates any cache
J Hardw Syst Secur (2019) 3397ndash425410
access time from the address being accessed However theperformance degradation is tremendous
5 Analysis of the Secure Caches
In this section we manually evaluate the effectiveness ofthe 18 secure caches [7 10 12 22 23 25 27 29 30 3848ndash53 56 57] We analyze how well the different cachescan protect against the 72 types of vulnerabilities definedin Tables 2 and 3 which cover all the possible Strong(according to the definition in Section 3) cache timingndashbased vulnerabilities Following the analysis discuss whattypes of secure caches and features are best suited fordefending different types of timing-based attacks
51 Effectiveness of the Secure Caches AgainstTiming-Based Attacks
Tables 4 and 5 list the result of our analysis of whichcaches can prevent which types of attacks Some cachesare able to prevent certain vulnerabilities denoted by acheckmark and green color in the table For exampleSPlowast cache can defend against Vu Ad Vu (slow)(one type of Evict + Time [34]) vulnerability For someother caches and vulnerabilities the cache is not able toprevent the vulnerabilities and it is indicated by times and redcolor For example SecDCP cache cannot defend againstVu Va Vu (slow) (one type of Bernsteinrsquos Attack [3])vulnerability
Each cache is analyzed for each type of vulnerabilitylisted in Tables 2 and 3 A cache is judged to be able toprevent a type of cache timingndashbased vulnerability in threecases
1 A cache can prevent a timing attack if the timing ofthe last step in a vulnerability is always constant andthe attacker can never observe fast and slow timingdifference for the given set of three steps For instancein a regular set-associative cache the Vd Vu Aa
(fast) (one type of Flush + Reload [55]) vulnerabilitywill allow the attacker to know that address a mapsto secret u when the attacker observes fast timingcompared with observing slow timing in the other casesHowever in case of the RP cache [49] will make thetiming of the last step to be always slow because RPcache does not allow data of different processes toderive cache hit between each other
2 A cache can prevent a timing attack if the timingof last step is randomized and cannot have originalcorresponding relation between victimrsquos behavior andattackerrsquos observation For instance Ad Vu Ainv
d (fast) (one type of Prime + Probe Invalidation)
vulnerability when executed on a normal set-associativecache will allow the attacker to know that the addressd has the same index with secret u when observing fasttiming compared with slow timing in the other casesHowever when executing this attacks on the RandomFill cache [30] for example a slow timing will notdetermine that u and d have the same index as thesecret since in Random Fill cache u would be accessedwithout caching and another random data would becached instead
3 A cache can prevent a timing attack if it disallowscertain steps from the three-step model to be executedthus prevents the corresponding vulnerability Forinstance when PL cache [49] preloads and locks thesecurity critical data in the cache vulnerabilities suchas Ad Vu V inv
d (slow) (one type of Prime + TimeInvalidation) will not be possible since a preloadedlocked security critical data will not allow Ad in Step 1to replace it In this case Ad cannot be in the cache sothis vulnerability cannot be triggered in PL cache
From the security perspective the entries of the securecache in Tables 4 and 5 should have as many green coloredcells as possible If a cache design has any red cells thenit cannot defend against that type of vulnerabilitymdashattackerusing the timing-based vulnerability that corresponds to thered cell can attack the system
The third column in Tables 4 and 5 shows a normalset associative cache which cannot defend against anytype of timing-based vulnerabilities Meanwhile the lastcolumn of Tables 4 and 5 shows the situation where thecache is fully disabled As is expected the timing-basedvulnerabilities are eliminated and timing-based attacks willnot succeed Disabling caches however has tremendousperformance penalty Similarly second-to-last columnshows Nondeterministic Cache which totally randomizescache access time It can defend all the attacks but againwill have a tremendous cost to security when the applicationis complex
For each of the entry that shows the effectivenessof a secure cache against a vulnerability there are tworesults listed Left one is for normal execution and theright one is for speculative execution Some secure cachessuch as InvisiSpec cache target timing-based channels inspeculative execution For most of the caches that do notdifferentiate speculative execution and normal executionthe two sub-columns for each cache are the same
6 Secure Cache Techniques
Among the secure cache designs presented in the priorsection there are three main techniques that the caches
J Hardw Syst Secur (2019) 3397ndash425 411
Table4
Exi
stin
gse
cure
cach
esrsquo
prot
ectio
nag
ains
tal
lpo
ssib
letim
ing-
base
dvu
lner
abili
ties
with
last
step
tobe
mem
ory
acce
ssre
late
dop
erat
ions
Sin
gle
ina
gree
nce
llm
eans
this
cach
eca
npr
even
tth
eco
rres
pond
ing
vuln
erab
ility
i
na
pink
cell
mea
nsth
isca
che
can
prev
ent
the
corr
espo
ndin
gvu
lner
abili
tyin
som
ede
gree
A
timesin
are
dce
llm
eans
this
cach
eca
nnot
prev
ent
this
vuln
erab
ility
Fur
ther
mor
efo
rea
chca
che
we
anal
yze
norm
alex
ecut
ion
(lef
tcol
umn
unde
rth
eca
che
nam
e)an
dsp
ecul
ativ
eex
ecut
ion
(rig
htco
lum
nun
der
the
cach
ena
me)
a Dyn
amic
adju
stm
ento
fw
ays
for
diff
eren
tthr
eads
isas
sum
edto
bepr
oper
lyus
edac
cord
ing
toth
eru
nnin
gpr
ogra
mrsquos
cach
eus
age
bSo
me
soft
war
eas
sum
ptio
nslis
ted
inth
een
trie
sin
this
colu
mn
have
been
impl
emen
ted
byth
eca
chersquo
sre
late
dso
ftw
are
c Flus
his
disa
bled
but
cach
eco
here
nce
mig
htbe
used
todo
the
data
rem
oval
dFo
rL
1ca
che
and
TL
Bf
lush
ing
isdo
nedu
ring
cont
exts
witc
he T
hete
chni
ques
are
impl
emen
ted
inL
1ca
che
TL
Ban
dla
st-l
evel
cach
ew
hich
cons
isto
fth
ew
hole
cach
ehi
erar
chy
whe
reL
1ca
che
and
TL
Bre
quir
eso
ftw
are
flus
hpr
otec
tion
and
the
last
-lev
elca
che
can
beac
hiev
edby
sim
ple
hard
war
epa
rtiti
onin
gTo
prot
ecta
llle
vels
ofca
ches
the
soft
war
eas
sum
ptio
nsne
edto
bead
ded
f The
tech
niqu
eis
now
only
impl
emen
ted
inla
st-l
evel
cach
egT
hete
chni
que
now
only
targ
ets
shar
edca
che
hT
hete
chni
que
only
targ
ets
incl
usio
nla
st-l
evel
cach
ei T
hete
chni
que
targ
ets
data
cach
ehi
erar
chy
j For
the
last
-lev
elca
che
cach
eis
part
ition
edbe
twee
nth
evi
ctim
and
the
atta
cker
kT
hete
chni
que
can
cont
rolt
hepr
obab
ilitie
sof
the
vuln
erab
ility
tobe
succ
essf
ulto
beex
trem
ely
smal
ll T
hete
chni
que
can
wor
kin
shar
edr
ead
only
mem
ory
whi
leno
twor
king
insh
ared
wri
tabl
em
emor
ym
Ran
dom
dela
ybu
tnot
rand
omm
appi
ngca
non
lyde
crea
seth
epr
obab
ilitie
sof
atta
cker
inso
me
limite
dde
gree
J Hardw Syst Secur (2019) 3397ndash425412
Table5
Exi
stin
gse
cure
cach
esrsquop
rote
ctio
nag
ains
tall
poss
ible
timin
g-ba
sed
vuln
erab
ilitie
sw
ithla
stst
epto
bein
valid
atio
nre
late
dop
erat
ions
Sin
gle
ina
gree
nce
llm
eans
this
cach
eca
npr
even
tth
eco
rres
pond
ing
vuln
erab
ility
in
api
nkce
llm
eans
this
cach
eca
npr
even
tthe
corr
espo
ndin
gvu
lner
abili
tyin
som
ede
gree
Atimes
ina
red
cell
mea
nsth
isca
che
cann
otpr
even
tthi
svu
lner
abili
ty
Furt
herm
ore
for
each
cach
ew
ean
alyz
eno
rmal
exec
utio
n(l
eftc
olum
nun
der
the
cach
ena
me)
and
spec
ulat
ive
exec
utio
n(r
ight
colu
mn
unde
rth
eca
che
nam
e)
a Dyn
amic
adju
stm
ento
fw
ays
for
diff
eren
tthr
eads
isas
sum
edto
bepr
oper
lyus
edac
cord
ing
toth
eru
nnin
gpr
ogra
mrsquos
cach
eus
age
bSo
me
soft
war
eas
sum
ptio
nslis
ted
inth
een
trie
sin
this
colu
mn
have
been
impl
emen
ted
byth
eca
chersquo
sre
late
dso
ftw
are
c Flus
his
disa
bled
but
cach
eco
here
nce
mig
htbe
used
todo
the
data
rem
oval
dFo
rL
1ca
che
and
TL
Bf
lush
ing
isdo
nedu
ring
cont
exts
witc
he T
hete
chni
ques
are
impl
emen
ted
inL
1ca
che
TL
Ban
dla
st-l
evel
cach
ew
hich
cons
isto
fth
ew
hole
cach
ehi
erar
chy
whe
reL
1ca
che
and
TL
Bre
quir
eso
ftw
are
flus
hpr
otec
tion
and
the
last
-lev
elca
che
can
beac
hiev
edby
sim
ple
hard
war
epa
rtiti
onin
gTo
prot
ecta
llle
vels
ofca
ches
the
soft
war
eas
sum
ptio
nsne
edto
bead
ded
f The
tech
niqu
eis
now
only
impl
emen
ted
inla
st-l
evel
cach
egT
hete
chni
que
now
only
targ
ets
shar
edca
che
hT
hete
chni
que
only
targ
ets
incl
usio
nla
st-l
evel
cach
ei T
hete
chni
que
targ
ets
data
cach
ehi
erar
chy
j For
the
last
-lev
elca
che
cach
eis
part
ition
edbe
twee
nth
evi
ctim
and
the
atta
cker
kT
hete
chni
que
can
cont
rolt
hepr
obab
ilitie
sof
the
vuln
erab
ility
tobe
succ
essf
ulto
beex
trem
ely
smal
ll T
hete
chni
que
can
wor
kin
shar
edr
ead
only
mem
ory
whi
leno
twor
king
insh
ared
wri
tabl
em
emor
ym
Ran
dom
dela
ybu
tnot
rand
omm
appi
ngca
non
lyde
crea
seth
epr
obab
ilitie
sof
atta
cker
inso
me
limite
dde
gree
J Hardw Syst Secur (2019) 3397ndash425 413
utilize differentiating sensitive data partitioning andrandomization
Differentiating Sensitive Data (columns for CATalyst cacheto columns for Random Fill cache in Tables 4 and 5) allowsthe victim or attacker software or management software toexplicitly label a certain range of the data of victim whichthey think is sensitive The victim process or managementsoftware is able to use cache-specific instructions to protectthe data and limit internal interference between victimrsquosown data For example it is possible to disable victimrsquosown flushing of victimrsquos labeled data and therefore preventvulnerabilities that leverage flushing This technique allowsthe designer to have stronger control over security criticaldata rather than forcing the system to assume all of victimrsquosdata is sensitive However how to identify sensitive dataand whether this identification process is reliable are openresearch questions for caches that support differentiation ofsensitive data
This technique is independent of whether a cache usespartitioning or randomization techniques to eliminate sidechannels between the attacker and the victim Caches thatare able to label and identify sensitive data have theadvantage in preventing internal interference since they areable to differentiate sensitive data from the normal dataand can make use of special instructions to give moreprivileges to sensitive data However it requires careful usewhen identifying the actual sensitive data and implementingcorresponding security features on the cache
Comparing PL cache with SPlowast cache although bothof them use partitioning flush is able to be implementedto be disabled for victimrsquos sensitive data in PL cachewhere Vu V inv
a Vu (slow) (one type of Flush +Time) is prevented Newcache is able to prevent Vu Va Vu (slow) (one type of Bernsteinrsquos Attack [3])while most of the caches without ability to differentiatesensitive data cannot because Newcache disallows replacingdata as long as either data to be evicted or data to becached is identified to be sensitive However permittingdifferentiation of sensitive data can potentially backfire onthe cache itself For example Random Fill cache cannotprevent Vu Ad Vu (slow) (one type of Evict +Time [34]) which most of the other caches can prevent oravoid because the random fill technique loses its intendedrandom behavior when the security critical data is initiallyloaded into the cache in Step 1
Partitioning-Based Caches usually limit the victim and theattacker to be able to only access a limited set of cacheblock (columns for SPlowast cache to column for PL cachein Tables 4 and 5) For example either there is staticor dynamic partitioning of caches which allocates someblocks to High victim and Low attacker The partitioning
can be based not just on whether the memory access isvictimrsquos or attackerrsquos but also on where the access is to(eg High partition is determined by the data address) Forspeculative execution attackerrsquos code can be the part ofspeculation or out-of-order load or store which is able to bepartitioned (eg using speculative load buffer) from othernormal operations The partitioning granularity can be cachesets cache lines or cache ways Partitioning-based securecaches are usually able to prevent external interference bypartitioning but are weak at preventing internal interferenceWhen partitioning is used interference between the attackerand the victim or data belonging to different securitylevels should not be possible and attacks based on externalinterference between the victim and the attacker will failHowever the internal interference of victimrsquos own datais hard to be prevented by the partitioning-based cachesWhatrsquos more partitioning is recognized to be wastefulin terms of cache space and inherently degrades systemperformance [49] Dynamic partitioning can help limit thenegative performance and space impacts but it could be ata cost of revealing some information when adjusting thepartitioning size for each part It also does not help withinternal interference prevention
In terms of the three-step model the partitioning-basedcaches excel at making use of partitioning techniques todisallow the attacker to set initial states (Step 0) of victimpartition by use of flushing or eviction and therefore bringuncertainty to the final timing observation made by theattacker
SPlowast cache can prevent external miss-based interferencebut it still allows the victim and the attacker to getcache hits due to each otherrsquos data which makes hit-based vulnerabilities happen eg Vd Vu Va (fast)(one type of Cache Internal Collision [5]) vulnerabilityis one of the examples that SPlowast cache cannot preventSecVerilog cache is similar to SPlowast cache but prevents theattacker from directly getting cache hit due to victimrsquos datafor confidentiality and therefore prevents vulnerabilitiessuch as Ainv
a Vu Aa (fast) (one type of Flush+ Reload [55]) SHARP cache mainly uses partitioningcombined with random eviction to minimize the probabilityof evicting victimrsquos data and prevent external miss-basedvulnerabilities It is vulnerable to hit-based or internalinterference vulnerabilities such as Vu Va Vu (slow)(one type of Bernsteinrsquos Attack [3]) vulnerability DAWGcache will only allow the data to get a cache hit if bothits address and the process ID are the same Thereforecompared with normal partitioning cache such as SPlowast cacheit is able to prevent vulnerabilities such as Vd Vu Ainv
d
(fast) (one type of Prime + Flush)SecDCP and NoMo cache both leverage dynamic
partitioning to improve performance Compared withSecVerilog cache SecDCP cache introduces certain side
J Hardw Syst Secur (2019) 3397ndash425414
channels which manifest themselves when the number ofways assigned to the victim and attacker changes egVu Ainv
a Vu (slow) (one type of Flush + Time)vulnerability NoMo cache behaves more carefully whenchanging the number of ways during dynamic partitioninghowever it requires victimrsquos sensitive data to fit intothe assigned partitions otherwise it will be put into theunreserved way and allow eviction by the attacker SecDCPdoes not have unreserved way All the space in the cachewill be either belongs to High or Low partition
Sanctum cache and CATalyst cache are both controlledby a powerful software monitor and they disallow securepage sharing between victim and attacker to preventvulnerabilities such as Ad Vu Aa (fast) (onetype of Flush + Reload [55]) Sanctum cache does notconsider internal interference while CATalyst cache ismore carefully designed to prevent different vulnerabilitieswith the implemented software system so far supportingpreventing all of the vulnerabilities but only works forLLC and with high software implementation complexityand some assumptions that might be hard to achievein other scenarios eg assuming the secure partition isbig enough to fit all the secure data MI6 cache is thecombination of Sanctum and disabling speculation wheninteracting with the outside world Therefore in normalexecution it behaves the same as Sanctum For speculativeexecution because it will simply disable all the speculationwhen involving the outside world the external interferencevulnerability such as Vd Vu Ad (slow) (one type ofEvict + Probe) vulnerability will be prevented
InvisiSpec cache does not modify the original cachestate but places the data in a speculative buffer partitionduring the speculation or out-of-order load or store Sinceduring speculation cache state is not actually updated thespeculative execution cannot trigger any of the steps inthe three-step model RIC cache focuses on eviction basedattack and therefore are good at preventing even someinternal miss-based vulnerability such as Vu Va Vu
(slow) (one type of Bernsteinrsquos Attack [3]) but are bad atall hit-based vulnerabilities PL cache is line-partitionedand uses locking techniques for victimrsquos security criticaldata It can prevent many vulnerabilities because preloadingand locking secure data disallow the attacker or non-securevictim data to set initial states (Step 0) for victim partitionand therefore brings uncertainty to the final observation bythe attacker eg Ad Vu Va (fast) (one type of CacheInternal Collision [5]) vulnerability is prevented
Randomization-Based Caches (columns for SHARP cacheand columns for RP cache to columns for Non Deterministiccache in Tables 4 and 5) inherently de-correlate therelationship between information of victimrsquos securitycritical datarsquos address and observed timing from cache
hit or miss or between the address and observed timingof flush or cache coherence operations For speculativeexecution they also de-correlate the relationship betweenthe address of the data being accessed during speculativeexecution or out-of-order load or store and the observedtiming from a cache hit or miss Randomization can beused when bringing data into the cache evicting data orboth Some designs randomize the address to cache setmapping As a result of the randomization the mutualinformation from the observed timing due to having ornot having data in the cache could be reduced to 0 ifrandomization is done on every memory access Somesecure caches use randomization to avoid many of the miss-based internal interference vulnerabilities However theymay still suffer from hit-based vulnerabilities especiallywhen the vulnerabilities are related to internal interferenceHowever randomization is also likewise recognized toincrease performance overheads [23] It also requires afast and secure random number generator Most of therandomization is cache-line-based and can be combinedwith differentiation of sensitive data to be more efficient
RP cache allows eviction between different sensitivedata which leaves vulnerabilities such as Vu Va Vu
(slow) (one type of Bernsteinrsquos Attack [3]) still possiblewhile Newcache prevents this Both of the RP cacheand Newcache are not able to prevent hit-based internal-interference vulnerabilities such as Ainv
a Vu Va
(fast) (one type of Cache Internal Collision [5]) RandomFill cache is able to use total de-correlation of memoryaccess and cache access of victimrsquos security critical datato prevent most of the internal and external interferenceHowever when security critical data is initially directlyloaded into the cache block for Step 1 Random Fill cachewill not randomly load security critical data and allowsvulnerabilities such as Vu V inv
a Vu (slow) (one typeof Flush + Time) vulnerability to exist CEASER cache usesencryption scheme plus dynamic remapping to randomizemapping from memory addresses to cache sets Howeverthis targets eviction-based attacks and cannot prevent hit-based vulnerabilities such as Va V inv
u V inva (fast)
(one type of Flush + Probe Invalidation) SCATTER cacheencrypts both the cache address and process ID whenmapping into different cache index to further prevent morehit-based vulnerabilities for shared and read only memoryNon Deterministic Cache totally randomizes timing ofcache accesses by adding delays and can prevent all attacks(but at tremendous performance cost)
61 Estimated Performance and Security Tradeoffs
Table 6 shows the implementation and performanceresults of the secure caches as listed by the designersin the different papers At the extreme end there is
J Hardw Syst Secur (2019) 3397ndash425 415
Table6
Exi
stin
gse
cure
cach
esrsquo
impl
emen
tatio
nm
etho
dpe
rfor
man
cep
ower
and
area
sum
mar
y
J Hardw Syst Secur (2019) 3397ndash425416
the Non Deterministic cache with random delay thesecure cache can prevent all the cache timingndashbasedvulnerabilities in some degreemdashwhile their paper reportsonly 7 degradation in performance we expect it tobe much more for more complex application than AESalgorithm Disabling caches eliminates the attacks but ata huge performance cost Normally a secure cache needsto sacrifice some performance in order to de-correlatememory access with the timing The secure caches thattend to be able to prevent more vulnerabilities usuallyhave weaker performance compared with other securecaches For example more security seems to imply lessperformance
62 Towards Ideal Secure Cache
Based on the above analysis a good secure cache shouldconsider all the 72 types of Strong vulnerabilities egexternal and internal interference and hit-based and miss-based vulnerabilities Considering all factors and basedon Tables 4 and 5 we have several suggestions andobservations for a secure cache design which can defendtiming-based attacks
bull Internal interference is important for caches to preventtiming-based attacks and is the weak point of most ofthe secure caches To prevent this the following threesubpoints should be considered
ndash Miss-based internal interference can be solvedby randomly evicting data to de-correlatememory access with timing information wheneither data to be evicted or data to be cachedis sensitive eg Newcache prevents Vu Va Vu (slow) (one type of BernsteinrsquosAttack [3]) vulnerability
ndash Hit-based internal interference can be solvedby randomly bringing data into the cache egRandom Fill cache prevents Ad Vu Va
(fast) (Cache Internal Collision) vulnerabilityndash To limit internal interference at lower perfor-
mance cost rather than simply assume all ofvictimrsquos data is sensitive it is better to differ-entiate real sensitive data from other data in thevictim code However identification of sen-sitive information needs to be carefully usedeg Random Fill cache is vulnerable to Vu Ad Vu (fast) (one type of Evict + Time [34])vulnerability which most of the secure cachesare able to prevent
bull Direct partitioning between the victim and the attackeralthough may hurt cache space utilization or perfor-mance is good at disallowing attacker to set known
initial state to victimrsquos partition and therefore pre-vents external interference Alternatively careful use ofrandomization can also prevent external interference
It should be noted that some cache designs only focuson certain levels eg CATalyst cache only works at thelast level cache In order to fully protect the whole cachesystem from timing-based attacks all levels of caches in thehierarchy should be protected with related security featuresFor example Sanctum is able to prevent all levels of cachesfrom L1 to last-level cache Consequently secure cachedesign needs to be realizable at all levels of the cachehierarchy
7 RelatedWork
There are a lot of existing attacks exploring timing-based cache channels eg [1 3 5 16ndash19 34 36 5455] Furthermore our recent paper [11] has summarizedcache timingndashbased side-channel vulnerabilities using athree-step model and inspired this work on checkingwhich vulnerability types are truly defeated by the securecaches in context of timing-based attacks In other workZhang and Lee [59] used finite-state machine to modelcache architectures and leveraged mutual information tomeasure potential side-channel leakage of the modeledcache architectures Meanwhile He and Lee [20] modeledinterference using probabilistic information flow graph andused attackerrsquos success probability to estimate differentcachesrsquo ability to defend against some cache timingndashbasedside-channel attacks However they did not explore allpossible vulnerabilities due to cache timingndashbased channels
There is also some other work focusing on cacheside-channel verification [13 14 47] Among theseCacheAudit [14] efficiently computes possible side-channelobservations using abstractions in a modular way Bit-level and arithmetic reasoning is used in [13] for memoryaccesses in the presence of dynamic memory allocationCacheD [47] detects potential cache differences at eachprogram point leveraging symbolic execution and constraintsolving
Hardware transactional memory has also been lever-aged to prevent timing-based cache side-channel attacks [815] Hardware transactional memory (HTM) is availableon modern commercial processors such as Intelrsquos Transac-tional Synchronization Extensions (TSX) Its main featureis to abort the transaction and roll back the modificationswhenever a cache block contained in the read set or writeset is evicted out of the cache In [15] HTM was com-bined with preloading strategy for code and data to preventFlush + Reload attacks in the local setting and Prime andProbe attacks in the cloud setting In [8] the software-level
J Hardw Syst Secur (2019) 3397ndash425 417
solution targets system calls page faults code refactoringand abort reasoning to eliminate not only Prime + ProbeFlush + Reload but also Evict + time and Cache Collisionattacks
8 Conclusion
This paper first proposed a new three-step model in orderto model all possible cache timing vulnerabilities It furtherprovided a cache three-step simulator and reduction rulesto derive effective vulnerabilities allowing us to find onesthat have not been exploited in literature With exhaustiveeffective vulnerability types listed this paper presentedanalysis of 18 secure processor cache designs with respectto how well they can defend against these timing-basedvulnerabilities Our work showed that vulnerabilities basedon internal interference of the victim application aredifficult to protect against and many secure cache designsfail in this We also provided a summary of secure processorcache features that could be integrated to make an idealsecure cache that is able to defend timing-based attacksOverall implementing a secure cache in a processor can be aviable alternative to defend timing-based attacks Howeverit requires design of an ideal secure cache or correction ofexisting secure cache designs to eliminate the few attacksthat they do not protect against
Acknowledgments This work was support in part by NSF grants1651945 and 1813797 and through Semiconductor Research Corpo-ration (SRC) under GRC Task 2844001
Appendix A Attack Strategies Descriptions
This appendix gives overview of the attack strategies shownin Tables 2 and 3 in Section 3 For each attack strategy anoverview of the three steps of the strategy is given Some ofthe strategies are similar and some may not be precise butwe keep and use the original names as they were assignedin prior work One advantage of our three-step model isthat it gives precise definition of each attack Neverthelessthe attack strategy names used before (and added by us forstrategies which did not have such names) may be useful torecall the attacksrsquo high-level operation
Cache Internal Collision In Step 1 cache blockrsquos data isinvalidated by flushing or eviction done by either theattacker or the victim Then the victim accesses secret datain Step 2 Finally the victim accesses data at a knownaddress in Step 3 if there is a cache hit then it reveals thatthere is an internal collision and leaks value of u
Flush + Reload In Step 1 either the attacker or the victiminvalidates the cache blockrsquos data by flushing or evictionThen the victim access secret data in Step 2 Finally theattacker tries to access some data in Step 2 using a knownaddress If a cache hit is observed then addresses from lasttwo steps are the same and the attacker learns the secretaddress This strategy has similar Step 1 and Step 2 asCache Internal Collision vulnerability but for Step 3 it isthe attacker who does the reload access
Reload + Time (New Name Assigned in this Paper) In Step
1 secret data is invalidated by the victim Then the attackerdoes some known data access in Step 2 that could possiblybring back the invalidated the victimrsquos secret data in Step
1 In Step 3 if the victim reloads the secret data a cachehit is observed and the attacker can derive the secret datarsquosaddress
Flush + Probe (New Name Assigned in this Paper) In Step
1 the victim or the attacker access some known addressIn Step 2 the victim invalidates secret data In Step 3reloading of Step 1rsquos data and observation of a cache misswill help the attacker learn that the secret data maps to theknown address from Step 1
Evict + Time In Step 1 some victimrsquos secret data is put intothe cache by the victim itself In Step 2 the attacker evicts aspecific cache set by performing a memory related operationthat is not a flush In Step 3 the victim reloads secret dataand if a cache miss is observed the will learn the secretdatarsquos cache set information This attack has similar Step 1and Step 3 as Flush + Time vulnerability but for Step 2 inEvict + Time the attacker invalidates some known addressallowing it to find the full address of the secret data insteadof evicting a cache set to only find the secret datarsquos cacheindex as in the Flush + Time attack
Prime + Probe In Step 1 the attacker primes the cache setusing data at address known to the attacker In Step 2 thevictim accesses the secret data which possibly evicts datafrom Step 1 In Step 3 the attacker probes each cache setand if a cache miss is observed the attacker knowns thesecret data maps to the cache set he or she primed
Bernsteinrsquos Attack This attack strategy leverages the vic-timrsquos internal interference to trigger the miss-based attackFor one case the victim does the same secret data access inStep 1 and Step 3 while in Step 2 the victim tries to evictone whole cache setrsquos data by known data accesses If cachemiss is observed in Step 3 that will tell the attacker thecache set is the one secret data maps to For another casethe victim primes and probe a cache set in Step 1 and Step
3 driven by the attacker while in Step 2 the victim tries to
J Hardw Syst Secur (2019) 3397ndash425418
access the secret data Similar to the first case observingcache miss in Step 3 tells the attacker the cache set is theone secret data maps to
Evict + Probe (NewName Assigned in this Paper) In Step 1victim evict the cache set using the access to a data at anaddress known to the attacker In Step 2 the victim accessessecret data which possibly evicts data from Step 1 In Step
3 the attacker probes each cache set using the same data asin Step 1 if a cache miss is observed the attacker knownsthe secret data maps to the cache set he or she primed Thisattack strategy has similar Step 2 and Step 3 as Prime +Probe attack but for Step 1 it is the victim that does theeviction accesses
Prime + Time (NewName Assigned in this Paper) In Step 1the attacker primes the cache set using access to data at anaddress known to the attacker In Step 2 the victim accessessecret data which possibly evicts data from Step 1 In Step
3 the victim probes each cache set using the same data Step
1 if a cache miss is observed the attacker knowns the secretdata maps to the cache set he or she primed in Step 1 Thisattack strategy has similar Step 1 and Step 2 as Prime +Probe attack but for Step 3 it is the victim that does theprobing accesses
Flush + Time (NewNameAssigned in this Paper) The victimaccesses the same secret data in Step 1 and Step 3 whilein Step 2 the attacker tries to invalidate data at a knownaddress If cache miss is observed in Step 3 that will tellthe attacker the data address he or she invalidated in Step 2maps to the secret data
Invalidation related (new names assigned in this paper)Vulnerabilities that have names ending with ldquoinvalidationrdquoin Table 3 correspond to the vulnerabilities that havethe same name (except for the ldquoinvalidationrdquo part) inTable 2 The difference between each set of correspondingvulnerabilities is that the vulnerabilities ending withldquoinvalidationrdquo use invalidation related operation in the laststep to derive the timing information rather than the normalmemory access related operations
Appendix B Soundness Analysisof the Three-StepModel
In this section we analyze the soundness of the three-stepmodel to demonstrate that the three-step model can coverall possible timing-based cache vulnerabilities in normalcaches If there is a vulnerability that is represented usingmore than three steps the steps can be reduced to only three
steps or a three-step sub-pattern can be found in the longerrepresentation
In the below analysis we use β to denote the numberof memory-related operations ie steps in a representationof a vulnerability We show that β = 1 is not sufficient torepresent a vulnerability β = 2 covers some vulnerabilitiesbut not all β = 3 represents all the vulnerabilities andβ gt 3 can be reduced to only three steps or a three-step sub-pattern can be found in the longer representation Knownaddresses refer to all the cache states that interferencewith the data a aalias and d Unknown address refersu An access to a known memory address is denoted asknown access operation and an invalidation of a knownmemory address is denoted as known inv operationThe known access operation and known inv operation
together make up not u operations An unknown memoryrelated operation (containing u) is denoted as u operation
B1 Patterns with β = 1
When β = 1 there is only one memory-relatedoperation and it is not possible to create interferencebetween memory-related operations since two memory-related operations are the minimum requirement for aninterference Furthermore β = 1 corresponds to the three-step pattern with both Step 1 and Step 2 being since thecache state gives no information and Step 3 being theone operation These types of patterns are all examined bythe cache three-step simulator and none of these types arefound to be effective Consequently a vulnerability cannotexit when β = 1
B2 Patterns with β = 2
When β = 2 it satisfies the minimum requirementof an interference for memory related operations andcorresponds to the three-step cases where Step 1 is and Step 2 and Step 3 are the two operations Thesetypes are all examined by the cache three-step simulatorand some of them belong to Weak Vulnerabilities like Aa Vu Therefore three-step cases where Step
1 is have corresponding effective vulnerabilities shownin Table 2 Consequently β = 2 can represent someweak vulnerabilities but not all vulnerabilities as there existsome that are represented with three steps as discussednext
B3 Patterns with β = 3
When β = 3 we have tested all possible combinations ofthree-step memory related operations in Section 33 usingour cache simulator for the three-step model We foundthat there are in total 72 types of Strong Vulnerabilities
J Hardw Syst Secur (2019) 3397ndash425 419
and 64 types of Weak Vulnerabilities that are representedby patterns with β = 3 steps Consequently β = 3can represent all the vulnerabilities (including some weakones where Step 1 is ) Using more steps to representvulnerabilities is not necessary as discussed next
B4 Patterns with β gt 3
When β gt 3 the pattern of memory-related operations fora vulnerability can be reduced using the following rules
B41 Subdivision Rules
First a set of subdivision rules is used to divide the longpattern into shorter patterns following the below rulesEach subdivision rule should be applied recursively beforeapplying the next rule
Subdivision Rule 1 If the longer pattern contains a sub-pattern such as the longer pattern canbe divided into two separate patterns where is assignedas Step 1 of the second pattern This is because givesno timing information and the attacker loses track of thecache state after This rule should be recursively applieduntil there are no sub-patterns left with a in the middleor as last step ( in the last step will be deleted) in thelonger pattern
Subdivision Rule 2 Next if a pattern (derived afterrecursive application of the Rule 1 contains a sub-patternsuch as AinvVinv the longer pattern canbe divided into two separate patterns where AinvVinv isassigned as Step 1 of the second pattern This is becauseAinvVinv will flush all the timing information of thecurrent block and it can be used as the flushing step forStep 1 eg vulnerability Ainv Vu Aa(f ast)shown in Table 2 AinvVinv cannot be a candidate formiddle steps or the last step because it will flush alltiming information making the attacker unable to deducethe final timing with victimrsquos sensitive address translationinformation This rule should be recursively applied untilthere are no sub-patterns left with a AinvVinv in themiddle or the last step (AinvVinv in the last step will bedeleted)
B42 Simplification Rules
For each of the patterns resulting from the subdivisionof the original pattern we define Commute Rules UnionRules and Reduction Rules for a each set of two adjacentsteps in these remaining patterns In Table 7 we showall the possible cases of the rule applying conditions for
each adjacent two steps regardless of the attackerrsquos access(A) or the victimrsquos access (V ) The table shows whetherthe corresponding two steps can be commuted reduced orunioned (and the reduced or the unioned result if the rulescan be applied)
B421 Commute Rules
Suppose there are two adjacent steps M and N for a memorysequences M N If commuting M andN lead to the same observation result ie M N and N M will have thesame timing observation information in the final step for theattacker we can freely exchange the place of M and N inthis pattern In this case we have more chance to Reduceand Union the steps within the memory sequence by thefollowing Rules In the possible commuting process we willtry every possible combinations to commute different pairsof two steps that are able to apply the Commute Rules andthen further apply Reduce Rules and Union Rules to seewhether the commute is effective ie there can be stepsreduced or unioned after the proper commuting process Thefollowing two adjacent memory-related operations can becommuted
ndash Commute Rule 1 For two adjacent steps if one stepis a known access operation and another step is aknown inv operation and the addresses they referto are different these two steps can be commuted nomatter which position of the two steps they are in withinthe whole memory sequence It will show a ldquoyesrdquo forthe corresponding two-step pattern for the CommuteRule 1 column if these two can be commuted in Table 7
ndash Commute Rule 2 A superset of two-step patterns thatcan apply Commute Rule 1 can be commuted if thesecond step of these two adjacent steps is not thelast step in the whole memory sequence There aresome two adjacent steps that can only be commutedif the second step of these two adjacent steps is notthe last step in the whole memory sequence Therewill be a ldquoyesrdquo for the corresponding two-step patternfor the Commute Rule 2 column and a ldquonordquo for thecorresponding two-step pattern for the Commute Rule 1column in Table 7
B422 Reduction Rules
If the memory sequence after applying Commute Ruleshave a sub-pattern that has two adjacent steps both relatedto known addresses or both related to unknown address(including repeating states) the two adjacent steps can bereduced to only one following the reduction rules (if thetwo-step pattern has ldquoyesrdquo for the Column ldquoUnion Rule or
J Hardw Syst Secur (2019) 3397ndash425420
Table7
Rul
esfo
rco
mbi
ning
two
adja
cent
step
s
Firs
tSe
cond
Com
mute
Com
mute
Union
Rule
orC
ombi
ned
Firs
tSe
cond
Com
mute
Com
mute
Union
Rule
orC
ombi
ned
Rule1
Rule2
ReduceRule
Step
Rule1
Rule2
ReduceRule
Step
aa
yes
yes
yes
aa
inv
ano
noye
sa
aa
ali
as
nono
yes
aali
as
ain
va
ali
as
yes
yes
yes
aali
as
ad
nono
yes
da
inv
dye
sye
sye
sd
au
nono
nominus
ain
vu
nono
nominus
aa
inv
nono
yes
ain
va
inv
ain
vye
sye
sye
sa
inv
aa
ali
asin
vye
sye
sye
sa
ain
va
ali
asin
vye
sye
sye
sU
nio
n(a
inva
ali
asin
v)
ad
inv
yes
yes
yes
aa
inv
din
vye
sye
sye
sU
nio
n(a
invd
inv)
au
inv
nono
nominus
ain
vu
inv
noye
sno
minusa
ali
as
ano
noye
sa
aali
asin
va
yes
yes
yes
a
aali
as
aali
as
yes
yes
yes
aali
as
aali
asin
va
ali
as
nono
yes
aali
as
aali
as
dno
noye
sd
aali
asin
vd
yes
yes
yes
d
aali
as
uno
nono
minusa
ali
asin
vu
nono
nominus
aali
as
ain
vye
sye
sye
sa
ali
as
aali
asin
va
inv
yes
yes
yes
Unio
n(a
inva
ali
asin
v)
aali
as
aali
asin
vno
noye
sa
ali
asin
va
ali
asin
va
ali
asin
vye
sye
sye
sa
ali
asin
v
aali
as
din
vye
sye
sye
sa
ali
as
aali
asin
vd
inv
yes
yes
yes
Unio
n(d
inva
ali
asin
v)
aali
as
uin
vno
nono
minusa
ali
asin
vu
inv
noye
sno
minusd
ano
noye
sa
din
va
yes
yes
yes
a
da
ali
as
nono
yes
aali
as
din
va
ali
as
yes
yes
yes
aali
as
dd
yes
yes
yes
dd
inv
dno
noye
sd
du
nono
nominus
din
vu
noye
sno
minusd
ain
vye
sye
sye
sd
din
va
inv
yes
yes
yes
Unio
n(a
invd
inv)
da
ali
asin
vye
sye
sye
sd
din
va
ali
asin
vye
sye
sye
sU
nio
n(d
inva
ali
asin
v)
dd
inv
nono
yes
din
vd
inv
din
vye
sye
sye
sd
inv
du
inv
noye
sno
minusd
inv
uin
vno
yes
nominus
ua
nono
nominus
uin
va
nono
nominus
ua
ali
as
nono
nominus
uin
va
ali
as
nono
nominus
ud
nono
nominus
uin
vd
noye
sno
minusu
uye
sye
sye
su
uin
vu
nono
yes
u
ua
inv
nono
nominus
uin
va
inv
noye
sno
minusu
aali
asin
vno
nono
minusu
inv
aali
asin
vno
yes
nominus
ud
inv
noye
sno
minusu
inv
din
vno
yes
nominus
uu
inv
nono
yes
uin
vu
inv
uin
vye
sye
sye
su
inv
J Hardw Syst Secur (2019) 3397ndash425 421
Reduce Rulerdquo and has no Union result for the ldquoCombinedSteprdquo Column in Table 7
ndash Reduction Rule 1 For two u operations although u isunknown both of the accesses target on the same u socan be reduced to only keep the second access in thememory sequence
ndash Reduction Rule 2 For two known adjacent memoryaccessndashrelated operations (known access operation)they always result in a deterministic state of the secondmemory accessndashrelated cache block so these two stepscan be reduced to only one step
ndash Reduction Rule 3 For two adjacent steps if onestep is known access operation and another one isknown inv operation no matter what order theyhave and the address they refer to is the same these twocan be reduced to one step which is the second step
B423 Union Rules
Suppose there are two adjacent steps M and N for memorysequences M N If combining M andN leads to the same timing observation result ie M N and Union(M N) will havethe same timing observation information in the final step forthe attacker we can combine step M and N to be a joint onestep for this memory sequence defined as Union(M N)Two adjacent steps that can be combined are discussed inthe following cases
ndash Union Rule 1 Two invalidations to two knowndifferent memory addresses can be applied UnionRule 1 known inv operation are two operations bothinvalidating some known address therefore they canbe combined to only one step The Union Rule can becontinuously done to union all the adjacent invalidationstep that invalidates known different memory addresses
B424 Final Check Rules
Each long memory sequence will recursively apply thesethree categorizations of the rules in the order Com-mute Rules first to put known access operations andknown inv operation that targets the same address as nearas possible and u operations and not u operations areputting together as much as possible The Reduced Rulesare then checked and applied to the processed memorysequence to reduce the steps Then the Union Rule isapplied to the processed memory sequence
The recursion at each application to these threecategorizations of the rules should be always applied andreduce at least one step until the resulting sequence matchesone of the two possible cases
bull the long (β gt 3) memory sequence with u operation
and not u operation is further reduced to a sequencewhere there are at most three steps in the followingpatterns or less
ndash u operation not u operation u operation
ndash not u operation u operation not u operation
There might be possible extra or AinvV inv beforethese three-step pattern where
bull An extra in the first step will not influencethe result and can be directly removed
bull If an extra AinvV inv in the first step
ndash If followed by known access ope-ration AinvV inv can be removeddue to the actual state further put intothe cache block
ndash If followed by known inv opera-t ion or V inv
u AinvV inv can also beremoved since the memory locationis repeatedly flushed by the two steps
ndash If followed by Vu worst case will beAinvV inv Vu not u operation u operationwhich is either an effective vul-nerability according to Table 2and reduction rules shown inSection 33 or AinvV inv Vu Ainv
d V invd u operation where
Vu Ainvd V inv
d can further beapplied Commute Rule 2 to reduceand be within three steps
In this case the steps are finally within three steps andthe checking is done
bull There exist two adjacent steps that cannot be appliedany Rules above and requires the Rest Checking
The only left two adjacent steps that cannot be appliedby any of the three categorizations of the Rules are thefollowing
ndash AaVaAaalias Vaalias AdVdAinva V inv
a
Ainvaalias V inv
aalias Vu ndash AaVaAaalias Vaalias V inv
u ndash Vu AaVaAaalias Vaalias AdVd
Ainva V inv
a Ainvaalias V inv
aalias ndash V inv
u AaVaAaalias Vaalias We manually checked all of the two adjacent step patterns
above and found that adding extra step before or after these
J Hardw Syst Secur (2019) 3397ndash425422
Algorithm 2 -Step ( 3) pattern reduction
Input number of steps of the pattern
a two-dimensional dynamic-size array [0] contains the states of each step of the original pattern in
order [1] [2] are empty initially
Output reduced effective vulnerability pattern(s) array It will be an empty list if the original pattern does not
correspond to an effective vulnerability
1 = Oslash
2 while contain( ) and index not 0 do3 = 1 ( )
4 end while5 while ( contain( ) and index not 0) or ( contain( ) and index not 0) do6 = 2 ( )
7 end while8 while ( is ineffecitve or has interval effective three steps) do9 = ( )
10 = ( )
11 = ( )
12 if ( is ineffecitve or has interval effective three steps) then13 += ( )
14 end if15 end while16 return
two steps can either generate two adjacent step patterns thatbe processed by the three Rules where further step can bereduced or construct effective vulnerability according toTable 2 and reduction rules shown in Section 33 wherethe corresponding pattern can be treated effective and thechecking is done
B43 Algorithm for Reducing and CheckingMemorySequence
Algorithm 2 is used to (i) reduce a β-step (β gt 3)pattern to a three-step pattern thus demonstrating that thecorresponding β gt 3 step pattern actually is equivalent tothe output three-step pattern and represents a vulnerabilitythat is captured by an existing three-step pattern or (ii)demonstrate that the β-step pattern can be mapped to one ormore three-step vulnerabilities It is not possible for a β-stepvulnerability pattern to not be either (i) or (ii) after doing theRule applications Key outcome of our analysis is that anyβ-step pattern is not a vulnerability or if it is a vulnerabilityit maps to either outputs (i) or (ii) of the algorithm
Inside the Algorithm 2 contain() represents a function tocheck if a list contains a corresponding state is ineffective()represents a function that checks the corresponding memorysequence does not contain any effective three-stepshas interval effective three steps() represents a functionthat check if the corresponding memory sequence can bemapped to one or more three-step vulnerabilities
B44 Summary
In conclusion the three-step model can model all possibletiming-based cache vulnerability in normal caches Vulner-abilities which are represented by more than three steps canbe always reduced to one (or more) vulnerabilities from ourthree-step model and thus using more than three step is notnecessary
References
1 Acıicmez O Koc CK (2006) Trace-driven cache attacks on AES(short paper) In International conference on information andcommunications security Springer pp 112ndash121
2 Agrawal D Archambeault B Rao JR Rohatgi P (2002) TheEM side-channel (s) In International workshop on cryptographichardware and embedded systems Springer pp 29ndash45
3 Bernstein DJ (2005) Cache-timing attacks on AES4 Binkert N Beckmann B Black G Reinhardt SK Saidi A Basu
A Hestness J Hower DR Krishna T Sardashti S et al (2011)The gem5 simulator ACM SIGARCH computer architecture news39(2)1ndash7
5 Bonneau J Mironov I (2006) Cache-collision timing attacksagainst AES In International workshop on cryptographichardware and embedded systems Springer pp 201ndash215
6 Borghoff J Canteaut A Guneysu T Kavun EB Knezevic MKnudsen LR Leander G Nikov V Paar C Rechberger Cet al (2012) Princendasha low-latency block cipher for pervasivecomputing applications In International conference on the theoryand application of cryptology and information security Springerpp 208ndash225
J Hardw Syst Secur (2019) 3397ndash425 423
7 Bourgeat T Lebedev I Wright A Zhang S Devadas S et al (2019)MI6 Secure enclaves in a speculative out-of-order processorIn Proceedings of the 52nd Annual IEEEACM InternationalSymposium on Microarchitecture ACM pp 42ndash56
8 Chen S Liu F Mi Z Zhang Y Lee RB Chen H Wang X(2018) Leveraging hardware transactional memory for cache side-channel defenses In Proceedings of the 2018 on Asia conferenceon computer and communications security ACM pp 601ndash608
9 Clark SS Ransford B Rahmati A Guineau S Sorber J Xu WFu K (2013) Wattsupdoc power side channels to nonintrusivelydiscover untargeted malware on embedded medical devices InPresented as part of the 2013 USENIX workshop on healthinformation technologies
10 Costan V Lebedev IA Devadas S (2016) Sanctum minimalhardware extensions for strong software isolation In USENIXsecurity symposium pp 857ndash874
11 Deng S Xiong W Szefer J (2018) Cache timing side-channel vulnerability checking with computation tree logic InProceedings of the 7th international workshop on hardware andarchitectural support for security and privacy ACM p 2
12 Domnitser L Jaleel A Loew J Abu-Ghazaleh N Ponomarev D(2012) Non-monopolizable caches low-complexity mitigation ofcache side channel attacks ACM Transactions on Architectureand Code Optimization (TACO) 8(4)35
13 Doychev G Kopf B (2017) Rigorous analysis of softwarecountermeasures against cache attacks In Proceedings of the 38thACM SIGPLAN conference on programming language design andimplementation ACM pp 406ndash421
14 Doychev G Kopf B Mauborgne L Reineke J (2015) CacheAudita tool for the static analysis of cache side channels ACM Trans-actions on information and system security (TISSEC) 18(1)4
15 Gruss D Lettner J Schuster F Ohrimenko O Haller I CostaM (2017) Strong and efficient cache side-channel protectionusing hardware transactional memory In USENIX securitysymposium pp 217ndash233
16 Gruss D Maurice C Wagner K Mangard S (2016) Flush+ flusha fast and stealthy cache attack In International conference ondetection of intrusions and malware and vulnerability assessmentSpringer pp 279ndash299
17 Gruss D Spreitzer R Mangard S (2015) Cache template attacksautomating attacks on inclusive last-level caches In USENIXsecurity symposium pp 897ndash912
18 Guanciale R Nemati H Baumann C Dam M (2016) Cache stor-age channels alias-driven attacks and verified countermeasuresIn 2016 IEEE symposium on security and privacy (SP) IEEEpp 38ndash55
19 Gullasch D Bangerter E Krenn S (2011) Cache gamesndashbringingaccess-based cache attacks on AES to practice In 2011 IEEEsymposium on security and privacy (SP) IEEE pp 490ndash505
20 He Z Lee RB (2017) How secure is your cache against side-channel attacks In Proceedings of the 50th annual IEEEACMinternational symposium on microarchitecture ACM pp 341ndash353
21 Intel C (2015) Improving real-time performance by utilizing cacheallocation technology Intel Corporation
22 Kayaalp M Khasawneh KN Esfeden HA Elwell J Abu-Ghazaleh N Ponomarev D Jaleel A (2017) RIC relaxed inclusioncaches for mitigating LLC side-channel attacks In 2017 54thACMEDACIEEE design automation conference (DAC) IEEEpp 1ndash6
23 Keramidas G Antonopoulos A Serpanos DN Kaxiras S (2008)Non deterministic caches a simple and effective defense againstside channel attacks Des Autom Embed Syst 12(3)221ndash230
24 Kessler RE Hill MD (1992) Page placement algorithms for largerealindexed caches ACM Trans Comput Syst (TOCS) 10(4)338ndash359
25 Kiriansky V Lebedev I Amarasinghe S Devadas S Emer J(2018) DAWG a defense against cache timing attacks in spec-ulative execution processors In 2018 51st Annual IEEEACMinternational symposium on microarchitecture (MICRO) IEEEpp 974ndash987
26 Kocher P Genkin D Gruss D Haas W Hamburg M LippM Mangard S Prescher T Schwarz M Yarom Y (2018)Spectre attacks exploiting speculativeexecution In 2019 IEEESymposium on Security and Privacy (SP) IEEE pp 1ndash19
27 Lee RB Kwan P McGregor JP Dwoskin J Wang Z (2005)Architecture for protecting critical secrets in microprocessorsIn ACM SIGARCH computer architecture news vol 33 IEEEComputer Society pp 2ndash13
28 Lee Y Waterman A Avizienis R Cook H Sun C StojanovicV Asanovic K (2014) A 45nm 13 GHz 167 double-precisionGFLOPSW RISC-V processor with vector accelerators InEuropean solid state circuits conference (ESSCIRC) ESSCIRC2014-40th IEEE pp 199ndash202
29 Liu F Ge Q Yarom Y Mckeen F Rozas C Heiser G Lee RB(2016) CATalyst defeating last-level cache side channel attacks incloud computing In 2016 IEEE international symposium on highperformance computer architecture (HPCA) IEEE pp 406ndash418
30 Liu F Lee RB (2014) Random fill cache architecture In2014 47th annual IEEEACM international symposium onmicroarchitecture (MICRO) IEEE pp 203ndash215
31 Liu F Wu H Mai K Lee RB (2016) Newcache secure cachearchitecture thwarting cache side-channel attacks IEEE Micro36(5)8ndash16
32 Luk CK Cohn R Muth R Patil H Klauser A Lowney G WallaceS Reddi VJ Hazelwood K (2005) Pin building customizedprogram analysis tools with dynamic instrumentation In ACMsigplan notices vol 40 ACM pp 190ndash200
33 Masti RJ Rai D Ranganathan A Muller C Thiele L Capkun S(2015) Thermal covert channels on multi-core platforms In 24thUSENIX security symposium (USENIX security 15) pp 865ndash880
34 Osvik DA Shamir A Tromer E (2006) Cache attacks andcountermeasures the case of AES In Cryptographersrsquo track atthe RSA conference Springer pp 1ndash20
35 Patel A Afram F Chen S Ghose K (2011) MARSS Afull system simulator for multicore x86 CPUs In 2011 48thACMEDACIEEE design automation conference (DAC) IEEEpp 1050ndash1055
36 Percival C (2005) Cache missing for fun and profit BSDCan 2005Ottawa 2005 httpwwwdaemonologynetpapershttpdf
37 Hammarlund P Martinez AJ Bajwa AA Hill DL Hallnor EJiang H Dixon M Derr M Hunsaker M Kumar R Osborne RB(2014) Haswell The fourth-generation intel core processor IEEEMicro 1034(2)6ndash20
38 Qureshi MK (2018) CEASER mitigating conflict-based cacheattacks via encrypted-address and remapping In 2018 51Stannual IEEEACM international symposium on microarchitecture(MICRO) IEEE pp 775ndash787
39 Sanchez D Kozyrakis C (2013) ZSIm fast and accuratemicroarchitectural simulation of thousand-core systems In ACMSIGARCH computer architecture news vol 41 ACM pp 475ndash486
40 Schwarz M Schwarzl M Lipp M Gruss D (2018) NetSpectreread arbitrary memory over network In European Symposium onResearch in Computer Security Springer pp 279ndash299
41 Seznec A (1993) A case for two-way skewed-associative cachesACM SIGARCH computer architecture news 21(2)169ndash178
42 Sharkey J Ponomarev D Ghose K (2005) M-sim a flexiblemultithreaded architectural simulation environment Techenicalreport Department of Computer Science State University ofNew York at Binghamton In Technical Report 20012 CompaqComputer Corporation Aug 2001
J Hardw Syst Secur (2019) 3397ndash425424
43 Shivakumar P Jouppi NP (2001) Cacti 30 an integrated cachetiming power and area model In Technical Report 20012Compaq Computer Corporation Aug 2001
44 Taylor G Davies P Farmwald M (1990) The TLB slice-alow-cost high-speed address translation mechanism In 17thannual international symposium on computer architecture 1990Proceedings IEEE pp 355ndash363
45 Thoziyoor S Muralimanohar N Ahn JH Jouppi NP (2008)CACTI 51 Technical Report HPL-2008-20 HP Labs
46 Trippel C Lustig D Martonosi M (2018) MeltdownPrimeand SpectrePrime automatically-synthesized attacks exploitinginvalidation-based coherence protocols arXiv180203802
47 Wang S Wang P Liu X Zhang D Wu D (2017) CacheDidentifying cache-based timing channels in production softwareIn 26th USENIX security symposium USENIX association
48 Wang Y Ferraiuolo A Zhang D Myers AC Suh GE (2016)SecDCP secure dynamic cache partitioning for efficient timingchannel protection In 2016 53nd ACMEDACIEEE designautomation conference (DAC) IEEE pp 1ndash6
49 Wang Z Lee RB (2007) New cache designs for thwarting softwarecache-based side channel attacks In ACM SIGARCH computerarchitecture news vol 35 ACM pp 494ndash505
50 Wang Z Lee RB (2008) A novel cache architecture withenhanced performance and security In 2008 41st IEEEACMinternational symposium on Microarchitecture 2008 MICRO-41IEEE pp 83ndash93
51 Werner M Unterluggauer T Giner L Schwarz M Gruss D Man-gard S (2019) ScatterCache thwarting cache attacks via cache setrandomization In 28th USENIX security symposium (USENIXSecurity 19) USENIX Association Santa Clara CA httpswwwusenixorgconferenceusenixsecurity19presentationwerner
52 Yan M Choi J Skarlatos D Morrison A Fletcher C Torrellas J(2018) InvisiSpec making speculative execution invisible in thecache hierarchy In 2018 51st annual IEEEACM internationalsymposium on microarchitecture (MICRO) IEEE pp 428ndash441
53 Yan M Gopireddy B Shull T Torrellas J (2017) Secure hierarchy-aware cache replacement policy (SHARP) defending againstcache-based side channel attacks In Proceedings of the 44thannual international symposium on computer architecture ACMpp 347ndash360
54 Yao F Doroslovacki M Venkataramani G (2018) Are coherenceprotocol states vulnerable to information leakage In 2018IEEE international symposium on high performance computerarchitecture (HPCA) IEEE pp 168ndash179
55 Yarom Y Falkner K (2014) FLUSH+ RELOAD a high resolutionlow noise L3 cache side-channel attack In USENIX securitysymposium pp 719ndash732
56 Zhang D Askarov A Myers AC (2012) Language-based controland mitigation of timing channels ACM SIGPLAN Not 47(6)99ndash110
57 Zhang D Wang Y Suh GE Myers AC (2015) A hardware designlanguage for timing-sensitive information-flow security In ACMSIGARCH computer architecture news vol 43 ACM pp 503ndash516
58 Zhang S Wright A Bourgeat T Arvind A (2018) Composablebuilding blocks to open up processor design In 2018 51stannual IEEEACM international symposium on microarchitecture(MICRO) IEEE pp 68ndash81
59 Zhang T Lee RB (2014) New models of cache architecturescharacterizing information leakage from cache side channels InProceedings of the 30th annual computer security applicationsconference ACM pp 96ndash105
60 Zhang Y Parikh D Sankaranarayanan K Skadron K Stan M(2003) HotLeakage a temperature-aware model of subthresholdand gate leakage for architects University of Virginia Dept ofComputer Science Tech Report CS-2003 5
Publisherrsquos Note Springer Nature remains neutral with regard tojurisdictional claims in published maps and institutional affiliations
J Hardw Syst Secur (2019) 3397ndash425 425
different protection bit arbitrary data of a random cache setSprime will be evicted and D will be accessed without cachingIf they belong to different processes D will be stored in anevicted cache block of Sprime and mapping of S and Sprime will beswapped as well Otherwise the normal replacement policyis executed
Newcache Cache [31 50] dynamically randomizesmemory-to-cache mapping It introduced a ReMappingTable (RMT) and the mapping between memory addressesand this RMT is as in a direct mapped cache while themapping between the RMT and actual cache is fully asso-ciative The index bits of memory address are used to lookup entries in the RMT to find the cache block that shouldbe accessed It stores the most useful cache lines rather thanhold a fixed set of cache lines This index stored in RMTcombined with the process ID is used to look up the actualcache where each cache line is associated with its real indexand process ID Each cache block is also associated witha protection bit (P) to indicate if it is security critical Forcache replacement policy it is very similar to RP cacheCache hit needs both process ID and address to be the sameWhen cache miss happens to data D arbitrary data willbe evicted and D will be accessed without caching if theybelong to the same process but either one of their protec-tion bit is set If the evicted data and brought-in data havedifferent process IDs D will randomly replace a cache linesince it is fully associative in the actual cache Otherwisethe normal replacement policy for direct mapped cache isexecuted
Random Fill Cache [30] de-correlates cache fills withthe memory access using random filling technique Newinstructions used by applications in Random Fill cachecan control if the requested data belongs to a normalrequest or a random fill request Cache hits are processedas in normal cache For the security critical data accessesof the victim a Nof ill request is executed and therequested data access will be performed without cachingMeanwhile on a Random Fill request arbitrary datafrom the range of addresses will be brought into the cacheIn the paper [30] the authors show that random fill ofspatially near data does not hurt performance For otherprocessesrsquo memory accesses and normal victimrsquos memoryaccesses Normal request will be used to achieve normalreplacement policy Victim and attacker are able to removevictimrsquos own security critical data including using clf lush
instructions or cache coherence protocol since the flush willnot influence timing-based side-channel attack prevention(the random filling technique is used for this)
CEASER Cache [38] is able to mitigate conflict-basedLLC timingndashbased side-channel attacks using address
encryption and dynamic remapping CEASER cache doesnot differentiate whom the address belongs to and whetherthe address is security critical When memory access triesto modify the cache state the address will first be encryptedusing Low-Latency BlockCipher (LLBC) [6] which notonly randomizes the cache set it maps but also scatters theoriginal possibly ordered and location-intensive addressesto different cache sets decreasing the probability of conflictmisses The encryption and decryption can be done withintwo cycles using LLBC Furthermore the encryption keywill be periodically changed to avoid key reconstructionThe periodic re-keying will cause the address remapping todynamically change
SCATTER Cache [51] uses cache set randomization toprevent timing-based attacks It builds upon two ideas Firsta mapping function is used to translate memory addressand process information to cache set indices the mappingis different for each program or security domain Secondthe mapping function also calculates a different index foreach cache way in a similar way to the skewed associativecaches [41] The mapping function can be keyed hash orkeyed permutation derivation functionmdasha different key isused for different application or security domain resultingin a different mapping from address to cache sets for eachSoftware (eg the operating system) is responsible formanaging the security domains and process IDs which areused to differentiate the different software and assign itdifferent keys for the mapping For the hardware extensiona cryptographic primitive such as hashing and an indexdecoder for each scattered cache way is added SCATTERcache also stores the index bits of the physical addressto efficiently perform lookups and writebacks There isalso one bit per page-table entry added to allow the kernelto communicate with the user space for security domainidentification
Non Deterministic Cache [23] uses cache access delay torandomize the relation between cache block access andcache access timing There is no differentiation of datacaching between different process ID or whether the datais secure or not A per-cache-block counter records theinterval of its data activeness and is increased on eachglobal counter clock tick when the data is untouched Whenthe counter reaches a predefined value the correspondingcache line will be invalidated Non Deterministic Cacherandomly sets the local countersrsquo initial value that is lessthan the maximum value of the global counter In thiscase the cache delay is changed to be randomized Cachedelay interval controlled by this non-deterministic executioncan lead to different cache hit and miss statistics becausethe invalidation is determined by the randomized counterof each cache line and therefore de-correlates any cache
J Hardw Syst Secur (2019) 3397ndash425410
access time from the address being accessed However theperformance degradation is tremendous
5 Analysis of the Secure Caches
In this section we manually evaluate the effectiveness ofthe 18 secure caches [7 10 12 22 23 25 27 29 30 3848ndash53 56 57] We analyze how well the different cachescan protect against the 72 types of vulnerabilities definedin Tables 2 and 3 which cover all the possible Strong(according to the definition in Section 3) cache timingndashbased vulnerabilities Following the analysis discuss whattypes of secure caches and features are best suited fordefending different types of timing-based attacks
51 Effectiveness of the Secure Caches AgainstTiming-Based Attacks
Tables 4 and 5 list the result of our analysis of whichcaches can prevent which types of attacks Some cachesare able to prevent certain vulnerabilities denoted by acheckmark and green color in the table For exampleSPlowast cache can defend against Vu Ad Vu (slow)(one type of Evict + Time [34]) vulnerability For someother caches and vulnerabilities the cache is not able toprevent the vulnerabilities and it is indicated by times and redcolor For example SecDCP cache cannot defend againstVu Va Vu (slow) (one type of Bernsteinrsquos Attack [3])vulnerability
Each cache is analyzed for each type of vulnerabilitylisted in Tables 2 and 3 A cache is judged to be able toprevent a type of cache timingndashbased vulnerability in threecases
1 A cache can prevent a timing attack if the timing ofthe last step in a vulnerability is always constant andthe attacker can never observe fast and slow timingdifference for the given set of three steps For instancein a regular set-associative cache the Vd Vu Aa
(fast) (one type of Flush + Reload [55]) vulnerabilitywill allow the attacker to know that address a mapsto secret u when the attacker observes fast timingcompared with observing slow timing in the other casesHowever in case of the RP cache [49] will make thetiming of the last step to be always slow because RPcache does not allow data of different processes toderive cache hit between each other
2 A cache can prevent a timing attack if the timingof last step is randomized and cannot have originalcorresponding relation between victimrsquos behavior andattackerrsquos observation For instance Ad Vu Ainv
d (fast) (one type of Prime + Probe Invalidation)
vulnerability when executed on a normal set-associativecache will allow the attacker to know that the addressd has the same index with secret u when observing fasttiming compared with slow timing in the other casesHowever when executing this attacks on the RandomFill cache [30] for example a slow timing will notdetermine that u and d have the same index as thesecret since in Random Fill cache u would be accessedwithout caching and another random data would becached instead
3 A cache can prevent a timing attack if it disallowscertain steps from the three-step model to be executedthus prevents the corresponding vulnerability Forinstance when PL cache [49] preloads and locks thesecurity critical data in the cache vulnerabilities suchas Ad Vu V inv
d (slow) (one type of Prime + TimeInvalidation) will not be possible since a preloadedlocked security critical data will not allow Ad in Step 1to replace it In this case Ad cannot be in the cache sothis vulnerability cannot be triggered in PL cache
From the security perspective the entries of the securecache in Tables 4 and 5 should have as many green coloredcells as possible If a cache design has any red cells thenit cannot defend against that type of vulnerabilitymdashattackerusing the timing-based vulnerability that corresponds to thered cell can attack the system
The third column in Tables 4 and 5 shows a normalset associative cache which cannot defend against anytype of timing-based vulnerabilities Meanwhile the lastcolumn of Tables 4 and 5 shows the situation where thecache is fully disabled As is expected the timing-basedvulnerabilities are eliminated and timing-based attacks willnot succeed Disabling caches however has tremendousperformance penalty Similarly second-to-last columnshows Nondeterministic Cache which totally randomizescache access time It can defend all the attacks but againwill have a tremendous cost to security when the applicationis complex
For each of the entry that shows the effectivenessof a secure cache against a vulnerability there are tworesults listed Left one is for normal execution and theright one is for speculative execution Some secure cachessuch as InvisiSpec cache target timing-based channels inspeculative execution For most of the caches that do notdifferentiate speculative execution and normal executionthe two sub-columns for each cache are the same
6 Secure Cache Techniques
Among the secure cache designs presented in the priorsection there are three main techniques that the caches
J Hardw Syst Secur (2019) 3397ndash425 411
Table4
Exi
stin
gse
cure
cach
esrsquo
prot
ectio
nag
ains
tal
lpo
ssib
letim
ing-
base
dvu
lner
abili
ties
with
last
step
tobe
mem
ory
acce
ssre
late
dop
erat
ions
Sin
gle
ina
gree
nce
llm
eans
this
cach
eca
npr
even
tth
eco
rres
pond
ing
vuln
erab
ility
i
na
pink
cell
mea
nsth
isca
che
can
prev
ent
the
corr
espo
ndin
gvu
lner
abili
tyin
som
ede
gree
A
timesin
are
dce
llm
eans
this
cach
eca
nnot
prev
ent
this
vuln
erab
ility
Fur
ther
mor
efo
rea
chca
che
we
anal
yze
norm
alex
ecut
ion
(lef
tcol
umn
unde
rth
eca
che
nam
e)an
dsp
ecul
ativ
eex
ecut
ion
(rig
htco
lum
nun
der
the
cach
ena
me)
a Dyn
amic
adju
stm
ento
fw
ays
for
diff
eren
tthr
eads
isas
sum
edto
bepr
oper
lyus
edac
cord
ing
toth
eru
nnin
gpr
ogra
mrsquos
cach
eus
age
bSo
me
soft
war
eas
sum
ptio
nslis
ted
inth
een
trie
sin
this
colu
mn
have
been
impl
emen
ted
byth
eca
chersquo
sre
late
dso
ftw
are
c Flus
his
disa
bled
but
cach
eco
here
nce
mig
htbe
used
todo
the
data
rem
oval
dFo
rL
1ca
che
and
TL
Bf
lush
ing
isdo
nedu
ring
cont
exts
witc
he T
hete
chni
ques
are
impl
emen
ted
inL
1ca
che
TL
Ban
dla
st-l
evel
cach
ew
hich
cons
isto
fth
ew
hole
cach
ehi
erar
chy
whe
reL
1ca
che
and
TL
Bre
quir
eso
ftw
are
flus
hpr
otec
tion
and
the
last
-lev
elca
che
can
beac
hiev
edby
sim
ple
hard
war
epa
rtiti
onin
gTo
prot
ecta
llle
vels
ofca
ches
the
soft
war
eas
sum
ptio
nsne
edto
bead
ded
f The
tech
niqu
eis
now
only
impl
emen
ted
inla
st-l
evel
cach
egT
hete
chni
que
now
only
targ
ets
shar
edca
che
hT
hete
chni
que
only
targ
ets
incl
usio
nla
st-l
evel
cach
ei T
hete
chni
que
targ
ets
data
cach
ehi
erar
chy
j For
the
last
-lev
elca
che
cach
eis
part
ition
edbe
twee
nth
evi
ctim
and
the
atta
cker
kT
hete
chni
que
can
cont
rolt
hepr
obab
ilitie
sof
the
vuln
erab
ility
tobe
succ
essf
ulto
beex
trem
ely
smal
ll T
hete
chni
que
can
wor
kin
shar
edr
ead
only
mem
ory
whi
leno
twor
king
insh
ared
wri
tabl
em
emor
ym
Ran
dom
dela
ybu
tnot
rand
omm
appi
ngca
non
lyde
crea
seth
epr
obab
ilitie
sof
atta
cker
inso
me
limite
dde
gree
J Hardw Syst Secur (2019) 3397ndash425412
Table5
Exi
stin
gse
cure
cach
esrsquop
rote
ctio
nag
ains
tall
poss
ible
timin
g-ba
sed
vuln
erab
ilitie
sw
ithla
stst
epto
bein
valid
atio
nre
late
dop
erat
ions
Sin
gle
ina
gree
nce
llm
eans
this
cach
eca
npr
even
tth
eco
rres
pond
ing
vuln
erab
ility
in
api
nkce
llm
eans
this
cach
eca
npr
even
tthe
corr
espo
ndin
gvu
lner
abili
tyin
som
ede
gree
Atimes
ina
red
cell
mea
nsth
isca
che
cann
otpr
even
tthi
svu
lner
abili
ty
Furt
herm
ore
for
each
cach
ew
ean
alyz
eno
rmal
exec
utio
n(l
eftc
olum
nun
der
the
cach
ena
me)
and
spec
ulat
ive
exec
utio
n(r
ight
colu
mn
unde
rth
eca
che
nam
e)
a Dyn
amic
adju
stm
ento
fw
ays
for
diff
eren
tthr
eads
isas
sum
edto
bepr
oper
lyus
edac
cord
ing
toth
eru
nnin
gpr
ogra
mrsquos
cach
eus
age
bSo
me
soft
war
eas
sum
ptio
nslis
ted
inth
een
trie
sin
this
colu
mn
have
been
impl
emen
ted
byth
eca
chersquo
sre
late
dso
ftw
are
c Flus
his
disa
bled
but
cach
eco
here
nce
mig
htbe
used
todo
the
data
rem
oval
dFo
rL
1ca
che
and
TL
Bf
lush
ing
isdo
nedu
ring
cont
exts
witc
he T
hete
chni
ques
are
impl
emen
ted
inL
1ca
che
TL
Ban
dla
st-l
evel
cach
ew
hich
cons
isto
fth
ew
hole
cach
ehi
erar
chy
whe
reL
1ca
che
and
TL
Bre
quir
eso
ftw
are
flus
hpr
otec
tion
and
the
last
-lev
elca
che
can
beac
hiev
edby
sim
ple
hard
war
epa
rtiti
onin
gTo
prot
ecta
llle
vels
ofca
ches
the
soft
war
eas
sum
ptio
nsne
edto
bead
ded
f The
tech
niqu
eis
now
only
impl
emen
ted
inla
st-l
evel
cach
egT
hete
chni
que
now
only
targ
ets
shar
edca
che
hT
hete
chni
que
only
targ
ets
incl
usio
nla
st-l
evel
cach
ei T
hete
chni
que
targ
ets
data
cach
ehi
erar
chy
j For
the
last
-lev
elca
che
cach
eis
part
ition
edbe
twee
nth
evi
ctim
and
the
atta
cker
kT
hete
chni
que
can
cont
rolt
hepr
obab
ilitie
sof
the
vuln
erab
ility
tobe
succ
essf
ulto
beex
trem
ely
smal
ll T
hete
chni
que
can
wor
kin
shar
edr
ead
only
mem
ory
whi
leno
twor
king
insh
ared
wri
tabl
em
emor
ym
Ran
dom
dela
ybu
tnot
rand
omm
appi
ngca
non
lyde
crea
seth
epr
obab
ilitie
sof
atta
cker
inso
me
limite
dde
gree
J Hardw Syst Secur (2019) 3397ndash425 413
utilize differentiating sensitive data partitioning andrandomization
Differentiating Sensitive Data (columns for CATalyst cacheto columns for Random Fill cache in Tables 4 and 5) allowsthe victim or attacker software or management software toexplicitly label a certain range of the data of victim whichthey think is sensitive The victim process or managementsoftware is able to use cache-specific instructions to protectthe data and limit internal interference between victimrsquosown data For example it is possible to disable victimrsquosown flushing of victimrsquos labeled data and therefore preventvulnerabilities that leverage flushing This technique allowsthe designer to have stronger control over security criticaldata rather than forcing the system to assume all of victimrsquosdata is sensitive However how to identify sensitive dataand whether this identification process is reliable are openresearch questions for caches that support differentiation ofsensitive data
This technique is independent of whether a cache usespartitioning or randomization techniques to eliminate sidechannels between the attacker and the victim Caches thatare able to label and identify sensitive data have theadvantage in preventing internal interference since they areable to differentiate sensitive data from the normal dataand can make use of special instructions to give moreprivileges to sensitive data However it requires careful usewhen identifying the actual sensitive data and implementingcorresponding security features on the cache
Comparing PL cache with SPlowast cache although bothof them use partitioning flush is able to be implementedto be disabled for victimrsquos sensitive data in PL cachewhere Vu V inv
a Vu (slow) (one type of Flush +Time) is prevented Newcache is able to prevent Vu Va Vu (slow) (one type of Bernsteinrsquos Attack [3])while most of the caches without ability to differentiatesensitive data cannot because Newcache disallows replacingdata as long as either data to be evicted or data to becached is identified to be sensitive However permittingdifferentiation of sensitive data can potentially backfire onthe cache itself For example Random Fill cache cannotprevent Vu Ad Vu (slow) (one type of Evict +Time [34]) which most of the other caches can prevent oravoid because the random fill technique loses its intendedrandom behavior when the security critical data is initiallyloaded into the cache in Step 1
Partitioning-Based Caches usually limit the victim and theattacker to be able to only access a limited set of cacheblock (columns for SPlowast cache to column for PL cachein Tables 4 and 5) For example either there is staticor dynamic partitioning of caches which allocates someblocks to High victim and Low attacker The partitioning
can be based not just on whether the memory access isvictimrsquos or attackerrsquos but also on where the access is to(eg High partition is determined by the data address) Forspeculative execution attackerrsquos code can be the part ofspeculation or out-of-order load or store which is able to bepartitioned (eg using speculative load buffer) from othernormal operations The partitioning granularity can be cachesets cache lines or cache ways Partitioning-based securecaches are usually able to prevent external interference bypartitioning but are weak at preventing internal interferenceWhen partitioning is used interference between the attackerand the victim or data belonging to different securitylevels should not be possible and attacks based on externalinterference between the victim and the attacker will failHowever the internal interference of victimrsquos own datais hard to be prevented by the partitioning-based cachesWhatrsquos more partitioning is recognized to be wastefulin terms of cache space and inherently degrades systemperformance [49] Dynamic partitioning can help limit thenegative performance and space impacts but it could be ata cost of revealing some information when adjusting thepartitioning size for each part It also does not help withinternal interference prevention
In terms of the three-step model the partitioning-basedcaches excel at making use of partitioning techniques todisallow the attacker to set initial states (Step 0) of victimpartition by use of flushing or eviction and therefore bringuncertainty to the final timing observation made by theattacker
SPlowast cache can prevent external miss-based interferencebut it still allows the victim and the attacker to getcache hits due to each otherrsquos data which makes hit-based vulnerabilities happen eg Vd Vu Va (fast)(one type of Cache Internal Collision [5]) vulnerabilityis one of the examples that SPlowast cache cannot preventSecVerilog cache is similar to SPlowast cache but prevents theattacker from directly getting cache hit due to victimrsquos datafor confidentiality and therefore prevents vulnerabilitiessuch as Ainv
a Vu Aa (fast) (one type of Flush+ Reload [55]) SHARP cache mainly uses partitioningcombined with random eviction to minimize the probabilityof evicting victimrsquos data and prevent external miss-basedvulnerabilities It is vulnerable to hit-based or internalinterference vulnerabilities such as Vu Va Vu (slow)(one type of Bernsteinrsquos Attack [3]) vulnerability DAWGcache will only allow the data to get a cache hit if bothits address and the process ID are the same Thereforecompared with normal partitioning cache such as SPlowast cacheit is able to prevent vulnerabilities such as Vd Vu Ainv
d
(fast) (one type of Prime + Flush)SecDCP and NoMo cache both leverage dynamic
partitioning to improve performance Compared withSecVerilog cache SecDCP cache introduces certain side
J Hardw Syst Secur (2019) 3397ndash425414
channels which manifest themselves when the number ofways assigned to the victim and attacker changes egVu Ainv
a Vu (slow) (one type of Flush + Time)vulnerability NoMo cache behaves more carefully whenchanging the number of ways during dynamic partitioninghowever it requires victimrsquos sensitive data to fit intothe assigned partitions otherwise it will be put into theunreserved way and allow eviction by the attacker SecDCPdoes not have unreserved way All the space in the cachewill be either belongs to High or Low partition
Sanctum cache and CATalyst cache are both controlledby a powerful software monitor and they disallow securepage sharing between victim and attacker to preventvulnerabilities such as Ad Vu Aa (fast) (onetype of Flush + Reload [55]) Sanctum cache does notconsider internal interference while CATalyst cache ismore carefully designed to prevent different vulnerabilitieswith the implemented software system so far supportingpreventing all of the vulnerabilities but only works forLLC and with high software implementation complexityand some assumptions that might be hard to achievein other scenarios eg assuming the secure partition isbig enough to fit all the secure data MI6 cache is thecombination of Sanctum and disabling speculation wheninteracting with the outside world Therefore in normalexecution it behaves the same as Sanctum For speculativeexecution because it will simply disable all the speculationwhen involving the outside world the external interferencevulnerability such as Vd Vu Ad (slow) (one type ofEvict + Probe) vulnerability will be prevented
InvisiSpec cache does not modify the original cachestate but places the data in a speculative buffer partitionduring the speculation or out-of-order load or store Sinceduring speculation cache state is not actually updated thespeculative execution cannot trigger any of the steps inthe three-step model RIC cache focuses on eviction basedattack and therefore are good at preventing even someinternal miss-based vulnerability such as Vu Va Vu
(slow) (one type of Bernsteinrsquos Attack [3]) but are bad atall hit-based vulnerabilities PL cache is line-partitionedand uses locking techniques for victimrsquos security criticaldata It can prevent many vulnerabilities because preloadingand locking secure data disallow the attacker or non-securevictim data to set initial states (Step 0) for victim partitionand therefore brings uncertainty to the final observation bythe attacker eg Ad Vu Va (fast) (one type of CacheInternal Collision [5]) vulnerability is prevented
Randomization-Based Caches (columns for SHARP cacheand columns for RP cache to columns for Non Deterministiccache in Tables 4 and 5) inherently de-correlate therelationship between information of victimrsquos securitycritical datarsquos address and observed timing from cache
hit or miss or between the address and observed timingof flush or cache coherence operations For speculativeexecution they also de-correlate the relationship betweenthe address of the data being accessed during speculativeexecution or out-of-order load or store and the observedtiming from a cache hit or miss Randomization can beused when bringing data into the cache evicting data orboth Some designs randomize the address to cache setmapping As a result of the randomization the mutualinformation from the observed timing due to having ornot having data in the cache could be reduced to 0 ifrandomization is done on every memory access Somesecure caches use randomization to avoid many of the miss-based internal interference vulnerabilities However theymay still suffer from hit-based vulnerabilities especiallywhen the vulnerabilities are related to internal interferenceHowever randomization is also likewise recognized toincrease performance overheads [23] It also requires afast and secure random number generator Most of therandomization is cache-line-based and can be combinedwith differentiation of sensitive data to be more efficient
RP cache allows eviction between different sensitivedata which leaves vulnerabilities such as Vu Va Vu
(slow) (one type of Bernsteinrsquos Attack [3]) still possiblewhile Newcache prevents this Both of the RP cacheand Newcache are not able to prevent hit-based internal-interference vulnerabilities such as Ainv
a Vu Va
(fast) (one type of Cache Internal Collision [5]) RandomFill cache is able to use total de-correlation of memoryaccess and cache access of victimrsquos security critical datato prevent most of the internal and external interferenceHowever when security critical data is initially directlyloaded into the cache block for Step 1 Random Fill cachewill not randomly load security critical data and allowsvulnerabilities such as Vu V inv
a Vu (slow) (one typeof Flush + Time) vulnerability to exist CEASER cache usesencryption scheme plus dynamic remapping to randomizemapping from memory addresses to cache sets Howeverthis targets eviction-based attacks and cannot prevent hit-based vulnerabilities such as Va V inv
u V inva (fast)
(one type of Flush + Probe Invalidation) SCATTER cacheencrypts both the cache address and process ID whenmapping into different cache index to further prevent morehit-based vulnerabilities for shared and read only memoryNon Deterministic Cache totally randomizes timing ofcache accesses by adding delays and can prevent all attacks(but at tremendous performance cost)
61 Estimated Performance and Security Tradeoffs
Table 6 shows the implementation and performanceresults of the secure caches as listed by the designersin the different papers At the extreme end there is
J Hardw Syst Secur (2019) 3397ndash425 415
Table6
Exi
stin
gse
cure
cach
esrsquo
impl
emen
tatio
nm
etho
dpe
rfor
man
cep
ower
and
area
sum
mar
y
J Hardw Syst Secur (2019) 3397ndash425416
the Non Deterministic cache with random delay thesecure cache can prevent all the cache timingndashbasedvulnerabilities in some degreemdashwhile their paper reportsonly 7 degradation in performance we expect it tobe much more for more complex application than AESalgorithm Disabling caches eliminates the attacks but ata huge performance cost Normally a secure cache needsto sacrifice some performance in order to de-correlatememory access with the timing The secure caches thattend to be able to prevent more vulnerabilities usuallyhave weaker performance compared with other securecaches For example more security seems to imply lessperformance
62 Towards Ideal Secure Cache
Based on the above analysis a good secure cache shouldconsider all the 72 types of Strong vulnerabilities egexternal and internal interference and hit-based and miss-based vulnerabilities Considering all factors and basedon Tables 4 and 5 we have several suggestions andobservations for a secure cache design which can defendtiming-based attacks
bull Internal interference is important for caches to preventtiming-based attacks and is the weak point of most ofthe secure caches To prevent this the following threesubpoints should be considered
ndash Miss-based internal interference can be solvedby randomly evicting data to de-correlatememory access with timing information wheneither data to be evicted or data to be cachedis sensitive eg Newcache prevents Vu Va Vu (slow) (one type of BernsteinrsquosAttack [3]) vulnerability
ndash Hit-based internal interference can be solvedby randomly bringing data into the cache egRandom Fill cache prevents Ad Vu Va
(fast) (Cache Internal Collision) vulnerabilityndash To limit internal interference at lower perfor-
mance cost rather than simply assume all ofvictimrsquos data is sensitive it is better to differ-entiate real sensitive data from other data in thevictim code However identification of sen-sitive information needs to be carefully usedeg Random Fill cache is vulnerable to Vu Ad Vu (fast) (one type of Evict + Time [34])vulnerability which most of the secure cachesare able to prevent
bull Direct partitioning between the victim and the attackeralthough may hurt cache space utilization or perfor-mance is good at disallowing attacker to set known
initial state to victimrsquos partition and therefore pre-vents external interference Alternatively careful use ofrandomization can also prevent external interference
It should be noted that some cache designs only focuson certain levels eg CATalyst cache only works at thelast level cache In order to fully protect the whole cachesystem from timing-based attacks all levels of caches in thehierarchy should be protected with related security featuresFor example Sanctum is able to prevent all levels of cachesfrom L1 to last-level cache Consequently secure cachedesign needs to be realizable at all levels of the cachehierarchy
7 RelatedWork
There are a lot of existing attacks exploring timing-based cache channels eg [1 3 5 16ndash19 34 36 5455] Furthermore our recent paper [11] has summarizedcache timingndashbased side-channel vulnerabilities using athree-step model and inspired this work on checkingwhich vulnerability types are truly defeated by the securecaches in context of timing-based attacks In other workZhang and Lee [59] used finite-state machine to modelcache architectures and leveraged mutual information tomeasure potential side-channel leakage of the modeledcache architectures Meanwhile He and Lee [20] modeledinterference using probabilistic information flow graph andused attackerrsquos success probability to estimate differentcachesrsquo ability to defend against some cache timingndashbasedside-channel attacks However they did not explore allpossible vulnerabilities due to cache timingndashbased channels
There is also some other work focusing on cacheside-channel verification [13 14 47] Among theseCacheAudit [14] efficiently computes possible side-channelobservations using abstractions in a modular way Bit-level and arithmetic reasoning is used in [13] for memoryaccesses in the presence of dynamic memory allocationCacheD [47] detects potential cache differences at eachprogram point leveraging symbolic execution and constraintsolving
Hardware transactional memory has also been lever-aged to prevent timing-based cache side-channel attacks [815] Hardware transactional memory (HTM) is availableon modern commercial processors such as Intelrsquos Transac-tional Synchronization Extensions (TSX) Its main featureis to abort the transaction and roll back the modificationswhenever a cache block contained in the read set or writeset is evicted out of the cache In [15] HTM was com-bined with preloading strategy for code and data to preventFlush + Reload attacks in the local setting and Prime andProbe attacks in the cloud setting In [8] the software-level
J Hardw Syst Secur (2019) 3397ndash425 417
solution targets system calls page faults code refactoringand abort reasoning to eliminate not only Prime + ProbeFlush + Reload but also Evict + time and Cache Collisionattacks
8 Conclusion
This paper first proposed a new three-step model in orderto model all possible cache timing vulnerabilities It furtherprovided a cache three-step simulator and reduction rulesto derive effective vulnerabilities allowing us to find onesthat have not been exploited in literature With exhaustiveeffective vulnerability types listed this paper presentedanalysis of 18 secure processor cache designs with respectto how well they can defend against these timing-basedvulnerabilities Our work showed that vulnerabilities basedon internal interference of the victim application aredifficult to protect against and many secure cache designsfail in this We also provided a summary of secure processorcache features that could be integrated to make an idealsecure cache that is able to defend timing-based attacksOverall implementing a secure cache in a processor can be aviable alternative to defend timing-based attacks Howeverit requires design of an ideal secure cache or correction ofexisting secure cache designs to eliminate the few attacksthat they do not protect against
Acknowledgments This work was support in part by NSF grants1651945 and 1813797 and through Semiconductor Research Corpo-ration (SRC) under GRC Task 2844001
Appendix A Attack Strategies Descriptions
This appendix gives overview of the attack strategies shownin Tables 2 and 3 in Section 3 For each attack strategy anoverview of the three steps of the strategy is given Some ofthe strategies are similar and some may not be precise butwe keep and use the original names as they were assignedin prior work One advantage of our three-step model isthat it gives precise definition of each attack Neverthelessthe attack strategy names used before (and added by us forstrategies which did not have such names) may be useful torecall the attacksrsquo high-level operation
Cache Internal Collision In Step 1 cache blockrsquos data isinvalidated by flushing or eviction done by either theattacker or the victim Then the victim accesses secret datain Step 2 Finally the victim accesses data at a knownaddress in Step 3 if there is a cache hit then it reveals thatthere is an internal collision and leaks value of u
Flush + Reload In Step 1 either the attacker or the victiminvalidates the cache blockrsquos data by flushing or evictionThen the victim access secret data in Step 2 Finally theattacker tries to access some data in Step 2 using a knownaddress If a cache hit is observed then addresses from lasttwo steps are the same and the attacker learns the secretaddress This strategy has similar Step 1 and Step 2 asCache Internal Collision vulnerability but for Step 3 it isthe attacker who does the reload access
Reload + Time (New Name Assigned in this Paper) In Step
1 secret data is invalidated by the victim Then the attackerdoes some known data access in Step 2 that could possiblybring back the invalidated the victimrsquos secret data in Step
1 In Step 3 if the victim reloads the secret data a cachehit is observed and the attacker can derive the secret datarsquosaddress
Flush + Probe (New Name Assigned in this Paper) In Step
1 the victim or the attacker access some known addressIn Step 2 the victim invalidates secret data In Step 3reloading of Step 1rsquos data and observation of a cache misswill help the attacker learn that the secret data maps to theknown address from Step 1
Evict + Time In Step 1 some victimrsquos secret data is put intothe cache by the victim itself In Step 2 the attacker evicts aspecific cache set by performing a memory related operationthat is not a flush In Step 3 the victim reloads secret dataand if a cache miss is observed the will learn the secretdatarsquos cache set information This attack has similar Step 1and Step 3 as Flush + Time vulnerability but for Step 2 inEvict + Time the attacker invalidates some known addressallowing it to find the full address of the secret data insteadof evicting a cache set to only find the secret datarsquos cacheindex as in the Flush + Time attack
Prime + Probe In Step 1 the attacker primes the cache setusing data at address known to the attacker In Step 2 thevictim accesses the secret data which possibly evicts datafrom Step 1 In Step 3 the attacker probes each cache setand if a cache miss is observed the attacker knowns thesecret data maps to the cache set he or she primed
Bernsteinrsquos Attack This attack strategy leverages the vic-timrsquos internal interference to trigger the miss-based attackFor one case the victim does the same secret data access inStep 1 and Step 3 while in Step 2 the victim tries to evictone whole cache setrsquos data by known data accesses If cachemiss is observed in Step 3 that will tell the attacker thecache set is the one secret data maps to For another casethe victim primes and probe a cache set in Step 1 and Step
3 driven by the attacker while in Step 2 the victim tries to
J Hardw Syst Secur (2019) 3397ndash425418
access the secret data Similar to the first case observingcache miss in Step 3 tells the attacker the cache set is theone secret data maps to
Evict + Probe (NewName Assigned in this Paper) In Step 1victim evict the cache set using the access to a data at anaddress known to the attacker In Step 2 the victim accessessecret data which possibly evicts data from Step 1 In Step
3 the attacker probes each cache set using the same data asin Step 1 if a cache miss is observed the attacker knownsthe secret data maps to the cache set he or she primed Thisattack strategy has similar Step 2 and Step 3 as Prime +Probe attack but for Step 1 it is the victim that does theeviction accesses
Prime + Time (NewName Assigned in this Paper) In Step 1the attacker primes the cache set using access to data at anaddress known to the attacker In Step 2 the victim accessessecret data which possibly evicts data from Step 1 In Step
3 the victim probes each cache set using the same data Step
1 if a cache miss is observed the attacker knowns the secretdata maps to the cache set he or she primed in Step 1 Thisattack strategy has similar Step 1 and Step 2 as Prime +Probe attack but for Step 3 it is the victim that does theprobing accesses
Flush + Time (NewNameAssigned in this Paper) The victimaccesses the same secret data in Step 1 and Step 3 whilein Step 2 the attacker tries to invalidate data at a knownaddress If cache miss is observed in Step 3 that will tellthe attacker the data address he or she invalidated in Step 2maps to the secret data
Invalidation related (new names assigned in this paper)Vulnerabilities that have names ending with ldquoinvalidationrdquoin Table 3 correspond to the vulnerabilities that havethe same name (except for the ldquoinvalidationrdquo part) inTable 2 The difference between each set of correspondingvulnerabilities is that the vulnerabilities ending withldquoinvalidationrdquo use invalidation related operation in the laststep to derive the timing information rather than the normalmemory access related operations
Appendix B Soundness Analysisof the Three-StepModel
In this section we analyze the soundness of the three-stepmodel to demonstrate that the three-step model can coverall possible timing-based cache vulnerabilities in normalcaches If there is a vulnerability that is represented usingmore than three steps the steps can be reduced to only three
steps or a three-step sub-pattern can be found in the longerrepresentation
In the below analysis we use β to denote the numberof memory-related operations ie steps in a representationof a vulnerability We show that β = 1 is not sufficient torepresent a vulnerability β = 2 covers some vulnerabilitiesbut not all β = 3 represents all the vulnerabilities andβ gt 3 can be reduced to only three steps or a three-step sub-pattern can be found in the longer representation Knownaddresses refer to all the cache states that interferencewith the data a aalias and d Unknown address refersu An access to a known memory address is denoted asknown access operation and an invalidation of a knownmemory address is denoted as known inv operationThe known access operation and known inv operation
together make up not u operations An unknown memoryrelated operation (containing u) is denoted as u operation
B1 Patterns with β = 1
When β = 1 there is only one memory-relatedoperation and it is not possible to create interferencebetween memory-related operations since two memory-related operations are the minimum requirement for aninterference Furthermore β = 1 corresponds to the three-step pattern with both Step 1 and Step 2 being since thecache state gives no information and Step 3 being theone operation These types of patterns are all examined bythe cache three-step simulator and none of these types arefound to be effective Consequently a vulnerability cannotexit when β = 1
B2 Patterns with β = 2
When β = 2 it satisfies the minimum requirementof an interference for memory related operations andcorresponds to the three-step cases where Step 1 is and Step 2 and Step 3 are the two operations Thesetypes are all examined by the cache three-step simulatorand some of them belong to Weak Vulnerabilities like Aa Vu Therefore three-step cases where Step
1 is have corresponding effective vulnerabilities shownin Table 2 Consequently β = 2 can represent someweak vulnerabilities but not all vulnerabilities as there existsome that are represented with three steps as discussednext
B3 Patterns with β = 3
When β = 3 we have tested all possible combinations ofthree-step memory related operations in Section 33 usingour cache simulator for the three-step model We foundthat there are in total 72 types of Strong Vulnerabilities
J Hardw Syst Secur (2019) 3397ndash425 419
and 64 types of Weak Vulnerabilities that are representedby patterns with β = 3 steps Consequently β = 3can represent all the vulnerabilities (including some weakones where Step 1 is ) Using more steps to representvulnerabilities is not necessary as discussed next
B4 Patterns with β gt 3
When β gt 3 the pattern of memory-related operations fora vulnerability can be reduced using the following rules
B41 Subdivision Rules
First a set of subdivision rules is used to divide the longpattern into shorter patterns following the below rulesEach subdivision rule should be applied recursively beforeapplying the next rule
Subdivision Rule 1 If the longer pattern contains a sub-pattern such as the longer pattern canbe divided into two separate patterns where is assignedas Step 1 of the second pattern This is because givesno timing information and the attacker loses track of thecache state after This rule should be recursively applieduntil there are no sub-patterns left with a in the middleor as last step ( in the last step will be deleted) in thelonger pattern
Subdivision Rule 2 Next if a pattern (derived afterrecursive application of the Rule 1 contains a sub-patternsuch as AinvVinv the longer pattern canbe divided into two separate patterns where AinvVinv isassigned as Step 1 of the second pattern This is becauseAinvVinv will flush all the timing information of thecurrent block and it can be used as the flushing step forStep 1 eg vulnerability Ainv Vu Aa(f ast)shown in Table 2 AinvVinv cannot be a candidate formiddle steps or the last step because it will flush alltiming information making the attacker unable to deducethe final timing with victimrsquos sensitive address translationinformation This rule should be recursively applied untilthere are no sub-patterns left with a AinvVinv in themiddle or the last step (AinvVinv in the last step will bedeleted)
B42 Simplification Rules
For each of the patterns resulting from the subdivisionof the original pattern we define Commute Rules UnionRules and Reduction Rules for a each set of two adjacentsteps in these remaining patterns In Table 7 we showall the possible cases of the rule applying conditions for
each adjacent two steps regardless of the attackerrsquos access(A) or the victimrsquos access (V ) The table shows whetherthe corresponding two steps can be commuted reduced orunioned (and the reduced or the unioned result if the rulescan be applied)
B421 Commute Rules
Suppose there are two adjacent steps M and N for a memorysequences M N If commuting M andN lead to the same observation result ie M N and N M will have thesame timing observation information in the final step for theattacker we can freely exchange the place of M and N inthis pattern In this case we have more chance to Reduceand Union the steps within the memory sequence by thefollowing Rules In the possible commuting process we willtry every possible combinations to commute different pairsof two steps that are able to apply the Commute Rules andthen further apply Reduce Rules and Union Rules to seewhether the commute is effective ie there can be stepsreduced or unioned after the proper commuting process Thefollowing two adjacent memory-related operations can becommuted
ndash Commute Rule 1 For two adjacent steps if one stepis a known access operation and another step is aknown inv operation and the addresses they referto are different these two steps can be commuted nomatter which position of the two steps they are in withinthe whole memory sequence It will show a ldquoyesrdquo forthe corresponding two-step pattern for the CommuteRule 1 column if these two can be commuted in Table 7
ndash Commute Rule 2 A superset of two-step patterns thatcan apply Commute Rule 1 can be commuted if thesecond step of these two adjacent steps is not thelast step in the whole memory sequence There aresome two adjacent steps that can only be commutedif the second step of these two adjacent steps is notthe last step in the whole memory sequence Therewill be a ldquoyesrdquo for the corresponding two-step patternfor the Commute Rule 2 column and a ldquonordquo for thecorresponding two-step pattern for the Commute Rule 1column in Table 7
B422 Reduction Rules
If the memory sequence after applying Commute Ruleshave a sub-pattern that has two adjacent steps both relatedto known addresses or both related to unknown address(including repeating states) the two adjacent steps can bereduced to only one following the reduction rules (if thetwo-step pattern has ldquoyesrdquo for the Column ldquoUnion Rule or
J Hardw Syst Secur (2019) 3397ndash425420
Table7
Rul
esfo
rco
mbi
ning
two
adja
cent
step
s
Firs
tSe
cond
Com
mute
Com
mute
Union
Rule
orC
ombi
ned
Firs
tSe
cond
Com
mute
Com
mute
Union
Rule
orC
ombi
ned
Rule1
Rule2
ReduceRule
Step
Rule1
Rule2
ReduceRule
Step
aa
yes
yes
yes
aa
inv
ano
noye
sa
aa
ali
as
nono
yes
aali
as
ain
va
ali
as
yes
yes
yes
aali
as
ad
nono
yes
da
inv
dye
sye
sye
sd
au
nono
nominus
ain
vu
nono
nominus
aa
inv
nono
yes
ain
va
inv
ain
vye
sye
sye
sa
inv
aa
ali
asin
vye
sye
sye
sa
ain
va
ali
asin
vye
sye
sye
sU
nio
n(a
inva
ali
asin
v)
ad
inv
yes
yes
yes
aa
inv
din
vye
sye
sye
sU
nio
n(a
invd
inv)
au
inv
nono
nominus
ain
vu
inv
noye
sno
minusa
ali
as
ano
noye
sa
aali
asin
va
yes
yes
yes
a
aali
as
aali
as
yes
yes
yes
aali
as
aali
asin
va
ali
as
nono
yes
aali
as
aali
as
dno
noye
sd
aali
asin
vd
yes
yes
yes
d
aali
as
uno
nono
minusa
ali
asin
vu
nono
nominus
aali
as
ain
vye
sye
sye
sa
ali
as
aali
asin
va
inv
yes
yes
yes
Unio
n(a
inva
ali
asin
v)
aali
as
aali
asin
vno
noye
sa
ali
asin
va
ali
asin
va
ali
asin
vye
sye
sye
sa
ali
asin
v
aali
as
din
vye
sye
sye
sa
ali
as
aali
asin
vd
inv
yes
yes
yes
Unio
n(d
inva
ali
asin
v)
aali
as
uin
vno
nono
minusa
ali
asin
vu
inv
noye
sno
minusd
ano
noye
sa
din
va
yes
yes
yes
a
da
ali
as
nono
yes
aali
as
din
va
ali
as
yes
yes
yes
aali
as
dd
yes
yes
yes
dd
inv
dno
noye
sd
du
nono
nominus
din
vu
noye
sno
minusd
ain
vye
sye
sye
sd
din
va
inv
yes
yes
yes
Unio
n(a
invd
inv)
da
ali
asin
vye
sye
sye
sd
din
va
ali
asin
vye
sye
sye
sU
nio
n(d
inva
ali
asin
v)
dd
inv
nono
yes
din
vd
inv
din
vye
sye
sye
sd
inv
du
inv
noye
sno
minusd
inv
uin
vno
yes
nominus
ua
nono
nominus
uin
va
nono
nominus
ua
ali
as
nono
nominus
uin
va
ali
as
nono
nominus
ud
nono
nominus
uin
vd
noye
sno
minusu
uye
sye
sye
su
uin
vu
nono
yes
u
ua
inv
nono
nominus
uin
va
inv
noye
sno
minusu
aali
asin
vno
nono
minusu
inv
aali
asin
vno
yes
nominus
ud
inv
noye
sno
minusu
inv
din
vno
yes
nominus
uu
inv
nono
yes
uin
vu
inv
uin
vye
sye
sye
su
inv
J Hardw Syst Secur (2019) 3397ndash425 421
Reduce Rulerdquo and has no Union result for the ldquoCombinedSteprdquo Column in Table 7
ndash Reduction Rule 1 For two u operations although u isunknown both of the accesses target on the same u socan be reduced to only keep the second access in thememory sequence
ndash Reduction Rule 2 For two known adjacent memoryaccessndashrelated operations (known access operation)they always result in a deterministic state of the secondmemory accessndashrelated cache block so these two stepscan be reduced to only one step
ndash Reduction Rule 3 For two adjacent steps if onestep is known access operation and another one isknown inv operation no matter what order theyhave and the address they refer to is the same these twocan be reduced to one step which is the second step
B423 Union Rules
Suppose there are two adjacent steps M and N for memorysequences M N If combining M andN leads to the same timing observation result ie M N and Union(M N) will havethe same timing observation information in the final step forthe attacker we can combine step M and N to be a joint onestep for this memory sequence defined as Union(M N)Two adjacent steps that can be combined are discussed inthe following cases
ndash Union Rule 1 Two invalidations to two knowndifferent memory addresses can be applied UnionRule 1 known inv operation are two operations bothinvalidating some known address therefore they canbe combined to only one step The Union Rule can becontinuously done to union all the adjacent invalidationstep that invalidates known different memory addresses
B424 Final Check Rules
Each long memory sequence will recursively apply thesethree categorizations of the rules in the order Com-mute Rules first to put known access operations andknown inv operation that targets the same address as nearas possible and u operations and not u operations areputting together as much as possible The Reduced Rulesare then checked and applied to the processed memorysequence to reduce the steps Then the Union Rule isapplied to the processed memory sequence
The recursion at each application to these threecategorizations of the rules should be always applied andreduce at least one step until the resulting sequence matchesone of the two possible cases
bull the long (β gt 3) memory sequence with u operation
and not u operation is further reduced to a sequencewhere there are at most three steps in the followingpatterns or less
ndash u operation not u operation u operation
ndash not u operation u operation not u operation
There might be possible extra or AinvV inv beforethese three-step pattern where
bull An extra in the first step will not influencethe result and can be directly removed
bull If an extra AinvV inv in the first step
ndash If followed by known access ope-ration AinvV inv can be removeddue to the actual state further put intothe cache block
ndash If followed by known inv opera-t ion or V inv
u AinvV inv can also beremoved since the memory locationis repeatedly flushed by the two steps
ndash If followed by Vu worst case will beAinvV inv Vu not u operation u operationwhich is either an effective vul-nerability according to Table 2and reduction rules shown inSection 33 or AinvV inv Vu Ainv
d V invd u operation where
Vu Ainvd V inv
d can further beapplied Commute Rule 2 to reduceand be within three steps
In this case the steps are finally within three steps andthe checking is done
bull There exist two adjacent steps that cannot be appliedany Rules above and requires the Rest Checking
The only left two adjacent steps that cannot be appliedby any of the three categorizations of the Rules are thefollowing
ndash AaVaAaalias Vaalias AdVdAinva V inv
a
Ainvaalias V inv
aalias Vu ndash AaVaAaalias Vaalias V inv
u ndash Vu AaVaAaalias Vaalias AdVd
Ainva V inv
a Ainvaalias V inv
aalias ndash V inv
u AaVaAaalias Vaalias We manually checked all of the two adjacent step patterns
above and found that adding extra step before or after these
J Hardw Syst Secur (2019) 3397ndash425422
Algorithm 2 -Step ( 3) pattern reduction
Input number of steps of the pattern
a two-dimensional dynamic-size array [0] contains the states of each step of the original pattern in
order [1] [2] are empty initially
Output reduced effective vulnerability pattern(s) array It will be an empty list if the original pattern does not
correspond to an effective vulnerability
1 = Oslash
2 while contain( ) and index not 0 do3 = 1 ( )
4 end while5 while ( contain( ) and index not 0) or ( contain( ) and index not 0) do6 = 2 ( )
7 end while8 while ( is ineffecitve or has interval effective three steps) do9 = ( )
10 = ( )
11 = ( )
12 if ( is ineffecitve or has interval effective three steps) then13 += ( )
14 end if15 end while16 return
two steps can either generate two adjacent step patterns thatbe processed by the three Rules where further step can bereduced or construct effective vulnerability according toTable 2 and reduction rules shown in Section 33 wherethe corresponding pattern can be treated effective and thechecking is done
B43 Algorithm for Reducing and CheckingMemorySequence
Algorithm 2 is used to (i) reduce a β-step (β gt 3)pattern to a three-step pattern thus demonstrating that thecorresponding β gt 3 step pattern actually is equivalent tothe output three-step pattern and represents a vulnerabilitythat is captured by an existing three-step pattern or (ii)demonstrate that the β-step pattern can be mapped to one ormore three-step vulnerabilities It is not possible for a β-stepvulnerability pattern to not be either (i) or (ii) after doing theRule applications Key outcome of our analysis is that anyβ-step pattern is not a vulnerability or if it is a vulnerabilityit maps to either outputs (i) or (ii) of the algorithm
Inside the Algorithm 2 contain() represents a function tocheck if a list contains a corresponding state is ineffective()represents a function that checks the corresponding memorysequence does not contain any effective three-stepshas interval effective three steps() represents a functionthat check if the corresponding memory sequence can bemapped to one or more three-step vulnerabilities
B44 Summary
In conclusion the three-step model can model all possibletiming-based cache vulnerability in normal caches Vulner-abilities which are represented by more than three steps canbe always reduced to one (or more) vulnerabilities from ourthree-step model and thus using more than three step is notnecessary
References
1 Acıicmez O Koc CK (2006) Trace-driven cache attacks on AES(short paper) In International conference on information andcommunications security Springer pp 112ndash121
2 Agrawal D Archambeault B Rao JR Rohatgi P (2002) TheEM side-channel (s) In International workshop on cryptographichardware and embedded systems Springer pp 29ndash45
3 Bernstein DJ (2005) Cache-timing attacks on AES4 Binkert N Beckmann B Black G Reinhardt SK Saidi A Basu
A Hestness J Hower DR Krishna T Sardashti S et al (2011)The gem5 simulator ACM SIGARCH computer architecture news39(2)1ndash7
5 Bonneau J Mironov I (2006) Cache-collision timing attacksagainst AES In International workshop on cryptographichardware and embedded systems Springer pp 201ndash215
6 Borghoff J Canteaut A Guneysu T Kavun EB Knezevic MKnudsen LR Leander G Nikov V Paar C Rechberger Cet al (2012) Princendasha low-latency block cipher for pervasivecomputing applications In International conference on the theoryand application of cryptology and information security Springerpp 208ndash225
J Hardw Syst Secur (2019) 3397ndash425 423
7 Bourgeat T Lebedev I Wright A Zhang S Devadas S et al (2019)MI6 Secure enclaves in a speculative out-of-order processorIn Proceedings of the 52nd Annual IEEEACM InternationalSymposium on Microarchitecture ACM pp 42ndash56
8 Chen S Liu F Mi Z Zhang Y Lee RB Chen H Wang X(2018) Leveraging hardware transactional memory for cache side-channel defenses In Proceedings of the 2018 on Asia conferenceon computer and communications security ACM pp 601ndash608
9 Clark SS Ransford B Rahmati A Guineau S Sorber J Xu WFu K (2013) Wattsupdoc power side channels to nonintrusivelydiscover untargeted malware on embedded medical devices InPresented as part of the 2013 USENIX workshop on healthinformation technologies
10 Costan V Lebedev IA Devadas S (2016) Sanctum minimalhardware extensions for strong software isolation In USENIXsecurity symposium pp 857ndash874
11 Deng S Xiong W Szefer J (2018) Cache timing side-channel vulnerability checking with computation tree logic InProceedings of the 7th international workshop on hardware andarchitectural support for security and privacy ACM p 2
12 Domnitser L Jaleel A Loew J Abu-Ghazaleh N Ponomarev D(2012) Non-monopolizable caches low-complexity mitigation ofcache side channel attacks ACM Transactions on Architectureand Code Optimization (TACO) 8(4)35
13 Doychev G Kopf B (2017) Rigorous analysis of softwarecountermeasures against cache attacks In Proceedings of the 38thACM SIGPLAN conference on programming language design andimplementation ACM pp 406ndash421
14 Doychev G Kopf B Mauborgne L Reineke J (2015) CacheAudita tool for the static analysis of cache side channels ACM Trans-actions on information and system security (TISSEC) 18(1)4
15 Gruss D Lettner J Schuster F Ohrimenko O Haller I CostaM (2017) Strong and efficient cache side-channel protectionusing hardware transactional memory In USENIX securitysymposium pp 217ndash233
16 Gruss D Maurice C Wagner K Mangard S (2016) Flush+ flusha fast and stealthy cache attack In International conference ondetection of intrusions and malware and vulnerability assessmentSpringer pp 279ndash299
17 Gruss D Spreitzer R Mangard S (2015) Cache template attacksautomating attacks on inclusive last-level caches In USENIXsecurity symposium pp 897ndash912
18 Guanciale R Nemati H Baumann C Dam M (2016) Cache stor-age channels alias-driven attacks and verified countermeasuresIn 2016 IEEE symposium on security and privacy (SP) IEEEpp 38ndash55
19 Gullasch D Bangerter E Krenn S (2011) Cache gamesndashbringingaccess-based cache attacks on AES to practice In 2011 IEEEsymposium on security and privacy (SP) IEEE pp 490ndash505
20 He Z Lee RB (2017) How secure is your cache against side-channel attacks In Proceedings of the 50th annual IEEEACMinternational symposium on microarchitecture ACM pp 341ndash353
21 Intel C (2015) Improving real-time performance by utilizing cacheallocation technology Intel Corporation
22 Kayaalp M Khasawneh KN Esfeden HA Elwell J Abu-Ghazaleh N Ponomarev D Jaleel A (2017) RIC relaxed inclusioncaches for mitigating LLC side-channel attacks In 2017 54thACMEDACIEEE design automation conference (DAC) IEEEpp 1ndash6
23 Keramidas G Antonopoulos A Serpanos DN Kaxiras S (2008)Non deterministic caches a simple and effective defense againstside channel attacks Des Autom Embed Syst 12(3)221ndash230
24 Kessler RE Hill MD (1992) Page placement algorithms for largerealindexed caches ACM Trans Comput Syst (TOCS) 10(4)338ndash359
25 Kiriansky V Lebedev I Amarasinghe S Devadas S Emer J(2018) DAWG a defense against cache timing attacks in spec-ulative execution processors In 2018 51st Annual IEEEACMinternational symposium on microarchitecture (MICRO) IEEEpp 974ndash987
26 Kocher P Genkin D Gruss D Haas W Hamburg M LippM Mangard S Prescher T Schwarz M Yarom Y (2018)Spectre attacks exploiting speculativeexecution In 2019 IEEESymposium on Security and Privacy (SP) IEEE pp 1ndash19
27 Lee RB Kwan P McGregor JP Dwoskin J Wang Z (2005)Architecture for protecting critical secrets in microprocessorsIn ACM SIGARCH computer architecture news vol 33 IEEEComputer Society pp 2ndash13
28 Lee Y Waterman A Avizienis R Cook H Sun C StojanovicV Asanovic K (2014) A 45nm 13 GHz 167 double-precisionGFLOPSW RISC-V processor with vector accelerators InEuropean solid state circuits conference (ESSCIRC) ESSCIRC2014-40th IEEE pp 199ndash202
29 Liu F Ge Q Yarom Y Mckeen F Rozas C Heiser G Lee RB(2016) CATalyst defeating last-level cache side channel attacks incloud computing In 2016 IEEE international symposium on highperformance computer architecture (HPCA) IEEE pp 406ndash418
30 Liu F Lee RB (2014) Random fill cache architecture In2014 47th annual IEEEACM international symposium onmicroarchitecture (MICRO) IEEE pp 203ndash215
31 Liu F Wu H Mai K Lee RB (2016) Newcache secure cachearchitecture thwarting cache side-channel attacks IEEE Micro36(5)8ndash16
32 Luk CK Cohn R Muth R Patil H Klauser A Lowney G WallaceS Reddi VJ Hazelwood K (2005) Pin building customizedprogram analysis tools with dynamic instrumentation In ACMsigplan notices vol 40 ACM pp 190ndash200
33 Masti RJ Rai D Ranganathan A Muller C Thiele L Capkun S(2015) Thermal covert channels on multi-core platforms In 24thUSENIX security symposium (USENIX security 15) pp 865ndash880
34 Osvik DA Shamir A Tromer E (2006) Cache attacks andcountermeasures the case of AES In Cryptographersrsquo track atthe RSA conference Springer pp 1ndash20
35 Patel A Afram F Chen S Ghose K (2011) MARSS Afull system simulator for multicore x86 CPUs In 2011 48thACMEDACIEEE design automation conference (DAC) IEEEpp 1050ndash1055
36 Percival C (2005) Cache missing for fun and profit BSDCan 2005Ottawa 2005 httpwwwdaemonologynetpapershttpdf
37 Hammarlund P Martinez AJ Bajwa AA Hill DL Hallnor EJiang H Dixon M Derr M Hunsaker M Kumar R Osborne RB(2014) Haswell The fourth-generation intel core processor IEEEMicro 1034(2)6ndash20
38 Qureshi MK (2018) CEASER mitigating conflict-based cacheattacks via encrypted-address and remapping In 2018 51Stannual IEEEACM international symposium on microarchitecture(MICRO) IEEE pp 775ndash787
39 Sanchez D Kozyrakis C (2013) ZSIm fast and accuratemicroarchitectural simulation of thousand-core systems In ACMSIGARCH computer architecture news vol 41 ACM pp 475ndash486
40 Schwarz M Schwarzl M Lipp M Gruss D (2018) NetSpectreread arbitrary memory over network In European Symposium onResearch in Computer Security Springer pp 279ndash299
41 Seznec A (1993) A case for two-way skewed-associative cachesACM SIGARCH computer architecture news 21(2)169ndash178
42 Sharkey J Ponomarev D Ghose K (2005) M-sim a flexiblemultithreaded architectural simulation environment Techenicalreport Department of Computer Science State University ofNew York at Binghamton In Technical Report 20012 CompaqComputer Corporation Aug 2001
J Hardw Syst Secur (2019) 3397ndash425424
43 Shivakumar P Jouppi NP (2001) Cacti 30 an integrated cachetiming power and area model In Technical Report 20012Compaq Computer Corporation Aug 2001
44 Taylor G Davies P Farmwald M (1990) The TLB slice-alow-cost high-speed address translation mechanism In 17thannual international symposium on computer architecture 1990Proceedings IEEE pp 355ndash363
45 Thoziyoor S Muralimanohar N Ahn JH Jouppi NP (2008)CACTI 51 Technical Report HPL-2008-20 HP Labs
46 Trippel C Lustig D Martonosi M (2018) MeltdownPrimeand SpectrePrime automatically-synthesized attacks exploitinginvalidation-based coherence protocols arXiv180203802
47 Wang S Wang P Liu X Zhang D Wu D (2017) CacheDidentifying cache-based timing channels in production softwareIn 26th USENIX security symposium USENIX association
48 Wang Y Ferraiuolo A Zhang D Myers AC Suh GE (2016)SecDCP secure dynamic cache partitioning for efficient timingchannel protection In 2016 53nd ACMEDACIEEE designautomation conference (DAC) IEEE pp 1ndash6
49 Wang Z Lee RB (2007) New cache designs for thwarting softwarecache-based side channel attacks In ACM SIGARCH computerarchitecture news vol 35 ACM pp 494ndash505
50 Wang Z Lee RB (2008) A novel cache architecture withenhanced performance and security In 2008 41st IEEEACMinternational symposium on Microarchitecture 2008 MICRO-41IEEE pp 83ndash93
51 Werner M Unterluggauer T Giner L Schwarz M Gruss D Man-gard S (2019) ScatterCache thwarting cache attacks via cache setrandomization In 28th USENIX security symposium (USENIXSecurity 19) USENIX Association Santa Clara CA httpswwwusenixorgconferenceusenixsecurity19presentationwerner
52 Yan M Choi J Skarlatos D Morrison A Fletcher C Torrellas J(2018) InvisiSpec making speculative execution invisible in thecache hierarchy In 2018 51st annual IEEEACM internationalsymposium on microarchitecture (MICRO) IEEE pp 428ndash441
53 Yan M Gopireddy B Shull T Torrellas J (2017) Secure hierarchy-aware cache replacement policy (SHARP) defending againstcache-based side channel attacks In Proceedings of the 44thannual international symposium on computer architecture ACMpp 347ndash360
54 Yao F Doroslovacki M Venkataramani G (2018) Are coherenceprotocol states vulnerable to information leakage In 2018IEEE international symposium on high performance computerarchitecture (HPCA) IEEE pp 168ndash179
55 Yarom Y Falkner K (2014) FLUSH+ RELOAD a high resolutionlow noise L3 cache side-channel attack In USENIX securitysymposium pp 719ndash732
56 Zhang D Askarov A Myers AC (2012) Language-based controland mitigation of timing channels ACM SIGPLAN Not 47(6)99ndash110
57 Zhang D Wang Y Suh GE Myers AC (2015) A hardware designlanguage for timing-sensitive information-flow security In ACMSIGARCH computer architecture news vol 43 ACM pp 503ndash516
58 Zhang S Wright A Bourgeat T Arvind A (2018) Composablebuilding blocks to open up processor design In 2018 51stannual IEEEACM international symposium on microarchitecture(MICRO) IEEE pp 68ndash81
59 Zhang T Lee RB (2014) New models of cache architecturescharacterizing information leakage from cache side channels InProceedings of the 30th annual computer security applicationsconference ACM pp 96ndash105
60 Zhang Y Parikh D Sankaranarayanan K Skadron K Stan M(2003) HotLeakage a temperature-aware model of subthresholdand gate leakage for architects University of Virginia Dept ofComputer Science Tech Report CS-2003 5
Publisherrsquos Note Springer Nature remains neutral with regard tojurisdictional claims in published maps and institutional affiliations
J Hardw Syst Secur (2019) 3397ndash425 425
access time from the address being accessed However theperformance degradation is tremendous
5 Analysis of the Secure Caches
In this section we manually evaluate the effectiveness ofthe 18 secure caches [7 10 12 22 23 25 27 29 30 3848ndash53 56 57] We analyze how well the different cachescan protect against the 72 types of vulnerabilities definedin Tables 2 and 3 which cover all the possible Strong(according to the definition in Section 3) cache timingndashbased vulnerabilities Following the analysis discuss whattypes of secure caches and features are best suited fordefending different types of timing-based attacks
51 Effectiveness of the Secure Caches AgainstTiming-Based Attacks
Tables 4 and 5 list the result of our analysis of whichcaches can prevent which types of attacks Some cachesare able to prevent certain vulnerabilities denoted by acheckmark and green color in the table For exampleSPlowast cache can defend against Vu Ad Vu (slow)(one type of Evict + Time [34]) vulnerability For someother caches and vulnerabilities the cache is not able toprevent the vulnerabilities and it is indicated by times and redcolor For example SecDCP cache cannot defend againstVu Va Vu (slow) (one type of Bernsteinrsquos Attack [3])vulnerability
Each cache is analyzed for each type of vulnerabilitylisted in Tables 2 and 3 A cache is judged to be able toprevent a type of cache timingndashbased vulnerability in threecases
1 A cache can prevent a timing attack if the timing ofthe last step in a vulnerability is always constant andthe attacker can never observe fast and slow timingdifference for the given set of three steps For instancein a regular set-associative cache the Vd Vu Aa
(fast) (one type of Flush + Reload [55]) vulnerabilitywill allow the attacker to know that address a mapsto secret u when the attacker observes fast timingcompared with observing slow timing in the other casesHowever in case of the RP cache [49] will make thetiming of the last step to be always slow because RPcache does not allow data of different processes toderive cache hit between each other
2 A cache can prevent a timing attack if the timingof last step is randomized and cannot have originalcorresponding relation between victimrsquos behavior andattackerrsquos observation For instance Ad Vu Ainv
d (fast) (one type of Prime + Probe Invalidation)
vulnerability when executed on a normal set-associativecache will allow the attacker to know that the addressd has the same index with secret u when observing fasttiming compared with slow timing in the other casesHowever when executing this attacks on the RandomFill cache [30] for example a slow timing will notdetermine that u and d have the same index as thesecret since in Random Fill cache u would be accessedwithout caching and another random data would becached instead
3 A cache can prevent a timing attack if it disallowscertain steps from the three-step model to be executedthus prevents the corresponding vulnerability Forinstance when PL cache [49] preloads and locks thesecurity critical data in the cache vulnerabilities suchas Ad Vu V inv
d (slow) (one type of Prime + TimeInvalidation) will not be possible since a preloadedlocked security critical data will not allow Ad in Step 1to replace it In this case Ad cannot be in the cache sothis vulnerability cannot be triggered in PL cache
From the security perspective the entries of the securecache in Tables 4 and 5 should have as many green coloredcells as possible If a cache design has any red cells thenit cannot defend against that type of vulnerabilitymdashattackerusing the timing-based vulnerability that corresponds to thered cell can attack the system
The third column in Tables 4 and 5 shows a normalset associative cache which cannot defend against anytype of timing-based vulnerabilities Meanwhile the lastcolumn of Tables 4 and 5 shows the situation where thecache is fully disabled As is expected the timing-basedvulnerabilities are eliminated and timing-based attacks willnot succeed Disabling caches however has tremendousperformance penalty Similarly second-to-last columnshows Nondeterministic Cache which totally randomizescache access time It can defend all the attacks but againwill have a tremendous cost to security when the applicationis complex
For each of the entry that shows the effectivenessof a secure cache against a vulnerability there are tworesults listed Left one is for normal execution and theright one is for speculative execution Some secure cachessuch as InvisiSpec cache target timing-based channels inspeculative execution For most of the caches that do notdifferentiate speculative execution and normal executionthe two sub-columns for each cache are the same
6 Secure Cache Techniques
Among the secure cache designs presented in the priorsection there are three main techniques that the caches
J Hardw Syst Secur (2019) 3397ndash425 411
Table4
Exi
stin
gse
cure
cach
esrsquo
prot
ectio
nag
ains
tal
lpo
ssib
letim
ing-
base
dvu
lner
abili
ties
with
last
step
tobe
mem
ory
acce
ssre
late
dop
erat
ions
Sin
gle
ina
gree
nce
llm
eans
this
cach
eca
npr
even
tth
eco
rres
pond
ing
vuln
erab
ility
i
na
pink
cell
mea
nsth
isca
che
can
prev
ent
the
corr
espo
ndin
gvu
lner
abili
tyin
som
ede
gree
A
timesin
are
dce
llm
eans
this
cach
eca
nnot
prev
ent
this
vuln
erab
ility
Fur
ther
mor
efo
rea
chca
che
we
anal
yze
norm
alex
ecut
ion
(lef
tcol
umn
unde
rth
eca
che
nam
e)an
dsp
ecul
ativ
eex
ecut
ion
(rig
htco
lum
nun
der
the
cach
ena
me)
a Dyn
amic
adju
stm
ento
fw
ays
for
diff
eren
tthr
eads
isas
sum
edto
bepr
oper
lyus
edac
cord
ing
toth
eru
nnin
gpr
ogra
mrsquos
cach
eus
age
bSo
me
soft
war
eas
sum
ptio
nslis
ted
inth
een
trie
sin
this
colu
mn
have
been
impl
emen
ted
byth
eca
chersquo
sre
late
dso
ftw
are
c Flus
his
disa
bled
but
cach
eco
here
nce
mig
htbe
used
todo
the
data
rem
oval
dFo
rL
1ca
che
and
TL
Bf
lush
ing
isdo
nedu
ring
cont
exts
witc
he T
hete
chni
ques
are
impl
emen
ted
inL
1ca
che
TL
Ban
dla
st-l
evel
cach
ew
hich
cons
isto
fth
ew
hole
cach
ehi
erar
chy
whe
reL
1ca
che
and
TL
Bre
quir
eso
ftw
are
flus
hpr
otec
tion
and
the
last
-lev
elca
che
can
beac
hiev
edby
sim
ple
hard
war
epa
rtiti
onin
gTo
prot
ecta
llle
vels
ofca
ches
the
soft
war
eas
sum
ptio
nsne
edto
bead
ded
f The
tech
niqu
eis
now
only
impl
emen
ted
inla
st-l
evel
cach
egT
hete
chni
que
now
only
targ
ets
shar
edca
che
hT
hete
chni
que
only
targ
ets
incl
usio
nla
st-l
evel
cach
ei T
hete
chni
que
targ
ets
data
cach
ehi
erar
chy
j For
the
last
-lev
elca
che
cach
eis
part
ition
edbe
twee
nth
evi
ctim
and
the
atta
cker
kT
hete
chni
que
can
cont
rolt
hepr
obab
ilitie
sof
the
vuln
erab
ility
tobe
succ
essf
ulto
beex
trem
ely
smal
ll T
hete
chni
que
can
wor
kin
shar
edr
ead
only
mem
ory
whi
leno
twor
king
insh
ared
wri
tabl
em
emor
ym
Ran
dom
dela
ybu
tnot
rand
omm
appi
ngca
non
lyde
crea
seth
epr
obab
ilitie
sof
atta
cker
inso
me
limite
dde
gree
J Hardw Syst Secur (2019) 3397ndash425412
Table5
Exi
stin
gse
cure
cach
esrsquop
rote
ctio
nag
ains
tall
poss
ible
timin
g-ba
sed
vuln
erab
ilitie
sw
ithla
stst
epto
bein
valid
atio
nre
late
dop
erat
ions
Sin
gle
ina
gree
nce
llm
eans
this
cach
eca
npr
even
tth
eco
rres
pond
ing
vuln
erab
ility
in
api
nkce
llm
eans
this
cach
eca
npr
even
tthe
corr
espo
ndin
gvu
lner
abili
tyin
som
ede
gree
Atimes
ina
red
cell
mea
nsth
isca
che
cann
otpr
even
tthi
svu
lner
abili
ty
Furt
herm
ore
for
each
cach
ew
ean
alyz
eno
rmal
exec
utio
n(l
eftc
olum
nun
der
the
cach
ena
me)
and
spec
ulat
ive
exec
utio
n(r
ight
colu
mn
unde
rth
eca
che
nam
e)
a Dyn
amic
adju
stm
ento
fw
ays
for
diff
eren
tthr
eads
isas
sum
edto
bepr
oper
lyus
edac
cord
ing
toth
eru
nnin
gpr
ogra
mrsquos
cach
eus
age
bSo
me
soft
war
eas
sum
ptio
nslis
ted
inth
een
trie
sin
this
colu
mn
have
been
impl
emen
ted
byth
eca
chersquo
sre
late
dso
ftw
are
c Flus
his
disa
bled
but
cach
eco
here
nce
mig
htbe
used
todo
the
data
rem
oval
dFo
rL
1ca
che
and
TL
Bf
lush
ing
isdo
nedu
ring
cont
exts
witc
he T
hete
chni
ques
are
impl
emen
ted
inL
1ca
che
TL
Ban
dla
st-l
evel
cach
ew
hich
cons
isto
fth
ew
hole
cach
ehi
erar
chy
whe
reL
1ca
che
and
TL
Bre
quir
eso
ftw
are
flus
hpr
otec
tion
and
the
last
-lev
elca
che
can
beac
hiev
edby
sim
ple
hard
war
epa
rtiti
onin
gTo
prot
ecta
llle
vels
ofca
ches
the
soft
war
eas
sum
ptio
nsne
edto
bead
ded
f The
tech
niqu
eis
now
only
impl
emen
ted
inla
st-l
evel
cach
egT
hete
chni
que
now
only
targ
ets
shar
edca
che
hT
hete
chni
que
only
targ
ets
incl
usio
nla
st-l
evel
cach
ei T
hete
chni
que
targ
ets
data
cach
ehi
erar
chy
j For
the
last
-lev
elca
che
cach
eis
part
ition
edbe
twee
nth
evi
ctim
and
the
atta
cker
kT
hete
chni
que
can
cont
rolt
hepr
obab
ilitie
sof
the
vuln
erab
ility
tobe
succ
essf
ulto
beex
trem
ely
smal
ll T
hete
chni
que
can
wor
kin
shar
edr
ead
only
mem
ory
whi
leno
twor
king
insh
ared
wri
tabl
em
emor
ym
Ran
dom
dela
ybu
tnot
rand
omm
appi
ngca
non
lyde
crea
seth
epr
obab
ilitie
sof
atta
cker
inso
me
limite
dde
gree
J Hardw Syst Secur (2019) 3397ndash425 413
utilize differentiating sensitive data partitioning andrandomization
Differentiating Sensitive Data (columns for CATalyst cacheto columns for Random Fill cache in Tables 4 and 5) allowsthe victim or attacker software or management software toexplicitly label a certain range of the data of victim whichthey think is sensitive The victim process or managementsoftware is able to use cache-specific instructions to protectthe data and limit internal interference between victimrsquosown data For example it is possible to disable victimrsquosown flushing of victimrsquos labeled data and therefore preventvulnerabilities that leverage flushing This technique allowsthe designer to have stronger control over security criticaldata rather than forcing the system to assume all of victimrsquosdata is sensitive However how to identify sensitive dataand whether this identification process is reliable are openresearch questions for caches that support differentiation ofsensitive data
This technique is independent of whether a cache usespartitioning or randomization techniques to eliminate sidechannels between the attacker and the victim Caches thatare able to label and identify sensitive data have theadvantage in preventing internal interference since they areable to differentiate sensitive data from the normal dataand can make use of special instructions to give moreprivileges to sensitive data However it requires careful usewhen identifying the actual sensitive data and implementingcorresponding security features on the cache
Comparing PL cache with SPlowast cache although bothof them use partitioning flush is able to be implementedto be disabled for victimrsquos sensitive data in PL cachewhere Vu V inv
a Vu (slow) (one type of Flush +Time) is prevented Newcache is able to prevent Vu Va Vu (slow) (one type of Bernsteinrsquos Attack [3])while most of the caches without ability to differentiatesensitive data cannot because Newcache disallows replacingdata as long as either data to be evicted or data to becached is identified to be sensitive However permittingdifferentiation of sensitive data can potentially backfire onthe cache itself For example Random Fill cache cannotprevent Vu Ad Vu (slow) (one type of Evict +Time [34]) which most of the other caches can prevent oravoid because the random fill technique loses its intendedrandom behavior when the security critical data is initiallyloaded into the cache in Step 1
Partitioning-Based Caches usually limit the victim and theattacker to be able to only access a limited set of cacheblock (columns for SPlowast cache to column for PL cachein Tables 4 and 5) For example either there is staticor dynamic partitioning of caches which allocates someblocks to High victim and Low attacker The partitioning
can be based not just on whether the memory access isvictimrsquos or attackerrsquos but also on where the access is to(eg High partition is determined by the data address) Forspeculative execution attackerrsquos code can be the part ofspeculation or out-of-order load or store which is able to bepartitioned (eg using speculative load buffer) from othernormal operations The partitioning granularity can be cachesets cache lines or cache ways Partitioning-based securecaches are usually able to prevent external interference bypartitioning but are weak at preventing internal interferenceWhen partitioning is used interference between the attackerand the victim or data belonging to different securitylevels should not be possible and attacks based on externalinterference between the victim and the attacker will failHowever the internal interference of victimrsquos own datais hard to be prevented by the partitioning-based cachesWhatrsquos more partitioning is recognized to be wastefulin terms of cache space and inherently degrades systemperformance [49] Dynamic partitioning can help limit thenegative performance and space impacts but it could be ata cost of revealing some information when adjusting thepartitioning size for each part It also does not help withinternal interference prevention
In terms of the three-step model the partitioning-basedcaches excel at making use of partitioning techniques todisallow the attacker to set initial states (Step 0) of victimpartition by use of flushing or eviction and therefore bringuncertainty to the final timing observation made by theattacker
SPlowast cache can prevent external miss-based interferencebut it still allows the victim and the attacker to getcache hits due to each otherrsquos data which makes hit-based vulnerabilities happen eg Vd Vu Va (fast)(one type of Cache Internal Collision [5]) vulnerabilityis one of the examples that SPlowast cache cannot preventSecVerilog cache is similar to SPlowast cache but prevents theattacker from directly getting cache hit due to victimrsquos datafor confidentiality and therefore prevents vulnerabilitiessuch as Ainv
a Vu Aa (fast) (one type of Flush+ Reload [55]) SHARP cache mainly uses partitioningcombined with random eviction to minimize the probabilityof evicting victimrsquos data and prevent external miss-basedvulnerabilities It is vulnerable to hit-based or internalinterference vulnerabilities such as Vu Va Vu (slow)(one type of Bernsteinrsquos Attack [3]) vulnerability DAWGcache will only allow the data to get a cache hit if bothits address and the process ID are the same Thereforecompared with normal partitioning cache such as SPlowast cacheit is able to prevent vulnerabilities such as Vd Vu Ainv
d
(fast) (one type of Prime + Flush)SecDCP and NoMo cache both leverage dynamic
partitioning to improve performance Compared withSecVerilog cache SecDCP cache introduces certain side
J Hardw Syst Secur (2019) 3397ndash425414
channels which manifest themselves when the number ofways assigned to the victim and attacker changes egVu Ainv
a Vu (slow) (one type of Flush + Time)vulnerability NoMo cache behaves more carefully whenchanging the number of ways during dynamic partitioninghowever it requires victimrsquos sensitive data to fit intothe assigned partitions otherwise it will be put into theunreserved way and allow eviction by the attacker SecDCPdoes not have unreserved way All the space in the cachewill be either belongs to High or Low partition
Sanctum cache and CATalyst cache are both controlledby a powerful software monitor and they disallow securepage sharing between victim and attacker to preventvulnerabilities such as Ad Vu Aa (fast) (onetype of Flush + Reload [55]) Sanctum cache does notconsider internal interference while CATalyst cache ismore carefully designed to prevent different vulnerabilitieswith the implemented software system so far supportingpreventing all of the vulnerabilities but only works forLLC and with high software implementation complexityand some assumptions that might be hard to achievein other scenarios eg assuming the secure partition isbig enough to fit all the secure data MI6 cache is thecombination of Sanctum and disabling speculation wheninteracting with the outside world Therefore in normalexecution it behaves the same as Sanctum For speculativeexecution because it will simply disable all the speculationwhen involving the outside world the external interferencevulnerability such as Vd Vu Ad (slow) (one type ofEvict + Probe) vulnerability will be prevented
InvisiSpec cache does not modify the original cachestate but places the data in a speculative buffer partitionduring the speculation or out-of-order load or store Sinceduring speculation cache state is not actually updated thespeculative execution cannot trigger any of the steps inthe three-step model RIC cache focuses on eviction basedattack and therefore are good at preventing even someinternal miss-based vulnerability such as Vu Va Vu
(slow) (one type of Bernsteinrsquos Attack [3]) but are bad atall hit-based vulnerabilities PL cache is line-partitionedand uses locking techniques for victimrsquos security criticaldata It can prevent many vulnerabilities because preloadingand locking secure data disallow the attacker or non-securevictim data to set initial states (Step 0) for victim partitionand therefore brings uncertainty to the final observation bythe attacker eg Ad Vu Va (fast) (one type of CacheInternal Collision [5]) vulnerability is prevented
Randomization-Based Caches (columns for SHARP cacheand columns for RP cache to columns for Non Deterministiccache in Tables 4 and 5) inherently de-correlate therelationship between information of victimrsquos securitycritical datarsquos address and observed timing from cache
hit or miss or between the address and observed timingof flush or cache coherence operations For speculativeexecution they also de-correlate the relationship betweenthe address of the data being accessed during speculativeexecution or out-of-order load or store and the observedtiming from a cache hit or miss Randomization can beused when bringing data into the cache evicting data orboth Some designs randomize the address to cache setmapping As a result of the randomization the mutualinformation from the observed timing due to having ornot having data in the cache could be reduced to 0 ifrandomization is done on every memory access Somesecure caches use randomization to avoid many of the miss-based internal interference vulnerabilities However theymay still suffer from hit-based vulnerabilities especiallywhen the vulnerabilities are related to internal interferenceHowever randomization is also likewise recognized toincrease performance overheads [23] It also requires afast and secure random number generator Most of therandomization is cache-line-based and can be combinedwith differentiation of sensitive data to be more efficient
RP cache allows eviction between different sensitivedata which leaves vulnerabilities such as Vu Va Vu
(slow) (one type of Bernsteinrsquos Attack [3]) still possiblewhile Newcache prevents this Both of the RP cacheand Newcache are not able to prevent hit-based internal-interference vulnerabilities such as Ainv
a Vu Va
(fast) (one type of Cache Internal Collision [5]) RandomFill cache is able to use total de-correlation of memoryaccess and cache access of victimrsquos security critical datato prevent most of the internal and external interferenceHowever when security critical data is initially directlyloaded into the cache block for Step 1 Random Fill cachewill not randomly load security critical data and allowsvulnerabilities such as Vu V inv
a Vu (slow) (one typeof Flush + Time) vulnerability to exist CEASER cache usesencryption scheme plus dynamic remapping to randomizemapping from memory addresses to cache sets Howeverthis targets eviction-based attacks and cannot prevent hit-based vulnerabilities such as Va V inv
u V inva (fast)
(one type of Flush + Probe Invalidation) SCATTER cacheencrypts both the cache address and process ID whenmapping into different cache index to further prevent morehit-based vulnerabilities for shared and read only memoryNon Deterministic Cache totally randomizes timing ofcache accesses by adding delays and can prevent all attacks(but at tremendous performance cost)
61 Estimated Performance and Security Tradeoffs
Table 6 shows the implementation and performanceresults of the secure caches as listed by the designersin the different papers At the extreme end there is
J Hardw Syst Secur (2019) 3397ndash425 415
Table6
Exi
stin
gse
cure
cach
esrsquo
impl
emen
tatio
nm
etho
dpe
rfor
man
cep
ower
and
area
sum
mar
y
J Hardw Syst Secur (2019) 3397ndash425416
the Non Deterministic cache with random delay thesecure cache can prevent all the cache timingndashbasedvulnerabilities in some degreemdashwhile their paper reportsonly 7 degradation in performance we expect it tobe much more for more complex application than AESalgorithm Disabling caches eliminates the attacks but ata huge performance cost Normally a secure cache needsto sacrifice some performance in order to de-correlatememory access with the timing The secure caches thattend to be able to prevent more vulnerabilities usuallyhave weaker performance compared with other securecaches For example more security seems to imply lessperformance
62 Towards Ideal Secure Cache
Based on the above analysis a good secure cache shouldconsider all the 72 types of Strong vulnerabilities egexternal and internal interference and hit-based and miss-based vulnerabilities Considering all factors and basedon Tables 4 and 5 we have several suggestions andobservations for a secure cache design which can defendtiming-based attacks
bull Internal interference is important for caches to preventtiming-based attacks and is the weak point of most ofthe secure caches To prevent this the following threesubpoints should be considered
ndash Miss-based internal interference can be solvedby randomly evicting data to de-correlatememory access with timing information wheneither data to be evicted or data to be cachedis sensitive eg Newcache prevents Vu Va Vu (slow) (one type of BernsteinrsquosAttack [3]) vulnerability
ndash Hit-based internal interference can be solvedby randomly bringing data into the cache egRandom Fill cache prevents Ad Vu Va
(fast) (Cache Internal Collision) vulnerabilityndash To limit internal interference at lower perfor-
mance cost rather than simply assume all ofvictimrsquos data is sensitive it is better to differ-entiate real sensitive data from other data in thevictim code However identification of sen-sitive information needs to be carefully usedeg Random Fill cache is vulnerable to Vu Ad Vu (fast) (one type of Evict + Time [34])vulnerability which most of the secure cachesare able to prevent
bull Direct partitioning between the victim and the attackeralthough may hurt cache space utilization or perfor-mance is good at disallowing attacker to set known
initial state to victimrsquos partition and therefore pre-vents external interference Alternatively careful use ofrandomization can also prevent external interference
It should be noted that some cache designs only focuson certain levels eg CATalyst cache only works at thelast level cache In order to fully protect the whole cachesystem from timing-based attacks all levels of caches in thehierarchy should be protected with related security featuresFor example Sanctum is able to prevent all levels of cachesfrom L1 to last-level cache Consequently secure cachedesign needs to be realizable at all levels of the cachehierarchy
7 RelatedWork
There are a lot of existing attacks exploring timing-based cache channels eg [1 3 5 16ndash19 34 36 5455] Furthermore our recent paper [11] has summarizedcache timingndashbased side-channel vulnerabilities using athree-step model and inspired this work on checkingwhich vulnerability types are truly defeated by the securecaches in context of timing-based attacks In other workZhang and Lee [59] used finite-state machine to modelcache architectures and leveraged mutual information tomeasure potential side-channel leakage of the modeledcache architectures Meanwhile He and Lee [20] modeledinterference using probabilistic information flow graph andused attackerrsquos success probability to estimate differentcachesrsquo ability to defend against some cache timingndashbasedside-channel attacks However they did not explore allpossible vulnerabilities due to cache timingndashbased channels
There is also some other work focusing on cacheside-channel verification [13 14 47] Among theseCacheAudit [14] efficiently computes possible side-channelobservations using abstractions in a modular way Bit-level and arithmetic reasoning is used in [13] for memoryaccesses in the presence of dynamic memory allocationCacheD [47] detects potential cache differences at eachprogram point leveraging symbolic execution and constraintsolving
Hardware transactional memory has also been lever-aged to prevent timing-based cache side-channel attacks [815] Hardware transactional memory (HTM) is availableon modern commercial processors such as Intelrsquos Transac-tional Synchronization Extensions (TSX) Its main featureis to abort the transaction and roll back the modificationswhenever a cache block contained in the read set or writeset is evicted out of the cache In [15] HTM was com-bined with preloading strategy for code and data to preventFlush + Reload attacks in the local setting and Prime andProbe attacks in the cloud setting In [8] the software-level
J Hardw Syst Secur (2019) 3397ndash425 417
solution targets system calls page faults code refactoringand abort reasoning to eliminate not only Prime + ProbeFlush + Reload but also Evict + time and Cache Collisionattacks
8 Conclusion
This paper first proposed a new three-step model in orderto model all possible cache timing vulnerabilities It furtherprovided a cache three-step simulator and reduction rulesto derive effective vulnerabilities allowing us to find onesthat have not been exploited in literature With exhaustiveeffective vulnerability types listed this paper presentedanalysis of 18 secure processor cache designs with respectto how well they can defend against these timing-basedvulnerabilities Our work showed that vulnerabilities basedon internal interference of the victim application aredifficult to protect against and many secure cache designsfail in this We also provided a summary of secure processorcache features that could be integrated to make an idealsecure cache that is able to defend timing-based attacksOverall implementing a secure cache in a processor can be aviable alternative to defend timing-based attacks Howeverit requires design of an ideal secure cache or correction ofexisting secure cache designs to eliminate the few attacksthat they do not protect against
Acknowledgments This work was support in part by NSF grants1651945 and 1813797 and through Semiconductor Research Corpo-ration (SRC) under GRC Task 2844001
Appendix A Attack Strategies Descriptions
This appendix gives overview of the attack strategies shownin Tables 2 and 3 in Section 3 For each attack strategy anoverview of the three steps of the strategy is given Some ofthe strategies are similar and some may not be precise butwe keep and use the original names as they were assignedin prior work One advantage of our three-step model isthat it gives precise definition of each attack Neverthelessthe attack strategy names used before (and added by us forstrategies which did not have such names) may be useful torecall the attacksrsquo high-level operation
Cache Internal Collision In Step 1 cache blockrsquos data isinvalidated by flushing or eviction done by either theattacker or the victim Then the victim accesses secret datain Step 2 Finally the victim accesses data at a knownaddress in Step 3 if there is a cache hit then it reveals thatthere is an internal collision and leaks value of u
Flush + Reload In Step 1 either the attacker or the victiminvalidates the cache blockrsquos data by flushing or evictionThen the victim access secret data in Step 2 Finally theattacker tries to access some data in Step 2 using a knownaddress If a cache hit is observed then addresses from lasttwo steps are the same and the attacker learns the secretaddress This strategy has similar Step 1 and Step 2 asCache Internal Collision vulnerability but for Step 3 it isthe attacker who does the reload access
Reload + Time (New Name Assigned in this Paper) In Step
1 secret data is invalidated by the victim Then the attackerdoes some known data access in Step 2 that could possiblybring back the invalidated the victimrsquos secret data in Step
1 In Step 3 if the victim reloads the secret data a cachehit is observed and the attacker can derive the secret datarsquosaddress
Flush + Probe (New Name Assigned in this Paper) In Step
1 the victim or the attacker access some known addressIn Step 2 the victim invalidates secret data In Step 3reloading of Step 1rsquos data and observation of a cache misswill help the attacker learn that the secret data maps to theknown address from Step 1
Evict + Time In Step 1 some victimrsquos secret data is put intothe cache by the victim itself In Step 2 the attacker evicts aspecific cache set by performing a memory related operationthat is not a flush In Step 3 the victim reloads secret dataand if a cache miss is observed the will learn the secretdatarsquos cache set information This attack has similar Step 1and Step 3 as Flush + Time vulnerability but for Step 2 inEvict + Time the attacker invalidates some known addressallowing it to find the full address of the secret data insteadof evicting a cache set to only find the secret datarsquos cacheindex as in the Flush + Time attack
Prime + Probe In Step 1 the attacker primes the cache setusing data at address known to the attacker In Step 2 thevictim accesses the secret data which possibly evicts datafrom Step 1 In Step 3 the attacker probes each cache setand if a cache miss is observed the attacker knowns thesecret data maps to the cache set he or she primed
Bernsteinrsquos Attack This attack strategy leverages the vic-timrsquos internal interference to trigger the miss-based attackFor one case the victim does the same secret data access inStep 1 and Step 3 while in Step 2 the victim tries to evictone whole cache setrsquos data by known data accesses If cachemiss is observed in Step 3 that will tell the attacker thecache set is the one secret data maps to For another casethe victim primes and probe a cache set in Step 1 and Step
3 driven by the attacker while in Step 2 the victim tries to
J Hardw Syst Secur (2019) 3397ndash425418
access the secret data Similar to the first case observingcache miss in Step 3 tells the attacker the cache set is theone secret data maps to
Evict + Probe (NewName Assigned in this Paper) In Step 1victim evict the cache set using the access to a data at anaddress known to the attacker In Step 2 the victim accessessecret data which possibly evicts data from Step 1 In Step
3 the attacker probes each cache set using the same data asin Step 1 if a cache miss is observed the attacker knownsthe secret data maps to the cache set he or she primed Thisattack strategy has similar Step 2 and Step 3 as Prime +Probe attack but for Step 1 it is the victim that does theeviction accesses
Prime + Time (NewName Assigned in this Paper) In Step 1the attacker primes the cache set using access to data at anaddress known to the attacker In Step 2 the victim accessessecret data which possibly evicts data from Step 1 In Step
3 the victim probes each cache set using the same data Step
1 if a cache miss is observed the attacker knowns the secretdata maps to the cache set he or she primed in Step 1 Thisattack strategy has similar Step 1 and Step 2 as Prime +Probe attack but for Step 3 it is the victim that does theprobing accesses
Flush + Time (NewNameAssigned in this Paper) The victimaccesses the same secret data in Step 1 and Step 3 whilein Step 2 the attacker tries to invalidate data at a knownaddress If cache miss is observed in Step 3 that will tellthe attacker the data address he or she invalidated in Step 2maps to the secret data
Invalidation related (new names assigned in this paper)Vulnerabilities that have names ending with ldquoinvalidationrdquoin Table 3 correspond to the vulnerabilities that havethe same name (except for the ldquoinvalidationrdquo part) inTable 2 The difference between each set of correspondingvulnerabilities is that the vulnerabilities ending withldquoinvalidationrdquo use invalidation related operation in the laststep to derive the timing information rather than the normalmemory access related operations
Appendix B Soundness Analysisof the Three-StepModel
In this section we analyze the soundness of the three-stepmodel to demonstrate that the three-step model can coverall possible timing-based cache vulnerabilities in normalcaches If there is a vulnerability that is represented usingmore than three steps the steps can be reduced to only three
steps or a three-step sub-pattern can be found in the longerrepresentation
In the below analysis we use β to denote the numberof memory-related operations ie steps in a representationof a vulnerability We show that β = 1 is not sufficient torepresent a vulnerability β = 2 covers some vulnerabilitiesbut not all β = 3 represents all the vulnerabilities andβ gt 3 can be reduced to only three steps or a three-step sub-pattern can be found in the longer representation Knownaddresses refer to all the cache states that interferencewith the data a aalias and d Unknown address refersu An access to a known memory address is denoted asknown access operation and an invalidation of a knownmemory address is denoted as known inv operationThe known access operation and known inv operation
together make up not u operations An unknown memoryrelated operation (containing u) is denoted as u operation
B1 Patterns with β = 1
When β = 1 there is only one memory-relatedoperation and it is not possible to create interferencebetween memory-related operations since two memory-related operations are the minimum requirement for aninterference Furthermore β = 1 corresponds to the three-step pattern with both Step 1 and Step 2 being since thecache state gives no information and Step 3 being theone operation These types of patterns are all examined bythe cache three-step simulator and none of these types arefound to be effective Consequently a vulnerability cannotexit when β = 1
B2 Patterns with β = 2
When β = 2 it satisfies the minimum requirementof an interference for memory related operations andcorresponds to the three-step cases where Step 1 is and Step 2 and Step 3 are the two operations Thesetypes are all examined by the cache three-step simulatorand some of them belong to Weak Vulnerabilities like Aa Vu Therefore three-step cases where Step
1 is have corresponding effective vulnerabilities shownin Table 2 Consequently β = 2 can represent someweak vulnerabilities but not all vulnerabilities as there existsome that are represented with three steps as discussednext
B3 Patterns with β = 3
When β = 3 we have tested all possible combinations ofthree-step memory related operations in Section 33 usingour cache simulator for the three-step model We foundthat there are in total 72 types of Strong Vulnerabilities
J Hardw Syst Secur (2019) 3397ndash425 419
and 64 types of Weak Vulnerabilities that are representedby patterns with β = 3 steps Consequently β = 3can represent all the vulnerabilities (including some weakones where Step 1 is ) Using more steps to representvulnerabilities is not necessary as discussed next
B4 Patterns with β gt 3
When β gt 3 the pattern of memory-related operations fora vulnerability can be reduced using the following rules
B41 Subdivision Rules
First a set of subdivision rules is used to divide the longpattern into shorter patterns following the below rulesEach subdivision rule should be applied recursively beforeapplying the next rule
Subdivision Rule 1 If the longer pattern contains a sub-pattern such as the longer pattern canbe divided into two separate patterns where is assignedas Step 1 of the second pattern This is because givesno timing information and the attacker loses track of thecache state after This rule should be recursively applieduntil there are no sub-patterns left with a in the middleor as last step ( in the last step will be deleted) in thelonger pattern
Subdivision Rule 2 Next if a pattern (derived afterrecursive application of the Rule 1 contains a sub-patternsuch as AinvVinv the longer pattern canbe divided into two separate patterns where AinvVinv isassigned as Step 1 of the second pattern This is becauseAinvVinv will flush all the timing information of thecurrent block and it can be used as the flushing step forStep 1 eg vulnerability Ainv Vu Aa(f ast)shown in Table 2 AinvVinv cannot be a candidate formiddle steps or the last step because it will flush alltiming information making the attacker unable to deducethe final timing with victimrsquos sensitive address translationinformation This rule should be recursively applied untilthere are no sub-patterns left with a AinvVinv in themiddle or the last step (AinvVinv in the last step will bedeleted)
B42 Simplification Rules
For each of the patterns resulting from the subdivisionof the original pattern we define Commute Rules UnionRules and Reduction Rules for a each set of two adjacentsteps in these remaining patterns In Table 7 we showall the possible cases of the rule applying conditions for
each adjacent two steps regardless of the attackerrsquos access(A) or the victimrsquos access (V ) The table shows whetherthe corresponding two steps can be commuted reduced orunioned (and the reduced or the unioned result if the rulescan be applied)
B421 Commute Rules
Suppose there are two adjacent steps M and N for a memorysequences M N If commuting M andN lead to the same observation result ie M N and N M will have thesame timing observation information in the final step for theattacker we can freely exchange the place of M and N inthis pattern In this case we have more chance to Reduceand Union the steps within the memory sequence by thefollowing Rules In the possible commuting process we willtry every possible combinations to commute different pairsof two steps that are able to apply the Commute Rules andthen further apply Reduce Rules and Union Rules to seewhether the commute is effective ie there can be stepsreduced or unioned after the proper commuting process Thefollowing two adjacent memory-related operations can becommuted
ndash Commute Rule 1 For two adjacent steps if one stepis a known access operation and another step is aknown inv operation and the addresses they referto are different these two steps can be commuted nomatter which position of the two steps they are in withinthe whole memory sequence It will show a ldquoyesrdquo forthe corresponding two-step pattern for the CommuteRule 1 column if these two can be commuted in Table 7
ndash Commute Rule 2 A superset of two-step patterns thatcan apply Commute Rule 1 can be commuted if thesecond step of these two adjacent steps is not thelast step in the whole memory sequence There aresome two adjacent steps that can only be commutedif the second step of these two adjacent steps is notthe last step in the whole memory sequence Therewill be a ldquoyesrdquo for the corresponding two-step patternfor the Commute Rule 2 column and a ldquonordquo for thecorresponding two-step pattern for the Commute Rule 1column in Table 7
B422 Reduction Rules
If the memory sequence after applying Commute Ruleshave a sub-pattern that has two adjacent steps both relatedto known addresses or both related to unknown address(including repeating states) the two adjacent steps can bereduced to only one following the reduction rules (if thetwo-step pattern has ldquoyesrdquo for the Column ldquoUnion Rule or
J Hardw Syst Secur (2019) 3397ndash425420
Table7
Rul
esfo
rco
mbi
ning
two
adja
cent
step
s
Firs
tSe
cond
Com
mute
Com
mute
Union
Rule
orC
ombi
ned
Firs
tSe
cond
Com
mute
Com
mute
Union
Rule
orC
ombi
ned
Rule1
Rule2
ReduceRule
Step
Rule1
Rule2
ReduceRule
Step
aa
yes
yes
yes
aa
inv
ano
noye
sa
aa
ali
as
nono
yes
aali
as
ain
va
ali
as
yes
yes
yes
aali
as
ad
nono
yes
da
inv
dye
sye
sye
sd
au
nono
nominus
ain
vu
nono
nominus
aa
inv
nono
yes
ain
va
inv
ain
vye
sye
sye
sa
inv
aa
ali
asin
vye
sye
sye
sa
ain
va
ali
asin
vye
sye
sye
sU
nio
n(a
inva
ali
asin
v)
ad
inv
yes
yes
yes
aa
inv
din
vye
sye
sye
sU
nio
n(a
invd
inv)
au
inv
nono
nominus
ain
vu
inv
noye
sno
minusa
ali
as
ano
noye
sa
aali
asin
va
yes
yes
yes
a
aali
as
aali
as
yes
yes
yes
aali
as
aali
asin
va
ali
as
nono
yes
aali
as
aali
as
dno
noye
sd
aali
asin
vd
yes
yes
yes
d
aali
as
uno
nono
minusa
ali
asin
vu
nono
nominus
aali
as
ain
vye
sye
sye
sa
ali
as
aali
asin
va
inv
yes
yes
yes
Unio
n(a
inva
ali
asin
v)
aali
as
aali
asin
vno
noye
sa
ali
asin
va
ali
asin
va
ali
asin
vye
sye
sye
sa
ali
asin
v
aali
as
din
vye
sye
sye
sa
ali
as
aali
asin
vd
inv
yes
yes
yes
Unio
n(d
inva
ali
asin
v)
aali
as
uin
vno
nono
minusa
ali
asin
vu
inv
noye
sno
minusd
ano
noye
sa
din
va
yes
yes
yes
a
da
ali
as
nono
yes
aali
as
din
va
ali
as
yes
yes
yes
aali
as
dd
yes
yes
yes
dd
inv
dno
noye
sd
du
nono
nominus
din
vu
noye
sno
minusd
ain
vye
sye
sye
sd
din
va
inv
yes
yes
yes
Unio
n(a
invd
inv)
da
ali
asin
vye
sye
sye
sd
din
va
ali
asin
vye
sye
sye
sU
nio
n(d
inva
ali
asin
v)
dd
inv
nono
yes
din
vd
inv
din
vye
sye
sye
sd
inv
du
inv
noye
sno
minusd
inv
uin
vno
yes
nominus
ua
nono
nominus
uin
va
nono
nominus
ua
ali
as
nono
nominus
uin
va
ali
as
nono
nominus
ud
nono
nominus
uin
vd
noye
sno
minusu
uye
sye
sye
su
uin
vu
nono
yes
u
ua
inv
nono
nominus
uin
va
inv
noye
sno
minusu
aali
asin
vno
nono
minusu
inv
aali
asin
vno
yes
nominus
ud
inv
noye
sno
minusu
inv
din
vno
yes
nominus
uu
inv
nono
yes
uin
vu
inv
uin
vye
sye
sye
su
inv
J Hardw Syst Secur (2019) 3397ndash425 421
Reduce Rulerdquo and has no Union result for the ldquoCombinedSteprdquo Column in Table 7
ndash Reduction Rule 1 For two u operations although u isunknown both of the accesses target on the same u socan be reduced to only keep the second access in thememory sequence
ndash Reduction Rule 2 For two known adjacent memoryaccessndashrelated operations (known access operation)they always result in a deterministic state of the secondmemory accessndashrelated cache block so these two stepscan be reduced to only one step
ndash Reduction Rule 3 For two adjacent steps if onestep is known access operation and another one isknown inv operation no matter what order theyhave and the address they refer to is the same these twocan be reduced to one step which is the second step
B423 Union Rules
Suppose there are two adjacent steps M and N for memorysequences M N If combining M andN leads to the same timing observation result ie M N and Union(M N) will havethe same timing observation information in the final step forthe attacker we can combine step M and N to be a joint onestep for this memory sequence defined as Union(M N)Two adjacent steps that can be combined are discussed inthe following cases
ndash Union Rule 1 Two invalidations to two knowndifferent memory addresses can be applied UnionRule 1 known inv operation are two operations bothinvalidating some known address therefore they canbe combined to only one step The Union Rule can becontinuously done to union all the adjacent invalidationstep that invalidates known different memory addresses
B424 Final Check Rules
Each long memory sequence will recursively apply thesethree categorizations of the rules in the order Com-mute Rules first to put known access operations andknown inv operation that targets the same address as nearas possible and u operations and not u operations areputting together as much as possible The Reduced Rulesare then checked and applied to the processed memorysequence to reduce the steps Then the Union Rule isapplied to the processed memory sequence
The recursion at each application to these threecategorizations of the rules should be always applied andreduce at least one step until the resulting sequence matchesone of the two possible cases
bull the long (β gt 3) memory sequence with u operation
and not u operation is further reduced to a sequencewhere there are at most three steps in the followingpatterns or less
ndash u operation not u operation u operation
ndash not u operation u operation not u operation
There might be possible extra or AinvV inv beforethese three-step pattern where
bull An extra in the first step will not influencethe result and can be directly removed
bull If an extra AinvV inv in the first step
ndash If followed by known access ope-ration AinvV inv can be removeddue to the actual state further put intothe cache block
ndash If followed by known inv opera-t ion or V inv
u AinvV inv can also beremoved since the memory locationis repeatedly flushed by the two steps
ndash If followed by Vu worst case will beAinvV inv Vu not u operation u operationwhich is either an effective vul-nerability according to Table 2and reduction rules shown inSection 33 or AinvV inv Vu Ainv
d V invd u operation where
Vu Ainvd V inv
d can further beapplied Commute Rule 2 to reduceand be within three steps
In this case the steps are finally within three steps andthe checking is done
bull There exist two adjacent steps that cannot be appliedany Rules above and requires the Rest Checking
The only left two adjacent steps that cannot be appliedby any of the three categorizations of the Rules are thefollowing
ndash AaVaAaalias Vaalias AdVdAinva V inv
a
Ainvaalias V inv
aalias Vu ndash AaVaAaalias Vaalias V inv
u ndash Vu AaVaAaalias Vaalias AdVd
Ainva V inv
a Ainvaalias V inv
aalias ndash V inv
u AaVaAaalias Vaalias We manually checked all of the two adjacent step patterns
above and found that adding extra step before or after these
J Hardw Syst Secur (2019) 3397ndash425422
Algorithm 2 -Step ( 3) pattern reduction
Input number of steps of the pattern
a two-dimensional dynamic-size array [0] contains the states of each step of the original pattern in
order [1] [2] are empty initially
Output reduced effective vulnerability pattern(s) array It will be an empty list if the original pattern does not
correspond to an effective vulnerability
1 = Oslash
2 while contain( ) and index not 0 do3 = 1 ( )
4 end while5 while ( contain( ) and index not 0) or ( contain( ) and index not 0) do6 = 2 ( )
7 end while8 while ( is ineffecitve or has interval effective three steps) do9 = ( )
10 = ( )
11 = ( )
12 if ( is ineffecitve or has interval effective three steps) then13 += ( )
14 end if15 end while16 return
two steps can either generate two adjacent step patterns thatbe processed by the three Rules where further step can bereduced or construct effective vulnerability according toTable 2 and reduction rules shown in Section 33 wherethe corresponding pattern can be treated effective and thechecking is done
B43 Algorithm for Reducing and CheckingMemorySequence
Algorithm 2 is used to (i) reduce a β-step (β gt 3)pattern to a three-step pattern thus demonstrating that thecorresponding β gt 3 step pattern actually is equivalent tothe output three-step pattern and represents a vulnerabilitythat is captured by an existing three-step pattern or (ii)demonstrate that the β-step pattern can be mapped to one ormore three-step vulnerabilities It is not possible for a β-stepvulnerability pattern to not be either (i) or (ii) after doing theRule applications Key outcome of our analysis is that anyβ-step pattern is not a vulnerability or if it is a vulnerabilityit maps to either outputs (i) or (ii) of the algorithm
Inside the Algorithm 2 contain() represents a function tocheck if a list contains a corresponding state is ineffective()represents a function that checks the corresponding memorysequence does not contain any effective three-stepshas interval effective three steps() represents a functionthat check if the corresponding memory sequence can bemapped to one or more three-step vulnerabilities
B44 Summary
In conclusion the three-step model can model all possibletiming-based cache vulnerability in normal caches Vulner-abilities which are represented by more than three steps canbe always reduced to one (or more) vulnerabilities from ourthree-step model and thus using more than three step is notnecessary
References
1 Acıicmez O Koc CK (2006) Trace-driven cache attacks on AES(short paper) In International conference on information andcommunications security Springer pp 112ndash121
2 Agrawal D Archambeault B Rao JR Rohatgi P (2002) TheEM side-channel (s) In International workshop on cryptographichardware and embedded systems Springer pp 29ndash45
3 Bernstein DJ (2005) Cache-timing attacks on AES4 Binkert N Beckmann B Black G Reinhardt SK Saidi A Basu
A Hestness J Hower DR Krishna T Sardashti S et al (2011)The gem5 simulator ACM SIGARCH computer architecture news39(2)1ndash7
5 Bonneau J Mironov I (2006) Cache-collision timing attacksagainst AES In International workshop on cryptographichardware and embedded systems Springer pp 201ndash215
6 Borghoff J Canteaut A Guneysu T Kavun EB Knezevic MKnudsen LR Leander G Nikov V Paar C Rechberger Cet al (2012) Princendasha low-latency block cipher for pervasivecomputing applications In International conference on the theoryand application of cryptology and information security Springerpp 208ndash225
J Hardw Syst Secur (2019) 3397ndash425 423
7 Bourgeat T Lebedev I Wright A Zhang S Devadas S et al (2019)MI6 Secure enclaves in a speculative out-of-order processorIn Proceedings of the 52nd Annual IEEEACM InternationalSymposium on Microarchitecture ACM pp 42ndash56
8 Chen S Liu F Mi Z Zhang Y Lee RB Chen H Wang X(2018) Leveraging hardware transactional memory for cache side-channel defenses In Proceedings of the 2018 on Asia conferenceon computer and communications security ACM pp 601ndash608
9 Clark SS Ransford B Rahmati A Guineau S Sorber J Xu WFu K (2013) Wattsupdoc power side channels to nonintrusivelydiscover untargeted malware on embedded medical devices InPresented as part of the 2013 USENIX workshop on healthinformation technologies
10 Costan V Lebedev IA Devadas S (2016) Sanctum minimalhardware extensions for strong software isolation In USENIXsecurity symposium pp 857ndash874
11 Deng S Xiong W Szefer J (2018) Cache timing side-channel vulnerability checking with computation tree logic InProceedings of the 7th international workshop on hardware andarchitectural support for security and privacy ACM p 2
12 Domnitser L Jaleel A Loew J Abu-Ghazaleh N Ponomarev D(2012) Non-monopolizable caches low-complexity mitigation ofcache side channel attacks ACM Transactions on Architectureand Code Optimization (TACO) 8(4)35
13 Doychev G Kopf B (2017) Rigorous analysis of softwarecountermeasures against cache attacks In Proceedings of the 38thACM SIGPLAN conference on programming language design andimplementation ACM pp 406ndash421
14 Doychev G Kopf B Mauborgne L Reineke J (2015) CacheAudita tool for the static analysis of cache side channels ACM Trans-actions on information and system security (TISSEC) 18(1)4
15 Gruss D Lettner J Schuster F Ohrimenko O Haller I CostaM (2017) Strong and efficient cache side-channel protectionusing hardware transactional memory In USENIX securitysymposium pp 217ndash233
16 Gruss D Maurice C Wagner K Mangard S (2016) Flush+ flusha fast and stealthy cache attack In International conference ondetection of intrusions and malware and vulnerability assessmentSpringer pp 279ndash299
17 Gruss D Spreitzer R Mangard S (2015) Cache template attacksautomating attacks on inclusive last-level caches In USENIXsecurity symposium pp 897ndash912
18 Guanciale R Nemati H Baumann C Dam M (2016) Cache stor-age channels alias-driven attacks and verified countermeasuresIn 2016 IEEE symposium on security and privacy (SP) IEEEpp 38ndash55
19 Gullasch D Bangerter E Krenn S (2011) Cache gamesndashbringingaccess-based cache attacks on AES to practice In 2011 IEEEsymposium on security and privacy (SP) IEEE pp 490ndash505
20 He Z Lee RB (2017) How secure is your cache against side-channel attacks In Proceedings of the 50th annual IEEEACMinternational symposium on microarchitecture ACM pp 341ndash353
21 Intel C (2015) Improving real-time performance by utilizing cacheallocation technology Intel Corporation
22 Kayaalp M Khasawneh KN Esfeden HA Elwell J Abu-Ghazaleh N Ponomarev D Jaleel A (2017) RIC relaxed inclusioncaches for mitigating LLC side-channel attacks In 2017 54thACMEDACIEEE design automation conference (DAC) IEEEpp 1ndash6
23 Keramidas G Antonopoulos A Serpanos DN Kaxiras S (2008)Non deterministic caches a simple and effective defense againstside channel attacks Des Autom Embed Syst 12(3)221ndash230
24 Kessler RE Hill MD (1992) Page placement algorithms for largerealindexed caches ACM Trans Comput Syst (TOCS) 10(4)338ndash359
25 Kiriansky V Lebedev I Amarasinghe S Devadas S Emer J(2018) DAWG a defense against cache timing attacks in spec-ulative execution processors In 2018 51st Annual IEEEACMinternational symposium on microarchitecture (MICRO) IEEEpp 974ndash987
26 Kocher P Genkin D Gruss D Haas W Hamburg M LippM Mangard S Prescher T Schwarz M Yarom Y (2018)Spectre attacks exploiting speculativeexecution In 2019 IEEESymposium on Security and Privacy (SP) IEEE pp 1ndash19
27 Lee RB Kwan P McGregor JP Dwoskin J Wang Z (2005)Architecture for protecting critical secrets in microprocessorsIn ACM SIGARCH computer architecture news vol 33 IEEEComputer Society pp 2ndash13
28 Lee Y Waterman A Avizienis R Cook H Sun C StojanovicV Asanovic K (2014) A 45nm 13 GHz 167 double-precisionGFLOPSW RISC-V processor with vector accelerators InEuropean solid state circuits conference (ESSCIRC) ESSCIRC2014-40th IEEE pp 199ndash202
29 Liu F Ge Q Yarom Y Mckeen F Rozas C Heiser G Lee RB(2016) CATalyst defeating last-level cache side channel attacks incloud computing In 2016 IEEE international symposium on highperformance computer architecture (HPCA) IEEE pp 406ndash418
30 Liu F Lee RB (2014) Random fill cache architecture In2014 47th annual IEEEACM international symposium onmicroarchitecture (MICRO) IEEE pp 203ndash215
31 Liu F Wu H Mai K Lee RB (2016) Newcache secure cachearchitecture thwarting cache side-channel attacks IEEE Micro36(5)8ndash16
32 Luk CK Cohn R Muth R Patil H Klauser A Lowney G WallaceS Reddi VJ Hazelwood K (2005) Pin building customizedprogram analysis tools with dynamic instrumentation In ACMsigplan notices vol 40 ACM pp 190ndash200
33 Masti RJ Rai D Ranganathan A Muller C Thiele L Capkun S(2015) Thermal covert channels on multi-core platforms In 24thUSENIX security symposium (USENIX security 15) pp 865ndash880
34 Osvik DA Shamir A Tromer E (2006) Cache attacks andcountermeasures the case of AES In Cryptographersrsquo track atthe RSA conference Springer pp 1ndash20
35 Patel A Afram F Chen S Ghose K (2011) MARSS Afull system simulator for multicore x86 CPUs In 2011 48thACMEDACIEEE design automation conference (DAC) IEEEpp 1050ndash1055
36 Percival C (2005) Cache missing for fun and profit BSDCan 2005Ottawa 2005 httpwwwdaemonologynetpapershttpdf
37 Hammarlund P Martinez AJ Bajwa AA Hill DL Hallnor EJiang H Dixon M Derr M Hunsaker M Kumar R Osborne RB(2014) Haswell The fourth-generation intel core processor IEEEMicro 1034(2)6ndash20
38 Qureshi MK (2018) CEASER mitigating conflict-based cacheattacks via encrypted-address and remapping In 2018 51Stannual IEEEACM international symposium on microarchitecture(MICRO) IEEE pp 775ndash787
39 Sanchez D Kozyrakis C (2013) ZSIm fast and accuratemicroarchitectural simulation of thousand-core systems In ACMSIGARCH computer architecture news vol 41 ACM pp 475ndash486
40 Schwarz M Schwarzl M Lipp M Gruss D (2018) NetSpectreread arbitrary memory over network In European Symposium onResearch in Computer Security Springer pp 279ndash299
41 Seznec A (1993) A case for two-way skewed-associative cachesACM SIGARCH computer architecture news 21(2)169ndash178
42 Sharkey J Ponomarev D Ghose K (2005) M-sim a flexiblemultithreaded architectural simulation environment Techenicalreport Department of Computer Science State University ofNew York at Binghamton In Technical Report 20012 CompaqComputer Corporation Aug 2001
J Hardw Syst Secur (2019) 3397ndash425424
43 Shivakumar P Jouppi NP (2001) Cacti 30 an integrated cachetiming power and area model In Technical Report 20012Compaq Computer Corporation Aug 2001
44 Taylor G Davies P Farmwald M (1990) The TLB slice-alow-cost high-speed address translation mechanism In 17thannual international symposium on computer architecture 1990Proceedings IEEE pp 355ndash363
45 Thoziyoor S Muralimanohar N Ahn JH Jouppi NP (2008)CACTI 51 Technical Report HPL-2008-20 HP Labs
46 Trippel C Lustig D Martonosi M (2018) MeltdownPrimeand SpectrePrime automatically-synthesized attacks exploitinginvalidation-based coherence protocols arXiv180203802
47 Wang S Wang P Liu X Zhang D Wu D (2017) CacheDidentifying cache-based timing channels in production softwareIn 26th USENIX security symposium USENIX association
48 Wang Y Ferraiuolo A Zhang D Myers AC Suh GE (2016)SecDCP secure dynamic cache partitioning for efficient timingchannel protection In 2016 53nd ACMEDACIEEE designautomation conference (DAC) IEEE pp 1ndash6
49 Wang Z Lee RB (2007) New cache designs for thwarting softwarecache-based side channel attacks In ACM SIGARCH computerarchitecture news vol 35 ACM pp 494ndash505
50 Wang Z Lee RB (2008) A novel cache architecture withenhanced performance and security In 2008 41st IEEEACMinternational symposium on Microarchitecture 2008 MICRO-41IEEE pp 83ndash93
51 Werner M Unterluggauer T Giner L Schwarz M Gruss D Man-gard S (2019) ScatterCache thwarting cache attacks via cache setrandomization In 28th USENIX security symposium (USENIXSecurity 19) USENIX Association Santa Clara CA httpswwwusenixorgconferenceusenixsecurity19presentationwerner
52 Yan M Choi J Skarlatos D Morrison A Fletcher C Torrellas J(2018) InvisiSpec making speculative execution invisible in thecache hierarchy In 2018 51st annual IEEEACM internationalsymposium on microarchitecture (MICRO) IEEE pp 428ndash441
53 Yan M Gopireddy B Shull T Torrellas J (2017) Secure hierarchy-aware cache replacement policy (SHARP) defending againstcache-based side channel attacks In Proceedings of the 44thannual international symposium on computer architecture ACMpp 347ndash360
54 Yao F Doroslovacki M Venkataramani G (2018) Are coherenceprotocol states vulnerable to information leakage In 2018IEEE international symposium on high performance computerarchitecture (HPCA) IEEE pp 168ndash179
55 Yarom Y Falkner K (2014) FLUSH+ RELOAD a high resolutionlow noise L3 cache side-channel attack In USENIX securitysymposium pp 719ndash732
56 Zhang D Askarov A Myers AC (2012) Language-based controland mitigation of timing channels ACM SIGPLAN Not 47(6)99ndash110
57 Zhang D Wang Y Suh GE Myers AC (2015) A hardware designlanguage for timing-sensitive information-flow security In ACMSIGARCH computer architecture news vol 43 ACM pp 503ndash516
58 Zhang S Wright A Bourgeat T Arvind A (2018) Composablebuilding blocks to open up processor design In 2018 51stannual IEEEACM international symposium on microarchitecture(MICRO) IEEE pp 68ndash81
59 Zhang T Lee RB (2014) New models of cache architecturescharacterizing information leakage from cache side channels InProceedings of the 30th annual computer security applicationsconference ACM pp 96ndash105
60 Zhang Y Parikh D Sankaranarayanan K Skadron K Stan M(2003) HotLeakage a temperature-aware model of subthresholdand gate leakage for architects University of Virginia Dept ofComputer Science Tech Report CS-2003 5
Publisherrsquos Note Springer Nature remains neutral with regard tojurisdictional claims in published maps and institutional affiliations
J Hardw Syst Secur (2019) 3397ndash425 425
Table4
Exi
stin
gse
cure
cach
esrsquo
prot
ectio
nag
ains
tal
lpo
ssib
letim
ing-
base
dvu
lner
abili
ties
with
last
step
tobe
mem
ory
acce
ssre
late
dop
erat
ions
Sin
gle
ina
gree
nce
llm
eans
this
cach
eca
npr
even
tth
eco
rres
pond
ing
vuln
erab
ility
i
na
pink
cell
mea
nsth
isca
che
can
prev
ent
the
corr
espo
ndin
gvu
lner
abili
tyin
som
ede
gree
A
timesin
are
dce
llm
eans
this
cach
eca
nnot
prev
ent
this
vuln
erab
ility
Fur
ther
mor
efo
rea
chca
che
we
anal
yze
norm
alex
ecut
ion
(lef
tcol
umn
unde
rth
eca
che
nam
e)an
dsp
ecul
ativ
eex
ecut
ion
(rig
htco
lum
nun
der
the
cach
ena
me)
a Dyn
amic
adju
stm
ento
fw
ays
for
diff
eren
tthr
eads
isas
sum
edto
bepr
oper
lyus
edac
cord
ing
toth
eru
nnin
gpr
ogra
mrsquos
cach
eus
age
bSo
me
soft
war
eas
sum
ptio
nslis
ted
inth
een
trie
sin
this
colu
mn
have
been
impl
emen
ted
byth
eca
chersquo
sre
late
dso
ftw
are
c Flus
his
disa
bled
but
cach
eco
here
nce
mig
htbe
used
todo
the
data
rem
oval
dFo
rL
1ca
che
and
TL
Bf
lush
ing
isdo
nedu
ring
cont
exts
witc
he T
hete
chni
ques
are
impl
emen
ted
inL
1ca
che
TL
Ban
dla
st-l
evel
cach
ew
hich
cons
isto
fth
ew
hole
cach
ehi
erar
chy
whe
reL
1ca
che
and
TL
Bre
quir
eso
ftw
are
flus
hpr
otec
tion
and
the
last
-lev
elca
che
can
beac
hiev
edby
sim
ple
hard
war
epa
rtiti
onin
gTo
prot
ecta
llle
vels
ofca
ches
the
soft
war
eas
sum
ptio
nsne
edto
bead
ded
f The
tech
niqu
eis
now
only
impl
emen
ted
inla
st-l
evel
cach
egT
hete
chni
que
now
only
targ
ets
shar
edca
che
hT
hete
chni
que
only
targ
ets
incl
usio
nla
st-l
evel
cach
ei T
hete
chni
que
targ
ets
data
cach
ehi
erar
chy
j For
the
last
-lev
elca
che
cach
eis
part
ition
edbe
twee
nth
evi
ctim
and
the
atta
cker
kT
hete
chni
que
can
cont
rolt
hepr
obab
ilitie
sof
the
vuln
erab
ility
tobe
succ
essf
ulto
beex
trem
ely
smal
ll T
hete
chni
que
can
wor
kin
shar
edr
ead
only
mem
ory
whi
leno
twor
king
insh
ared
wri
tabl
em
emor
ym
Ran
dom
dela
ybu
tnot
rand
omm
appi
ngca
non
lyde
crea
seth
epr
obab
ilitie
sof
atta
cker
inso
me
limite
dde
gree
J Hardw Syst Secur (2019) 3397ndash425412
Table5
Exi
stin
gse
cure
cach
esrsquop
rote
ctio
nag
ains
tall
poss
ible
timin
g-ba
sed
vuln
erab
ilitie
sw
ithla
stst
epto
bein
valid
atio
nre
late
dop
erat
ions
Sin
gle
ina
gree
nce
llm
eans
this
cach
eca
npr
even
tth
eco
rres
pond
ing
vuln
erab
ility
in
api
nkce
llm
eans
this
cach
eca
npr
even
tthe
corr
espo
ndin
gvu
lner
abili
tyin
som
ede
gree
Atimes
ina
red
cell
mea
nsth
isca
che
cann
otpr
even
tthi
svu
lner
abili
ty
Furt
herm
ore
for
each
cach
ew
ean
alyz
eno
rmal
exec
utio
n(l
eftc
olum
nun
der
the
cach
ena
me)
and
spec
ulat
ive
exec
utio
n(r
ight
colu
mn
unde
rth
eca
che
nam
e)
a Dyn
amic
adju
stm
ento
fw
ays
for
diff
eren
tthr
eads
isas
sum
edto
bepr
oper
lyus
edac
cord
ing
toth
eru
nnin
gpr
ogra
mrsquos
cach
eus
age
bSo
me
soft
war
eas
sum
ptio
nslis
ted
inth
een
trie
sin
this
colu
mn
have
been
impl
emen
ted
byth
eca
chersquo
sre
late
dso
ftw
are
c Flus
his
disa
bled
but
cach
eco
here
nce
mig
htbe
used
todo
the
data
rem
oval
dFo
rL
1ca
che
and
TL
Bf
lush
ing
isdo
nedu
ring
cont
exts
witc
he T
hete
chni
ques
are
impl
emen
ted
inL
1ca
che
TL
Ban
dla
st-l
evel
cach
ew
hich
cons
isto
fth
ew
hole
cach
ehi
erar
chy
whe
reL
1ca
che
and
TL
Bre
quir
eso
ftw
are
flus
hpr
otec
tion
and
the
last
-lev
elca
che
can
beac
hiev
edby
sim
ple
hard
war
epa
rtiti
onin
gTo
prot
ecta
llle
vels
ofca
ches
the
soft
war
eas
sum
ptio
nsne
edto
bead
ded
f The
tech
niqu
eis
now
only
impl
emen
ted
inla
st-l
evel
cach
egT
hete
chni
que
now
only
targ
ets
shar
edca
che
hT
hete
chni
que
only
targ
ets
incl
usio
nla
st-l
evel
cach
ei T
hete
chni
que
targ
ets
data
cach
ehi
erar
chy
j For
the
last
-lev
elca
che
cach
eis
part
ition
edbe
twee
nth
evi
ctim
and
the
atta
cker
kT
hete
chni
que
can
cont
rolt
hepr
obab
ilitie
sof
the
vuln
erab
ility
tobe
succ
essf
ulto
beex
trem
ely
smal
ll T
hete
chni
que
can
wor
kin
shar
edr
ead
only
mem
ory
whi
leno
twor
king
insh
ared
wri
tabl
em
emor
ym
Ran
dom
dela
ybu
tnot
rand
omm
appi
ngca
non
lyde
crea
seth
epr
obab
ilitie
sof
atta
cker
inso
me
limite
dde
gree
J Hardw Syst Secur (2019) 3397ndash425 413
utilize differentiating sensitive data partitioning andrandomization
Differentiating Sensitive Data (columns for CATalyst cacheto columns for Random Fill cache in Tables 4 and 5) allowsthe victim or attacker software or management software toexplicitly label a certain range of the data of victim whichthey think is sensitive The victim process or managementsoftware is able to use cache-specific instructions to protectthe data and limit internal interference between victimrsquosown data For example it is possible to disable victimrsquosown flushing of victimrsquos labeled data and therefore preventvulnerabilities that leverage flushing This technique allowsthe designer to have stronger control over security criticaldata rather than forcing the system to assume all of victimrsquosdata is sensitive However how to identify sensitive dataand whether this identification process is reliable are openresearch questions for caches that support differentiation ofsensitive data
This technique is independent of whether a cache usespartitioning or randomization techniques to eliminate sidechannels between the attacker and the victim Caches thatare able to label and identify sensitive data have theadvantage in preventing internal interference since they areable to differentiate sensitive data from the normal dataand can make use of special instructions to give moreprivileges to sensitive data However it requires careful usewhen identifying the actual sensitive data and implementingcorresponding security features on the cache
Comparing PL cache with SPlowast cache although bothof them use partitioning flush is able to be implementedto be disabled for victimrsquos sensitive data in PL cachewhere Vu V inv
a Vu (slow) (one type of Flush +Time) is prevented Newcache is able to prevent Vu Va Vu (slow) (one type of Bernsteinrsquos Attack [3])while most of the caches without ability to differentiatesensitive data cannot because Newcache disallows replacingdata as long as either data to be evicted or data to becached is identified to be sensitive However permittingdifferentiation of sensitive data can potentially backfire onthe cache itself For example Random Fill cache cannotprevent Vu Ad Vu (slow) (one type of Evict +Time [34]) which most of the other caches can prevent oravoid because the random fill technique loses its intendedrandom behavior when the security critical data is initiallyloaded into the cache in Step 1
Partitioning-Based Caches usually limit the victim and theattacker to be able to only access a limited set of cacheblock (columns for SPlowast cache to column for PL cachein Tables 4 and 5) For example either there is staticor dynamic partitioning of caches which allocates someblocks to High victim and Low attacker The partitioning
can be based not just on whether the memory access isvictimrsquos or attackerrsquos but also on where the access is to(eg High partition is determined by the data address) Forspeculative execution attackerrsquos code can be the part ofspeculation or out-of-order load or store which is able to bepartitioned (eg using speculative load buffer) from othernormal operations The partitioning granularity can be cachesets cache lines or cache ways Partitioning-based securecaches are usually able to prevent external interference bypartitioning but are weak at preventing internal interferenceWhen partitioning is used interference between the attackerand the victim or data belonging to different securitylevels should not be possible and attacks based on externalinterference between the victim and the attacker will failHowever the internal interference of victimrsquos own datais hard to be prevented by the partitioning-based cachesWhatrsquos more partitioning is recognized to be wastefulin terms of cache space and inherently degrades systemperformance [49] Dynamic partitioning can help limit thenegative performance and space impacts but it could be ata cost of revealing some information when adjusting thepartitioning size for each part It also does not help withinternal interference prevention
In terms of the three-step model the partitioning-basedcaches excel at making use of partitioning techniques todisallow the attacker to set initial states (Step 0) of victimpartition by use of flushing or eviction and therefore bringuncertainty to the final timing observation made by theattacker
SPlowast cache can prevent external miss-based interferencebut it still allows the victim and the attacker to getcache hits due to each otherrsquos data which makes hit-based vulnerabilities happen eg Vd Vu Va (fast)(one type of Cache Internal Collision [5]) vulnerabilityis one of the examples that SPlowast cache cannot preventSecVerilog cache is similar to SPlowast cache but prevents theattacker from directly getting cache hit due to victimrsquos datafor confidentiality and therefore prevents vulnerabilitiessuch as Ainv
a Vu Aa (fast) (one type of Flush+ Reload [55]) SHARP cache mainly uses partitioningcombined with random eviction to minimize the probabilityof evicting victimrsquos data and prevent external miss-basedvulnerabilities It is vulnerable to hit-based or internalinterference vulnerabilities such as Vu Va Vu (slow)(one type of Bernsteinrsquos Attack [3]) vulnerability DAWGcache will only allow the data to get a cache hit if bothits address and the process ID are the same Thereforecompared with normal partitioning cache such as SPlowast cacheit is able to prevent vulnerabilities such as Vd Vu Ainv
d
(fast) (one type of Prime + Flush)SecDCP and NoMo cache both leverage dynamic
partitioning to improve performance Compared withSecVerilog cache SecDCP cache introduces certain side
J Hardw Syst Secur (2019) 3397ndash425414
channels which manifest themselves when the number ofways assigned to the victim and attacker changes egVu Ainv
a Vu (slow) (one type of Flush + Time)vulnerability NoMo cache behaves more carefully whenchanging the number of ways during dynamic partitioninghowever it requires victimrsquos sensitive data to fit intothe assigned partitions otherwise it will be put into theunreserved way and allow eviction by the attacker SecDCPdoes not have unreserved way All the space in the cachewill be either belongs to High or Low partition
Sanctum cache and CATalyst cache are both controlledby a powerful software monitor and they disallow securepage sharing between victim and attacker to preventvulnerabilities such as Ad Vu Aa (fast) (onetype of Flush + Reload [55]) Sanctum cache does notconsider internal interference while CATalyst cache ismore carefully designed to prevent different vulnerabilitieswith the implemented software system so far supportingpreventing all of the vulnerabilities but only works forLLC and with high software implementation complexityand some assumptions that might be hard to achievein other scenarios eg assuming the secure partition isbig enough to fit all the secure data MI6 cache is thecombination of Sanctum and disabling speculation wheninteracting with the outside world Therefore in normalexecution it behaves the same as Sanctum For speculativeexecution because it will simply disable all the speculationwhen involving the outside world the external interferencevulnerability such as Vd Vu Ad (slow) (one type ofEvict + Probe) vulnerability will be prevented
InvisiSpec cache does not modify the original cachestate but places the data in a speculative buffer partitionduring the speculation or out-of-order load or store Sinceduring speculation cache state is not actually updated thespeculative execution cannot trigger any of the steps inthe three-step model RIC cache focuses on eviction basedattack and therefore are good at preventing even someinternal miss-based vulnerability such as Vu Va Vu
(slow) (one type of Bernsteinrsquos Attack [3]) but are bad atall hit-based vulnerabilities PL cache is line-partitionedand uses locking techniques for victimrsquos security criticaldata It can prevent many vulnerabilities because preloadingand locking secure data disallow the attacker or non-securevictim data to set initial states (Step 0) for victim partitionand therefore brings uncertainty to the final observation bythe attacker eg Ad Vu Va (fast) (one type of CacheInternal Collision [5]) vulnerability is prevented
Randomization-Based Caches (columns for SHARP cacheand columns for RP cache to columns for Non Deterministiccache in Tables 4 and 5) inherently de-correlate therelationship between information of victimrsquos securitycritical datarsquos address and observed timing from cache
hit or miss or between the address and observed timingof flush or cache coherence operations For speculativeexecution they also de-correlate the relationship betweenthe address of the data being accessed during speculativeexecution or out-of-order load or store and the observedtiming from a cache hit or miss Randomization can beused when bringing data into the cache evicting data orboth Some designs randomize the address to cache setmapping As a result of the randomization the mutualinformation from the observed timing due to having ornot having data in the cache could be reduced to 0 ifrandomization is done on every memory access Somesecure caches use randomization to avoid many of the miss-based internal interference vulnerabilities However theymay still suffer from hit-based vulnerabilities especiallywhen the vulnerabilities are related to internal interferenceHowever randomization is also likewise recognized toincrease performance overheads [23] It also requires afast and secure random number generator Most of therandomization is cache-line-based and can be combinedwith differentiation of sensitive data to be more efficient
RP cache allows eviction between different sensitivedata which leaves vulnerabilities such as Vu Va Vu
(slow) (one type of Bernsteinrsquos Attack [3]) still possiblewhile Newcache prevents this Both of the RP cacheand Newcache are not able to prevent hit-based internal-interference vulnerabilities such as Ainv
a Vu Va
(fast) (one type of Cache Internal Collision [5]) RandomFill cache is able to use total de-correlation of memoryaccess and cache access of victimrsquos security critical datato prevent most of the internal and external interferenceHowever when security critical data is initially directlyloaded into the cache block for Step 1 Random Fill cachewill not randomly load security critical data and allowsvulnerabilities such as Vu V inv
a Vu (slow) (one typeof Flush + Time) vulnerability to exist CEASER cache usesencryption scheme plus dynamic remapping to randomizemapping from memory addresses to cache sets Howeverthis targets eviction-based attacks and cannot prevent hit-based vulnerabilities such as Va V inv
u V inva (fast)
(one type of Flush + Probe Invalidation) SCATTER cacheencrypts both the cache address and process ID whenmapping into different cache index to further prevent morehit-based vulnerabilities for shared and read only memoryNon Deterministic Cache totally randomizes timing ofcache accesses by adding delays and can prevent all attacks(but at tremendous performance cost)
61 Estimated Performance and Security Tradeoffs
Table 6 shows the implementation and performanceresults of the secure caches as listed by the designersin the different papers At the extreme end there is
J Hardw Syst Secur (2019) 3397ndash425 415
Table6
Exi
stin
gse
cure
cach
esrsquo
impl
emen
tatio
nm
etho
dpe
rfor
man
cep
ower
and
area
sum
mar
y
J Hardw Syst Secur (2019) 3397ndash425416
the Non Deterministic cache with random delay thesecure cache can prevent all the cache timingndashbasedvulnerabilities in some degreemdashwhile their paper reportsonly 7 degradation in performance we expect it tobe much more for more complex application than AESalgorithm Disabling caches eliminates the attacks but ata huge performance cost Normally a secure cache needsto sacrifice some performance in order to de-correlatememory access with the timing The secure caches thattend to be able to prevent more vulnerabilities usuallyhave weaker performance compared with other securecaches For example more security seems to imply lessperformance
62 Towards Ideal Secure Cache
Based on the above analysis a good secure cache shouldconsider all the 72 types of Strong vulnerabilities egexternal and internal interference and hit-based and miss-based vulnerabilities Considering all factors and basedon Tables 4 and 5 we have several suggestions andobservations for a secure cache design which can defendtiming-based attacks
bull Internal interference is important for caches to preventtiming-based attacks and is the weak point of most ofthe secure caches To prevent this the following threesubpoints should be considered
ndash Miss-based internal interference can be solvedby randomly evicting data to de-correlatememory access with timing information wheneither data to be evicted or data to be cachedis sensitive eg Newcache prevents Vu Va Vu (slow) (one type of BernsteinrsquosAttack [3]) vulnerability
ndash Hit-based internal interference can be solvedby randomly bringing data into the cache egRandom Fill cache prevents Ad Vu Va
(fast) (Cache Internal Collision) vulnerabilityndash To limit internal interference at lower perfor-
mance cost rather than simply assume all ofvictimrsquos data is sensitive it is better to differ-entiate real sensitive data from other data in thevictim code However identification of sen-sitive information needs to be carefully usedeg Random Fill cache is vulnerable to Vu Ad Vu (fast) (one type of Evict + Time [34])vulnerability which most of the secure cachesare able to prevent
bull Direct partitioning between the victim and the attackeralthough may hurt cache space utilization or perfor-mance is good at disallowing attacker to set known
initial state to victimrsquos partition and therefore pre-vents external interference Alternatively careful use ofrandomization can also prevent external interference
It should be noted that some cache designs only focuson certain levels eg CATalyst cache only works at thelast level cache In order to fully protect the whole cachesystem from timing-based attacks all levels of caches in thehierarchy should be protected with related security featuresFor example Sanctum is able to prevent all levels of cachesfrom L1 to last-level cache Consequently secure cachedesign needs to be realizable at all levels of the cachehierarchy
7 RelatedWork
There are a lot of existing attacks exploring timing-based cache channels eg [1 3 5 16ndash19 34 36 5455] Furthermore our recent paper [11] has summarizedcache timingndashbased side-channel vulnerabilities using athree-step model and inspired this work on checkingwhich vulnerability types are truly defeated by the securecaches in context of timing-based attacks In other workZhang and Lee [59] used finite-state machine to modelcache architectures and leveraged mutual information tomeasure potential side-channel leakage of the modeledcache architectures Meanwhile He and Lee [20] modeledinterference using probabilistic information flow graph andused attackerrsquos success probability to estimate differentcachesrsquo ability to defend against some cache timingndashbasedside-channel attacks However they did not explore allpossible vulnerabilities due to cache timingndashbased channels
There is also some other work focusing on cacheside-channel verification [13 14 47] Among theseCacheAudit [14] efficiently computes possible side-channelobservations using abstractions in a modular way Bit-level and arithmetic reasoning is used in [13] for memoryaccesses in the presence of dynamic memory allocationCacheD [47] detects potential cache differences at eachprogram point leveraging symbolic execution and constraintsolving
Hardware transactional memory has also been lever-aged to prevent timing-based cache side-channel attacks [815] Hardware transactional memory (HTM) is availableon modern commercial processors such as Intelrsquos Transac-tional Synchronization Extensions (TSX) Its main featureis to abort the transaction and roll back the modificationswhenever a cache block contained in the read set or writeset is evicted out of the cache In [15] HTM was com-bined with preloading strategy for code and data to preventFlush + Reload attacks in the local setting and Prime andProbe attacks in the cloud setting In [8] the software-level
J Hardw Syst Secur (2019) 3397ndash425 417
solution targets system calls page faults code refactoringand abort reasoning to eliminate not only Prime + ProbeFlush + Reload but also Evict + time and Cache Collisionattacks
8 Conclusion
This paper first proposed a new three-step model in orderto model all possible cache timing vulnerabilities It furtherprovided a cache three-step simulator and reduction rulesto derive effective vulnerabilities allowing us to find onesthat have not been exploited in literature With exhaustiveeffective vulnerability types listed this paper presentedanalysis of 18 secure processor cache designs with respectto how well they can defend against these timing-basedvulnerabilities Our work showed that vulnerabilities basedon internal interference of the victim application aredifficult to protect against and many secure cache designsfail in this We also provided a summary of secure processorcache features that could be integrated to make an idealsecure cache that is able to defend timing-based attacksOverall implementing a secure cache in a processor can be aviable alternative to defend timing-based attacks Howeverit requires design of an ideal secure cache or correction ofexisting secure cache designs to eliminate the few attacksthat they do not protect against
Acknowledgments This work was support in part by NSF grants1651945 and 1813797 and through Semiconductor Research Corpo-ration (SRC) under GRC Task 2844001
Appendix A Attack Strategies Descriptions
This appendix gives overview of the attack strategies shownin Tables 2 and 3 in Section 3 For each attack strategy anoverview of the three steps of the strategy is given Some ofthe strategies are similar and some may not be precise butwe keep and use the original names as they were assignedin prior work One advantage of our three-step model isthat it gives precise definition of each attack Neverthelessthe attack strategy names used before (and added by us forstrategies which did not have such names) may be useful torecall the attacksrsquo high-level operation
Cache Internal Collision In Step 1 cache blockrsquos data isinvalidated by flushing or eviction done by either theattacker or the victim Then the victim accesses secret datain Step 2 Finally the victim accesses data at a knownaddress in Step 3 if there is a cache hit then it reveals thatthere is an internal collision and leaks value of u
Flush + Reload In Step 1 either the attacker or the victiminvalidates the cache blockrsquos data by flushing or evictionThen the victim access secret data in Step 2 Finally theattacker tries to access some data in Step 2 using a knownaddress If a cache hit is observed then addresses from lasttwo steps are the same and the attacker learns the secretaddress This strategy has similar Step 1 and Step 2 asCache Internal Collision vulnerability but for Step 3 it isthe attacker who does the reload access
Reload + Time (New Name Assigned in this Paper) In Step
1 secret data is invalidated by the victim Then the attackerdoes some known data access in Step 2 that could possiblybring back the invalidated the victimrsquos secret data in Step
1 In Step 3 if the victim reloads the secret data a cachehit is observed and the attacker can derive the secret datarsquosaddress
Flush + Probe (New Name Assigned in this Paper) In Step
1 the victim or the attacker access some known addressIn Step 2 the victim invalidates secret data In Step 3reloading of Step 1rsquos data and observation of a cache misswill help the attacker learn that the secret data maps to theknown address from Step 1
Evict + Time In Step 1 some victimrsquos secret data is put intothe cache by the victim itself In Step 2 the attacker evicts aspecific cache set by performing a memory related operationthat is not a flush In Step 3 the victim reloads secret dataand if a cache miss is observed the will learn the secretdatarsquos cache set information This attack has similar Step 1and Step 3 as Flush + Time vulnerability but for Step 2 inEvict + Time the attacker invalidates some known addressallowing it to find the full address of the secret data insteadof evicting a cache set to only find the secret datarsquos cacheindex as in the Flush + Time attack
Prime + Probe In Step 1 the attacker primes the cache setusing data at address known to the attacker In Step 2 thevictim accesses the secret data which possibly evicts datafrom Step 1 In Step 3 the attacker probes each cache setand if a cache miss is observed the attacker knowns thesecret data maps to the cache set he or she primed
Bernsteinrsquos Attack This attack strategy leverages the vic-timrsquos internal interference to trigger the miss-based attackFor one case the victim does the same secret data access inStep 1 and Step 3 while in Step 2 the victim tries to evictone whole cache setrsquos data by known data accesses If cachemiss is observed in Step 3 that will tell the attacker thecache set is the one secret data maps to For another casethe victim primes and probe a cache set in Step 1 and Step
3 driven by the attacker while in Step 2 the victim tries to
J Hardw Syst Secur (2019) 3397ndash425418
access the secret data Similar to the first case observingcache miss in Step 3 tells the attacker the cache set is theone secret data maps to
Evict + Probe (NewName Assigned in this Paper) In Step 1victim evict the cache set using the access to a data at anaddress known to the attacker In Step 2 the victim accessessecret data which possibly evicts data from Step 1 In Step
3 the attacker probes each cache set using the same data asin Step 1 if a cache miss is observed the attacker knownsthe secret data maps to the cache set he or she primed Thisattack strategy has similar Step 2 and Step 3 as Prime +Probe attack but for Step 1 it is the victim that does theeviction accesses
Prime + Time (NewName Assigned in this Paper) In Step 1the attacker primes the cache set using access to data at anaddress known to the attacker In Step 2 the victim accessessecret data which possibly evicts data from Step 1 In Step
3 the victim probes each cache set using the same data Step
1 if a cache miss is observed the attacker knowns the secretdata maps to the cache set he or she primed in Step 1 Thisattack strategy has similar Step 1 and Step 2 as Prime +Probe attack but for Step 3 it is the victim that does theprobing accesses
Flush + Time (NewNameAssigned in this Paper) The victimaccesses the same secret data in Step 1 and Step 3 whilein Step 2 the attacker tries to invalidate data at a knownaddress If cache miss is observed in Step 3 that will tellthe attacker the data address he or she invalidated in Step 2maps to the secret data
Invalidation related (new names assigned in this paper)Vulnerabilities that have names ending with ldquoinvalidationrdquoin Table 3 correspond to the vulnerabilities that havethe same name (except for the ldquoinvalidationrdquo part) inTable 2 The difference between each set of correspondingvulnerabilities is that the vulnerabilities ending withldquoinvalidationrdquo use invalidation related operation in the laststep to derive the timing information rather than the normalmemory access related operations
Appendix B Soundness Analysisof the Three-StepModel
In this section we analyze the soundness of the three-stepmodel to demonstrate that the three-step model can coverall possible timing-based cache vulnerabilities in normalcaches If there is a vulnerability that is represented usingmore than three steps the steps can be reduced to only three
steps or a three-step sub-pattern can be found in the longerrepresentation
In the below analysis we use β to denote the numberof memory-related operations ie steps in a representationof a vulnerability We show that β = 1 is not sufficient torepresent a vulnerability β = 2 covers some vulnerabilitiesbut not all β = 3 represents all the vulnerabilities andβ gt 3 can be reduced to only three steps or a three-step sub-pattern can be found in the longer representation Knownaddresses refer to all the cache states that interferencewith the data a aalias and d Unknown address refersu An access to a known memory address is denoted asknown access operation and an invalidation of a knownmemory address is denoted as known inv operationThe known access operation and known inv operation
together make up not u operations An unknown memoryrelated operation (containing u) is denoted as u operation
B1 Patterns with β = 1
When β = 1 there is only one memory-relatedoperation and it is not possible to create interferencebetween memory-related operations since two memory-related operations are the minimum requirement for aninterference Furthermore β = 1 corresponds to the three-step pattern with both Step 1 and Step 2 being since thecache state gives no information and Step 3 being theone operation These types of patterns are all examined bythe cache three-step simulator and none of these types arefound to be effective Consequently a vulnerability cannotexit when β = 1
B2 Patterns with β = 2
When β = 2 it satisfies the minimum requirementof an interference for memory related operations andcorresponds to the three-step cases where Step 1 is and Step 2 and Step 3 are the two operations Thesetypes are all examined by the cache three-step simulatorand some of them belong to Weak Vulnerabilities like Aa Vu Therefore three-step cases where Step
1 is have corresponding effective vulnerabilities shownin Table 2 Consequently β = 2 can represent someweak vulnerabilities but not all vulnerabilities as there existsome that are represented with three steps as discussednext
B3 Patterns with β = 3
When β = 3 we have tested all possible combinations ofthree-step memory related operations in Section 33 usingour cache simulator for the three-step model We foundthat there are in total 72 types of Strong Vulnerabilities
J Hardw Syst Secur (2019) 3397ndash425 419
and 64 types of Weak Vulnerabilities that are representedby patterns with β = 3 steps Consequently β = 3can represent all the vulnerabilities (including some weakones where Step 1 is ) Using more steps to representvulnerabilities is not necessary as discussed next
B4 Patterns with β gt 3
When β gt 3 the pattern of memory-related operations fora vulnerability can be reduced using the following rules
B41 Subdivision Rules
First a set of subdivision rules is used to divide the longpattern into shorter patterns following the below rulesEach subdivision rule should be applied recursively beforeapplying the next rule
Subdivision Rule 1 If the longer pattern contains a sub-pattern such as the longer pattern canbe divided into two separate patterns where is assignedas Step 1 of the second pattern This is because givesno timing information and the attacker loses track of thecache state after This rule should be recursively applieduntil there are no sub-patterns left with a in the middleor as last step ( in the last step will be deleted) in thelonger pattern
Subdivision Rule 2 Next if a pattern (derived afterrecursive application of the Rule 1 contains a sub-patternsuch as AinvVinv the longer pattern canbe divided into two separate patterns where AinvVinv isassigned as Step 1 of the second pattern This is becauseAinvVinv will flush all the timing information of thecurrent block and it can be used as the flushing step forStep 1 eg vulnerability Ainv Vu Aa(f ast)shown in Table 2 AinvVinv cannot be a candidate formiddle steps or the last step because it will flush alltiming information making the attacker unable to deducethe final timing with victimrsquos sensitive address translationinformation This rule should be recursively applied untilthere are no sub-patterns left with a AinvVinv in themiddle or the last step (AinvVinv in the last step will bedeleted)
B42 Simplification Rules
For each of the patterns resulting from the subdivisionof the original pattern we define Commute Rules UnionRules and Reduction Rules for a each set of two adjacentsteps in these remaining patterns In Table 7 we showall the possible cases of the rule applying conditions for
each adjacent two steps regardless of the attackerrsquos access(A) or the victimrsquos access (V ) The table shows whetherthe corresponding two steps can be commuted reduced orunioned (and the reduced or the unioned result if the rulescan be applied)
B421 Commute Rules
Suppose there are two adjacent steps M and N for a memorysequences M N If commuting M andN lead to the same observation result ie M N and N M will have thesame timing observation information in the final step for theattacker we can freely exchange the place of M and N inthis pattern In this case we have more chance to Reduceand Union the steps within the memory sequence by thefollowing Rules In the possible commuting process we willtry every possible combinations to commute different pairsof two steps that are able to apply the Commute Rules andthen further apply Reduce Rules and Union Rules to seewhether the commute is effective ie there can be stepsreduced or unioned after the proper commuting process Thefollowing two adjacent memory-related operations can becommuted
ndash Commute Rule 1 For two adjacent steps if one stepis a known access operation and another step is aknown inv operation and the addresses they referto are different these two steps can be commuted nomatter which position of the two steps they are in withinthe whole memory sequence It will show a ldquoyesrdquo forthe corresponding two-step pattern for the CommuteRule 1 column if these two can be commuted in Table 7
ndash Commute Rule 2 A superset of two-step patterns thatcan apply Commute Rule 1 can be commuted if thesecond step of these two adjacent steps is not thelast step in the whole memory sequence There aresome two adjacent steps that can only be commutedif the second step of these two adjacent steps is notthe last step in the whole memory sequence Therewill be a ldquoyesrdquo for the corresponding two-step patternfor the Commute Rule 2 column and a ldquonordquo for thecorresponding two-step pattern for the Commute Rule 1column in Table 7
B422 Reduction Rules
If the memory sequence after applying Commute Ruleshave a sub-pattern that has two adjacent steps both relatedto known addresses or both related to unknown address(including repeating states) the two adjacent steps can bereduced to only one following the reduction rules (if thetwo-step pattern has ldquoyesrdquo for the Column ldquoUnion Rule or
J Hardw Syst Secur (2019) 3397ndash425420
Table7
Rul
esfo
rco
mbi
ning
two
adja
cent
step
s
Firs
tSe
cond
Com
mute
Com
mute
Union
Rule
orC
ombi
ned
Firs
tSe
cond
Com
mute
Com
mute
Union
Rule
orC
ombi
ned
Rule1
Rule2
ReduceRule
Step
Rule1
Rule2
ReduceRule
Step
aa
yes
yes
yes
aa
inv
ano
noye
sa
aa
ali
as
nono
yes
aali
as
ain
va
ali
as
yes
yes
yes
aali
as
ad
nono
yes
da
inv
dye
sye
sye
sd
au
nono
nominus
ain
vu
nono
nominus
aa
inv
nono
yes
ain
va
inv
ain
vye
sye
sye
sa
inv
aa
ali
asin
vye
sye
sye
sa
ain
va
ali
asin
vye
sye
sye
sU
nio
n(a
inva
ali
asin
v)
ad
inv
yes
yes
yes
aa
inv
din
vye
sye
sye
sU
nio
n(a
invd
inv)
au
inv
nono
nominus
ain
vu
inv
noye
sno
minusa
ali
as
ano
noye
sa
aali
asin
va
yes
yes
yes
a
aali
as
aali
as
yes
yes
yes
aali
as
aali
asin
va
ali
as
nono
yes
aali
as
aali
as
dno
noye
sd
aali
asin
vd
yes
yes
yes
d
aali
as
uno
nono
minusa
ali
asin
vu
nono
nominus
aali
as
ain
vye
sye
sye
sa
ali
as
aali
asin
va
inv
yes
yes
yes
Unio
n(a
inva
ali
asin
v)
aali
as
aali
asin
vno
noye
sa
ali
asin
va
ali
asin
va
ali
asin
vye
sye
sye
sa
ali
asin
v
aali
as
din
vye
sye
sye
sa
ali
as
aali
asin
vd
inv
yes
yes
yes
Unio
n(d
inva
ali
asin
v)
aali
as
uin
vno
nono
minusa
ali
asin
vu
inv
noye
sno
minusd
ano
noye
sa
din
va
yes
yes
yes
a
da
ali
as
nono
yes
aali
as
din
va
ali
as
yes
yes
yes
aali
as
dd
yes
yes
yes
dd
inv
dno
noye
sd
du
nono
nominus
din
vu
noye
sno
minusd
ain
vye
sye
sye
sd
din
va
inv
yes
yes
yes
Unio
n(a
invd
inv)
da
ali
asin
vye
sye
sye
sd
din
va
ali
asin
vye
sye
sye
sU
nio
n(d
inva
ali
asin
v)
dd
inv
nono
yes
din
vd
inv
din
vye
sye
sye
sd
inv
du
inv
noye
sno
minusd
inv
uin
vno
yes
nominus
ua
nono
nominus
uin
va
nono
nominus
ua
ali
as
nono
nominus
uin
va
ali
as
nono
nominus
ud
nono
nominus
uin
vd
noye
sno
minusu
uye
sye
sye
su
uin
vu
nono
yes
u
ua
inv
nono
nominus
uin
va
inv
noye
sno
minusu
aali
asin
vno
nono
minusu
inv
aali
asin
vno
yes
nominus
ud
inv
noye
sno
minusu
inv
din
vno
yes
nominus
uu
inv
nono
yes
uin
vu
inv
uin
vye
sye
sye
su
inv
J Hardw Syst Secur (2019) 3397ndash425 421
Reduce Rulerdquo and has no Union result for the ldquoCombinedSteprdquo Column in Table 7
ndash Reduction Rule 1 For two u operations although u isunknown both of the accesses target on the same u socan be reduced to only keep the second access in thememory sequence
ndash Reduction Rule 2 For two known adjacent memoryaccessndashrelated operations (known access operation)they always result in a deterministic state of the secondmemory accessndashrelated cache block so these two stepscan be reduced to only one step
ndash Reduction Rule 3 For two adjacent steps if onestep is known access operation and another one isknown inv operation no matter what order theyhave and the address they refer to is the same these twocan be reduced to one step which is the second step
B423 Union Rules
Suppose there are two adjacent steps M and N for memorysequences M N If combining M andN leads to the same timing observation result ie M N and Union(M N) will havethe same timing observation information in the final step forthe attacker we can combine step M and N to be a joint onestep for this memory sequence defined as Union(M N)Two adjacent steps that can be combined are discussed inthe following cases
ndash Union Rule 1 Two invalidations to two knowndifferent memory addresses can be applied UnionRule 1 known inv operation are two operations bothinvalidating some known address therefore they canbe combined to only one step The Union Rule can becontinuously done to union all the adjacent invalidationstep that invalidates known different memory addresses
B424 Final Check Rules
Each long memory sequence will recursively apply thesethree categorizations of the rules in the order Com-mute Rules first to put known access operations andknown inv operation that targets the same address as nearas possible and u operations and not u operations areputting together as much as possible The Reduced Rulesare then checked and applied to the processed memorysequence to reduce the steps Then the Union Rule isapplied to the processed memory sequence
The recursion at each application to these threecategorizations of the rules should be always applied andreduce at least one step until the resulting sequence matchesone of the two possible cases
bull the long (β gt 3) memory sequence with u operation
and not u operation is further reduced to a sequencewhere there are at most three steps in the followingpatterns or less
ndash u operation not u operation u operation
ndash not u operation u operation not u operation
There might be possible extra or AinvV inv beforethese three-step pattern where
bull An extra in the first step will not influencethe result and can be directly removed
bull If an extra AinvV inv in the first step
ndash If followed by known access ope-ration AinvV inv can be removeddue to the actual state further put intothe cache block
ndash If followed by known inv opera-t ion or V inv
u AinvV inv can also beremoved since the memory locationis repeatedly flushed by the two steps
ndash If followed by Vu worst case will beAinvV inv Vu not u operation u operationwhich is either an effective vul-nerability according to Table 2and reduction rules shown inSection 33 or AinvV inv Vu Ainv
d V invd u operation where
Vu Ainvd V inv
d can further beapplied Commute Rule 2 to reduceand be within three steps
In this case the steps are finally within three steps andthe checking is done
bull There exist two adjacent steps that cannot be appliedany Rules above and requires the Rest Checking
The only left two adjacent steps that cannot be appliedby any of the three categorizations of the Rules are thefollowing
ndash AaVaAaalias Vaalias AdVdAinva V inv
a
Ainvaalias V inv
aalias Vu ndash AaVaAaalias Vaalias V inv
u ndash Vu AaVaAaalias Vaalias AdVd
Ainva V inv
a Ainvaalias V inv
aalias ndash V inv
u AaVaAaalias Vaalias We manually checked all of the two adjacent step patterns
above and found that adding extra step before or after these
J Hardw Syst Secur (2019) 3397ndash425422
Algorithm 2 -Step ( 3) pattern reduction
Input number of steps of the pattern
a two-dimensional dynamic-size array [0] contains the states of each step of the original pattern in
order [1] [2] are empty initially
Output reduced effective vulnerability pattern(s) array It will be an empty list if the original pattern does not
correspond to an effective vulnerability
1 = Oslash
2 while contain( ) and index not 0 do3 = 1 ( )
4 end while5 while ( contain( ) and index not 0) or ( contain( ) and index not 0) do6 = 2 ( )
7 end while8 while ( is ineffecitve or has interval effective three steps) do9 = ( )
10 = ( )
11 = ( )
12 if ( is ineffecitve or has interval effective three steps) then13 += ( )
14 end if15 end while16 return
two steps can either generate two adjacent step patterns thatbe processed by the three Rules where further step can bereduced or construct effective vulnerability according toTable 2 and reduction rules shown in Section 33 wherethe corresponding pattern can be treated effective and thechecking is done
B43 Algorithm for Reducing and CheckingMemorySequence
Algorithm 2 is used to (i) reduce a β-step (β gt 3)pattern to a three-step pattern thus demonstrating that thecorresponding β gt 3 step pattern actually is equivalent tothe output three-step pattern and represents a vulnerabilitythat is captured by an existing three-step pattern or (ii)demonstrate that the β-step pattern can be mapped to one ormore three-step vulnerabilities It is not possible for a β-stepvulnerability pattern to not be either (i) or (ii) after doing theRule applications Key outcome of our analysis is that anyβ-step pattern is not a vulnerability or if it is a vulnerabilityit maps to either outputs (i) or (ii) of the algorithm
Inside the Algorithm 2 contain() represents a function tocheck if a list contains a corresponding state is ineffective()represents a function that checks the corresponding memorysequence does not contain any effective three-stepshas interval effective three steps() represents a functionthat check if the corresponding memory sequence can bemapped to one or more three-step vulnerabilities
B44 Summary
In conclusion the three-step model can model all possibletiming-based cache vulnerability in normal caches Vulner-abilities which are represented by more than three steps canbe always reduced to one (or more) vulnerabilities from ourthree-step model and thus using more than three step is notnecessary
References
1 Acıicmez O Koc CK (2006) Trace-driven cache attacks on AES(short paper) In International conference on information andcommunications security Springer pp 112ndash121
2 Agrawal D Archambeault B Rao JR Rohatgi P (2002) TheEM side-channel (s) In International workshop on cryptographichardware and embedded systems Springer pp 29ndash45
3 Bernstein DJ (2005) Cache-timing attacks on AES4 Binkert N Beckmann B Black G Reinhardt SK Saidi A Basu
A Hestness J Hower DR Krishna T Sardashti S et al (2011)The gem5 simulator ACM SIGARCH computer architecture news39(2)1ndash7
5 Bonneau J Mironov I (2006) Cache-collision timing attacksagainst AES In International workshop on cryptographichardware and embedded systems Springer pp 201ndash215
6 Borghoff J Canteaut A Guneysu T Kavun EB Knezevic MKnudsen LR Leander G Nikov V Paar C Rechberger Cet al (2012) Princendasha low-latency block cipher for pervasivecomputing applications In International conference on the theoryand application of cryptology and information security Springerpp 208ndash225
J Hardw Syst Secur (2019) 3397ndash425 423
7 Bourgeat T Lebedev I Wright A Zhang S Devadas S et al (2019)MI6 Secure enclaves in a speculative out-of-order processorIn Proceedings of the 52nd Annual IEEEACM InternationalSymposium on Microarchitecture ACM pp 42ndash56
8 Chen S Liu F Mi Z Zhang Y Lee RB Chen H Wang X(2018) Leveraging hardware transactional memory for cache side-channel defenses In Proceedings of the 2018 on Asia conferenceon computer and communications security ACM pp 601ndash608
9 Clark SS Ransford B Rahmati A Guineau S Sorber J Xu WFu K (2013) Wattsupdoc power side channels to nonintrusivelydiscover untargeted malware on embedded medical devices InPresented as part of the 2013 USENIX workshop on healthinformation technologies
10 Costan V Lebedev IA Devadas S (2016) Sanctum minimalhardware extensions for strong software isolation In USENIXsecurity symposium pp 857ndash874
11 Deng S Xiong W Szefer J (2018) Cache timing side-channel vulnerability checking with computation tree logic InProceedings of the 7th international workshop on hardware andarchitectural support for security and privacy ACM p 2
12 Domnitser L Jaleel A Loew J Abu-Ghazaleh N Ponomarev D(2012) Non-monopolizable caches low-complexity mitigation ofcache side channel attacks ACM Transactions on Architectureand Code Optimization (TACO) 8(4)35
13 Doychev G Kopf B (2017) Rigorous analysis of softwarecountermeasures against cache attacks In Proceedings of the 38thACM SIGPLAN conference on programming language design andimplementation ACM pp 406ndash421
14 Doychev G Kopf B Mauborgne L Reineke J (2015) CacheAudita tool for the static analysis of cache side channels ACM Trans-actions on information and system security (TISSEC) 18(1)4
15 Gruss D Lettner J Schuster F Ohrimenko O Haller I CostaM (2017) Strong and efficient cache side-channel protectionusing hardware transactional memory In USENIX securitysymposium pp 217ndash233
16 Gruss D Maurice C Wagner K Mangard S (2016) Flush+ flusha fast and stealthy cache attack In International conference ondetection of intrusions and malware and vulnerability assessmentSpringer pp 279ndash299
17 Gruss D Spreitzer R Mangard S (2015) Cache template attacksautomating attacks on inclusive last-level caches In USENIXsecurity symposium pp 897ndash912
18 Guanciale R Nemati H Baumann C Dam M (2016) Cache stor-age channels alias-driven attacks and verified countermeasuresIn 2016 IEEE symposium on security and privacy (SP) IEEEpp 38ndash55
19 Gullasch D Bangerter E Krenn S (2011) Cache gamesndashbringingaccess-based cache attacks on AES to practice In 2011 IEEEsymposium on security and privacy (SP) IEEE pp 490ndash505
20 He Z Lee RB (2017) How secure is your cache against side-channel attacks In Proceedings of the 50th annual IEEEACMinternational symposium on microarchitecture ACM pp 341ndash353
21 Intel C (2015) Improving real-time performance by utilizing cacheallocation technology Intel Corporation
22 Kayaalp M Khasawneh KN Esfeden HA Elwell J Abu-Ghazaleh N Ponomarev D Jaleel A (2017) RIC relaxed inclusioncaches for mitigating LLC side-channel attacks In 2017 54thACMEDACIEEE design automation conference (DAC) IEEEpp 1ndash6
23 Keramidas G Antonopoulos A Serpanos DN Kaxiras S (2008)Non deterministic caches a simple and effective defense againstside channel attacks Des Autom Embed Syst 12(3)221ndash230
24 Kessler RE Hill MD (1992) Page placement algorithms for largerealindexed caches ACM Trans Comput Syst (TOCS) 10(4)338ndash359
25 Kiriansky V Lebedev I Amarasinghe S Devadas S Emer J(2018) DAWG a defense against cache timing attacks in spec-ulative execution processors In 2018 51st Annual IEEEACMinternational symposium on microarchitecture (MICRO) IEEEpp 974ndash987
26 Kocher P Genkin D Gruss D Haas W Hamburg M LippM Mangard S Prescher T Schwarz M Yarom Y (2018)Spectre attacks exploiting speculativeexecution In 2019 IEEESymposium on Security and Privacy (SP) IEEE pp 1ndash19
27 Lee RB Kwan P McGregor JP Dwoskin J Wang Z (2005)Architecture for protecting critical secrets in microprocessorsIn ACM SIGARCH computer architecture news vol 33 IEEEComputer Society pp 2ndash13
28 Lee Y Waterman A Avizienis R Cook H Sun C StojanovicV Asanovic K (2014) A 45nm 13 GHz 167 double-precisionGFLOPSW RISC-V processor with vector accelerators InEuropean solid state circuits conference (ESSCIRC) ESSCIRC2014-40th IEEE pp 199ndash202
29 Liu F Ge Q Yarom Y Mckeen F Rozas C Heiser G Lee RB(2016) CATalyst defeating last-level cache side channel attacks incloud computing In 2016 IEEE international symposium on highperformance computer architecture (HPCA) IEEE pp 406ndash418
30 Liu F Lee RB (2014) Random fill cache architecture In2014 47th annual IEEEACM international symposium onmicroarchitecture (MICRO) IEEE pp 203ndash215
31 Liu F Wu H Mai K Lee RB (2016) Newcache secure cachearchitecture thwarting cache side-channel attacks IEEE Micro36(5)8ndash16
32 Luk CK Cohn R Muth R Patil H Klauser A Lowney G WallaceS Reddi VJ Hazelwood K (2005) Pin building customizedprogram analysis tools with dynamic instrumentation In ACMsigplan notices vol 40 ACM pp 190ndash200
33 Masti RJ Rai D Ranganathan A Muller C Thiele L Capkun S(2015) Thermal covert channels on multi-core platforms In 24thUSENIX security symposium (USENIX security 15) pp 865ndash880
34 Osvik DA Shamir A Tromer E (2006) Cache attacks andcountermeasures the case of AES In Cryptographersrsquo track atthe RSA conference Springer pp 1ndash20
35 Patel A Afram F Chen S Ghose K (2011) MARSS Afull system simulator for multicore x86 CPUs In 2011 48thACMEDACIEEE design automation conference (DAC) IEEEpp 1050ndash1055
36 Percival C (2005) Cache missing for fun and profit BSDCan 2005Ottawa 2005 httpwwwdaemonologynetpapershttpdf
37 Hammarlund P Martinez AJ Bajwa AA Hill DL Hallnor EJiang H Dixon M Derr M Hunsaker M Kumar R Osborne RB(2014) Haswell The fourth-generation intel core processor IEEEMicro 1034(2)6ndash20
38 Qureshi MK (2018) CEASER mitigating conflict-based cacheattacks via encrypted-address and remapping In 2018 51Stannual IEEEACM international symposium on microarchitecture(MICRO) IEEE pp 775ndash787
39 Sanchez D Kozyrakis C (2013) ZSIm fast and accuratemicroarchitectural simulation of thousand-core systems In ACMSIGARCH computer architecture news vol 41 ACM pp 475ndash486
40 Schwarz M Schwarzl M Lipp M Gruss D (2018) NetSpectreread arbitrary memory over network In European Symposium onResearch in Computer Security Springer pp 279ndash299
41 Seznec A (1993) A case for two-way skewed-associative cachesACM SIGARCH computer architecture news 21(2)169ndash178
42 Sharkey J Ponomarev D Ghose K (2005) M-sim a flexiblemultithreaded architectural simulation environment Techenicalreport Department of Computer Science State University ofNew York at Binghamton In Technical Report 20012 CompaqComputer Corporation Aug 2001
J Hardw Syst Secur (2019) 3397ndash425424
43 Shivakumar P Jouppi NP (2001) Cacti 30 an integrated cachetiming power and area model In Technical Report 20012Compaq Computer Corporation Aug 2001
44 Taylor G Davies P Farmwald M (1990) The TLB slice-alow-cost high-speed address translation mechanism In 17thannual international symposium on computer architecture 1990Proceedings IEEE pp 355ndash363
45 Thoziyoor S Muralimanohar N Ahn JH Jouppi NP (2008)CACTI 51 Technical Report HPL-2008-20 HP Labs
46 Trippel C Lustig D Martonosi M (2018) MeltdownPrimeand SpectrePrime automatically-synthesized attacks exploitinginvalidation-based coherence protocols arXiv180203802
47 Wang S Wang P Liu X Zhang D Wu D (2017) CacheDidentifying cache-based timing channels in production softwareIn 26th USENIX security symposium USENIX association
48 Wang Y Ferraiuolo A Zhang D Myers AC Suh GE (2016)SecDCP secure dynamic cache partitioning for efficient timingchannel protection In 2016 53nd ACMEDACIEEE designautomation conference (DAC) IEEE pp 1ndash6
49 Wang Z Lee RB (2007) New cache designs for thwarting softwarecache-based side channel attacks In ACM SIGARCH computerarchitecture news vol 35 ACM pp 494ndash505
50 Wang Z Lee RB (2008) A novel cache architecture withenhanced performance and security In 2008 41st IEEEACMinternational symposium on Microarchitecture 2008 MICRO-41IEEE pp 83ndash93
51 Werner M Unterluggauer T Giner L Schwarz M Gruss D Man-gard S (2019) ScatterCache thwarting cache attacks via cache setrandomization In 28th USENIX security symposium (USENIXSecurity 19) USENIX Association Santa Clara CA httpswwwusenixorgconferenceusenixsecurity19presentationwerner
52 Yan M Choi J Skarlatos D Morrison A Fletcher C Torrellas J(2018) InvisiSpec making speculative execution invisible in thecache hierarchy In 2018 51st annual IEEEACM internationalsymposium on microarchitecture (MICRO) IEEE pp 428ndash441
53 Yan M Gopireddy B Shull T Torrellas J (2017) Secure hierarchy-aware cache replacement policy (SHARP) defending againstcache-based side channel attacks In Proceedings of the 44thannual international symposium on computer architecture ACMpp 347ndash360
54 Yao F Doroslovacki M Venkataramani G (2018) Are coherenceprotocol states vulnerable to information leakage In 2018IEEE international symposium on high performance computerarchitecture (HPCA) IEEE pp 168ndash179
55 Yarom Y Falkner K (2014) FLUSH+ RELOAD a high resolutionlow noise L3 cache side-channel attack In USENIX securitysymposium pp 719ndash732
56 Zhang D Askarov A Myers AC (2012) Language-based controland mitigation of timing channels ACM SIGPLAN Not 47(6)99ndash110
57 Zhang D Wang Y Suh GE Myers AC (2015) A hardware designlanguage for timing-sensitive information-flow security In ACMSIGARCH computer architecture news vol 43 ACM pp 503ndash516
58 Zhang S Wright A Bourgeat T Arvind A (2018) Composablebuilding blocks to open up processor design In 2018 51stannual IEEEACM international symposium on microarchitecture(MICRO) IEEE pp 68ndash81
59 Zhang T Lee RB (2014) New models of cache architecturescharacterizing information leakage from cache side channels InProceedings of the 30th annual computer security applicationsconference ACM pp 96ndash105
60 Zhang Y Parikh D Sankaranarayanan K Skadron K Stan M(2003) HotLeakage a temperature-aware model of subthresholdand gate leakage for architects University of Virginia Dept ofComputer Science Tech Report CS-2003 5
Publisherrsquos Note Springer Nature remains neutral with regard tojurisdictional claims in published maps and institutional affiliations
J Hardw Syst Secur (2019) 3397ndash425 425
Table5
Exi
stin
gse
cure
cach
esrsquop
rote
ctio
nag
ains
tall
poss
ible
timin
g-ba
sed
vuln
erab
ilitie
sw
ithla
stst
epto
bein
valid
atio
nre
late
dop
erat
ions
Sin
gle
ina
gree
nce
llm
eans
this
cach
eca
npr
even
tth
eco
rres
pond
ing
vuln
erab
ility
in
api
nkce
llm
eans
this
cach
eca
npr
even
tthe
corr
espo
ndin
gvu
lner
abili
tyin
som
ede
gree
Atimes
ina
red
cell
mea
nsth
isca
che
cann
otpr
even
tthi
svu
lner
abili
ty
Furt
herm
ore
for
each
cach
ew
ean
alyz
eno
rmal
exec
utio
n(l
eftc
olum
nun
der
the
cach
ena
me)
and
spec
ulat
ive
exec
utio
n(r
ight
colu
mn
unde
rth
eca
che
nam
e)
a Dyn
amic
adju
stm
ento
fw
ays
for
diff
eren
tthr
eads
isas
sum
edto
bepr
oper
lyus
edac
cord
ing
toth
eru
nnin
gpr
ogra
mrsquos
cach
eus
age
bSo
me
soft
war
eas
sum
ptio
nslis
ted
inth
een
trie
sin
this
colu
mn
have
been
impl
emen
ted
byth
eca
chersquo
sre
late
dso
ftw
are
c Flus
his
disa
bled
but
cach
eco
here
nce
mig
htbe
used
todo
the
data
rem
oval
dFo
rL
1ca
che
and
TL
Bf
lush
ing
isdo
nedu
ring
cont
exts
witc
he T
hete
chni
ques
are
impl
emen
ted
inL
1ca
che
TL
Ban
dla
st-l
evel
cach
ew
hich
cons
isto
fth
ew
hole
cach
ehi
erar
chy
whe
reL
1ca
che
and
TL
Bre
quir
eso
ftw
are
flus
hpr
otec
tion
and
the
last
-lev
elca
che
can
beac
hiev
edby
sim
ple
hard
war
epa
rtiti
onin
gTo
prot
ecta
llle
vels
ofca
ches
the
soft
war
eas
sum
ptio
nsne
edto
bead
ded
f The
tech
niqu
eis
now
only
impl
emen
ted
inla
st-l
evel
cach
egT
hete
chni
que
now
only
targ
ets
shar
edca
che
hT
hete
chni
que
only
targ
ets
incl
usio
nla
st-l
evel
cach
ei T
hete
chni
que
targ
ets
data
cach
ehi
erar
chy
j For
the
last
-lev
elca
che
cach
eis
part
ition
edbe
twee
nth
evi
ctim
and
the
atta
cker
kT
hete
chni
que
can
cont
rolt
hepr
obab
ilitie
sof
the
vuln
erab
ility
tobe
succ
essf
ulto
beex
trem
ely
smal
ll T
hete
chni
que
can
wor
kin
shar
edr
ead
only
mem
ory
whi
leno
twor
king
insh
ared
wri
tabl
em
emor
ym
Ran
dom
dela
ybu
tnot
rand
omm
appi
ngca
non
lyde
crea
seth
epr
obab
ilitie
sof
atta
cker
inso
me
limite
dde
gree
J Hardw Syst Secur (2019) 3397ndash425 413
utilize differentiating sensitive data partitioning andrandomization
Differentiating Sensitive Data (columns for CATalyst cacheto columns for Random Fill cache in Tables 4 and 5) allowsthe victim or attacker software or management software toexplicitly label a certain range of the data of victim whichthey think is sensitive The victim process or managementsoftware is able to use cache-specific instructions to protectthe data and limit internal interference between victimrsquosown data For example it is possible to disable victimrsquosown flushing of victimrsquos labeled data and therefore preventvulnerabilities that leverage flushing This technique allowsthe designer to have stronger control over security criticaldata rather than forcing the system to assume all of victimrsquosdata is sensitive However how to identify sensitive dataand whether this identification process is reliable are openresearch questions for caches that support differentiation ofsensitive data
This technique is independent of whether a cache usespartitioning or randomization techniques to eliminate sidechannels between the attacker and the victim Caches thatare able to label and identify sensitive data have theadvantage in preventing internal interference since they areable to differentiate sensitive data from the normal dataand can make use of special instructions to give moreprivileges to sensitive data However it requires careful usewhen identifying the actual sensitive data and implementingcorresponding security features on the cache
Comparing PL cache with SPlowast cache although bothof them use partitioning flush is able to be implementedto be disabled for victimrsquos sensitive data in PL cachewhere Vu V inv
a Vu (slow) (one type of Flush +Time) is prevented Newcache is able to prevent Vu Va Vu (slow) (one type of Bernsteinrsquos Attack [3])while most of the caches without ability to differentiatesensitive data cannot because Newcache disallows replacingdata as long as either data to be evicted or data to becached is identified to be sensitive However permittingdifferentiation of sensitive data can potentially backfire onthe cache itself For example Random Fill cache cannotprevent Vu Ad Vu (slow) (one type of Evict +Time [34]) which most of the other caches can prevent oravoid because the random fill technique loses its intendedrandom behavior when the security critical data is initiallyloaded into the cache in Step 1
Partitioning-Based Caches usually limit the victim and theattacker to be able to only access a limited set of cacheblock (columns for SPlowast cache to column for PL cachein Tables 4 and 5) For example either there is staticor dynamic partitioning of caches which allocates someblocks to High victim and Low attacker The partitioning
can be based not just on whether the memory access isvictimrsquos or attackerrsquos but also on where the access is to(eg High partition is determined by the data address) Forspeculative execution attackerrsquos code can be the part ofspeculation or out-of-order load or store which is able to bepartitioned (eg using speculative load buffer) from othernormal operations The partitioning granularity can be cachesets cache lines or cache ways Partitioning-based securecaches are usually able to prevent external interference bypartitioning but are weak at preventing internal interferenceWhen partitioning is used interference between the attackerand the victim or data belonging to different securitylevels should not be possible and attacks based on externalinterference between the victim and the attacker will failHowever the internal interference of victimrsquos own datais hard to be prevented by the partitioning-based cachesWhatrsquos more partitioning is recognized to be wastefulin terms of cache space and inherently degrades systemperformance [49] Dynamic partitioning can help limit thenegative performance and space impacts but it could be ata cost of revealing some information when adjusting thepartitioning size for each part It also does not help withinternal interference prevention
In terms of the three-step model the partitioning-basedcaches excel at making use of partitioning techniques todisallow the attacker to set initial states (Step 0) of victimpartition by use of flushing or eviction and therefore bringuncertainty to the final timing observation made by theattacker
SPlowast cache can prevent external miss-based interferencebut it still allows the victim and the attacker to getcache hits due to each otherrsquos data which makes hit-based vulnerabilities happen eg Vd Vu Va (fast)(one type of Cache Internal Collision [5]) vulnerabilityis one of the examples that SPlowast cache cannot preventSecVerilog cache is similar to SPlowast cache but prevents theattacker from directly getting cache hit due to victimrsquos datafor confidentiality and therefore prevents vulnerabilitiessuch as Ainv
a Vu Aa (fast) (one type of Flush+ Reload [55]) SHARP cache mainly uses partitioningcombined with random eviction to minimize the probabilityof evicting victimrsquos data and prevent external miss-basedvulnerabilities It is vulnerable to hit-based or internalinterference vulnerabilities such as Vu Va Vu (slow)(one type of Bernsteinrsquos Attack [3]) vulnerability DAWGcache will only allow the data to get a cache hit if bothits address and the process ID are the same Thereforecompared with normal partitioning cache such as SPlowast cacheit is able to prevent vulnerabilities such as Vd Vu Ainv
d
(fast) (one type of Prime + Flush)SecDCP and NoMo cache both leverage dynamic
partitioning to improve performance Compared withSecVerilog cache SecDCP cache introduces certain side
J Hardw Syst Secur (2019) 3397ndash425414
channels which manifest themselves when the number ofways assigned to the victim and attacker changes egVu Ainv
a Vu (slow) (one type of Flush + Time)vulnerability NoMo cache behaves more carefully whenchanging the number of ways during dynamic partitioninghowever it requires victimrsquos sensitive data to fit intothe assigned partitions otherwise it will be put into theunreserved way and allow eviction by the attacker SecDCPdoes not have unreserved way All the space in the cachewill be either belongs to High or Low partition
Sanctum cache and CATalyst cache are both controlledby a powerful software monitor and they disallow securepage sharing between victim and attacker to preventvulnerabilities such as Ad Vu Aa (fast) (onetype of Flush + Reload [55]) Sanctum cache does notconsider internal interference while CATalyst cache ismore carefully designed to prevent different vulnerabilitieswith the implemented software system so far supportingpreventing all of the vulnerabilities but only works forLLC and with high software implementation complexityand some assumptions that might be hard to achievein other scenarios eg assuming the secure partition isbig enough to fit all the secure data MI6 cache is thecombination of Sanctum and disabling speculation wheninteracting with the outside world Therefore in normalexecution it behaves the same as Sanctum For speculativeexecution because it will simply disable all the speculationwhen involving the outside world the external interferencevulnerability such as Vd Vu Ad (slow) (one type ofEvict + Probe) vulnerability will be prevented
InvisiSpec cache does not modify the original cachestate but places the data in a speculative buffer partitionduring the speculation or out-of-order load or store Sinceduring speculation cache state is not actually updated thespeculative execution cannot trigger any of the steps inthe three-step model RIC cache focuses on eviction basedattack and therefore are good at preventing even someinternal miss-based vulnerability such as Vu Va Vu
(slow) (one type of Bernsteinrsquos Attack [3]) but are bad atall hit-based vulnerabilities PL cache is line-partitionedand uses locking techniques for victimrsquos security criticaldata It can prevent many vulnerabilities because preloadingand locking secure data disallow the attacker or non-securevictim data to set initial states (Step 0) for victim partitionand therefore brings uncertainty to the final observation bythe attacker eg Ad Vu Va (fast) (one type of CacheInternal Collision [5]) vulnerability is prevented
Randomization-Based Caches (columns for SHARP cacheand columns for RP cache to columns for Non Deterministiccache in Tables 4 and 5) inherently de-correlate therelationship between information of victimrsquos securitycritical datarsquos address and observed timing from cache
hit or miss or between the address and observed timingof flush or cache coherence operations For speculativeexecution they also de-correlate the relationship betweenthe address of the data being accessed during speculativeexecution or out-of-order load or store and the observedtiming from a cache hit or miss Randomization can beused when bringing data into the cache evicting data orboth Some designs randomize the address to cache setmapping As a result of the randomization the mutualinformation from the observed timing due to having ornot having data in the cache could be reduced to 0 ifrandomization is done on every memory access Somesecure caches use randomization to avoid many of the miss-based internal interference vulnerabilities However theymay still suffer from hit-based vulnerabilities especiallywhen the vulnerabilities are related to internal interferenceHowever randomization is also likewise recognized toincrease performance overheads [23] It also requires afast and secure random number generator Most of therandomization is cache-line-based and can be combinedwith differentiation of sensitive data to be more efficient
RP cache allows eviction between different sensitivedata which leaves vulnerabilities such as Vu Va Vu
(slow) (one type of Bernsteinrsquos Attack [3]) still possiblewhile Newcache prevents this Both of the RP cacheand Newcache are not able to prevent hit-based internal-interference vulnerabilities such as Ainv
a Vu Va
(fast) (one type of Cache Internal Collision [5]) RandomFill cache is able to use total de-correlation of memoryaccess and cache access of victimrsquos security critical datato prevent most of the internal and external interferenceHowever when security critical data is initially directlyloaded into the cache block for Step 1 Random Fill cachewill not randomly load security critical data and allowsvulnerabilities such as Vu V inv
a Vu (slow) (one typeof Flush + Time) vulnerability to exist CEASER cache usesencryption scheme plus dynamic remapping to randomizemapping from memory addresses to cache sets Howeverthis targets eviction-based attacks and cannot prevent hit-based vulnerabilities such as Va V inv
u V inva (fast)
(one type of Flush + Probe Invalidation) SCATTER cacheencrypts both the cache address and process ID whenmapping into different cache index to further prevent morehit-based vulnerabilities for shared and read only memoryNon Deterministic Cache totally randomizes timing ofcache accesses by adding delays and can prevent all attacks(but at tremendous performance cost)
61 Estimated Performance and Security Tradeoffs
Table 6 shows the implementation and performanceresults of the secure caches as listed by the designersin the different papers At the extreme end there is
J Hardw Syst Secur (2019) 3397ndash425 415
Table6
Exi
stin
gse
cure
cach
esrsquo
impl
emen
tatio
nm
etho
dpe
rfor
man
cep
ower
and
area
sum
mar
y
J Hardw Syst Secur (2019) 3397ndash425416
the Non Deterministic cache with random delay thesecure cache can prevent all the cache timingndashbasedvulnerabilities in some degreemdashwhile their paper reportsonly 7 degradation in performance we expect it tobe much more for more complex application than AESalgorithm Disabling caches eliminates the attacks but ata huge performance cost Normally a secure cache needsto sacrifice some performance in order to de-correlatememory access with the timing The secure caches thattend to be able to prevent more vulnerabilities usuallyhave weaker performance compared with other securecaches For example more security seems to imply lessperformance
62 Towards Ideal Secure Cache
Based on the above analysis a good secure cache shouldconsider all the 72 types of Strong vulnerabilities egexternal and internal interference and hit-based and miss-based vulnerabilities Considering all factors and basedon Tables 4 and 5 we have several suggestions andobservations for a secure cache design which can defendtiming-based attacks
bull Internal interference is important for caches to preventtiming-based attacks and is the weak point of most ofthe secure caches To prevent this the following threesubpoints should be considered
ndash Miss-based internal interference can be solvedby randomly evicting data to de-correlatememory access with timing information wheneither data to be evicted or data to be cachedis sensitive eg Newcache prevents Vu Va Vu (slow) (one type of BernsteinrsquosAttack [3]) vulnerability
ndash Hit-based internal interference can be solvedby randomly bringing data into the cache egRandom Fill cache prevents Ad Vu Va
(fast) (Cache Internal Collision) vulnerabilityndash To limit internal interference at lower perfor-
mance cost rather than simply assume all ofvictimrsquos data is sensitive it is better to differ-entiate real sensitive data from other data in thevictim code However identification of sen-sitive information needs to be carefully usedeg Random Fill cache is vulnerable to Vu Ad Vu (fast) (one type of Evict + Time [34])vulnerability which most of the secure cachesare able to prevent
bull Direct partitioning between the victim and the attackeralthough may hurt cache space utilization or perfor-mance is good at disallowing attacker to set known
initial state to victimrsquos partition and therefore pre-vents external interference Alternatively careful use ofrandomization can also prevent external interference
It should be noted that some cache designs only focuson certain levels eg CATalyst cache only works at thelast level cache In order to fully protect the whole cachesystem from timing-based attacks all levels of caches in thehierarchy should be protected with related security featuresFor example Sanctum is able to prevent all levels of cachesfrom L1 to last-level cache Consequently secure cachedesign needs to be realizable at all levels of the cachehierarchy
7 RelatedWork
There are a lot of existing attacks exploring timing-based cache channels eg [1 3 5 16ndash19 34 36 5455] Furthermore our recent paper [11] has summarizedcache timingndashbased side-channel vulnerabilities using athree-step model and inspired this work on checkingwhich vulnerability types are truly defeated by the securecaches in context of timing-based attacks In other workZhang and Lee [59] used finite-state machine to modelcache architectures and leveraged mutual information tomeasure potential side-channel leakage of the modeledcache architectures Meanwhile He and Lee [20] modeledinterference using probabilistic information flow graph andused attackerrsquos success probability to estimate differentcachesrsquo ability to defend against some cache timingndashbasedside-channel attacks However they did not explore allpossible vulnerabilities due to cache timingndashbased channels
There is also some other work focusing on cacheside-channel verification [13 14 47] Among theseCacheAudit [14] efficiently computes possible side-channelobservations using abstractions in a modular way Bit-level and arithmetic reasoning is used in [13] for memoryaccesses in the presence of dynamic memory allocationCacheD [47] detects potential cache differences at eachprogram point leveraging symbolic execution and constraintsolving
Hardware transactional memory has also been lever-aged to prevent timing-based cache side-channel attacks [815] Hardware transactional memory (HTM) is availableon modern commercial processors such as Intelrsquos Transac-tional Synchronization Extensions (TSX) Its main featureis to abort the transaction and roll back the modificationswhenever a cache block contained in the read set or writeset is evicted out of the cache In [15] HTM was com-bined with preloading strategy for code and data to preventFlush + Reload attacks in the local setting and Prime andProbe attacks in the cloud setting In [8] the software-level
J Hardw Syst Secur (2019) 3397ndash425 417
solution targets system calls page faults code refactoringand abort reasoning to eliminate not only Prime + ProbeFlush + Reload but also Evict + time and Cache Collisionattacks
8 Conclusion
This paper first proposed a new three-step model in orderto model all possible cache timing vulnerabilities It furtherprovided a cache three-step simulator and reduction rulesto derive effective vulnerabilities allowing us to find onesthat have not been exploited in literature With exhaustiveeffective vulnerability types listed this paper presentedanalysis of 18 secure processor cache designs with respectto how well they can defend against these timing-basedvulnerabilities Our work showed that vulnerabilities basedon internal interference of the victim application aredifficult to protect against and many secure cache designsfail in this We also provided a summary of secure processorcache features that could be integrated to make an idealsecure cache that is able to defend timing-based attacksOverall implementing a secure cache in a processor can be aviable alternative to defend timing-based attacks Howeverit requires design of an ideal secure cache or correction ofexisting secure cache designs to eliminate the few attacksthat they do not protect against
Acknowledgments This work was support in part by NSF grants1651945 and 1813797 and through Semiconductor Research Corpo-ration (SRC) under GRC Task 2844001
Appendix A Attack Strategies Descriptions
This appendix gives overview of the attack strategies shownin Tables 2 and 3 in Section 3 For each attack strategy anoverview of the three steps of the strategy is given Some ofthe strategies are similar and some may not be precise butwe keep and use the original names as they were assignedin prior work One advantage of our three-step model isthat it gives precise definition of each attack Neverthelessthe attack strategy names used before (and added by us forstrategies which did not have such names) may be useful torecall the attacksrsquo high-level operation
Cache Internal Collision In Step 1 cache blockrsquos data isinvalidated by flushing or eviction done by either theattacker or the victim Then the victim accesses secret datain Step 2 Finally the victim accesses data at a knownaddress in Step 3 if there is a cache hit then it reveals thatthere is an internal collision and leaks value of u
Flush + Reload In Step 1 either the attacker or the victiminvalidates the cache blockrsquos data by flushing or evictionThen the victim access secret data in Step 2 Finally theattacker tries to access some data in Step 2 using a knownaddress If a cache hit is observed then addresses from lasttwo steps are the same and the attacker learns the secretaddress This strategy has similar Step 1 and Step 2 asCache Internal Collision vulnerability but for Step 3 it isthe attacker who does the reload access
Reload + Time (New Name Assigned in this Paper) In Step
1 secret data is invalidated by the victim Then the attackerdoes some known data access in Step 2 that could possiblybring back the invalidated the victimrsquos secret data in Step
1 In Step 3 if the victim reloads the secret data a cachehit is observed and the attacker can derive the secret datarsquosaddress
Flush + Probe (New Name Assigned in this Paper) In Step
1 the victim or the attacker access some known addressIn Step 2 the victim invalidates secret data In Step 3reloading of Step 1rsquos data and observation of a cache misswill help the attacker learn that the secret data maps to theknown address from Step 1
Evict + Time In Step 1 some victimrsquos secret data is put intothe cache by the victim itself In Step 2 the attacker evicts aspecific cache set by performing a memory related operationthat is not a flush In Step 3 the victim reloads secret dataand if a cache miss is observed the will learn the secretdatarsquos cache set information This attack has similar Step 1and Step 3 as Flush + Time vulnerability but for Step 2 inEvict + Time the attacker invalidates some known addressallowing it to find the full address of the secret data insteadof evicting a cache set to only find the secret datarsquos cacheindex as in the Flush + Time attack
Prime + Probe In Step 1 the attacker primes the cache setusing data at address known to the attacker In Step 2 thevictim accesses the secret data which possibly evicts datafrom Step 1 In Step 3 the attacker probes each cache setand if a cache miss is observed the attacker knowns thesecret data maps to the cache set he or she primed
Bernsteinrsquos Attack This attack strategy leverages the vic-timrsquos internal interference to trigger the miss-based attackFor one case the victim does the same secret data access inStep 1 and Step 3 while in Step 2 the victim tries to evictone whole cache setrsquos data by known data accesses If cachemiss is observed in Step 3 that will tell the attacker thecache set is the one secret data maps to For another casethe victim primes and probe a cache set in Step 1 and Step
3 driven by the attacker while in Step 2 the victim tries to
J Hardw Syst Secur (2019) 3397ndash425418
access the secret data Similar to the first case observingcache miss in Step 3 tells the attacker the cache set is theone secret data maps to
Evict + Probe (NewName Assigned in this Paper) In Step 1victim evict the cache set using the access to a data at anaddress known to the attacker In Step 2 the victim accessessecret data which possibly evicts data from Step 1 In Step
3 the attacker probes each cache set using the same data asin Step 1 if a cache miss is observed the attacker knownsthe secret data maps to the cache set he or she primed Thisattack strategy has similar Step 2 and Step 3 as Prime +Probe attack but for Step 1 it is the victim that does theeviction accesses
Prime + Time (NewName Assigned in this Paper) In Step 1the attacker primes the cache set using access to data at anaddress known to the attacker In Step 2 the victim accessessecret data which possibly evicts data from Step 1 In Step
3 the victim probes each cache set using the same data Step
1 if a cache miss is observed the attacker knowns the secretdata maps to the cache set he or she primed in Step 1 Thisattack strategy has similar Step 1 and Step 2 as Prime +Probe attack but for Step 3 it is the victim that does theprobing accesses
Flush + Time (NewNameAssigned in this Paper) The victimaccesses the same secret data in Step 1 and Step 3 whilein Step 2 the attacker tries to invalidate data at a knownaddress If cache miss is observed in Step 3 that will tellthe attacker the data address he or she invalidated in Step 2maps to the secret data
Invalidation related (new names assigned in this paper)Vulnerabilities that have names ending with ldquoinvalidationrdquoin Table 3 correspond to the vulnerabilities that havethe same name (except for the ldquoinvalidationrdquo part) inTable 2 The difference between each set of correspondingvulnerabilities is that the vulnerabilities ending withldquoinvalidationrdquo use invalidation related operation in the laststep to derive the timing information rather than the normalmemory access related operations
Appendix B Soundness Analysisof the Three-StepModel
In this section we analyze the soundness of the three-stepmodel to demonstrate that the three-step model can coverall possible timing-based cache vulnerabilities in normalcaches If there is a vulnerability that is represented usingmore than three steps the steps can be reduced to only three
steps or a three-step sub-pattern can be found in the longerrepresentation
In the below analysis we use β to denote the numberof memory-related operations ie steps in a representationof a vulnerability We show that β = 1 is not sufficient torepresent a vulnerability β = 2 covers some vulnerabilitiesbut not all β = 3 represents all the vulnerabilities andβ gt 3 can be reduced to only three steps or a three-step sub-pattern can be found in the longer representation Knownaddresses refer to all the cache states that interferencewith the data a aalias and d Unknown address refersu An access to a known memory address is denoted asknown access operation and an invalidation of a knownmemory address is denoted as known inv operationThe known access operation and known inv operation
together make up not u operations An unknown memoryrelated operation (containing u) is denoted as u operation
B1 Patterns with β = 1
When β = 1 there is only one memory-relatedoperation and it is not possible to create interferencebetween memory-related operations since two memory-related operations are the minimum requirement for aninterference Furthermore β = 1 corresponds to the three-step pattern with both Step 1 and Step 2 being since thecache state gives no information and Step 3 being theone operation These types of patterns are all examined bythe cache three-step simulator and none of these types arefound to be effective Consequently a vulnerability cannotexit when β = 1
B2 Patterns with β = 2
When β = 2 it satisfies the minimum requirementof an interference for memory related operations andcorresponds to the three-step cases where Step 1 is and Step 2 and Step 3 are the two operations Thesetypes are all examined by the cache three-step simulatorand some of them belong to Weak Vulnerabilities like Aa Vu Therefore three-step cases where Step
1 is have corresponding effective vulnerabilities shownin Table 2 Consequently β = 2 can represent someweak vulnerabilities but not all vulnerabilities as there existsome that are represented with three steps as discussednext
B3 Patterns with β = 3
When β = 3 we have tested all possible combinations ofthree-step memory related operations in Section 33 usingour cache simulator for the three-step model We foundthat there are in total 72 types of Strong Vulnerabilities
J Hardw Syst Secur (2019) 3397ndash425 419
and 64 types of Weak Vulnerabilities that are representedby patterns with β = 3 steps Consequently β = 3can represent all the vulnerabilities (including some weakones where Step 1 is ) Using more steps to representvulnerabilities is not necessary as discussed next
B4 Patterns with β gt 3
When β gt 3 the pattern of memory-related operations fora vulnerability can be reduced using the following rules
B41 Subdivision Rules
First a set of subdivision rules is used to divide the longpattern into shorter patterns following the below rulesEach subdivision rule should be applied recursively beforeapplying the next rule
Subdivision Rule 1 If the longer pattern contains a sub-pattern such as the longer pattern canbe divided into two separate patterns where is assignedas Step 1 of the second pattern This is because givesno timing information and the attacker loses track of thecache state after This rule should be recursively applieduntil there are no sub-patterns left with a in the middleor as last step ( in the last step will be deleted) in thelonger pattern
Subdivision Rule 2 Next if a pattern (derived afterrecursive application of the Rule 1 contains a sub-patternsuch as AinvVinv the longer pattern canbe divided into two separate patterns where AinvVinv isassigned as Step 1 of the second pattern This is becauseAinvVinv will flush all the timing information of thecurrent block and it can be used as the flushing step forStep 1 eg vulnerability Ainv Vu Aa(f ast)shown in Table 2 AinvVinv cannot be a candidate formiddle steps or the last step because it will flush alltiming information making the attacker unable to deducethe final timing with victimrsquos sensitive address translationinformation This rule should be recursively applied untilthere are no sub-patterns left with a AinvVinv in themiddle or the last step (AinvVinv in the last step will bedeleted)
B42 Simplification Rules
For each of the patterns resulting from the subdivisionof the original pattern we define Commute Rules UnionRules and Reduction Rules for a each set of two adjacentsteps in these remaining patterns In Table 7 we showall the possible cases of the rule applying conditions for
each adjacent two steps regardless of the attackerrsquos access(A) or the victimrsquos access (V ) The table shows whetherthe corresponding two steps can be commuted reduced orunioned (and the reduced or the unioned result if the rulescan be applied)
B421 Commute Rules
Suppose there are two adjacent steps M and N for a memorysequences M N If commuting M andN lead to the same observation result ie M N and N M will have thesame timing observation information in the final step for theattacker we can freely exchange the place of M and N inthis pattern In this case we have more chance to Reduceand Union the steps within the memory sequence by thefollowing Rules In the possible commuting process we willtry every possible combinations to commute different pairsof two steps that are able to apply the Commute Rules andthen further apply Reduce Rules and Union Rules to seewhether the commute is effective ie there can be stepsreduced or unioned after the proper commuting process Thefollowing two adjacent memory-related operations can becommuted
ndash Commute Rule 1 For two adjacent steps if one stepis a known access operation and another step is aknown inv operation and the addresses they referto are different these two steps can be commuted nomatter which position of the two steps they are in withinthe whole memory sequence It will show a ldquoyesrdquo forthe corresponding two-step pattern for the CommuteRule 1 column if these two can be commuted in Table 7
ndash Commute Rule 2 A superset of two-step patterns thatcan apply Commute Rule 1 can be commuted if thesecond step of these two adjacent steps is not thelast step in the whole memory sequence There aresome two adjacent steps that can only be commutedif the second step of these two adjacent steps is notthe last step in the whole memory sequence Therewill be a ldquoyesrdquo for the corresponding two-step patternfor the Commute Rule 2 column and a ldquonordquo for thecorresponding two-step pattern for the Commute Rule 1column in Table 7
B422 Reduction Rules
If the memory sequence after applying Commute Ruleshave a sub-pattern that has two adjacent steps both relatedto known addresses or both related to unknown address(including repeating states) the two adjacent steps can bereduced to only one following the reduction rules (if thetwo-step pattern has ldquoyesrdquo for the Column ldquoUnion Rule or
J Hardw Syst Secur (2019) 3397ndash425420
Table7
Rul
esfo
rco
mbi
ning
two
adja
cent
step
s
Firs
tSe
cond
Com
mute
Com
mute
Union
Rule
orC
ombi
ned
Firs
tSe
cond
Com
mute
Com
mute
Union
Rule
orC
ombi
ned
Rule1
Rule2
ReduceRule
Step
Rule1
Rule2
ReduceRule
Step
aa
yes
yes
yes
aa
inv
ano
noye
sa
aa
ali
as
nono
yes
aali
as
ain
va
ali
as
yes
yes
yes
aali
as
ad
nono
yes
da
inv
dye
sye
sye
sd
au
nono
nominus
ain
vu
nono
nominus
aa
inv
nono
yes
ain
va
inv
ain
vye
sye
sye
sa
inv
aa
ali
asin
vye
sye
sye
sa
ain
va
ali
asin
vye
sye
sye
sU
nio
n(a
inva
ali
asin
v)
ad
inv
yes
yes
yes
aa
inv
din
vye
sye
sye
sU
nio
n(a
invd
inv)
au
inv
nono
nominus
ain
vu
inv
noye
sno
minusa
ali
as
ano
noye
sa
aali
asin
va
yes
yes
yes
a
aali
as
aali
as
yes
yes
yes
aali
as
aali
asin
va
ali
as
nono
yes
aali
as
aali
as
dno
noye
sd
aali
asin
vd
yes
yes
yes
d
aali
as
uno
nono
minusa
ali
asin
vu
nono
nominus
aali
as
ain
vye
sye
sye
sa
ali
as
aali
asin
va
inv
yes
yes
yes
Unio
n(a
inva
ali
asin
v)
aali
as
aali
asin
vno
noye
sa
ali
asin
va
ali
asin
va
ali
asin
vye
sye
sye
sa
ali
asin
v
aali
as
din
vye
sye
sye
sa
ali
as
aali
asin
vd
inv
yes
yes
yes
Unio
n(d
inva
ali
asin
v)
aali
as
uin
vno
nono
minusa
ali
asin
vu
inv
noye
sno
minusd
ano
noye
sa
din
va
yes
yes
yes
a
da
ali
as
nono
yes
aali
as
din
va
ali
as
yes
yes
yes
aali
as
dd
yes
yes
yes
dd
inv
dno
noye
sd
du
nono
nominus
din
vu
noye
sno
minusd
ain
vye
sye
sye
sd
din
va
inv
yes
yes
yes
Unio
n(a
invd
inv)
da
ali
asin
vye
sye
sye
sd
din
va
ali
asin
vye
sye
sye
sU
nio
n(d
inva
ali
asin
v)
dd
inv
nono
yes
din
vd
inv
din
vye
sye
sye
sd
inv
du
inv
noye
sno
minusd
inv
uin
vno
yes
nominus
ua
nono
nominus
uin
va
nono
nominus
ua
ali
as
nono
nominus
uin
va
ali
as
nono
nominus
ud
nono
nominus
uin
vd
noye
sno
minusu
uye
sye
sye
su
uin
vu
nono
yes
u
ua
inv
nono
nominus
uin
va
inv
noye
sno
minusu
aali
asin
vno
nono
minusu
inv
aali
asin
vno
yes
nominus
ud
inv
noye
sno
minusu
inv
din
vno
yes
nominus
uu
inv
nono
yes
uin
vu
inv
uin
vye
sye
sye
su
inv
J Hardw Syst Secur (2019) 3397ndash425 421
Reduce Rulerdquo and has no Union result for the ldquoCombinedSteprdquo Column in Table 7
ndash Reduction Rule 1 For two u operations although u isunknown both of the accesses target on the same u socan be reduced to only keep the second access in thememory sequence
ndash Reduction Rule 2 For two known adjacent memoryaccessndashrelated operations (known access operation)they always result in a deterministic state of the secondmemory accessndashrelated cache block so these two stepscan be reduced to only one step
ndash Reduction Rule 3 For two adjacent steps if onestep is known access operation and another one isknown inv operation no matter what order theyhave and the address they refer to is the same these twocan be reduced to one step which is the second step
B423 Union Rules
Suppose there are two adjacent steps M and N for memorysequences M N If combining M andN leads to the same timing observation result ie M N and Union(M N) will havethe same timing observation information in the final step forthe attacker we can combine step M and N to be a joint onestep for this memory sequence defined as Union(M N)Two adjacent steps that can be combined are discussed inthe following cases
ndash Union Rule 1 Two invalidations to two knowndifferent memory addresses can be applied UnionRule 1 known inv operation are two operations bothinvalidating some known address therefore they canbe combined to only one step The Union Rule can becontinuously done to union all the adjacent invalidationstep that invalidates known different memory addresses
B424 Final Check Rules
Each long memory sequence will recursively apply thesethree categorizations of the rules in the order Com-mute Rules first to put known access operations andknown inv operation that targets the same address as nearas possible and u operations and not u operations areputting together as much as possible The Reduced Rulesare then checked and applied to the processed memorysequence to reduce the steps Then the Union Rule isapplied to the processed memory sequence
The recursion at each application to these threecategorizations of the rules should be always applied andreduce at least one step until the resulting sequence matchesone of the two possible cases
bull the long (β gt 3) memory sequence with u operation
and not u operation is further reduced to a sequencewhere there are at most three steps in the followingpatterns or less
ndash u operation not u operation u operation
ndash not u operation u operation not u operation
There might be possible extra or AinvV inv beforethese three-step pattern where
bull An extra in the first step will not influencethe result and can be directly removed
bull If an extra AinvV inv in the first step
ndash If followed by known access ope-ration AinvV inv can be removeddue to the actual state further put intothe cache block
ndash If followed by known inv opera-t ion or V inv
u AinvV inv can also beremoved since the memory locationis repeatedly flushed by the two steps
ndash If followed by Vu worst case will beAinvV inv Vu not u operation u operationwhich is either an effective vul-nerability according to Table 2and reduction rules shown inSection 33 or AinvV inv Vu Ainv
d V invd u operation where
Vu Ainvd V inv
d can further beapplied Commute Rule 2 to reduceand be within three steps
In this case the steps are finally within three steps andthe checking is done
bull There exist two adjacent steps that cannot be appliedany Rules above and requires the Rest Checking
The only left two adjacent steps that cannot be appliedby any of the three categorizations of the Rules are thefollowing
ndash AaVaAaalias Vaalias AdVdAinva V inv
a
Ainvaalias V inv
aalias Vu ndash AaVaAaalias Vaalias V inv
u ndash Vu AaVaAaalias Vaalias AdVd
Ainva V inv
a Ainvaalias V inv
aalias ndash V inv
u AaVaAaalias Vaalias We manually checked all of the two adjacent step patterns
above and found that adding extra step before or after these
J Hardw Syst Secur (2019) 3397ndash425422
Algorithm 2 -Step ( 3) pattern reduction
Input number of steps of the pattern
a two-dimensional dynamic-size array [0] contains the states of each step of the original pattern in
order [1] [2] are empty initially
Output reduced effective vulnerability pattern(s) array It will be an empty list if the original pattern does not
correspond to an effective vulnerability
1 = Oslash
2 while contain( ) and index not 0 do3 = 1 ( )
4 end while5 while ( contain( ) and index not 0) or ( contain( ) and index not 0) do6 = 2 ( )
7 end while8 while ( is ineffecitve or has interval effective three steps) do9 = ( )
10 = ( )
11 = ( )
12 if ( is ineffecitve or has interval effective three steps) then13 += ( )
14 end if15 end while16 return
two steps can either generate two adjacent step patterns thatbe processed by the three Rules where further step can bereduced or construct effective vulnerability according toTable 2 and reduction rules shown in Section 33 wherethe corresponding pattern can be treated effective and thechecking is done
B43 Algorithm for Reducing and CheckingMemorySequence
Algorithm 2 is used to (i) reduce a β-step (β gt 3)pattern to a three-step pattern thus demonstrating that thecorresponding β gt 3 step pattern actually is equivalent tothe output three-step pattern and represents a vulnerabilitythat is captured by an existing three-step pattern or (ii)demonstrate that the β-step pattern can be mapped to one ormore three-step vulnerabilities It is not possible for a β-stepvulnerability pattern to not be either (i) or (ii) after doing theRule applications Key outcome of our analysis is that anyβ-step pattern is not a vulnerability or if it is a vulnerabilityit maps to either outputs (i) or (ii) of the algorithm
Inside the Algorithm 2 contain() represents a function tocheck if a list contains a corresponding state is ineffective()represents a function that checks the corresponding memorysequence does not contain any effective three-stepshas interval effective three steps() represents a functionthat check if the corresponding memory sequence can bemapped to one or more three-step vulnerabilities
B44 Summary
In conclusion the three-step model can model all possibletiming-based cache vulnerability in normal caches Vulner-abilities which are represented by more than three steps canbe always reduced to one (or more) vulnerabilities from ourthree-step model and thus using more than three step is notnecessary
References
1 Acıicmez O Koc CK (2006) Trace-driven cache attacks on AES(short paper) In International conference on information andcommunications security Springer pp 112ndash121
2 Agrawal D Archambeault B Rao JR Rohatgi P (2002) TheEM side-channel (s) In International workshop on cryptographichardware and embedded systems Springer pp 29ndash45
3 Bernstein DJ (2005) Cache-timing attacks on AES4 Binkert N Beckmann B Black G Reinhardt SK Saidi A Basu
A Hestness J Hower DR Krishna T Sardashti S et al (2011)The gem5 simulator ACM SIGARCH computer architecture news39(2)1ndash7
5 Bonneau J Mironov I (2006) Cache-collision timing attacksagainst AES In International workshop on cryptographichardware and embedded systems Springer pp 201ndash215
6 Borghoff J Canteaut A Guneysu T Kavun EB Knezevic MKnudsen LR Leander G Nikov V Paar C Rechberger Cet al (2012) Princendasha low-latency block cipher for pervasivecomputing applications In International conference on the theoryand application of cryptology and information security Springerpp 208ndash225
J Hardw Syst Secur (2019) 3397ndash425 423
7 Bourgeat T Lebedev I Wright A Zhang S Devadas S et al (2019)MI6 Secure enclaves in a speculative out-of-order processorIn Proceedings of the 52nd Annual IEEEACM InternationalSymposium on Microarchitecture ACM pp 42ndash56
8 Chen S Liu F Mi Z Zhang Y Lee RB Chen H Wang X(2018) Leveraging hardware transactional memory for cache side-channel defenses In Proceedings of the 2018 on Asia conferenceon computer and communications security ACM pp 601ndash608
9 Clark SS Ransford B Rahmati A Guineau S Sorber J Xu WFu K (2013) Wattsupdoc power side channels to nonintrusivelydiscover untargeted malware on embedded medical devices InPresented as part of the 2013 USENIX workshop on healthinformation technologies
10 Costan V Lebedev IA Devadas S (2016) Sanctum minimalhardware extensions for strong software isolation In USENIXsecurity symposium pp 857ndash874
11 Deng S Xiong W Szefer J (2018) Cache timing side-channel vulnerability checking with computation tree logic InProceedings of the 7th international workshop on hardware andarchitectural support for security and privacy ACM p 2
12 Domnitser L Jaleel A Loew J Abu-Ghazaleh N Ponomarev D(2012) Non-monopolizable caches low-complexity mitigation ofcache side channel attacks ACM Transactions on Architectureand Code Optimization (TACO) 8(4)35
13 Doychev G Kopf B (2017) Rigorous analysis of softwarecountermeasures against cache attacks In Proceedings of the 38thACM SIGPLAN conference on programming language design andimplementation ACM pp 406ndash421
14 Doychev G Kopf B Mauborgne L Reineke J (2015) CacheAudita tool for the static analysis of cache side channels ACM Trans-actions on information and system security (TISSEC) 18(1)4
15 Gruss D Lettner J Schuster F Ohrimenko O Haller I CostaM (2017) Strong and efficient cache side-channel protectionusing hardware transactional memory In USENIX securitysymposium pp 217ndash233
16 Gruss D Maurice C Wagner K Mangard S (2016) Flush+ flusha fast and stealthy cache attack In International conference ondetection of intrusions and malware and vulnerability assessmentSpringer pp 279ndash299
17 Gruss D Spreitzer R Mangard S (2015) Cache template attacksautomating attacks on inclusive last-level caches In USENIXsecurity symposium pp 897ndash912
18 Guanciale R Nemati H Baumann C Dam M (2016) Cache stor-age channels alias-driven attacks and verified countermeasuresIn 2016 IEEE symposium on security and privacy (SP) IEEEpp 38ndash55
19 Gullasch D Bangerter E Krenn S (2011) Cache gamesndashbringingaccess-based cache attacks on AES to practice In 2011 IEEEsymposium on security and privacy (SP) IEEE pp 490ndash505
20 He Z Lee RB (2017) How secure is your cache against side-channel attacks In Proceedings of the 50th annual IEEEACMinternational symposium on microarchitecture ACM pp 341ndash353
21 Intel C (2015) Improving real-time performance by utilizing cacheallocation technology Intel Corporation
22 Kayaalp M Khasawneh KN Esfeden HA Elwell J Abu-Ghazaleh N Ponomarev D Jaleel A (2017) RIC relaxed inclusioncaches for mitigating LLC side-channel attacks In 2017 54thACMEDACIEEE design automation conference (DAC) IEEEpp 1ndash6
23 Keramidas G Antonopoulos A Serpanos DN Kaxiras S (2008)Non deterministic caches a simple and effective defense againstside channel attacks Des Autom Embed Syst 12(3)221ndash230
24 Kessler RE Hill MD (1992) Page placement algorithms for largerealindexed caches ACM Trans Comput Syst (TOCS) 10(4)338ndash359
25 Kiriansky V Lebedev I Amarasinghe S Devadas S Emer J(2018) DAWG a defense against cache timing attacks in spec-ulative execution processors In 2018 51st Annual IEEEACMinternational symposium on microarchitecture (MICRO) IEEEpp 974ndash987
26 Kocher P Genkin D Gruss D Haas W Hamburg M LippM Mangard S Prescher T Schwarz M Yarom Y (2018)Spectre attacks exploiting speculativeexecution In 2019 IEEESymposium on Security and Privacy (SP) IEEE pp 1ndash19
27 Lee RB Kwan P McGregor JP Dwoskin J Wang Z (2005)Architecture for protecting critical secrets in microprocessorsIn ACM SIGARCH computer architecture news vol 33 IEEEComputer Society pp 2ndash13
28 Lee Y Waterman A Avizienis R Cook H Sun C StojanovicV Asanovic K (2014) A 45nm 13 GHz 167 double-precisionGFLOPSW RISC-V processor with vector accelerators InEuropean solid state circuits conference (ESSCIRC) ESSCIRC2014-40th IEEE pp 199ndash202
29 Liu F Ge Q Yarom Y Mckeen F Rozas C Heiser G Lee RB(2016) CATalyst defeating last-level cache side channel attacks incloud computing In 2016 IEEE international symposium on highperformance computer architecture (HPCA) IEEE pp 406ndash418
30 Liu F Lee RB (2014) Random fill cache architecture In2014 47th annual IEEEACM international symposium onmicroarchitecture (MICRO) IEEE pp 203ndash215
31 Liu F Wu H Mai K Lee RB (2016) Newcache secure cachearchitecture thwarting cache side-channel attacks IEEE Micro36(5)8ndash16
32 Luk CK Cohn R Muth R Patil H Klauser A Lowney G WallaceS Reddi VJ Hazelwood K (2005) Pin building customizedprogram analysis tools with dynamic instrumentation In ACMsigplan notices vol 40 ACM pp 190ndash200
33 Masti RJ Rai D Ranganathan A Muller C Thiele L Capkun S(2015) Thermal covert channels on multi-core platforms In 24thUSENIX security symposium (USENIX security 15) pp 865ndash880
34 Osvik DA Shamir A Tromer E (2006) Cache attacks andcountermeasures the case of AES In Cryptographersrsquo track atthe RSA conference Springer pp 1ndash20
35 Patel A Afram F Chen S Ghose K (2011) MARSS Afull system simulator for multicore x86 CPUs In 2011 48thACMEDACIEEE design automation conference (DAC) IEEEpp 1050ndash1055
36 Percival C (2005) Cache missing for fun and profit BSDCan 2005Ottawa 2005 httpwwwdaemonologynetpapershttpdf
37 Hammarlund P Martinez AJ Bajwa AA Hill DL Hallnor EJiang H Dixon M Derr M Hunsaker M Kumar R Osborne RB(2014) Haswell The fourth-generation intel core processor IEEEMicro 1034(2)6ndash20
38 Qureshi MK (2018) CEASER mitigating conflict-based cacheattacks via encrypted-address and remapping In 2018 51Stannual IEEEACM international symposium on microarchitecture(MICRO) IEEE pp 775ndash787
39 Sanchez D Kozyrakis C (2013) ZSIm fast and accuratemicroarchitectural simulation of thousand-core systems In ACMSIGARCH computer architecture news vol 41 ACM pp 475ndash486
40 Schwarz M Schwarzl M Lipp M Gruss D (2018) NetSpectreread arbitrary memory over network In European Symposium onResearch in Computer Security Springer pp 279ndash299
41 Seznec A (1993) A case for two-way skewed-associative cachesACM SIGARCH computer architecture news 21(2)169ndash178
42 Sharkey J Ponomarev D Ghose K (2005) M-sim a flexiblemultithreaded architectural simulation environment Techenicalreport Department of Computer Science State University ofNew York at Binghamton In Technical Report 20012 CompaqComputer Corporation Aug 2001
J Hardw Syst Secur (2019) 3397ndash425424
43 Shivakumar P Jouppi NP (2001) Cacti 30 an integrated cachetiming power and area model In Technical Report 20012Compaq Computer Corporation Aug 2001
44 Taylor G Davies P Farmwald M (1990) The TLB slice-alow-cost high-speed address translation mechanism In 17thannual international symposium on computer architecture 1990Proceedings IEEE pp 355ndash363
45 Thoziyoor S Muralimanohar N Ahn JH Jouppi NP (2008)CACTI 51 Technical Report HPL-2008-20 HP Labs
46 Trippel C Lustig D Martonosi M (2018) MeltdownPrimeand SpectrePrime automatically-synthesized attacks exploitinginvalidation-based coherence protocols arXiv180203802
47 Wang S Wang P Liu X Zhang D Wu D (2017) CacheDidentifying cache-based timing channels in production softwareIn 26th USENIX security symposium USENIX association
48 Wang Y Ferraiuolo A Zhang D Myers AC Suh GE (2016)SecDCP secure dynamic cache partitioning for efficient timingchannel protection In 2016 53nd ACMEDACIEEE designautomation conference (DAC) IEEE pp 1ndash6
49 Wang Z Lee RB (2007) New cache designs for thwarting softwarecache-based side channel attacks In ACM SIGARCH computerarchitecture news vol 35 ACM pp 494ndash505
50 Wang Z Lee RB (2008) A novel cache architecture withenhanced performance and security In 2008 41st IEEEACMinternational symposium on Microarchitecture 2008 MICRO-41IEEE pp 83ndash93
51 Werner M Unterluggauer T Giner L Schwarz M Gruss D Man-gard S (2019) ScatterCache thwarting cache attacks via cache setrandomization In 28th USENIX security symposium (USENIXSecurity 19) USENIX Association Santa Clara CA httpswwwusenixorgconferenceusenixsecurity19presentationwerner
52 Yan M Choi J Skarlatos D Morrison A Fletcher C Torrellas J(2018) InvisiSpec making speculative execution invisible in thecache hierarchy In 2018 51st annual IEEEACM internationalsymposium on microarchitecture (MICRO) IEEE pp 428ndash441
53 Yan M Gopireddy B Shull T Torrellas J (2017) Secure hierarchy-aware cache replacement policy (SHARP) defending againstcache-based side channel attacks In Proceedings of the 44thannual international symposium on computer architecture ACMpp 347ndash360
54 Yao F Doroslovacki M Venkataramani G (2018) Are coherenceprotocol states vulnerable to information leakage In 2018IEEE international symposium on high performance computerarchitecture (HPCA) IEEE pp 168ndash179
55 Yarom Y Falkner K (2014) FLUSH+ RELOAD a high resolutionlow noise L3 cache side-channel attack In USENIX securitysymposium pp 719ndash732
56 Zhang D Askarov A Myers AC (2012) Language-based controland mitigation of timing channels ACM SIGPLAN Not 47(6)99ndash110
57 Zhang D Wang Y Suh GE Myers AC (2015) A hardware designlanguage for timing-sensitive information-flow security In ACMSIGARCH computer architecture news vol 43 ACM pp 503ndash516
58 Zhang S Wright A Bourgeat T Arvind A (2018) Composablebuilding blocks to open up processor design In 2018 51stannual IEEEACM international symposium on microarchitecture(MICRO) IEEE pp 68ndash81
59 Zhang T Lee RB (2014) New models of cache architecturescharacterizing information leakage from cache side channels InProceedings of the 30th annual computer security applicationsconference ACM pp 96ndash105
60 Zhang Y Parikh D Sankaranarayanan K Skadron K Stan M(2003) HotLeakage a temperature-aware model of subthresholdand gate leakage for architects University of Virginia Dept ofComputer Science Tech Report CS-2003 5
Publisherrsquos Note Springer Nature remains neutral with regard tojurisdictional claims in published maps and institutional affiliations
J Hardw Syst Secur (2019) 3397ndash425 425
utilize differentiating sensitive data partitioning andrandomization
Differentiating Sensitive Data (columns for CATalyst cacheto columns for Random Fill cache in Tables 4 and 5) allowsthe victim or attacker software or management software toexplicitly label a certain range of the data of victim whichthey think is sensitive The victim process or managementsoftware is able to use cache-specific instructions to protectthe data and limit internal interference between victimrsquosown data For example it is possible to disable victimrsquosown flushing of victimrsquos labeled data and therefore preventvulnerabilities that leverage flushing This technique allowsthe designer to have stronger control over security criticaldata rather than forcing the system to assume all of victimrsquosdata is sensitive However how to identify sensitive dataand whether this identification process is reliable are openresearch questions for caches that support differentiation ofsensitive data
This technique is independent of whether a cache usespartitioning or randomization techniques to eliminate sidechannels between the attacker and the victim Caches thatare able to label and identify sensitive data have theadvantage in preventing internal interference since they areable to differentiate sensitive data from the normal dataand can make use of special instructions to give moreprivileges to sensitive data However it requires careful usewhen identifying the actual sensitive data and implementingcorresponding security features on the cache
Comparing PL cache with SPlowast cache although bothof them use partitioning flush is able to be implementedto be disabled for victimrsquos sensitive data in PL cachewhere Vu V inv
a Vu (slow) (one type of Flush +Time) is prevented Newcache is able to prevent Vu Va Vu (slow) (one type of Bernsteinrsquos Attack [3])while most of the caches without ability to differentiatesensitive data cannot because Newcache disallows replacingdata as long as either data to be evicted or data to becached is identified to be sensitive However permittingdifferentiation of sensitive data can potentially backfire onthe cache itself For example Random Fill cache cannotprevent Vu Ad Vu (slow) (one type of Evict +Time [34]) which most of the other caches can prevent oravoid because the random fill technique loses its intendedrandom behavior when the security critical data is initiallyloaded into the cache in Step 1
Partitioning-Based Caches usually limit the victim and theattacker to be able to only access a limited set of cacheblock (columns for SPlowast cache to column for PL cachein Tables 4 and 5) For example either there is staticor dynamic partitioning of caches which allocates someblocks to High victim and Low attacker The partitioning
can be based not just on whether the memory access isvictimrsquos or attackerrsquos but also on where the access is to(eg High partition is determined by the data address) Forspeculative execution attackerrsquos code can be the part ofspeculation or out-of-order load or store which is able to bepartitioned (eg using speculative load buffer) from othernormal operations The partitioning granularity can be cachesets cache lines or cache ways Partitioning-based securecaches are usually able to prevent external interference bypartitioning but are weak at preventing internal interferenceWhen partitioning is used interference between the attackerand the victim or data belonging to different securitylevels should not be possible and attacks based on externalinterference between the victim and the attacker will failHowever the internal interference of victimrsquos own datais hard to be prevented by the partitioning-based cachesWhatrsquos more partitioning is recognized to be wastefulin terms of cache space and inherently degrades systemperformance [49] Dynamic partitioning can help limit thenegative performance and space impacts but it could be ata cost of revealing some information when adjusting thepartitioning size for each part It also does not help withinternal interference prevention
In terms of the three-step model the partitioning-basedcaches excel at making use of partitioning techniques todisallow the attacker to set initial states (Step 0) of victimpartition by use of flushing or eviction and therefore bringuncertainty to the final timing observation made by theattacker
SPlowast cache can prevent external miss-based interferencebut it still allows the victim and the attacker to getcache hits due to each otherrsquos data which makes hit-based vulnerabilities happen eg Vd Vu Va (fast)(one type of Cache Internal Collision [5]) vulnerabilityis one of the examples that SPlowast cache cannot preventSecVerilog cache is similar to SPlowast cache but prevents theattacker from directly getting cache hit due to victimrsquos datafor confidentiality and therefore prevents vulnerabilitiessuch as Ainv
a Vu Aa (fast) (one type of Flush+ Reload [55]) SHARP cache mainly uses partitioningcombined with random eviction to minimize the probabilityof evicting victimrsquos data and prevent external miss-basedvulnerabilities It is vulnerable to hit-based or internalinterference vulnerabilities such as Vu Va Vu (slow)(one type of Bernsteinrsquos Attack [3]) vulnerability DAWGcache will only allow the data to get a cache hit if bothits address and the process ID are the same Thereforecompared with normal partitioning cache such as SPlowast cacheit is able to prevent vulnerabilities such as Vd Vu Ainv
d
(fast) (one type of Prime + Flush)SecDCP and NoMo cache both leverage dynamic
partitioning to improve performance Compared withSecVerilog cache SecDCP cache introduces certain side
J Hardw Syst Secur (2019) 3397ndash425414
channels which manifest themselves when the number ofways assigned to the victim and attacker changes egVu Ainv
a Vu (slow) (one type of Flush + Time)vulnerability NoMo cache behaves more carefully whenchanging the number of ways during dynamic partitioninghowever it requires victimrsquos sensitive data to fit intothe assigned partitions otherwise it will be put into theunreserved way and allow eviction by the attacker SecDCPdoes not have unreserved way All the space in the cachewill be either belongs to High or Low partition
Sanctum cache and CATalyst cache are both controlledby a powerful software monitor and they disallow securepage sharing between victim and attacker to preventvulnerabilities such as Ad Vu Aa (fast) (onetype of Flush + Reload [55]) Sanctum cache does notconsider internal interference while CATalyst cache ismore carefully designed to prevent different vulnerabilitieswith the implemented software system so far supportingpreventing all of the vulnerabilities but only works forLLC and with high software implementation complexityand some assumptions that might be hard to achievein other scenarios eg assuming the secure partition isbig enough to fit all the secure data MI6 cache is thecombination of Sanctum and disabling speculation wheninteracting with the outside world Therefore in normalexecution it behaves the same as Sanctum For speculativeexecution because it will simply disable all the speculationwhen involving the outside world the external interferencevulnerability such as Vd Vu Ad (slow) (one type ofEvict + Probe) vulnerability will be prevented
InvisiSpec cache does not modify the original cachestate but places the data in a speculative buffer partitionduring the speculation or out-of-order load or store Sinceduring speculation cache state is not actually updated thespeculative execution cannot trigger any of the steps inthe three-step model RIC cache focuses on eviction basedattack and therefore are good at preventing even someinternal miss-based vulnerability such as Vu Va Vu
(slow) (one type of Bernsteinrsquos Attack [3]) but are bad atall hit-based vulnerabilities PL cache is line-partitionedand uses locking techniques for victimrsquos security criticaldata It can prevent many vulnerabilities because preloadingand locking secure data disallow the attacker or non-securevictim data to set initial states (Step 0) for victim partitionand therefore brings uncertainty to the final observation bythe attacker eg Ad Vu Va (fast) (one type of CacheInternal Collision [5]) vulnerability is prevented
Randomization-Based Caches (columns for SHARP cacheand columns for RP cache to columns for Non Deterministiccache in Tables 4 and 5) inherently de-correlate therelationship between information of victimrsquos securitycritical datarsquos address and observed timing from cache
hit or miss or between the address and observed timingof flush or cache coherence operations For speculativeexecution they also de-correlate the relationship betweenthe address of the data being accessed during speculativeexecution or out-of-order load or store and the observedtiming from a cache hit or miss Randomization can beused when bringing data into the cache evicting data orboth Some designs randomize the address to cache setmapping As a result of the randomization the mutualinformation from the observed timing due to having ornot having data in the cache could be reduced to 0 ifrandomization is done on every memory access Somesecure caches use randomization to avoid many of the miss-based internal interference vulnerabilities However theymay still suffer from hit-based vulnerabilities especiallywhen the vulnerabilities are related to internal interferenceHowever randomization is also likewise recognized toincrease performance overheads [23] It also requires afast and secure random number generator Most of therandomization is cache-line-based and can be combinedwith differentiation of sensitive data to be more efficient
RP cache allows eviction between different sensitivedata which leaves vulnerabilities such as Vu Va Vu
(slow) (one type of Bernsteinrsquos Attack [3]) still possiblewhile Newcache prevents this Both of the RP cacheand Newcache are not able to prevent hit-based internal-interference vulnerabilities such as Ainv
a Vu Va
(fast) (one type of Cache Internal Collision [5]) RandomFill cache is able to use total de-correlation of memoryaccess and cache access of victimrsquos security critical datato prevent most of the internal and external interferenceHowever when security critical data is initially directlyloaded into the cache block for Step 1 Random Fill cachewill not randomly load security critical data and allowsvulnerabilities such as Vu V inv
a Vu (slow) (one typeof Flush + Time) vulnerability to exist CEASER cache usesencryption scheme plus dynamic remapping to randomizemapping from memory addresses to cache sets Howeverthis targets eviction-based attacks and cannot prevent hit-based vulnerabilities such as Va V inv
u V inva (fast)
(one type of Flush + Probe Invalidation) SCATTER cacheencrypts both the cache address and process ID whenmapping into different cache index to further prevent morehit-based vulnerabilities for shared and read only memoryNon Deterministic Cache totally randomizes timing ofcache accesses by adding delays and can prevent all attacks(but at tremendous performance cost)
61 Estimated Performance and Security Tradeoffs
Table 6 shows the implementation and performanceresults of the secure caches as listed by the designersin the different papers At the extreme end there is
J Hardw Syst Secur (2019) 3397ndash425 415
Table6
Exi
stin
gse
cure
cach
esrsquo
impl
emen
tatio
nm
etho
dpe
rfor
man
cep
ower
and
area
sum
mar
y
J Hardw Syst Secur (2019) 3397ndash425416
the Non Deterministic cache with random delay thesecure cache can prevent all the cache timingndashbasedvulnerabilities in some degreemdashwhile their paper reportsonly 7 degradation in performance we expect it tobe much more for more complex application than AESalgorithm Disabling caches eliminates the attacks but ata huge performance cost Normally a secure cache needsto sacrifice some performance in order to de-correlatememory access with the timing The secure caches thattend to be able to prevent more vulnerabilities usuallyhave weaker performance compared with other securecaches For example more security seems to imply lessperformance
62 Towards Ideal Secure Cache
Based on the above analysis a good secure cache shouldconsider all the 72 types of Strong vulnerabilities egexternal and internal interference and hit-based and miss-based vulnerabilities Considering all factors and basedon Tables 4 and 5 we have several suggestions andobservations for a secure cache design which can defendtiming-based attacks
bull Internal interference is important for caches to preventtiming-based attacks and is the weak point of most ofthe secure caches To prevent this the following threesubpoints should be considered
ndash Miss-based internal interference can be solvedby randomly evicting data to de-correlatememory access with timing information wheneither data to be evicted or data to be cachedis sensitive eg Newcache prevents Vu Va Vu (slow) (one type of BernsteinrsquosAttack [3]) vulnerability
ndash Hit-based internal interference can be solvedby randomly bringing data into the cache egRandom Fill cache prevents Ad Vu Va
(fast) (Cache Internal Collision) vulnerabilityndash To limit internal interference at lower perfor-
mance cost rather than simply assume all ofvictimrsquos data is sensitive it is better to differ-entiate real sensitive data from other data in thevictim code However identification of sen-sitive information needs to be carefully usedeg Random Fill cache is vulnerable to Vu Ad Vu (fast) (one type of Evict + Time [34])vulnerability which most of the secure cachesare able to prevent
bull Direct partitioning between the victim and the attackeralthough may hurt cache space utilization or perfor-mance is good at disallowing attacker to set known
initial state to victimrsquos partition and therefore pre-vents external interference Alternatively careful use ofrandomization can also prevent external interference
It should be noted that some cache designs only focuson certain levels eg CATalyst cache only works at thelast level cache In order to fully protect the whole cachesystem from timing-based attacks all levels of caches in thehierarchy should be protected with related security featuresFor example Sanctum is able to prevent all levels of cachesfrom L1 to last-level cache Consequently secure cachedesign needs to be realizable at all levels of the cachehierarchy
7 RelatedWork
There are a lot of existing attacks exploring timing-based cache channels eg [1 3 5 16ndash19 34 36 5455] Furthermore our recent paper [11] has summarizedcache timingndashbased side-channel vulnerabilities using athree-step model and inspired this work on checkingwhich vulnerability types are truly defeated by the securecaches in context of timing-based attacks In other workZhang and Lee [59] used finite-state machine to modelcache architectures and leveraged mutual information tomeasure potential side-channel leakage of the modeledcache architectures Meanwhile He and Lee [20] modeledinterference using probabilistic information flow graph andused attackerrsquos success probability to estimate differentcachesrsquo ability to defend against some cache timingndashbasedside-channel attacks However they did not explore allpossible vulnerabilities due to cache timingndashbased channels
There is also some other work focusing on cacheside-channel verification [13 14 47] Among theseCacheAudit [14] efficiently computes possible side-channelobservations using abstractions in a modular way Bit-level and arithmetic reasoning is used in [13] for memoryaccesses in the presence of dynamic memory allocationCacheD [47] detects potential cache differences at eachprogram point leveraging symbolic execution and constraintsolving
Hardware transactional memory has also been lever-aged to prevent timing-based cache side-channel attacks [815] Hardware transactional memory (HTM) is availableon modern commercial processors such as Intelrsquos Transac-tional Synchronization Extensions (TSX) Its main featureis to abort the transaction and roll back the modificationswhenever a cache block contained in the read set or writeset is evicted out of the cache In [15] HTM was com-bined with preloading strategy for code and data to preventFlush + Reload attacks in the local setting and Prime andProbe attacks in the cloud setting In [8] the software-level
J Hardw Syst Secur (2019) 3397ndash425 417
solution targets system calls page faults code refactoringand abort reasoning to eliminate not only Prime + ProbeFlush + Reload but also Evict + time and Cache Collisionattacks
8 Conclusion
This paper first proposed a new three-step model in orderto model all possible cache timing vulnerabilities It furtherprovided a cache three-step simulator and reduction rulesto derive effective vulnerabilities allowing us to find onesthat have not been exploited in literature With exhaustiveeffective vulnerability types listed this paper presentedanalysis of 18 secure processor cache designs with respectto how well they can defend against these timing-basedvulnerabilities Our work showed that vulnerabilities basedon internal interference of the victim application aredifficult to protect against and many secure cache designsfail in this We also provided a summary of secure processorcache features that could be integrated to make an idealsecure cache that is able to defend timing-based attacksOverall implementing a secure cache in a processor can be aviable alternative to defend timing-based attacks Howeverit requires design of an ideal secure cache or correction ofexisting secure cache designs to eliminate the few attacksthat they do not protect against
Acknowledgments This work was support in part by NSF grants1651945 and 1813797 and through Semiconductor Research Corpo-ration (SRC) under GRC Task 2844001
Appendix A Attack Strategies Descriptions
This appendix gives overview of the attack strategies shownin Tables 2 and 3 in Section 3 For each attack strategy anoverview of the three steps of the strategy is given Some ofthe strategies are similar and some may not be precise butwe keep and use the original names as they were assignedin prior work One advantage of our three-step model isthat it gives precise definition of each attack Neverthelessthe attack strategy names used before (and added by us forstrategies which did not have such names) may be useful torecall the attacksrsquo high-level operation
Cache Internal Collision In Step 1 cache blockrsquos data isinvalidated by flushing or eviction done by either theattacker or the victim Then the victim accesses secret datain Step 2 Finally the victim accesses data at a knownaddress in Step 3 if there is a cache hit then it reveals thatthere is an internal collision and leaks value of u
Flush + Reload In Step 1 either the attacker or the victiminvalidates the cache blockrsquos data by flushing or evictionThen the victim access secret data in Step 2 Finally theattacker tries to access some data in Step 2 using a knownaddress If a cache hit is observed then addresses from lasttwo steps are the same and the attacker learns the secretaddress This strategy has similar Step 1 and Step 2 asCache Internal Collision vulnerability but for Step 3 it isthe attacker who does the reload access
Reload + Time (New Name Assigned in this Paper) In Step
1 secret data is invalidated by the victim Then the attackerdoes some known data access in Step 2 that could possiblybring back the invalidated the victimrsquos secret data in Step
1 In Step 3 if the victim reloads the secret data a cachehit is observed and the attacker can derive the secret datarsquosaddress
Flush + Probe (New Name Assigned in this Paper) In Step
1 the victim or the attacker access some known addressIn Step 2 the victim invalidates secret data In Step 3reloading of Step 1rsquos data and observation of a cache misswill help the attacker learn that the secret data maps to theknown address from Step 1
Evict + Time In Step 1 some victimrsquos secret data is put intothe cache by the victim itself In Step 2 the attacker evicts aspecific cache set by performing a memory related operationthat is not a flush In Step 3 the victim reloads secret dataand if a cache miss is observed the will learn the secretdatarsquos cache set information This attack has similar Step 1and Step 3 as Flush + Time vulnerability but for Step 2 inEvict + Time the attacker invalidates some known addressallowing it to find the full address of the secret data insteadof evicting a cache set to only find the secret datarsquos cacheindex as in the Flush + Time attack
Prime + Probe In Step 1 the attacker primes the cache setusing data at address known to the attacker In Step 2 thevictim accesses the secret data which possibly evicts datafrom Step 1 In Step 3 the attacker probes each cache setand if a cache miss is observed the attacker knowns thesecret data maps to the cache set he or she primed
Bernsteinrsquos Attack This attack strategy leverages the vic-timrsquos internal interference to trigger the miss-based attackFor one case the victim does the same secret data access inStep 1 and Step 3 while in Step 2 the victim tries to evictone whole cache setrsquos data by known data accesses If cachemiss is observed in Step 3 that will tell the attacker thecache set is the one secret data maps to For another casethe victim primes and probe a cache set in Step 1 and Step
3 driven by the attacker while in Step 2 the victim tries to
J Hardw Syst Secur (2019) 3397ndash425418
access the secret data Similar to the first case observingcache miss in Step 3 tells the attacker the cache set is theone secret data maps to
Evict + Probe (NewName Assigned in this Paper) In Step 1victim evict the cache set using the access to a data at anaddress known to the attacker In Step 2 the victim accessessecret data which possibly evicts data from Step 1 In Step
3 the attacker probes each cache set using the same data asin Step 1 if a cache miss is observed the attacker knownsthe secret data maps to the cache set he or she primed Thisattack strategy has similar Step 2 and Step 3 as Prime +Probe attack but for Step 1 it is the victim that does theeviction accesses
Prime + Time (NewName Assigned in this Paper) In Step 1the attacker primes the cache set using access to data at anaddress known to the attacker In Step 2 the victim accessessecret data which possibly evicts data from Step 1 In Step
3 the victim probes each cache set using the same data Step
1 if a cache miss is observed the attacker knowns the secretdata maps to the cache set he or she primed in Step 1 Thisattack strategy has similar Step 1 and Step 2 as Prime +Probe attack but for Step 3 it is the victim that does theprobing accesses
Flush + Time (NewNameAssigned in this Paper) The victimaccesses the same secret data in Step 1 and Step 3 whilein Step 2 the attacker tries to invalidate data at a knownaddress If cache miss is observed in Step 3 that will tellthe attacker the data address he or she invalidated in Step 2maps to the secret data
Invalidation related (new names assigned in this paper)Vulnerabilities that have names ending with ldquoinvalidationrdquoin Table 3 correspond to the vulnerabilities that havethe same name (except for the ldquoinvalidationrdquo part) inTable 2 The difference between each set of correspondingvulnerabilities is that the vulnerabilities ending withldquoinvalidationrdquo use invalidation related operation in the laststep to derive the timing information rather than the normalmemory access related operations
Appendix B Soundness Analysisof the Three-StepModel
In this section we analyze the soundness of the three-stepmodel to demonstrate that the three-step model can coverall possible timing-based cache vulnerabilities in normalcaches If there is a vulnerability that is represented usingmore than three steps the steps can be reduced to only three
steps or a three-step sub-pattern can be found in the longerrepresentation
In the below analysis we use β to denote the numberof memory-related operations ie steps in a representationof a vulnerability We show that β = 1 is not sufficient torepresent a vulnerability β = 2 covers some vulnerabilitiesbut not all β = 3 represents all the vulnerabilities andβ gt 3 can be reduced to only three steps or a three-step sub-pattern can be found in the longer representation Knownaddresses refer to all the cache states that interferencewith the data a aalias and d Unknown address refersu An access to a known memory address is denoted asknown access operation and an invalidation of a knownmemory address is denoted as known inv operationThe known access operation and known inv operation
together make up not u operations An unknown memoryrelated operation (containing u) is denoted as u operation
B1 Patterns with β = 1
When β = 1 there is only one memory-relatedoperation and it is not possible to create interferencebetween memory-related operations since two memory-related operations are the minimum requirement for aninterference Furthermore β = 1 corresponds to the three-step pattern with both Step 1 and Step 2 being since thecache state gives no information and Step 3 being theone operation These types of patterns are all examined bythe cache three-step simulator and none of these types arefound to be effective Consequently a vulnerability cannotexit when β = 1
B2 Patterns with β = 2
When β = 2 it satisfies the minimum requirementof an interference for memory related operations andcorresponds to the three-step cases where Step 1 is and Step 2 and Step 3 are the two operations Thesetypes are all examined by the cache three-step simulatorand some of them belong to Weak Vulnerabilities like Aa Vu Therefore three-step cases where Step
1 is have corresponding effective vulnerabilities shownin Table 2 Consequently β = 2 can represent someweak vulnerabilities but not all vulnerabilities as there existsome that are represented with three steps as discussednext
B3 Patterns with β = 3
When β = 3 we have tested all possible combinations ofthree-step memory related operations in Section 33 usingour cache simulator for the three-step model We foundthat there are in total 72 types of Strong Vulnerabilities
J Hardw Syst Secur (2019) 3397ndash425 419
and 64 types of Weak Vulnerabilities that are representedby patterns with β = 3 steps Consequently β = 3can represent all the vulnerabilities (including some weakones where Step 1 is ) Using more steps to representvulnerabilities is not necessary as discussed next
B4 Patterns with β gt 3
When β gt 3 the pattern of memory-related operations fora vulnerability can be reduced using the following rules
B41 Subdivision Rules
First a set of subdivision rules is used to divide the longpattern into shorter patterns following the below rulesEach subdivision rule should be applied recursively beforeapplying the next rule
Subdivision Rule 1 If the longer pattern contains a sub-pattern such as the longer pattern canbe divided into two separate patterns where is assignedas Step 1 of the second pattern This is because givesno timing information and the attacker loses track of thecache state after This rule should be recursively applieduntil there are no sub-patterns left with a in the middleor as last step ( in the last step will be deleted) in thelonger pattern
Subdivision Rule 2 Next if a pattern (derived afterrecursive application of the Rule 1 contains a sub-patternsuch as AinvVinv the longer pattern canbe divided into two separate patterns where AinvVinv isassigned as Step 1 of the second pattern This is becauseAinvVinv will flush all the timing information of thecurrent block and it can be used as the flushing step forStep 1 eg vulnerability Ainv Vu Aa(f ast)shown in Table 2 AinvVinv cannot be a candidate formiddle steps or the last step because it will flush alltiming information making the attacker unable to deducethe final timing with victimrsquos sensitive address translationinformation This rule should be recursively applied untilthere are no sub-patterns left with a AinvVinv in themiddle or the last step (AinvVinv in the last step will bedeleted)
B42 Simplification Rules
For each of the patterns resulting from the subdivisionof the original pattern we define Commute Rules UnionRules and Reduction Rules for a each set of two adjacentsteps in these remaining patterns In Table 7 we showall the possible cases of the rule applying conditions for
each adjacent two steps regardless of the attackerrsquos access(A) or the victimrsquos access (V ) The table shows whetherthe corresponding two steps can be commuted reduced orunioned (and the reduced or the unioned result if the rulescan be applied)
B421 Commute Rules
Suppose there are two adjacent steps M and N for a memorysequences M N If commuting M andN lead to the same observation result ie M N and N M will have thesame timing observation information in the final step for theattacker we can freely exchange the place of M and N inthis pattern In this case we have more chance to Reduceand Union the steps within the memory sequence by thefollowing Rules In the possible commuting process we willtry every possible combinations to commute different pairsof two steps that are able to apply the Commute Rules andthen further apply Reduce Rules and Union Rules to seewhether the commute is effective ie there can be stepsreduced or unioned after the proper commuting process Thefollowing two adjacent memory-related operations can becommuted
ndash Commute Rule 1 For two adjacent steps if one stepis a known access operation and another step is aknown inv operation and the addresses they referto are different these two steps can be commuted nomatter which position of the two steps they are in withinthe whole memory sequence It will show a ldquoyesrdquo forthe corresponding two-step pattern for the CommuteRule 1 column if these two can be commuted in Table 7
ndash Commute Rule 2 A superset of two-step patterns thatcan apply Commute Rule 1 can be commuted if thesecond step of these two adjacent steps is not thelast step in the whole memory sequence There aresome two adjacent steps that can only be commutedif the second step of these two adjacent steps is notthe last step in the whole memory sequence Therewill be a ldquoyesrdquo for the corresponding two-step patternfor the Commute Rule 2 column and a ldquonordquo for thecorresponding two-step pattern for the Commute Rule 1column in Table 7
B422 Reduction Rules
If the memory sequence after applying Commute Ruleshave a sub-pattern that has two adjacent steps both relatedto known addresses or both related to unknown address(including repeating states) the two adjacent steps can bereduced to only one following the reduction rules (if thetwo-step pattern has ldquoyesrdquo for the Column ldquoUnion Rule or
J Hardw Syst Secur (2019) 3397ndash425420
Table7
Rul
esfo
rco
mbi
ning
two
adja
cent
step
s
Firs
tSe
cond
Com
mute
Com
mute
Union
Rule
orC
ombi
ned
Firs
tSe
cond
Com
mute
Com
mute
Union
Rule
orC
ombi
ned
Rule1
Rule2
ReduceRule
Step
Rule1
Rule2
ReduceRule
Step
aa
yes
yes
yes
aa
inv
ano
noye
sa
aa
ali
as
nono
yes
aali
as
ain
va
ali
as
yes
yes
yes
aali
as
ad
nono
yes
da
inv
dye
sye
sye
sd
au
nono
nominus
ain
vu
nono
nominus
aa
inv
nono
yes
ain
va
inv
ain
vye
sye
sye
sa
inv
aa
ali
asin
vye
sye
sye
sa
ain
va
ali
asin
vye
sye
sye
sU
nio
n(a
inva
ali
asin
v)
ad
inv
yes
yes
yes
aa
inv
din
vye
sye
sye
sU
nio
n(a
invd
inv)
au
inv
nono
nominus
ain
vu
inv
noye
sno
minusa
ali
as
ano
noye
sa
aali
asin
va
yes
yes
yes
a
aali
as
aali
as
yes
yes
yes
aali
as
aali
asin
va
ali
as
nono
yes
aali
as
aali
as
dno
noye
sd
aali
asin
vd
yes
yes
yes
d
aali
as
uno
nono
minusa
ali
asin
vu
nono
nominus
aali
as
ain
vye
sye
sye
sa
ali
as
aali
asin
va
inv
yes
yes
yes
Unio
n(a
inva
ali
asin
v)
aali
as
aali
asin
vno
noye
sa
ali
asin
va
ali
asin
va
ali
asin
vye
sye
sye
sa
ali
asin
v
aali
as
din
vye
sye
sye
sa
ali
as
aali
asin
vd
inv
yes
yes
yes
Unio
n(d
inva
ali
asin
v)
aali
as
uin
vno
nono
minusa
ali
asin
vu
inv
noye
sno
minusd
ano
noye
sa
din
va
yes
yes
yes
a
da
ali
as
nono
yes
aali
as
din
va
ali
as
yes
yes
yes
aali
as
dd
yes
yes
yes
dd
inv
dno
noye
sd
du
nono
nominus
din
vu
noye
sno
minusd
ain
vye
sye
sye
sd
din
va
inv
yes
yes
yes
Unio
n(a
invd
inv)
da
ali
asin
vye
sye
sye
sd
din
va
ali
asin
vye
sye
sye
sU
nio
n(d
inva
ali
asin
v)
dd
inv
nono
yes
din
vd
inv
din
vye
sye
sye
sd
inv
du
inv
noye
sno
minusd
inv
uin
vno
yes
nominus
ua
nono
nominus
uin
va
nono
nominus
ua
ali
as
nono
nominus
uin
va
ali
as
nono
nominus
ud
nono
nominus
uin
vd
noye
sno
minusu
uye
sye
sye
su
uin
vu
nono
yes
u
ua
inv
nono
nominus
uin
va
inv
noye
sno
minusu
aali
asin
vno
nono
minusu
inv
aali
asin
vno
yes
nominus
ud
inv
noye
sno
minusu
inv
din
vno
yes
nominus
uu
inv
nono
yes
uin
vu
inv
uin
vye
sye
sye
su
inv
J Hardw Syst Secur (2019) 3397ndash425 421
Reduce Rulerdquo and has no Union result for the ldquoCombinedSteprdquo Column in Table 7
ndash Reduction Rule 1 For two u operations although u isunknown both of the accesses target on the same u socan be reduced to only keep the second access in thememory sequence
ndash Reduction Rule 2 For two known adjacent memoryaccessndashrelated operations (known access operation)they always result in a deterministic state of the secondmemory accessndashrelated cache block so these two stepscan be reduced to only one step
ndash Reduction Rule 3 For two adjacent steps if onestep is known access operation and another one isknown inv operation no matter what order theyhave and the address they refer to is the same these twocan be reduced to one step which is the second step
B423 Union Rules
Suppose there are two adjacent steps M and N for memorysequences M N If combining M andN leads to the same timing observation result ie M N and Union(M N) will havethe same timing observation information in the final step forthe attacker we can combine step M and N to be a joint onestep for this memory sequence defined as Union(M N)Two adjacent steps that can be combined are discussed inthe following cases
ndash Union Rule 1 Two invalidations to two knowndifferent memory addresses can be applied UnionRule 1 known inv operation are two operations bothinvalidating some known address therefore they canbe combined to only one step The Union Rule can becontinuously done to union all the adjacent invalidationstep that invalidates known different memory addresses
B424 Final Check Rules
Each long memory sequence will recursively apply thesethree categorizations of the rules in the order Com-mute Rules first to put known access operations andknown inv operation that targets the same address as nearas possible and u operations and not u operations areputting together as much as possible The Reduced Rulesare then checked and applied to the processed memorysequence to reduce the steps Then the Union Rule isapplied to the processed memory sequence
The recursion at each application to these threecategorizations of the rules should be always applied andreduce at least one step until the resulting sequence matchesone of the two possible cases
bull the long (β gt 3) memory sequence with u operation
and not u operation is further reduced to a sequencewhere there are at most three steps in the followingpatterns or less
ndash u operation not u operation u operation
ndash not u operation u operation not u operation
There might be possible extra or AinvV inv beforethese three-step pattern where
bull An extra in the first step will not influencethe result and can be directly removed
bull If an extra AinvV inv in the first step
ndash If followed by known access ope-ration AinvV inv can be removeddue to the actual state further put intothe cache block
ndash If followed by known inv opera-t ion or V inv
u AinvV inv can also beremoved since the memory locationis repeatedly flushed by the two steps
ndash If followed by Vu worst case will beAinvV inv Vu not u operation u operationwhich is either an effective vul-nerability according to Table 2and reduction rules shown inSection 33 or AinvV inv Vu Ainv
d V invd u operation where
Vu Ainvd V inv
d can further beapplied Commute Rule 2 to reduceand be within three steps
In this case the steps are finally within three steps andthe checking is done
bull There exist two adjacent steps that cannot be appliedany Rules above and requires the Rest Checking
The only left two adjacent steps that cannot be appliedby any of the three categorizations of the Rules are thefollowing
ndash AaVaAaalias Vaalias AdVdAinva V inv
a
Ainvaalias V inv
aalias Vu ndash AaVaAaalias Vaalias V inv
u ndash Vu AaVaAaalias Vaalias AdVd
Ainva V inv
a Ainvaalias V inv
aalias ndash V inv
u AaVaAaalias Vaalias We manually checked all of the two adjacent step patterns
above and found that adding extra step before or after these
J Hardw Syst Secur (2019) 3397ndash425422
Algorithm 2 -Step ( 3) pattern reduction
Input number of steps of the pattern
a two-dimensional dynamic-size array [0] contains the states of each step of the original pattern in
order [1] [2] are empty initially
Output reduced effective vulnerability pattern(s) array It will be an empty list if the original pattern does not
correspond to an effective vulnerability
1 = Oslash
2 while contain( ) and index not 0 do3 = 1 ( )
4 end while5 while ( contain( ) and index not 0) or ( contain( ) and index not 0) do6 = 2 ( )
7 end while8 while ( is ineffecitve or has interval effective three steps) do9 = ( )
10 = ( )
11 = ( )
12 if ( is ineffecitve or has interval effective three steps) then13 += ( )
14 end if15 end while16 return
two steps can either generate two adjacent step patterns thatbe processed by the three Rules where further step can bereduced or construct effective vulnerability according toTable 2 and reduction rules shown in Section 33 wherethe corresponding pattern can be treated effective and thechecking is done
B43 Algorithm for Reducing and CheckingMemorySequence
Algorithm 2 is used to (i) reduce a β-step (β gt 3)pattern to a three-step pattern thus demonstrating that thecorresponding β gt 3 step pattern actually is equivalent tothe output three-step pattern and represents a vulnerabilitythat is captured by an existing three-step pattern or (ii)demonstrate that the β-step pattern can be mapped to one ormore three-step vulnerabilities It is not possible for a β-stepvulnerability pattern to not be either (i) or (ii) after doing theRule applications Key outcome of our analysis is that anyβ-step pattern is not a vulnerability or if it is a vulnerabilityit maps to either outputs (i) or (ii) of the algorithm
Inside the Algorithm 2 contain() represents a function tocheck if a list contains a corresponding state is ineffective()represents a function that checks the corresponding memorysequence does not contain any effective three-stepshas interval effective three steps() represents a functionthat check if the corresponding memory sequence can bemapped to one or more three-step vulnerabilities
B44 Summary
In conclusion the three-step model can model all possibletiming-based cache vulnerability in normal caches Vulner-abilities which are represented by more than three steps canbe always reduced to one (or more) vulnerabilities from ourthree-step model and thus using more than three step is notnecessary
References
1 Acıicmez O Koc CK (2006) Trace-driven cache attacks on AES(short paper) In International conference on information andcommunications security Springer pp 112ndash121
2 Agrawal D Archambeault B Rao JR Rohatgi P (2002) TheEM side-channel (s) In International workshop on cryptographichardware and embedded systems Springer pp 29ndash45
3 Bernstein DJ (2005) Cache-timing attacks on AES4 Binkert N Beckmann B Black G Reinhardt SK Saidi A Basu
A Hestness J Hower DR Krishna T Sardashti S et al (2011)The gem5 simulator ACM SIGARCH computer architecture news39(2)1ndash7
5 Bonneau J Mironov I (2006) Cache-collision timing attacksagainst AES In International workshop on cryptographichardware and embedded systems Springer pp 201ndash215
6 Borghoff J Canteaut A Guneysu T Kavun EB Knezevic MKnudsen LR Leander G Nikov V Paar C Rechberger Cet al (2012) Princendasha low-latency block cipher for pervasivecomputing applications In International conference on the theoryand application of cryptology and information security Springerpp 208ndash225
J Hardw Syst Secur (2019) 3397ndash425 423
7 Bourgeat T Lebedev I Wright A Zhang S Devadas S et al (2019)MI6 Secure enclaves in a speculative out-of-order processorIn Proceedings of the 52nd Annual IEEEACM InternationalSymposium on Microarchitecture ACM pp 42ndash56
8 Chen S Liu F Mi Z Zhang Y Lee RB Chen H Wang X(2018) Leveraging hardware transactional memory for cache side-channel defenses In Proceedings of the 2018 on Asia conferenceon computer and communications security ACM pp 601ndash608
9 Clark SS Ransford B Rahmati A Guineau S Sorber J Xu WFu K (2013) Wattsupdoc power side channels to nonintrusivelydiscover untargeted malware on embedded medical devices InPresented as part of the 2013 USENIX workshop on healthinformation technologies
10 Costan V Lebedev IA Devadas S (2016) Sanctum minimalhardware extensions for strong software isolation In USENIXsecurity symposium pp 857ndash874
11 Deng S Xiong W Szefer J (2018) Cache timing side-channel vulnerability checking with computation tree logic InProceedings of the 7th international workshop on hardware andarchitectural support for security and privacy ACM p 2
12 Domnitser L Jaleel A Loew J Abu-Ghazaleh N Ponomarev D(2012) Non-monopolizable caches low-complexity mitigation ofcache side channel attacks ACM Transactions on Architectureand Code Optimization (TACO) 8(4)35
13 Doychev G Kopf B (2017) Rigorous analysis of softwarecountermeasures against cache attacks In Proceedings of the 38thACM SIGPLAN conference on programming language design andimplementation ACM pp 406ndash421
14 Doychev G Kopf B Mauborgne L Reineke J (2015) CacheAudita tool for the static analysis of cache side channels ACM Trans-actions on information and system security (TISSEC) 18(1)4
15 Gruss D Lettner J Schuster F Ohrimenko O Haller I CostaM (2017) Strong and efficient cache side-channel protectionusing hardware transactional memory In USENIX securitysymposium pp 217ndash233
16 Gruss D Maurice C Wagner K Mangard S (2016) Flush+ flusha fast and stealthy cache attack In International conference ondetection of intrusions and malware and vulnerability assessmentSpringer pp 279ndash299
17 Gruss D Spreitzer R Mangard S (2015) Cache template attacksautomating attacks on inclusive last-level caches In USENIXsecurity symposium pp 897ndash912
18 Guanciale R Nemati H Baumann C Dam M (2016) Cache stor-age channels alias-driven attacks and verified countermeasuresIn 2016 IEEE symposium on security and privacy (SP) IEEEpp 38ndash55
19 Gullasch D Bangerter E Krenn S (2011) Cache gamesndashbringingaccess-based cache attacks on AES to practice In 2011 IEEEsymposium on security and privacy (SP) IEEE pp 490ndash505
20 He Z Lee RB (2017) How secure is your cache against side-channel attacks In Proceedings of the 50th annual IEEEACMinternational symposium on microarchitecture ACM pp 341ndash353
21 Intel C (2015) Improving real-time performance by utilizing cacheallocation technology Intel Corporation
22 Kayaalp M Khasawneh KN Esfeden HA Elwell J Abu-Ghazaleh N Ponomarev D Jaleel A (2017) RIC relaxed inclusioncaches for mitigating LLC side-channel attacks In 2017 54thACMEDACIEEE design automation conference (DAC) IEEEpp 1ndash6
23 Keramidas G Antonopoulos A Serpanos DN Kaxiras S (2008)Non deterministic caches a simple and effective defense againstside channel attacks Des Autom Embed Syst 12(3)221ndash230
24 Kessler RE Hill MD (1992) Page placement algorithms for largerealindexed caches ACM Trans Comput Syst (TOCS) 10(4)338ndash359
25 Kiriansky V Lebedev I Amarasinghe S Devadas S Emer J(2018) DAWG a defense against cache timing attacks in spec-ulative execution processors In 2018 51st Annual IEEEACMinternational symposium on microarchitecture (MICRO) IEEEpp 974ndash987
26 Kocher P Genkin D Gruss D Haas W Hamburg M LippM Mangard S Prescher T Schwarz M Yarom Y (2018)Spectre attacks exploiting speculativeexecution In 2019 IEEESymposium on Security and Privacy (SP) IEEE pp 1ndash19
27 Lee RB Kwan P McGregor JP Dwoskin J Wang Z (2005)Architecture for protecting critical secrets in microprocessorsIn ACM SIGARCH computer architecture news vol 33 IEEEComputer Society pp 2ndash13
28 Lee Y Waterman A Avizienis R Cook H Sun C StojanovicV Asanovic K (2014) A 45nm 13 GHz 167 double-precisionGFLOPSW RISC-V processor with vector accelerators InEuropean solid state circuits conference (ESSCIRC) ESSCIRC2014-40th IEEE pp 199ndash202
29 Liu F Ge Q Yarom Y Mckeen F Rozas C Heiser G Lee RB(2016) CATalyst defeating last-level cache side channel attacks incloud computing In 2016 IEEE international symposium on highperformance computer architecture (HPCA) IEEE pp 406ndash418
30 Liu F Lee RB (2014) Random fill cache architecture In2014 47th annual IEEEACM international symposium onmicroarchitecture (MICRO) IEEE pp 203ndash215
31 Liu F Wu H Mai K Lee RB (2016) Newcache secure cachearchitecture thwarting cache side-channel attacks IEEE Micro36(5)8ndash16
32 Luk CK Cohn R Muth R Patil H Klauser A Lowney G WallaceS Reddi VJ Hazelwood K (2005) Pin building customizedprogram analysis tools with dynamic instrumentation In ACMsigplan notices vol 40 ACM pp 190ndash200
33 Masti RJ Rai D Ranganathan A Muller C Thiele L Capkun S(2015) Thermal covert channels on multi-core platforms In 24thUSENIX security symposium (USENIX security 15) pp 865ndash880
34 Osvik DA Shamir A Tromer E (2006) Cache attacks andcountermeasures the case of AES In Cryptographersrsquo track atthe RSA conference Springer pp 1ndash20
35 Patel A Afram F Chen S Ghose K (2011) MARSS Afull system simulator for multicore x86 CPUs In 2011 48thACMEDACIEEE design automation conference (DAC) IEEEpp 1050ndash1055
36 Percival C (2005) Cache missing for fun and profit BSDCan 2005Ottawa 2005 httpwwwdaemonologynetpapershttpdf
37 Hammarlund P Martinez AJ Bajwa AA Hill DL Hallnor EJiang H Dixon M Derr M Hunsaker M Kumar R Osborne RB(2014) Haswell The fourth-generation intel core processor IEEEMicro 1034(2)6ndash20
38 Qureshi MK (2018) CEASER mitigating conflict-based cacheattacks via encrypted-address and remapping In 2018 51Stannual IEEEACM international symposium on microarchitecture(MICRO) IEEE pp 775ndash787
39 Sanchez D Kozyrakis C (2013) ZSIm fast and accuratemicroarchitectural simulation of thousand-core systems In ACMSIGARCH computer architecture news vol 41 ACM pp 475ndash486
40 Schwarz M Schwarzl M Lipp M Gruss D (2018) NetSpectreread arbitrary memory over network In European Symposium onResearch in Computer Security Springer pp 279ndash299
41 Seznec A (1993) A case for two-way skewed-associative cachesACM SIGARCH computer architecture news 21(2)169ndash178
42 Sharkey J Ponomarev D Ghose K (2005) M-sim a flexiblemultithreaded architectural simulation environment Techenicalreport Department of Computer Science State University ofNew York at Binghamton In Technical Report 20012 CompaqComputer Corporation Aug 2001
J Hardw Syst Secur (2019) 3397ndash425424
43 Shivakumar P Jouppi NP (2001) Cacti 30 an integrated cachetiming power and area model In Technical Report 20012Compaq Computer Corporation Aug 2001
44 Taylor G Davies P Farmwald M (1990) The TLB slice-alow-cost high-speed address translation mechanism In 17thannual international symposium on computer architecture 1990Proceedings IEEE pp 355ndash363
45 Thoziyoor S Muralimanohar N Ahn JH Jouppi NP (2008)CACTI 51 Technical Report HPL-2008-20 HP Labs
46 Trippel C Lustig D Martonosi M (2018) MeltdownPrimeand SpectrePrime automatically-synthesized attacks exploitinginvalidation-based coherence protocols arXiv180203802
47 Wang S Wang P Liu X Zhang D Wu D (2017) CacheDidentifying cache-based timing channels in production softwareIn 26th USENIX security symposium USENIX association
48 Wang Y Ferraiuolo A Zhang D Myers AC Suh GE (2016)SecDCP secure dynamic cache partitioning for efficient timingchannel protection In 2016 53nd ACMEDACIEEE designautomation conference (DAC) IEEE pp 1ndash6
49 Wang Z Lee RB (2007) New cache designs for thwarting softwarecache-based side channel attacks In ACM SIGARCH computerarchitecture news vol 35 ACM pp 494ndash505
50 Wang Z Lee RB (2008) A novel cache architecture withenhanced performance and security In 2008 41st IEEEACMinternational symposium on Microarchitecture 2008 MICRO-41IEEE pp 83ndash93
51 Werner M Unterluggauer T Giner L Schwarz M Gruss D Man-gard S (2019) ScatterCache thwarting cache attacks via cache setrandomization In 28th USENIX security symposium (USENIXSecurity 19) USENIX Association Santa Clara CA httpswwwusenixorgconferenceusenixsecurity19presentationwerner
52 Yan M Choi J Skarlatos D Morrison A Fletcher C Torrellas J(2018) InvisiSpec making speculative execution invisible in thecache hierarchy In 2018 51st annual IEEEACM internationalsymposium on microarchitecture (MICRO) IEEE pp 428ndash441
53 Yan M Gopireddy B Shull T Torrellas J (2017) Secure hierarchy-aware cache replacement policy (SHARP) defending againstcache-based side channel attacks In Proceedings of the 44thannual international symposium on computer architecture ACMpp 347ndash360
54 Yao F Doroslovacki M Venkataramani G (2018) Are coherenceprotocol states vulnerable to information leakage In 2018IEEE international symposium on high performance computerarchitecture (HPCA) IEEE pp 168ndash179
55 Yarom Y Falkner K (2014) FLUSH+ RELOAD a high resolutionlow noise L3 cache side-channel attack In USENIX securitysymposium pp 719ndash732
56 Zhang D Askarov A Myers AC (2012) Language-based controland mitigation of timing channels ACM SIGPLAN Not 47(6)99ndash110
57 Zhang D Wang Y Suh GE Myers AC (2015) A hardware designlanguage for timing-sensitive information-flow security In ACMSIGARCH computer architecture news vol 43 ACM pp 503ndash516
58 Zhang S Wright A Bourgeat T Arvind A (2018) Composablebuilding blocks to open up processor design In 2018 51stannual IEEEACM international symposium on microarchitecture(MICRO) IEEE pp 68ndash81
59 Zhang T Lee RB (2014) New models of cache architecturescharacterizing information leakage from cache side channels InProceedings of the 30th annual computer security applicationsconference ACM pp 96ndash105
60 Zhang Y Parikh D Sankaranarayanan K Skadron K Stan M(2003) HotLeakage a temperature-aware model of subthresholdand gate leakage for architects University of Virginia Dept ofComputer Science Tech Report CS-2003 5
Publisherrsquos Note Springer Nature remains neutral with regard tojurisdictional claims in published maps and institutional affiliations
J Hardw Syst Secur (2019) 3397ndash425 425
channels which manifest themselves when the number ofways assigned to the victim and attacker changes egVu Ainv
a Vu (slow) (one type of Flush + Time)vulnerability NoMo cache behaves more carefully whenchanging the number of ways during dynamic partitioninghowever it requires victimrsquos sensitive data to fit intothe assigned partitions otherwise it will be put into theunreserved way and allow eviction by the attacker SecDCPdoes not have unreserved way All the space in the cachewill be either belongs to High or Low partition
Sanctum cache and CATalyst cache are both controlledby a powerful software monitor and they disallow securepage sharing between victim and attacker to preventvulnerabilities such as Ad Vu Aa (fast) (onetype of Flush + Reload [55]) Sanctum cache does notconsider internal interference while CATalyst cache ismore carefully designed to prevent different vulnerabilitieswith the implemented software system so far supportingpreventing all of the vulnerabilities but only works forLLC and with high software implementation complexityand some assumptions that might be hard to achievein other scenarios eg assuming the secure partition isbig enough to fit all the secure data MI6 cache is thecombination of Sanctum and disabling speculation wheninteracting with the outside world Therefore in normalexecution it behaves the same as Sanctum For speculativeexecution because it will simply disable all the speculationwhen involving the outside world the external interferencevulnerability such as Vd Vu Ad (slow) (one type ofEvict + Probe) vulnerability will be prevented
InvisiSpec cache does not modify the original cachestate but places the data in a speculative buffer partitionduring the speculation or out-of-order load or store Sinceduring speculation cache state is not actually updated thespeculative execution cannot trigger any of the steps inthe three-step model RIC cache focuses on eviction basedattack and therefore are good at preventing even someinternal miss-based vulnerability such as Vu Va Vu
(slow) (one type of Bernsteinrsquos Attack [3]) but are bad atall hit-based vulnerabilities PL cache is line-partitionedand uses locking techniques for victimrsquos security criticaldata It can prevent many vulnerabilities because preloadingand locking secure data disallow the attacker or non-securevictim data to set initial states (Step 0) for victim partitionand therefore brings uncertainty to the final observation bythe attacker eg Ad Vu Va (fast) (one type of CacheInternal Collision [5]) vulnerability is prevented
Randomization-Based Caches (columns for SHARP cacheand columns for RP cache to columns for Non Deterministiccache in Tables 4 and 5) inherently de-correlate therelationship between information of victimrsquos securitycritical datarsquos address and observed timing from cache
hit or miss or between the address and observed timingof flush or cache coherence operations For speculativeexecution they also de-correlate the relationship betweenthe address of the data being accessed during speculativeexecution or out-of-order load or store and the observedtiming from a cache hit or miss Randomization can beused when bringing data into the cache evicting data orboth Some designs randomize the address to cache setmapping As a result of the randomization the mutualinformation from the observed timing due to having ornot having data in the cache could be reduced to 0 ifrandomization is done on every memory access Somesecure caches use randomization to avoid many of the miss-based internal interference vulnerabilities However theymay still suffer from hit-based vulnerabilities especiallywhen the vulnerabilities are related to internal interferenceHowever randomization is also likewise recognized toincrease performance overheads [23] It also requires afast and secure random number generator Most of therandomization is cache-line-based and can be combinedwith differentiation of sensitive data to be more efficient
RP cache allows eviction between different sensitivedata which leaves vulnerabilities such as Vu Va Vu
(slow) (one type of Bernsteinrsquos Attack [3]) still possiblewhile Newcache prevents this Both of the RP cacheand Newcache are not able to prevent hit-based internal-interference vulnerabilities such as Ainv
a Vu Va
(fast) (one type of Cache Internal Collision [5]) RandomFill cache is able to use total de-correlation of memoryaccess and cache access of victimrsquos security critical datato prevent most of the internal and external interferenceHowever when security critical data is initially directlyloaded into the cache block for Step 1 Random Fill cachewill not randomly load security critical data and allowsvulnerabilities such as Vu V inv
a Vu (slow) (one typeof Flush + Time) vulnerability to exist CEASER cache usesencryption scheme plus dynamic remapping to randomizemapping from memory addresses to cache sets Howeverthis targets eviction-based attacks and cannot prevent hit-based vulnerabilities such as Va V inv
u V inva (fast)
(one type of Flush + Probe Invalidation) SCATTER cacheencrypts both the cache address and process ID whenmapping into different cache index to further prevent morehit-based vulnerabilities for shared and read only memoryNon Deterministic Cache totally randomizes timing ofcache accesses by adding delays and can prevent all attacks(but at tremendous performance cost)
61 Estimated Performance and Security Tradeoffs
Table 6 shows the implementation and performanceresults of the secure caches as listed by the designersin the different papers At the extreme end there is
J Hardw Syst Secur (2019) 3397ndash425 415
Table6
Exi
stin
gse
cure
cach
esrsquo
impl
emen
tatio
nm
etho
dpe
rfor
man
cep
ower
and
area
sum
mar
y
J Hardw Syst Secur (2019) 3397ndash425416
the Non Deterministic cache with random delay thesecure cache can prevent all the cache timingndashbasedvulnerabilities in some degreemdashwhile their paper reportsonly 7 degradation in performance we expect it tobe much more for more complex application than AESalgorithm Disabling caches eliminates the attacks but ata huge performance cost Normally a secure cache needsto sacrifice some performance in order to de-correlatememory access with the timing The secure caches thattend to be able to prevent more vulnerabilities usuallyhave weaker performance compared with other securecaches For example more security seems to imply lessperformance
62 Towards Ideal Secure Cache
Based on the above analysis a good secure cache shouldconsider all the 72 types of Strong vulnerabilities egexternal and internal interference and hit-based and miss-based vulnerabilities Considering all factors and basedon Tables 4 and 5 we have several suggestions andobservations for a secure cache design which can defendtiming-based attacks
bull Internal interference is important for caches to preventtiming-based attacks and is the weak point of most ofthe secure caches To prevent this the following threesubpoints should be considered
ndash Miss-based internal interference can be solvedby randomly evicting data to de-correlatememory access with timing information wheneither data to be evicted or data to be cachedis sensitive eg Newcache prevents Vu Va Vu (slow) (one type of BernsteinrsquosAttack [3]) vulnerability
ndash Hit-based internal interference can be solvedby randomly bringing data into the cache egRandom Fill cache prevents Ad Vu Va
(fast) (Cache Internal Collision) vulnerabilityndash To limit internal interference at lower perfor-
mance cost rather than simply assume all ofvictimrsquos data is sensitive it is better to differ-entiate real sensitive data from other data in thevictim code However identification of sen-sitive information needs to be carefully usedeg Random Fill cache is vulnerable to Vu Ad Vu (fast) (one type of Evict + Time [34])vulnerability which most of the secure cachesare able to prevent
bull Direct partitioning between the victim and the attackeralthough may hurt cache space utilization or perfor-mance is good at disallowing attacker to set known
initial state to victimrsquos partition and therefore pre-vents external interference Alternatively careful use ofrandomization can also prevent external interference
It should be noted that some cache designs only focuson certain levels eg CATalyst cache only works at thelast level cache In order to fully protect the whole cachesystem from timing-based attacks all levels of caches in thehierarchy should be protected with related security featuresFor example Sanctum is able to prevent all levels of cachesfrom L1 to last-level cache Consequently secure cachedesign needs to be realizable at all levels of the cachehierarchy
7 RelatedWork
There are a lot of existing attacks exploring timing-based cache channels eg [1 3 5 16ndash19 34 36 5455] Furthermore our recent paper [11] has summarizedcache timingndashbased side-channel vulnerabilities using athree-step model and inspired this work on checkingwhich vulnerability types are truly defeated by the securecaches in context of timing-based attacks In other workZhang and Lee [59] used finite-state machine to modelcache architectures and leveraged mutual information tomeasure potential side-channel leakage of the modeledcache architectures Meanwhile He and Lee [20] modeledinterference using probabilistic information flow graph andused attackerrsquos success probability to estimate differentcachesrsquo ability to defend against some cache timingndashbasedside-channel attacks However they did not explore allpossible vulnerabilities due to cache timingndashbased channels
There is also some other work focusing on cacheside-channel verification [13 14 47] Among theseCacheAudit [14] efficiently computes possible side-channelobservations using abstractions in a modular way Bit-level and arithmetic reasoning is used in [13] for memoryaccesses in the presence of dynamic memory allocationCacheD [47] detects potential cache differences at eachprogram point leveraging symbolic execution and constraintsolving
Hardware transactional memory has also been lever-aged to prevent timing-based cache side-channel attacks [815] Hardware transactional memory (HTM) is availableon modern commercial processors such as Intelrsquos Transac-tional Synchronization Extensions (TSX) Its main featureis to abort the transaction and roll back the modificationswhenever a cache block contained in the read set or writeset is evicted out of the cache In [15] HTM was com-bined with preloading strategy for code and data to preventFlush + Reload attacks in the local setting and Prime andProbe attacks in the cloud setting In [8] the software-level
J Hardw Syst Secur (2019) 3397ndash425 417
solution targets system calls page faults code refactoringand abort reasoning to eliminate not only Prime + ProbeFlush + Reload but also Evict + time and Cache Collisionattacks
8 Conclusion
This paper first proposed a new three-step model in orderto model all possible cache timing vulnerabilities It furtherprovided a cache three-step simulator and reduction rulesto derive effective vulnerabilities allowing us to find onesthat have not been exploited in literature With exhaustiveeffective vulnerability types listed this paper presentedanalysis of 18 secure processor cache designs with respectto how well they can defend against these timing-basedvulnerabilities Our work showed that vulnerabilities basedon internal interference of the victim application aredifficult to protect against and many secure cache designsfail in this We also provided a summary of secure processorcache features that could be integrated to make an idealsecure cache that is able to defend timing-based attacksOverall implementing a secure cache in a processor can be aviable alternative to defend timing-based attacks Howeverit requires design of an ideal secure cache or correction ofexisting secure cache designs to eliminate the few attacksthat they do not protect against
Acknowledgments This work was support in part by NSF grants1651945 and 1813797 and through Semiconductor Research Corpo-ration (SRC) under GRC Task 2844001
Appendix A Attack Strategies Descriptions
This appendix gives overview of the attack strategies shownin Tables 2 and 3 in Section 3 For each attack strategy anoverview of the three steps of the strategy is given Some ofthe strategies are similar and some may not be precise butwe keep and use the original names as they were assignedin prior work One advantage of our three-step model isthat it gives precise definition of each attack Neverthelessthe attack strategy names used before (and added by us forstrategies which did not have such names) may be useful torecall the attacksrsquo high-level operation
Cache Internal Collision In Step 1 cache blockrsquos data isinvalidated by flushing or eviction done by either theattacker or the victim Then the victim accesses secret datain Step 2 Finally the victim accesses data at a knownaddress in Step 3 if there is a cache hit then it reveals thatthere is an internal collision and leaks value of u
Flush + Reload In Step 1 either the attacker or the victiminvalidates the cache blockrsquos data by flushing or evictionThen the victim access secret data in Step 2 Finally theattacker tries to access some data in Step 2 using a knownaddress If a cache hit is observed then addresses from lasttwo steps are the same and the attacker learns the secretaddress This strategy has similar Step 1 and Step 2 asCache Internal Collision vulnerability but for Step 3 it isthe attacker who does the reload access
Reload + Time (New Name Assigned in this Paper) In Step
1 secret data is invalidated by the victim Then the attackerdoes some known data access in Step 2 that could possiblybring back the invalidated the victimrsquos secret data in Step
1 In Step 3 if the victim reloads the secret data a cachehit is observed and the attacker can derive the secret datarsquosaddress
Flush + Probe (New Name Assigned in this Paper) In Step
1 the victim or the attacker access some known addressIn Step 2 the victim invalidates secret data In Step 3reloading of Step 1rsquos data and observation of a cache misswill help the attacker learn that the secret data maps to theknown address from Step 1
Evict + Time In Step 1 some victimrsquos secret data is put intothe cache by the victim itself In Step 2 the attacker evicts aspecific cache set by performing a memory related operationthat is not a flush In Step 3 the victim reloads secret dataand if a cache miss is observed the will learn the secretdatarsquos cache set information This attack has similar Step 1and Step 3 as Flush + Time vulnerability but for Step 2 inEvict + Time the attacker invalidates some known addressallowing it to find the full address of the secret data insteadof evicting a cache set to only find the secret datarsquos cacheindex as in the Flush + Time attack
Prime + Probe In Step 1 the attacker primes the cache setusing data at address known to the attacker In Step 2 thevictim accesses the secret data which possibly evicts datafrom Step 1 In Step 3 the attacker probes each cache setand if a cache miss is observed the attacker knowns thesecret data maps to the cache set he or she primed
Bernsteinrsquos Attack This attack strategy leverages the vic-timrsquos internal interference to trigger the miss-based attackFor one case the victim does the same secret data access inStep 1 and Step 3 while in Step 2 the victim tries to evictone whole cache setrsquos data by known data accesses If cachemiss is observed in Step 3 that will tell the attacker thecache set is the one secret data maps to For another casethe victim primes and probe a cache set in Step 1 and Step
3 driven by the attacker while in Step 2 the victim tries to
J Hardw Syst Secur (2019) 3397ndash425418
access the secret data Similar to the first case observingcache miss in Step 3 tells the attacker the cache set is theone secret data maps to
Evict + Probe (NewName Assigned in this Paper) In Step 1victim evict the cache set using the access to a data at anaddress known to the attacker In Step 2 the victim accessessecret data which possibly evicts data from Step 1 In Step
3 the attacker probes each cache set using the same data asin Step 1 if a cache miss is observed the attacker knownsthe secret data maps to the cache set he or she primed Thisattack strategy has similar Step 2 and Step 3 as Prime +Probe attack but for Step 1 it is the victim that does theeviction accesses
Prime + Time (NewName Assigned in this Paper) In Step 1the attacker primes the cache set using access to data at anaddress known to the attacker In Step 2 the victim accessessecret data which possibly evicts data from Step 1 In Step
3 the victim probes each cache set using the same data Step
1 if a cache miss is observed the attacker knowns the secretdata maps to the cache set he or she primed in Step 1 Thisattack strategy has similar Step 1 and Step 2 as Prime +Probe attack but for Step 3 it is the victim that does theprobing accesses
Flush + Time (NewNameAssigned in this Paper) The victimaccesses the same secret data in Step 1 and Step 3 whilein Step 2 the attacker tries to invalidate data at a knownaddress If cache miss is observed in Step 3 that will tellthe attacker the data address he or she invalidated in Step 2maps to the secret data
Invalidation related (new names assigned in this paper)Vulnerabilities that have names ending with ldquoinvalidationrdquoin Table 3 correspond to the vulnerabilities that havethe same name (except for the ldquoinvalidationrdquo part) inTable 2 The difference between each set of correspondingvulnerabilities is that the vulnerabilities ending withldquoinvalidationrdquo use invalidation related operation in the laststep to derive the timing information rather than the normalmemory access related operations
Appendix B Soundness Analysisof the Three-StepModel
In this section we analyze the soundness of the three-stepmodel to demonstrate that the three-step model can coverall possible timing-based cache vulnerabilities in normalcaches If there is a vulnerability that is represented usingmore than three steps the steps can be reduced to only three
steps or a three-step sub-pattern can be found in the longerrepresentation
In the below analysis we use β to denote the numberof memory-related operations ie steps in a representationof a vulnerability We show that β = 1 is not sufficient torepresent a vulnerability β = 2 covers some vulnerabilitiesbut not all β = 3 represents all the vulnerabilities andβ gt 3 can be reduced to only three steps or a three-step sub-pattern can be found in the longer representation Knownaddresses refer to all the cache states that interferencewith the data a aalias and d Unknown address refersu An access to a known memory address is denoted asknown access operation and an invalidation of a knownmemory address is denoted as known inv operationThe known access operation and known inv operation
together make up not u operations An unknown memoryrelated operation (containing u) is denoted as u operation
B1 Patterns with β = 1
When β = 1 there is only one memory-relatedoperation and it is not possible to create interferencebetween memory-related operations since two memory-related operations are the minimum requirement for aninterference Furthermore β = 1 corresponds to the three-step pattern with both Step 1 and Step 2 being since thecache state gives no information and Step 3 being theone operation These types of patterns are all examined bythe cache three-step simulator and none of these types arefound to be effective Consequently a vulnerability cannotexit when β = 1
B2 Patterns with β = 2
When β = 2 it satisfies the minimum requirementof an interference for memory related operations andcorresponds to the three-step cases where Step 1 is and Step 2 and Step 3 are the two operations Thesetypes are all examined by the cache three-step simulatorand some of them belong to Weak Vulnerabilities like Aa Vu Therefore three-step cases where Step
1 is have corresponding effective vulnerabilities shownin Table 2 Consequently β = 2 can represent someweak vulnerabilities but not all vulnerabilities as there existsome that are represented with three steps as discussednext
B3 Patterns with β = 3
When β = 3 we have tested all possible combinations ofthree-step memory related operations in Section 33 usingour cache simulator for the three-step model We foundthat there are in total 72 types of Strong Vulnerabilities
J Hardw Syst Secur (2019) 3397ndash425 419
and 64 types of Weak Vulnerabilities that are representedby patterns with β = 3 steps Consequently β = 3can represent all the vulnerabilities (including some weakones where Step 1 is ) Using more steps to representvulnerabilities is not necessary as discussed next
B4 Patterns with β gt 3
When β gt 3 the pattern of memory-related operations fora vulnerability can be reduced using the following rules
B41 Subdivision Rules
First a set of subdivision rules is used to divide the longpattern into shorter patterns following the below rulesEach subdivision rule should be applied recursively beforeapplying the next rule
Subdivision Rule 1 If the longer pattern contains a sub-pattern such as the longer pattern canbe divided into two separate patterns where is assignedas Step 1 of the second pattern This is because givesno timing information and the attacker loses track of thecache state after This rule should be recursively applieduntil there are no sub-patterns left with a in the middleor as last step ( in the last step will be deleted) in thelonger pattern
Subdivision Rule 2 Next if a pattern (derived afterrecursive application of the Rule 1 contains a sub-patternsuch as AinvVinv the longer pattern canbe divided into two separate patterns where AinvVinv isassigned as Step 1 of the second pattern This is becauseAinvVinv will flush all the timing information of thecurrent block and it can be used as the flushing step forStep 1 eg vulnerability Ainv Vu Aa(f ast)shown in Table 2 AinvVinv cannot be a candidate formiddle steps or the last step because it will flush alltiming information making the attacker unable to deducethe final timing with victimrsquos sensitive address translationinformation This rule should be recursively applied untilthere are no sub-patterns left with a AinvVinv in themiddle or the last step (AinvVinv in the last step will bedeleted)
B42 Simplification Rules
For each of the patterns resulting from the subdivisionof the original pattern we define Commute Rules UnionRules and Reduction Rules for a each set of two adjacentsteps in these remaining patterns In Table 7 we showall the possible cases of the rule applying conditions for
each adjacent two steps regardless of the attackerrsquos access(A) or the victimrsquos access (V ) The table shows whetherthe corresponding two steps can be commuted reduced orunioned (and the reduced or the unioned result if the rulescan be applied)
B421 Commute Rules
Suppose there are two adjacent steps M and N for a memorysequences M N If commuting M andN lead to the same observation result ie M N and N M will have thesame timing observation information in the final step for theattacker we can freely exchange the place of M and N inthis pattern In this case we have more chance to Reduceand Union the steps within the memory sequence by thefollowing Rules In the possible commuting process we willtry every possible combinations to commute different pairsof two steps that are able to apply the Commute Rules andthen further apply Reduce Rules and Union Rules to seewhether the commute is effective ie there can be stepsreduced or unioned after the proper commuting process Thefollowing two adjacent memory-related operations can becommuted
ndash Commute Rule 1 For two adjacent steps if one stepis a known access operation and another step is aknown inv operation and the addresses they referto are different these two steps can be commuted nomatter which position of the two steps they are in withinthe whole memory sequence It will show a ldquoyesrdquo forthe corresponding two-step pattern for the CommuteRule 1 column if these two can be commuted in Table 7
ndash Commute Rule 2 A superset of two-step patterns thatcan apply Commute Rule 1 can be commuted if thesecond step of these two adjacent steps is not thelast step in the whole memory sequence There aresome two adjacent steps that can only be commutedif the second step of these two adjacent steps is notthe last step in the whole memory sequence Therewill be a ldquoyesrdquo for the corresponding two-step patternfor the Commute Rule 2 column and a ldquonordquo for thecorresponding two-step pattern for the Commute Rule 1column in Table 7
B422 Reduction Rules
If the memory sequence after applying Commute Ruleshave a sub-pattern that has two adjacent steps both relatedto known addresses or both related to unknown address(including repeating states) the two adjacent steps can bereduced to only one following the reduction rules (if thetwo-step pattern has ldquoyesrdquo for the Column ldquoUnion Rule or
J Hardw Syst Secur (2019) 3397ndash425420
Table7
Rul
esfo
rco
mbi
ning
two
adja
cent
step
s
Firs
tSe
cond
Com
mute
Com
mute
Union
Rule
orC
ombi
ned
Firs
tSe
cond
Com
mute
Com
mute
Union
Rule
orC
ombi
ned
Rule1
Rule2
ReduceRule
Step
Rule1
Rule2
ReduceRule
Step
aa
yes
yes
yes
aa
inv
ano
noye
sa
aa
ali
as
nono
yes
aali
as
ain
va
ali
as
yes
yes
yes
aali
as
ad
nono
yes
da
inv
dye
sye
sye
sd
au
nono
nominus
ain
vu
nono
nominus
aa
inv
nono
yes
ain
va
inv
ain
vye
sye
sye
sa
inv
aa
ali
asin
vye
sye
sye
sa
ain
va
ali
asin
vye
sye
sye
sU
nio
n(a
inva
ali
asin
v)
ad
inv
yes
yes
yes
aa
inv
din
vye
sye
sye
sU
nio
n(a
invd
inv)
au
inv
nono
nominus
ain
vu
inv
noye
sno
minusa
ali
as
ano
noye
sa
aali
asin
va
yes
yes
yes
a
aali
as
aali
as
yes
yes
yes
aali
as
aali
asin
va
ali
as
nono
yes
aali
as
aali
as
dno
noye
sd
aali
asin
vd
yes
yes
yes
d
aali
as
uno
nono
minusa
ali
asin
vu
nono
nominus
aali
as
ain
vye
sye
sye
sa
ali
as
aali
asin
va
inv
yes
yes
yes
Unio
n(a
inva
ali
asin
v)
aali
as
aali
asin
vno
noye
sa
ali
asin
va
ali
asin
va
ali
asin
vye
sye
sye
sa
ali
asin
v
aali
as
din
vye
sye
sye
sa
ali
as
aali
asin
vd
inv
yes
yes
yes
Unio
n(d
inva
ali
asin
v)
aali
as
uin
vno
nono
minusa
ali
asin
vu
inv
noye
sno
minusd
ano
noye
sa
din
va
yes
yes
yes
a
da
ali
as
nono
yes
aali
as
din
va
ali
as
yes
yes
yes
aali
as
dd
yes
yes
yes
dd
inv
dno
noye
sd
du
nono
nominus
din
vu
noye
sno
minusd
ain
vye
sye
sye
sd
din
va
inv
yes
yes
yes
Unio
n(a
invd
inv)
da
ali
asin
vye
sye
sye
sd
din
va
ali
asin
vye
sye
sye
sU
nio
n(d
inva
ali
asin
v)
dd
inv
nono
yes
din
vd
inv
din
vye
sye
sye
sd
inv
du
inv
noye
sno
minusd
inv
uin
vno
yes
nominus
ua
nono
nominus
uin
va
nono
nominus
ua
ali
as
nono
nominus
uin
va
ali
as
nono
nominus
ud
nono
nominus
uin
vd
noye
sno
minusu
uye
sye
sye
su
uin
vu
nono
yes
u
ua
inv
nono
nominus
uin
va
inv
noye
sno
minusu
aali
asin
vno
nono
minusu
inv
aali
asin
vno
yes
nominus
ud
inv
noye
sno
minusu
inv
din
vno
yes
nominus
uu
inv
nono
yes
uin
vu
inv
uin
vye
sye
sye
su
inv
J Hardw Syst Secur (2019) 3397ndash425 421
Reduce Rulerdquo and has no Union result for the ldquoCombinedSteprdquo Column in Table 7
ndash Reduction Rule 1 For two u operations although u isunknown both of the accesses target on the same u socan be reduced to only keep the second access in thememory sequence
ndash Reduction Rule 2 For two known adjacent memoryaccessndashrelated operations (known access operation)they always result in a deterministic state of the secondmemory accessndashrelated cache block so these two stepscan be reduced to only one step
ndash Reduction Rule 3 For two adjacent steps if onestep is known access operation and another one isknown inv operation no matter what order theyhave and the address they refer to is the same these twocan be reduced to one step which is the second step
B423 Union Rules
Suppose there are two adjacent steps M and N for memorysequences M N If combining M andN leads to the same timing observation result ie M N and Union(M N) will havethe same timing observation information in the final step forthe attacker we can combine step M and N to be a joint onestep for this memory sequence defined as Union(M N)Two adjacent steps that can be combined are discussed inthe following cases
ndash Union Rule 1 Two invalidations to two knowndifferent memory addresses can be applied UnionRule 1 known inv operation are two operations bothinvalidating some known address therefore they canbe combined to only one step The Union Rule can becontinuously done to union all the adjacent invalidationstep that invalidates known different memory addresses
B424 Final Check Rules
Each long memory sequence will recursively apply thesethree categorizations of the rules in the order Com-mute Rules first to put known access operations andknown inv operation that targets the same address as nearas possible and u operations and not u operations areputting together as much as possible The Reduced Rulesare then checked and applied to the processed memorysequence to reduce the steps Then the Union Rule isapplied to the processed memory sequence
The recursion at each application to these threecategorizations of the rules should be always applied andreduce at least one step until the resulting sequence matchesone of the two possible cases
bull the long (β gt 3) memory sequence with u operation
and not u operation is further reduced to a sequencewhere there are at most three steps in the followingpatterns or less
ndash u operation not u operation u operation
ndash not u operation u operation not u operation
There might be possible extra or AinvV inv beforethese three-step pattern where
bull An extra in the first step will not influencethe result and can be directly removed
bull If an extra AinvV inv in the first step
ndash If followed by known access ope-ration AinvV inv can be removeddue to the actual state further put intothe cache block
ndash If followed by known inv opera-t ion or V inv
u AinvV inv can also beremoved since the memory locationis repeatedly flushed by the two steps
ndash If followed by Vu worst case will beAinvV inv Vu not u operation u operationwhich is either an effective vul-nerability according to Table 2and reduction rules shown inSection 33 or AinvV inv Vu Ainv
d V invd u operation where
Vu Ainvd V inv
d can further beapplied Commute Rule 2 to reduceand be within three steps
In this case the steps are finally within three steps andthe checking is done
bull There exist two adjacent steps that cannot be appliedany Rules above and requires the Rest Checking
The only left two adjacent steps that cannot be appliedby any of the three categorizations of the Rules are thefollowing
ndash AaVaAaalias Vaalias AdVdAinva V inv
a
Ainvaalias V inv
aalias Vu ndash AaVaAaalias Vaalias V inv
u ndash Vu AaVaAaalias Vaalias AdVd
Ainva V inv
a Ainvaalias V inv
aalias ndash V inv
u AaVaAaalias Vaalias We manually checked all of the two adjacent step patterns
above and found that adding extra step before or after these
J Hardw Syst Secur (2019) 3397ndash425422
Algorithm 2 -Step ( 3) pattern reduction
Input number of steps of the pattern
a two-dimensional dynamic-size array [0] contains the states of each step of the original pattern in
order [1] [2] are empty initially
Output reduced effective vulnerability pattern(s) array It will be an empty list if the original pattern does not
correspond to an effective vulnerability
1 = Oslash
2 while contain( ) and index not 0 do3 = 1 ( )
4 end while5 while ( contain( ) and index not 0) or ( contain( ) and index not 0) do6 = 2 ( )
7 end while8 while ( is ineffecitve or has interval effective three steps) do9 = ( )
10 = ( )
11 = ( )
12 if ( is ineffecitve or has interval effective three steps) then13 += ( )
14 end if15 end while16 return
two steps can either generate two adjacent step patterns thatbe processed by the three Rules where further step can bereduced or construct effective vulnerability according toTable 2 and reduction rules shown in Section 33 wherethe corresponding pattern can be treated effective and thechecking is done
B43 Algorithm for Reducing and CheckingMemorySequence
Algorithm 2 is used to (i) reduce a β-step (β gt 3)pattern to a three-step pattern thus demonstrating that thecorresponding β gt 3 step pattern actually is equivalent tothe output three-step pattern and represents a vulnerabilitythat is captured by an existing three-step pattern or (ii)demonstrate that the β-step pattern can be mapped to one ormore three-step vulnerabilities It is not possible for a β-stepvulnerability pattern to not be either (i) or (ii) after doing theRule applications Key outcome of our analysis is that anyβ-step pattern is not a vulnerability or if it is a vulnerabilityit maps to either outputs (i) or (ii) of the algorithm
Inside the Algorithm 2 contain() represents a function tocheck if a list contains a corresponding state is ineffective()represents a function that checks the corresponding memorysequence does not contain any effective three-stepshas interval effective three steps() represents a functionthat check if the corresponding memory sequence can bemapped to one or more three-step vulnerabilities
B44 Summary
In conclusion the three-step model can model all possibletiming-based cache vulnerability in normal caches Vulner-abilities which are represented by more than three steps canbe always reduced to one (or more) vulnerabilities from ourthree-step model and thus using more than three step is notnecessary
References
1 Acıicmez O Koc CK (2006) Trace-driven cache attacks on AES(short paper) In International conference on information andcommunications security Springer pp 112ndash121
2 Agrawal D Archambeault B Rao JR Rohatgi P (2002) TheEM side-channel (s) In International workshop on cryptographichardware and embedded systems Springer pp 29ndash45
3 Bernstein DJ (2005) Cache-timing attacks on AES4 Binkert N Beckmann B Black G Reinhardt SK Saidi A Basu
A Hestness J Hower DR Krishna T Sardashti S et al (2011)The gem5 simulator ACM SIGARCH computer architecture news39(2)1ndash7
5 Bonneau J Mironov I (2006) Cache-collision timing attacksagainst AES In International workshop on cryptographichardware and embedded systems Springer pp 201ndash215
6 Borghoff J Canteaut A Guneysu T Kavun EB Knezevic MKnudsen LR Leander G Nikov V Paar C Rechberger Cet al (2012) Princendasha low-latency block cipher for pervasivecomputing applications In International conference on the theoryand application of cryptology and information security Springerpp 208ndash225
J Hardw Syst Secur (2019) 3397ndash425 423
7 Bourgeat T Lebedev I Wright A Zhang S Devadas S et al (2019)MI6 Secure enclaves in a speculative out-of-order processorIn Proceedings of the 52nd Annual IEEEACM InternationalSymposium on Microarchitecture ACM pp 42ndash56
8 Chen S Liu F Mi Z Zhang Y Lee RB Chen H Wang X(2018) Leveraging hardware transactional memory for cache side-channel defenses In Proceedings of the 2018 on Asia conferenceon computer and communications security ACM pp 601ndash608
9 Clark SS Ransford B Rahmati A Guineau S Sorber J Xu WFu K (2013) Wattsupdoc power side channels to nonintrusivelydiscover untargeted malware on embedded medical devices InPresented as part of the 2013 USENIX workshop on healthinformation technologies
10 Costan V Lebedev IA Devadas S (2016) Sanctum minimalhardware extensions for strong software isolation In USENIXsecurity symposium pp 857ndash874
11 Deng S Xiong W Szefer J (2018) Cache timing side-channel vulnerability checking with computation tree logic InProceedings of the 7th international workshop on hardware andarchitectural support for security and privacy ACM p 2
12 Domnitser L Jaleel A Loew J Abu-Ghazaleh N Ponomarev D(2012) Non-monopolizable caches low-complexity mitigation ofcache side channel attacks ACM Transactions on Architectureand Code Optimization (TACO) 8(4)35
13 Doychev G Kopf B (2017) Rigorous analysis of softwarecountermeasures against cache attacks In Proceedings of the 38thACM SIGPLAN conference on programming language design andimplementation ACM pp 406ndash421
14 Doychev G Kopf B Mauborgne L Reineke J (2015) CacheAudita tool for the static analysis of cache side channels ACM Trans-actions on information and system security (TISSEC) 18(1)4
15 Gruss D Lettner J Schuster F Ohrimenko O Haller I CostaM (2017) Strong and efficient cache side-channel protectionusing hardware transactional memory In USENIX securitysymposium pp 217ndash233
16 Gruss D Maurice C Wagner K Mangard S (2016) Flush+ flusha fast and stealthy cache attack In International conference ondetection of intrusions and malware and vulnerability assessmentSpringer pp 279ndash299
17 Gruss D Spreitzer R Mangard S (2015) Cache template attacksautomating attacks on inclusive last-level caches In USENIXsecurity symposium pp 897ndash912
18 Guanciale R Nemati H Baumann C Dam M (2016) Cache stor-age channels alias-driven attacks and verified countermeasuresIn 2016 IEEE symposium on security and privacy (SP) IEEEpp 38ndash55
19 Gullasch D Bangerter E Krenn S (2011) Cache gamesndashbringingaccess-based cache attacks on AES to practice In 2011 IEEEsymposium on security and privacy (SP) IEEE pp 490ndash505
20 He Z Lee RB (2017) How secure is your cache against side-channel attacks In Proceedings of the 50th annual IEEEACMinternational symposium on microarchitecture ACM pp 341ndash353
21 Intel C (2015) Improving real-time performance by utilizing cacheallocation technology Intel Corporation
22 Kayaalp M Khasawneh KN Esfeden HA Elwell J Abu-Ghazaleh N Ponomarev D Jaleel A (2017) RIC relaxed inclusioncaches for mitigating LLC side-channel attacks In 2017 54thACMEDACIEEE design automation conference (DAC) IEEEpp 1ndash6
23 Keramidas G Antonopoulos A Serpanos DN Kaxiras S (2008)Non deterministic caches a simple and effective defense againstside channel attacks Des Autom Embed Syst 12(3)221ndash230
24 Kessler RE Hill MD (1992) Page placement algorithms for largerealindexed caches ACM Trans Comput Syst (TOCS) 10(4)338ndash359
25 Kiriansky V Lebedev I Amarasinghe S Devadas S Emer J(2018) DAWG a defense against cache timing attacks in spec-ulative execution processors In 2018 51st Annual IEEEACMinternational symposium on microarchitecture (MICRO) IEEEpp 974ndash987
26 Kocher P Genkin D Gruss D Haas W Hamburg M LippM Mangard S Prescher T Schwarz M Yarom Y (2018)Spectre attacks exploiting speculativeexecution In 2019 IEEESymposium on Security and Privacy (SP) IEEE pp 1ndash19
27 Lee RB Kwan P McGregor JP Dwoskin J Wang Z (2005)Architecture for protecting critical secrets in microprocessorsIn ACM SIGARCH computer architecture news vol 33 IEEEComputer Society pp 2ndash13
28 Lee Y Waterman A Avizienis R Cook H Sun C StojanovicV Asanovic K (2014) A 45nm 13 GHz 167 double-precisionGFLOPSW RISC-V processor with vector accelerators InEuropean solid state circuits conference (ESSCIRC) ESSCIRC2014-40th IEEE pp 199ndash202
29 Liu F Ge Q Yarom Y Mckeen F Rozas C Heiser G Lee RB(2016) CATalyst defeating last-level cache side channel attacks incloud computing In 2016 IEEE international symposium on highperformance computer architecture (HPCA) IEEE pp 406ndash418
30 Liu F Lee RB (2014) Random fill cache architecture In2014 47th annual IEEEACM international symposium onmicroarchitecture (MICRO) IEEE pp 203ndash215
31 Liu F Wu H Mai K Lee RB (2016) Newcache secure cachearchitecture thwarting cache side-channel attacks IEEE Micro36(5)8ndash16
32 Luk CK Cohn R Muth R Patil H Klauser A Lowney G WallaceS Reddi VJ Hazelwood K (2005) Pin building customizedprogram analysis tools with dynamic instrumentation In ACMsigplan notices vol 40 ACM pp 190ndash200
33 Masti RJ Rai D Ranganathan A Muller C Thiele L Capkun S(2015) Thermal covert channels on multi-core platforms In 24thUSENIX security symposium (USENIX security 15) pp 865ndash880
34 Osvik DA Shamir A Tromer E (2006) Cache attacks andcountermeasures the case of AES In Cryptographersrsquo track atthe RSA conference Springer pp 1ndash20
35 Patel A Afram F Chen S Ghose K (2011) MARSS Afull system simulator for multicore x86 CPUs In 2011 48thACMEDACIEEE design automation conference (DAC) IEEEpp 1050ndash1055
36 Percival C (2005) Cache missing for fun and profit BSDCan 2005Ottawa 2005 httpwwwdaemonologynetpapershttpdf
37 Hammarlund P Martinez AJ Bajwa AA Hill DL Hallnor EJiang H Dixon M Derr M Hunsaker M Kumar R Osborne RB(2014) Haswell The fourth-generation intel core processor IEEEMicro 1034(2)6ndash20
38 Qureshi MK (2018) CEASER mitigating conflict-based cacheattacks via encrypted-address and remapping In 2018 51Stannual IEEEACM international symposium on microarchitecture(MICRO) IEEE pp 775ndash787
39 Sanchez D Kozyrakis C (2013) ZSIm fast and accuratemicroarchitectural simulation of thousand-core systems In ACMSIGARCH computer architecture news vol 41 ACM pp 475ndash486
40 Schwarz M Schwarzl M Lipp M Gruss D (2018) NetSpectreread arbitrary memory over network In European Symposium onResearch in Computer Security Springer pp 279ndash299
41 Seznec A (1993) A case for two-way skewed-associative cachesACM SIGARCH computer architecture news 21(2)169ndash178
42 Sharkey J Ponomarev D Ghose K (2005) M-sim a flexiblemultithreaded architectural simulation environment Techenicalreport Department of Computer Science State University ofNew York at Binghamton In Technical Report 20012 CompaqComputer Corporation Aug 2001
J Hardw Syst Secur (2019) 3397ndash425424
43 Shivakumar P Jouppi NP (2001) Cacti 30 an integrated cachetiming power and area model In Technical Report 20012Compaq Computer Corporation Aug 2001
44 Taylor G Davies P Farmwald M (1990) The TLB slice-alow-cost high-speed address translation mechanism In 17thannual international symposium on computer architecture 1990Proceedings IEEE pp 355ndash363
45 Thoziyoor S Muralimanohar N Ahn JH Jouppi NP (2008)CACTI 51 Technical Report HPL-2008-20 HP Labs
46 Trippel C Lustig D Martonosi M (2018) MeltdownPrimeand SpectrePrime automatically-synthesized attacks exploitinginvalidation-based coherence protocols arXiv180203802
47 Wang S Wang P Liu X Zhang D Wu D (2017) CacheDidentifying cache-based timing channels in production softwareIn 26th USENIX security symposium USENIX association
48 Wang Y Ferraiuolo A Zhang D Myers AC Suh GE (2016)SecDCP secure dynamic cache partitioning for efficient timingchannel protection In 2016 53nd ACMEDACIEEE designautomation conference (DAC) IEEE pp 1ndash6
49 Wang Z Lee RB (2007) New cache designs for thwarting softwarecache-based side channel attacks In ACM SIGARCH computerarchitecture news vol 35 ACM pp 494ndash505
50 Wang Z Lee RB (2008) A novel cache architecture withenhanced performance and security In 2008 41st IEEEACMinternational symposium on Microarchitecture 2008 MICRO-41IEEE pp 83ndash93
51 Werner M Unterluggauer T Giner L Schwarz M Gruss D Man-gard S (2019) ScatterCache thwarting cache attacks via cache setrandomization In 28th USENIX security symposium (USENIXSecurity 19) USENIX Association Santa Clara CA httpswwwusenixorgconferenceusenixsecurity19presentationwerner
52 Yan M Choi J Skarlatos D Morrison A Fletcher C Torrellas J(2018) InvisiSpec making speculative execution invisible in thecache hierarchy In 2018 51st annual IEEEACM internationalsymposium on microarchitecture (MICRO) IEEE pp 428ndash441
53 Yan M Gopireddy B Shull T Torrellas J (2017) Secure hierarchy-aware cache replacement policy (SHARP) defending againstcache-based side channel attacks In Proceedings of the 44thannual international symposium on computer architecture ACMpp 347ndash360
54 Yao F Doroslovacki M Venkataramani G (2018) Are coherenceprotocol states vulnerable to information leakage In 2018IEEE international symposium on high performance computerarchitecture (HPCA) IEEE pp 168ndash179
55 Yarom Y Falkner K (2014) FLUSH+ RELOAD a high resolutionlow noise L3 cache side-channel attack In USENIX securitysymposium pp 719ndash732
56 Zhang D Askarov A Myers AC (2012) Language-based controland mitigation of timing channels ACM SIGPLAN Not 47(6)99ndash110
57 Zhang D Wang Y Suh GE Myers AC (2015) A hardware designlanguage for timing-sensitive information-flow security In ACMSIGARCH computer architecture news vol 43 ACM pp 503ndash516
58 Zhang S Wright A Bourgeat T Arvind A (2018) Composablebuilding blocks to open up processor design In 2018 51stannual IEEEACM international symposium on microarchitecture(MICRO) IEEE pp 68ndash81
59 Zhang T Lee RB (2014) New models of cache architecturescharacterizing information leakage from cache side channels InProceedings of the 30th annual computer security applicationsconference ACM pp 96ndash105
60 Zhang Y Parikh D Sankaranarayanan K Skadron K Stan M(2003) HotLeakage a temperature-aware model of subthresholdand gate leakage for architects University of Virginia Dept ofComputer Science Tech Report CS-2003 5
Publisherrsquos Note Springer Nature remains neutral with regard tojurisdictional claims in published maps and institutional affiliations
J Hardw Syst Secur (2019) 3397ndash425 425
Table6
Exi
stin
gse
cure
cach
esrsquo
impl
emen
tatio
nm
etho
dpe
rfor
man
cep
ower
and
area
sum
mar
y
J Hardw Syst Secur (2019) 3397ndash425416
the Non Deterministic cache with random delay thesecure cache can prevent all the cache timingndashbasedvulnerabilities in some degreemdashwhile their paper reportsonly 7 degradation in performance we expect it tobe much more for more complex application than AESalgorithm Disabling caches eliminates the attacks but ata huge performance cost Normally a secure cache needsto sacrifice some performance in order to de-correlatememory access with the timing The secure caches thattend to be able to prevent more vulnerabilities usuallyhave weaker performance compared with other securecaches For example more security seems to imply lessperformance
62 Towards Ideal Secure Cache
Based on the above analysis a good secure cache shouldconsider all the 72 types of Strong vulnerabilities egexternal and internal interference and hit-based and miss-based vulnerabilities Considering all factors and basedon Tables 4 and 5 we have several suggestions andobservations for a secure cache design which can defendtiming-based attacks
bull Internal interference is important for caches to preventtiming-based attacks and is the weak point of most ofthe secure caches To prevent this the following threesubpoints should be considered
ndash Miss-based internal interference can be solvedby randomly evicting data to de-correlatememory access with timing information wheneither data to be evicted or data to be cachedis sensitive eg Newcache prevents Vu Va Vu (slow) (one type of BernsteinrsquosAttack [3]) vulnerability
ndash Hit-based internal interference can be solvedby randomly bringing data into the cache egRandom Fill cache prevents Ad Vu Va
(fast) (Cache Internal Collision) vulnerabilityndash To limit internal interference at lower perfor-
mance cost rather than simply assume all ofvictimrsquos data is sensitive it is better to differ-entiate real sensitive data from other data in thevictim code However identification of sen-sitive information needs to be carefully usedeg Random Fill cache is vulnerable to Vu Ad Vu (fast) (one type of Evict + Time [34])vulnerability which most of the secure cachesare able to prevent
bull Direct partitioning between the victim and the attackeralthough may hurt cache space utilization or perfor-mance is good at disallowing attacker to set known
initial state to victimrsquos partition and therefore pre-vents external interference Alternatively careful use ofrandomization can also prevent external interference
It should be noted that some cache designs only focuson certain levels eg CATalyst cache only works at thelast level cache In order to fully protect the whole cachesystem from timing-based attacks all levels of caches in thehierarchy should be protected with related security featuresFor example Sanctum is able to prevent all levels of cachesfrom L1 to last-level cache Consequently secure cachedesign needs to be realizable at all levels of the cachehierarchy
7 RelatedWork
There are a lot of existing attacks exploring timing-based cache channels eg [1 3 5 16ndash19 34 36 5455] Furthermore our recent paper [11] has summarizedcache timingndashbased side-channel vulnerabilities using athree-step model and inspired this work on checkingwhich vulnerability types are truly defeated by the securecaches in context of timing-based attacks In other workZhang and Lee [59] used finite-state machine to modelcache architectures and leveraged mutual information tomeasure potential side-channel leakage of the modeledcache architectures Meanwhile He and Lee [20] modeledinterference using probabilistic information flow graph andused attackerrsquos success probability to estimate differentcachesrsquo ability to defend against some cache timingndashbasedside-channel attacks However they did not explore allpossible vulnerabilities due to cache timingndashbased channels
There is also some other work focusing on cacheside-channel verification [13 14 47] Among theseCacheAudit [14] efficiently computes possible side-channelobservations using abstractions in a modular way Bit-level and arithmetic reasoning is used in [13] for memoryaccesses in the presence of dynamic memory allocationCacheD [47] detects potential cache differences at eachprogram point leveraging symbolic execution and constraintsolving
Hardware transactional memory has also been lever-aged to prevent timing-based cache side-channel attacks [815] Hardware transactional memory (HTM) is availableon modern commercial processors such as Intelrsquos Transac-tional Synchronization Extensions (TSX) Its main featureis to abort the transaction and roll back the modificationswhenever a cache block contained in the read set or writeset is evicted out of the cache In [15] HTM was com-bined with preloading strategy for code and data to preventFlush + Reload attacks in the local setting and Prime andProbe attacks in the cloud setting In [8] the software-level
J Hardw Syst Secur (2019) 3397ndash425 417
solution targets system calls page faults code refactoringand abort reasoning to eliminate not only Prime + ProbeFlush + Reload but also Evict + time and Cache Collisionattacks
8 Conclusion
This paper first proposed a new three-step model in orderto model all possible cache timing vulnerabilities It furtherprovided a cache three-step simulator and reduction rulesto derive effective vulnerabilities allowing us to find onesthat have not been exploited in literature With exhaustiveeffective vulnerability types listed this paper presentedanalysis of 18 secure processor cache designs with respectto how well they can defend against these timing-basedvulnerabilities Our work showed that vulnerabilities basedon internal interference of the victim application aredifficult to protect against and many secure cache designsfail in this We also provided a summary of secure processorcache features that could be integrated to make an idealsecure cache that is able to defend timing-based attacksOverall implementing a secure cache in a processor can be aviable alternative to defend timing-based attacks Howeverit requires design of an ideal secure cache or correction ofexisting secure cache designs to eliminate the few attacksthat they do not protect against
Acknowledgments This work was support in part by NSF grants1651945 and 1813797 and through Semiconductor Research Corpo-ration (SRC) under GRC Task 2844001
Appendix A Attack Strategies Descriptions
This appendix gives overview of the attack strategies shownin Tables 2 and 3 in Section 3 For each attack strategy anoverview of the three steps of the strategy is given Some ofthe strategies are similar and some may not be precise butwe keep and use the original names as they were assignedin prior work One advantage of our three-step model isthat it gives precise definition of each attack Neverthelessthe attack strategy names used before (and added by us forstrategies which did not have such names) may be useful torecall the attacksrsquo high-level operation
Cache Internal Collision In Step 1 cache blockrsquos data isinvalidated by flushing or eviction done by either theattacker or the victim Then the victim accesses secret datain Step 2 Finally the victim accesses data at a knownaddress in Step 3 if there is a cache hit then it reveals thatthere is an internal collision and leaks value of u
Flush + Reload In Step 1 either the attacker or the victiminvalidates the cache blockrsquos data by flushing or evictionThen the victim access secret data in Step 2 Finally theattacker tries to access some data in Step 2 using a knownaddress If a cache hit is observed then addresses from lasttwo steps are the same and the attacker learns the secretaddress This strategy has similar Step 1 and Step 2 asCache Internal Collision vulnerability but for Step 3 it isthe attacker who does the reload access
Reload + Time (New Name Assigned in this Paper) In Step
1 secret data is invalidated by the victim Then the attackerdoes some known data access in Step 2 that could possiblybring back the invalidated the victimrsquos secret data in Step
1 In Step 3 if the victim reloads the secret data a cachehit is observed and the attacker can derive the secret datarsquosaddress
Flush + Probe (New Name Assigned in this Paper) In Step
1 the victim or the attacker access some known addressIn Step 2 the victim invalidates secret data In Step 3reloading of Step 1rsquos data and observation of a cache misswill help the attacker learn that the secret data maps to theknown address from Step 1
Evict + Time In Step 1 some victimrsquos secret data is put intothe cache by the victim itself In Step 2 the attacker evicts aspecific cache set by performing a memory related operationthat is not a flush In Step 3 the victim reloads secret dataand if a cache miss is observed the will learn the secretdatarsquos cache set information This attack has similar Step 1and Step 3 as Flush + Time vulnerability but for Step 2 inEvict + Time the attacker invalidates some known addressallowing it to find the full address of the secret data insteadof evicting a cache set to only find the secret datarsquos cacheindex as in the Flush + Time attack
Prime + Probe In Step 1 the attacker primes the cache setusing data at address known to the attacker In Step 2 thevictim accesses the secret data which possibly evicts datafrom Step 1 In Step 3 the attacker probes each cache setand if a cache miss is observed the attacker knowns thesecret data maps to the cache set he or she primed
Bernsteinrsquos Attack This attack strategy leverages the vic-timrsquos internal interference to trigger the miss-based attackFor one case the victim does the same secret data access inStep 1 and Step 3 while in Step 2 the victim tries to evictone whole cache setrsquos data by known data accesses If cachemiss is observed in Step 3 that will tell the attacker thecache set is the one secret data maps to For another casethe victim primes and probe a cache set in Step 1 and Step
3 driven by the attacker while in Step 2 the victim tries to
J Hardw Syst Secur (2019) 3397ndash425418
access the secret data Similar to the first case observingcache miss in Step 3 tells the attacker the cache set is theone secret data maps to
Evict + Probe (NewName Assigned in this Paper) In Step 1victim evict the cache set using the access to a data at anaddress known to the attacker In Step 2 the victim accessessecret data which possibly evicts data from Step 1 In Step
3 the attacker probes each cache set using the same data asin Step 1 if a cache miss is observed the attacker knownsthe secret data maps to the cache set he or she primed Thisattack strategy has similar Step 2 and Step 3 as Prime +Probe attack but for Step 1 it is the victim that does theeviction accesses
Prime + Time (NewName Assigned in this Paper) In Step 1the attacker primes the cache set using access to data at anaddress known to the attacker In Step 2 the victim accessessecret data which possibly evicts data from Step 1 In Step
3 the victim probes each cache set using the same data Step
1 if a cache miss is observed the attacker knowns the secretdata maps to the cache set he or she primed in Step 1 Thisattack strategy has similar Step 1 and Step 2 as Prime +Probe attack but for Step 3 it is the victim that does theprobing accesses
Flush + Time (NewNameAssigned in this Paper) The victimaccesses the same secret data in Step 1 and Step 3 whilein Step 2 the attacker tries to invalidate data at a knownaddress If cache miss is observed in Step 3 that will tellthe attacker the data address he or she invalidated in Step 2maps to the secret data
Invalidation related (new names assigned in this paper)Vulnerabilities that have names ending with ldquoinvalidationrdquoin Table 3 correspond to the vulnerabilities that havethe same name (except for the ldquoinvalidationrdquo part) inTable 2 The difference between each set of correspondingvulnerabilities is that the vulnerabilities ending withldquoinvalidationrdquo use invalidation related operation in the laststep to derive the timing information rather than the normalmemory access related operations
Appendix B Soundness Analysisof the Three-StepModel
In this section we analyze the soundness of the three-stepmodel to demonstrate that the three-step model can coverall possible timing-based cache vulnerabilities in normalcaches If there is a vulnerability that is represented usingmore than three steps the steps can be reduced to only three
steps or a three-step sub-pattern can be found in the longerrepresentation
In the below analysis we use β to denote the numberof memory-related operations ie steps in a representationof a vulnerability We show that β = 1 is not sufficient torepresent a vulnerability β = 2 covers some vulnerabilitiesbut not all β = 3 represents all the vulnerabilities andβ gt 3 can be reduced to only three steps or a three-step sub-pattern can be found in the longer representation Knownaddresses refer to all the cache states that interferencewith the data a aalias and d Unknown address refersu An access to a known memory address is denoted asknown access operation and an invalidation of a knownmemory address is denoted as known inv operationThe known access operation and known inv operation
together make up not u operations An unknown memoryrelated operation (containing u) is denoted as u operation
B1 Patterns with β = 1
When β = 1 there is only one memory-relatedoperation and it is not possible to create interferencebetween memory-related operations since two memory-related operations are the minimum requirement for aninterference Furthermore β = 1 corresponds to the three-step pattern with both Step 1 and Step 2 being since thecache state gives no information and Step 3 being theone operation These types of patterns are all examined bythe cache three-step simulator and none of these types arefound to be effective Consequently a vulnerability cannotexit when β = 1
B2 Patterns with β = 2
When β = 2 it satisfies the minimum requirementof an interference for memory related operations andcorresponds to the three-step cases where Step 1 is and Step 2 and Step 3 are the two operations Thesetypes are all examined by the cache three-step simulatorand some of them belong to Weak Vulnerabilities like Aa Vu Therefore three-step cases where Step
1 is have corresponding effective vulnerabilities shownin Table 2 Consequently β = 2 can represent someweak vulnerabilities but not all vulnerabilities as there existsome that are represented with three steps as discussednext
B3 Patterns with β = 3
When β = 3 we have tested all possible combinations ofthree-step memory related operations in Section 33 usingour cache simulator for the three-step model We foundthat there are in total 72 types of Strong Vulnerabilities
J Hardw Syst Secur (2019) 3397ndash425 419
and 64 types of Weak Vulnerabilities that are representedby patterns with β = 3 steps Consequently β = 3can represent all the vulnerabilities (including some weakones where Step 1 is ) Using more steps to representvulnerabilities is not necessary as discussed next
B4 Patterns with β gt 3
When β gt 3 the pattern of memory-related operations fora vulnerability can be reduced using the following rules
B41 Subdivision Rules
First a set of subdivision rules is used to divide the longpattern into shorter patterns following the below rulesEach subdivision rule should be applied recursively beforeapplying the next rule
Subdivision Rule 1 If the longer pattern contains a sub-pattern such as the longer pattern canbe divided into two separate patterns where is assignedas Step 1 of the second pattern This is because givesno timing information and the attacker loses track of thecache state after This rule should be recursively applieduntil there are no sub-patterns left with a in the middleor as last step ( in the last step will be deleted) in thelonger pattern
Subdivision Rule 2 Next if a pattern (derived afterrecursive application of the Rule 1 contains a sub-patternsuch as AinvVinv the longer pattern canbe divided into two separate patterns where AinvVinv isassigned as Step 1 of the second pattern This is becauseAinvVinv will flush all the timing information of thecurrent block and it can be used as the flushing step forStep 1 eg vulnerability Ainv Vu Aa(f ast)shown in Table 2 AinvVinv cannot be a candidate formiddle steps or the last step because it will flush alltiming information making the attacker unable to deducethe final timing with victimrsquos sensitive address translationinformation This rule should be recursively applied untilthere are no sub-patterns left with a AinvVinv in themiddle or the last step (AinvVinv in the last step will bedeleted)
B42 Simplification Rules
For each of the patterns resulting from the subdivisionof the original pattern we define Commute Rules UnionRules and Reduction Rules for a each set of two adjacentsteps in these remaining patterns In Table 7 we showall the possible cases of the rule applying conditions for
each adjacent two steps regardless of the attackerrsquos access(A) or the victimrsquos access (V ) The table shows whetherthe corresponding two steps can be commuted reduced orunioned (and the reduced or the unioned result if the rulescan be applied)
B421 Commute Rules
Suppose there are two adjacent steps M and N for a memorysequences M N If commuting M andN lead to the same observation result ie M N and N M will have thesame timing observation information in the final step for theattacker we can freely exchange the place of M and N inthis pattern In this case we have more chance to Reduceand Union the steps within the memory sequence by thefollowing Rules In the possible commuting process we willtry every possible combinations to commute different pairsof two steps that are able to apply the Commute Rules andthen further apply Reduce Rules and Union Rules to seewhether the commute is effective ie there can be stepsreduced or unioned after the proper commuting process Thefollowing two adjacent memory-related operations can becommuted
ndash Commute Rule 1 For two adjacent steps if one stepis a known access operation and another step is aknown inv operation and the addresses they referto are different these two steps can be commuted nomatter which position of the two steps they are in withinthe whole memory sequence It will show a ldquoyesrdquo forthe corresponding two-step pattern for the CommuteRule 1 column if these two can be commuted in Table 7
ndash Commute Rule 2 A superset of two-step patterns thatcan apply Commute Rule 1 can be commuted if thesecond step of these two adjacent steps is not thelast step in the whole memory sequence There aresome two adjacent steps that can only be commutedif the second step of these two adjacent steps is notthe last step in the whole memory sequence Therewill be a ldquoyesrdquo for the corresponding two-step patternfor the Commute Rule 2 column and a ldquonordquo for thecorresponding two-step pattern for the Commute Rule 1column in Table 7
B422 Reduction Rules
If the memory sequence after applying Commute Ruleshave a sub-pattern that has two adjacent steps both relatedto known addresses or both related to unknown address(including repeating states) the two adjacent steps can bereduced to only one following the reduction rules (if thetwo-step pattern has ldquoyesrdquo for the Column ldquoUnion Rule or
J Hardw Syst Secur (2019) 3397ndash425420
Table7
Rul
esfo
rco
mbi
ning
two
adja
cent
step
s
Firs
tSe
cond
Com
mute
Com
mute
Union
Rule
orC
ombi
ned
Firs
tSe
cond
Com
mute
Com
mute
Union
Rule
orC
ombi
ned
Rule1
Rule2
ReduceRule
Step
Rule1
Rule2
ReduceRule
Step
aa
yes
yes
yes
aa
inv
ano
noye
sa
aa
ali
as
nono
yes
aali
as
ain
va
ali
as
yes
yes
yes
aali
as
ad
nono
yes
da
inv
dye
sye
sye
sd
au
nono
nominus
ain
vu
nono
nominus
aa
inv
nono
yes
ain
va
inv
ain
vye
sye
sye
sa
inv
aa
ali
asin
vye
sye
sye
sa
ain
va
ali
asin
vye
sye
sye
sU
nio
n(a
inva
ali
asin
v)
ad
inv
yes
yes
yes
aa
inv
din
vye
sye
sye
sU
nio
n(a
invd
inv)
au
inv
nono
nominus
ain
vu
inv
noye
sno
minusa
ali
as
ano
noye
sa
aali
asin
va
yes
yes
yes
a
aali
as
aali
as
yes
yes
yes
aali
as
aali
asin
va
ali
as
nono
yes
aali
as
aali
as
dno
noye
sd
aali
asin
vd
yes
yes
yes
d
aali
as
uno
nono
minusa
ali
asin
vu
nono
nominus
aali
as
ain
vye
sye
sye
sa
ali
as
aali
asin
va
inv
yes
yes
yes
Unio
n(a
inva
ali
asin
v)
aali
as
aali
asin
vno
noye
sa
ali
asin
va
ali
asin
va
ali
asin
vye
sye
sye
sa
ali
asin
v
aali
as
din
vye
sye
sye
sa
ali
as
aali
asin
vd
inv
yes
yes
yes
Unio
n(d
inva
ali
asin
v)
aali
as
uin
vno
nono
minusa
ali
asin
vu
inv
noye
sno
minusd
ano
noye
sa
din
va
yes
yes
yes
a
da
ali
as
nono
yes
aali
as
din
va
ali
as
yes
yes
yes
aali
as
dd
yes
yes
yes
dd
inv
dno
noye
sd
du
nono
nominus
din
vu
noye
sno
minusd
ain
vye
sye
sye
sd
din
va
inv
yes
yes
yes
Unio
n(a
invd
inv)
da
ali
asin
vye
sye
sye
sd
din
va
ali
asin
vye
sye
sye
sU
nio
n(d
inva
ali
asin
v)
dd
inv
nono
yes
din
vd
inv
din
vye
sye
sye
sd
inv
du
inv
noye
sno
minusd
inv
uin
vno
yes
nominus
ua
nono
nominus
uin
va
nono
nominus
ua
ali
as
nono
nominus
uin
va
ali
as
nono
nominus
ud
nono
nominus
uin
vd
noye
sno
minusu
uye
sye
sye
su
uin
vu
nono
yes
u
ua
inv
nono
nominus
uin
va
inv
noye
sno
minusu
aali
asin
vno
nono
minusu
inv
aali
asin
vno
yes
nominus
ud
inv
noye
sno
minusu
inv
din
vno
yes
nominus
uu
inv
nono
yes
uin
vu
inv
uin
vye
sye
sye
su
inv
J Hardw Syst Secur (2019) 3397ndash425 421
Reduce Rulerdquo and has no Union result for the ldquoCombinedSteprdquo Column in Table 7
ndash Reduction Rule 1 For two u operations although u isunknown both of the accesses target on the same u socan be reduced to only keep the second access in thememory sequence
ndash Reduction Rule 2 For two known adjacent memoryaccessndashrelated operations (known access operation)they always result in a deterministic state of the secondmemory accessndashrelated cache block so these two stepscan be reduced to only one step
ndash Reduction Rule 3 For two adjacent steps if onestep is known access operation and another one isknown inv operation no matter what order theyhave and the address they refer to is the same these twocan be reduced to one step which is the second step
B423 Union Rules
Suppose there are two adjacent steps M and N for memorysequences M N If combining M andN leads to the same timing observation result ie M N and Union(M N) will havethe same timing observation information in the final step forthe attacker we can combine step M and N to be a joint onestep for this memory sequence defined as Union(M N)Two adjacent steps that can be combined are discussed inthe following cases
ndash Union Rule 1 Two invalidations to two knowndifferent memory addresses can be applied UnionRule 1 known inv operation are two operations bothinvalidating some known address therefore they canbe combined to only one step The Union Rule can becontinuously done to union all the adjacent invalidationstep that invalidates known different memory addresses
B424 Final Check Rules
Each long memory sequence will recursively apply thesethree categorizations of the rules in the order Com-mute Rules first to put known access operations andknown inv operation that targets the same address as nearas possible and u operations and not u operations areputting together as much as possible The Reduced Rulesare then checked and applied to the processed memorysequence to reduce the steps Then the Union Rule isapplied to the processed memory sequence
The recursion at each application to these threecategorizations of the rules should be always applied andreduce at least one step until the resulting sequence matchesone of the two possible cases
bull the long (β gt 3) memory sequence with u operation
and not u operation is further reduced to a sequencewhere there are at most three steps in the followingpatterns or less
ndash u operation not u operation u operation
ndash not u operation u operation not u operation
There might be possible extra or AinvV inv beforethese three-step pattern where
bull An extra in the first step will not influencethe result and can be directly removed
bull If an extra AinvV inv in the first step
ndash If followed by known access ope-ration AinvV inv can be removeddue to the actual state further put intothe cache block
ndash If followed by known inv opera-t ion or V inv
u AinvV inv can also beremoved since the memory locationis repeatedly flushed by the two steps
ndash If followed by Vu worst case will beAinvV inv Vu not u operation u operationwhich is either an effective vul-nerability according to Table 2and reduction rules shown inSection 33 or AinvV inv Vu Ainv
d V invd u operation where
Vu Ainvd V inv
d can further beapplied Commute Rule 2 to reduceand be within three steps
In this case the steps are finally within three steps andthe checking is done
bull There exist two adjacent steps that cannot be appliedany Rules above and requires the Rest Checking
The only left two adjacent steps that cannot be appliedby any of the three categorizations of the Rules are thefollowing
ndash AaVaAaalias Vaalias AdVdAinva V inv
a
Ainvaalias V inv
aalias Vu ndash AaVaAaalias Vaalias V inv
u ndash Vu AaVaAaalias Vaalias AdVd
Ainva V inv
a Ainvaalias V inv
aalias ndash V inv
u AaVaAaalias Vaalias We manually checked all of the two adjacent step patterns
above and found that adding extra step before or after these
J Hardw Syst Secur (2019) 3397ndash425422
Algorithm 2 -Step ( 3) pattern reduction
Input number of steps of the pattern
a two-dimensional dynamic-size array [0] contains the states of each step of the original pattern in
order [1] [2] are empty initially
Output reduced effective vulnerability pattern(s) array It will be an empty list if the original pattern does not
correspond to an effective vulnerability
1 = Oslash
2 while contain( ) and index not 0 do3 = 1 ( )
4 end while5 while ( contain( ) and index not 0) or ( contain( ) and index not 0) do6 = 2 ( )
7 end while8 while ( is ineffecitve or has interval effective three steps) do9 = ( )
10 = ( )
11 = ( )
12 if ( is ineffecitve or has interval effective three steps) then13 += ( )
14 end if15 end while16 return
two steps can either generate two adjacent step patterns thatbe processed by the three Rules where further step can bereduced or construct effective vulnerability according toTable 2 and reduction rules shown in Section 33 wherethe corresponding pattern can be treated effective and thechecking is done
B43 Algorithm for Reducing and CheckingMemorySequence
Algorithm 2 is used to (i) reduce a β-step (β gt 3)pattern to a three-step pattern thus demonstrating that thecorresponding β gt 3 step pattern actually is equivalent tothe output three-step pattern and represents a vulnerabilitythat is captured by an existing three-step pattern or (ii)demonstrate that the β-step pattern can be mapped to one ormore three-step vulnerabilities It is not possible for a β-stepvulnerability pattern to not be either (i) or (ii) after doing theRule applications Key outcome of our analysis is that anyβ-step pattern is not a vulnerability or if it is a vulnerabilityit maps to either outputs (i) or (ii) of the algorithm
Inside the Algorithm 2 contain() represents a function tocheck if a list contains a corresponding state is ineffective()represents a function that checks the corresponding memorysequence does not contain any effective three-stepshas interval effective three steps() represents a functionthat check if the corresponding memory sequence can bemapped to one or more three-step vulnerabilities
B44 Summary
In conclusion the three-step model can model all possibletiming-based cache vulnerability in normal caches Vulner-abilities which are represented by more than three steps canbe always reduced to one (or more) vulnerabilities from ourthree-step model and thus using more than three step is notnecessary
References
1 Acıicmez O Koc CK (2006) Trace-driven cache attacks on AES(short paper) In International conference on information andcommunications security Springer pp 112ndash121
2 Agrawal D Archambeault B Rao JR Rohatgi P (2002) TheEM side-channel (s) In International workshop on cryptographichardware and embedded systems Springer pp 29ndash45
3 Bernstein DJ (2005) Cache-timing attacks on AES4 Binkert N Beckmann B Black G Reinhardt SK Saidi A Basu
A Hestness J Hower DR Krishna T Sardashti S et al (2011)The gem5 simulator ACM SIGARCH computer architecture news39(2)1ndash7
5 Bonneau J Mironov I (2006) Cache-collision timing attacksagainst AES In International workshop on cryptographichardware and embedded systems Springer pp 201ndash215
6 Borghoff J Canteaut A Guneysu T Kavun EB Knezevic MKnudsen LR Leander G Nikov V Paar C Rechberger Cet al (2012) Princendasha low-latency block cipher for pervasivecomputing applications In International conference on the theoryand application of cryptology and information security Springerpp 208ndash225
J Hardw Syst Secur (2019) 3397ndash425 423
7 Bourgeat T Lebedev I Wright A Zhang S Devadas S et al (2019)MI6 Secure enclaves in a speculative out-of-order processorIn Proceedings of the 52nd Annual IEEEACM InternationalSymposium on Microarchitecture ACM pp 42ndash56
8 Chen S Liu F Mi Z Zhang Y Lee RB Chen H Wang X(2018) Leveraging hardware transactional memory for cache side-channel defenses In Proceedings of the 2018 on Asia conferenceon computer and communications security ACM pp 601ndash608
9 Clark SS Ransford B Rahmati A Guineau S Sorber J Xu WFu K (2013) Wattsupdoc power side channels to nonintrusivelydiscover untargeted malware on embedded medical devices InPresented as part of the 2013 USENIX workshop on healthinformation technologies
10 Costan V Lebedev IA Devadas S (2016) Sanctum minimalhardware extensions for strong software isolation In USENIXsecurity symposium pp 857ndash874
11 Deng S Xiong W Szefer J (2018) Cache timing side-channel vulnerability checking with computation tree logic InProceedings of the 7th international workshop on hardware andarchitectural support for security and privacy ACM p 2
12 Domnitser L Jaleel A Loew J Abu-Ghazaleh N Ponomarev D(2012) Non-monopolizable caches low-complexity mitigation ofcache side channel attacks ACM Transactions on Architectureand Code Optimization (TACO) 8(4)35
13 Doychev G Kopf B (2017) Rigorous analysis of softwarecountermeasures against cache attacks In Proceedings of the 38thACM SIGPLAN conference on programming language design andimplementation ACM pp 406ndash421
14 Doychev G Kopf B Mauborgne L Reineke J (2015) CacheAudita tool for the static analysis of cache side channels ACM Trans-actions on information and system security (TISSEC) 18(1)4
15 Gruss D Lettner J Schuster F Ohrimenko O Haller I CostaM (2017) Strong and efficient cache side-channel protectionusing hardware transactional memory In USENIX securitysymposium pp 217ndash233
16 Gruss D Maurice C Wagner K Mangard S (2016) Flush+ flusha fast and stealthy cache attack In International conference ondetection of intrusions and malware and vulnerability assessmentSpringer pp 279ndash299
17 Gruss D Spreitzer R Mangard S (2015) Cache template attacksautomating attacks on inclusive last-level caches In USENIXsecurity symposium pp 897ndash912
18 Guanciale R Nemati H Baumann C Dam M (2016) Cache stor-age channels alias-driven attacks and verified countermeasuresIn 2016 IEEE symposium on security and privacy (SP) IEEEpp 38ndash55
19 Gullasch D Bangerter E Krenn S (2011) Cache gamesndashbringingaccess-based cache attacks on AES to practice In 2011 IEEEsymposium on security and privacy (SP) IEEE pp 490ndash505
20 He Z Lee RB (2017) How secure is your cache against side-channel attacks In Proceedings of the 50th annual IEEEACMinternational symposium on microarchitecture ACM pp 341ndash353
21 Intel C (2015) Improving real-time performance by utilizing cacheallocation technology Intel Corporation
22 Kayaalp M Khasawneh KN Esfeden HA Elwell J Abu-Ghazaleh N Ponomarev D Jaleel A (2017) RIC relaxed inclusioncaches for mitigating LLC side-channel attacks In 2017 54thACMEDACIEEE design automation conference (DAC) IEEEpp 1ndash6
23 Keramidas G Antonopoulos A Serpanos DN Kaxiras S (2008)Non deterministic caches a simple and effective defense againstside channel attacks Des Autom Embed Syst 12(3)221ndash230
24 Kessler RE Hill MD (1992) Page placement algorithms for largerealindexed caches ACM Trans Comput Syst (TOCS) 10(4)338ndash359
25 Kiriansky V Lebedev I Amarasinghe S Devadas S Emer J(2018) DAWG a defense against cache timing attacks in spec-ulative execution processors In 2018 51st Annual IEEEACMinternational symposium on microarchitecture (MICRO) IEEEpp 974ndash987
26 Kocher P Genkin D Gruss D Haas W Hamburg M LippM Mangard S Prescher T Schwarz M Yarom Y (2018)Spectre attacks exploiting speculativeexecution In 2019 IEEESymposium on Security and Privacy (SP) IEEE pp 1ndash19
27 Lee RB Kwan P McGregor JP Dwoskin J Wang Z (2005)Architecture for protecting critical secrets in microprocessorsIn ACM SIGARCH computer architecture news vol 33 IEEEComputer Society pp 2ndash13
28 Lee Y Waterman A Avizienis R Cook H Sun C StojanovicV Asanovic K (2014) A 45nm 13 GHz 167 double-precisionGFLOPSW RISC-V processor with vector accelerators InEuropean solid state circuits conference (ESSCIRC) ESSCIRC2014-40th IEEE pp 199ndash202
29 Liu F Ge Q Yarom Y Mckeen F Rozas C Heiser G Lee RB(2016) CATalyst defeating last-level cache side channel attacks incloud computing In 2016 IEEE international symposium on highperformance computer architecture (HPCA) IEEE pp 406ndash418
30 Liu F Lee RB (2014) Random fill cache architecture In2014 47th annual IEEEACM international symposium onmicroarchitecture (MICRO) IEEE pp 203ndash215
31 Liu F Wu H Mai K Lee RB (2016) Newcache secure cachearchitecture thwarting cache side-channel attacks IEEE Micro36(5)8ndash16
32 Luk CK Cohn R Muth R Patil H Klauser A Lowney G WallaceS Reddi VJ Hazelwood K (2005) Pin building customizedprogram analysis tools with dynamic instrumentation In ACMsigplan notices vol 40 ACM pp 190ndash200
33 Masti RJ Rai D Ranganathan A Muller C Thiele L Capkun S(2015) Thermal covert channels on multi-core platforms In 24thUSENIX security symposium (USENIX security 15) pp 865ndash880
34 Osvik DA Shamir A Tromer E (2006) Cache attacks andcountermeasures the case of AES In Cryptographersrsquo track atthe RSA conference Springer pp 1ndash20
35 Patel A Afram F Chen S Ghose K (2011) MARSS Afull system simulator for multicore x86 CPUs In 2011 48thACMEDACIEEE design automation conference (DAC) IEEEpp 1050ndash1055
36 Percival C (2005) Cache missing for fun and profit BSDCan 2005Ottawa 2005 httpwwwdaemonologynetpapershttpdf
37 Hammarlund P Martinez AJ Bajwa AA Hill DL Hallnor EJiang H Dixon M Derr M Hunsaker M Kumar R Osborne RB(2014) Haswell The fourth-generation intel core processor IEEEMicro 1034(2)6ndash20
38 Qureshi MK (2018) CEASER mitigating conflict-based cacheattacks via encrypted-address and remapping In 2018 51Stannual IEEEACM international symposium on microarchitecture(MICRO) IEEE pp 775ndash787
39 Sanchez D Kozyrakis C (2013) ZSIm fast and accuratemicroarchitectural simulation of thousand-core systems In ACMSIGARCH computer architecture news vol 41 ACM pp 475ndash486
40 Schwarz M Schwarzl M Lipp M Gruss D (2018) NetSpectreread arbitrary memory over network In European Symposium onResearch in Computer Security Springer pp 279ndash299
41 Seznec A (1993) A case for two-way skewed-associative cachesACM SIGARCH computer architecture news 21(2)169ndash178
42 Sharkey J Ponomarev D Ghose K (2005) M-sim a flexiblemultithreaded architectural simulation environment Techenicalreport Department of Computer Science State University ofNew York at Binghamton In Technical Report 20012 CompaqComputer Corporation Aug 2001
J Hardw Syst Secur (2019) 3397ndash425424
43 Shivakumar P Jouppi NP (2001) Cacti 30 an integrated cachetiming power and area model In Technical Report 20012Compaq Computer Corporation Aug 2001
44 Taylor G Davies P Farmwald M (1990) The TLB slice-alow-cost high-speed address translation mechanism In 17thannual international symposium on computer architecture 1990Proceedings IEEE pp 355ndash363
45 Thoziyoor S Muralimanohar N Ahn JH Jouppi NP (2008)CACTI 51 Technical Report HPL-2008-20 HP Labs
46 Trippel C Lustig D Martonosi M (2018) MeltdownPrimeand SpectrePrime automatically-synthesized attacks exploitinginvalidation-based coherence protocols arXiv180203802
47 Wang S Wang P Liu X Zhang D Wu D (2017) CacheDidentifying cache-based timing channels in production softwareIn 26th USENIX security symposium USENIX association
48 Wang Y Ferraiuolo A Zhang D Myers AC Suh GE (2016)SecDCP secure dynamic cache partitioning for efficient timingchannel protection In 2016 53nd ACMEDACIEEE designautomation conference (DAC) IEEE pp 1ndash6
49 Wang Z Lee RB (2007) New cache designs for thwarting softwarecache-based side channel attacks In ACM SIGARCH computerarchitecture news vol 35 ACM pp 494ndash505
50 Wang Z Lee RB (2008) A novel cache architecture withenhanced performance and security In 2008 41st IEEEACMinternational symposium on Microarchitecture 2008 MICRO-41IEEE pp 83ndash93
51 Werner M Unterluggauer T Giner L Schwarz M Gruss D Man-gard S (2019) ScatterCache thwarting cache attacks via cache setrandomization In 28th USENIX security symposium (USENIXSecurity 19) USENIX Association Santa Clara CA httpswwwusenixorgconferenceusenixsecurity19presentationwerner
52 Yan M Choi J Skarlatos D Morrison A Fletcher C Torrellas J(2018) InvisiSpec making speculative execution invisible in thecache hierarchy In 2018 51st annual IEEEACM internationalsymposium on microarchitecture (MICRO) IEEE pp 428ndash441
53 Yan M Gopireddy B Shull T Torrellas J (2017) Secure hierarchy-aware cache replacement policy (SHARP) defending againstcache-based side channel attacks In Proceedings of the 44thannual international symposium on computer architecture ACMpp 347ndash360
54 Yao F Doroslovacki M Venkataramani G (2018) Are coherenceprotocol states vulnerable to information leakage In 2018IEEE international symposium on high performance computerarchitecture (HPCA) IEEE pp 168ndash179
55 Yarom Y Falkner K (2014) FLUSH+ RELOAD a high resolutionlow noise L3 cache side-channel attack In USENIX securitysymposium pp 719ndash732
56 Zhang D Askarov A Myers AC (2012) Language-based controland mitigation of timing channels ACM SIGPLAN Not 47(6)99ndash110
57 Zhang D Wang Y Suh GE Myers AC (2015) A hardware designlanguage for timing-sensitive information-flow security In ACMSIGARCH computer architecture news vol 43 ACM pp 503ndash516
58 Zhang S Wright A Bourgeat T Arvind A (2018) Composablebuilding blocks to open up processor design In 2018 51stannual IEEEACM international symposium on microarchitecture(MICRO) IEEE pp 68ndash81
59 Zhang T Lee RB (2014) New models of cache architecturescharacterizing information leakage from cache side channels InProceedings of the 30th annual computer security applicationsconference ACM pp 96ndash105
60 Zhang Y Parikh D Sankaranarayanan K Skadron K Stan M(2003) HotLeakage a temperature-aware model of subthresholdand gate leakage for architects University of Virginia Dept ofComputer Science Tech Report CS-2003 5
Publisherrsquos Note Springer Nature remains neutral with regard tojurisdictional claims in published maps and institutional affiliations
J Hardw Syst Secur (2019) 3397ndash425 425
the Non Deterministic cache with random delay thesecure cache can prevent all the cache timingndashbasedvulnerabilities in some degreemdashwhile their paper reportsonly 7 degradation in performance we expect it tobe much more for more complex application than AESalgorithm Disabling caches eliminates the attacks but ata huge performance cost Normally a secure cache needsto sacrifice some performance in order to de-correlatememory access with the timing The secure caches thattend to be able to prevent more vulnerabilities usuallyhave weaker performance compared with other securecaches For example more security seems to imply lessperformance
62 Towards Ideal Secure Cache
Based on the above analysis a good secure cache shouldconsider all the 72 types of Strong vulnerabilities egexternal and internal interference and hit-based and miss-based vulnerabilities Considering all factors and basedon Tables 4 and 5 we have several suggestions andobservations for a secure cache design which can defendtiming-based attacks
bull Internal interference is important for caches to preventtiming-based attacks and is the weak point of most ofthe secure caches To prevent this the following threesubpoints should be considered
ndash Miss-based internal interference can be solvedby randomly evicting data to de-correlatememory access with timing information wheneither data to be evicted or data to be cachedis sensitive eg Newcache prevents Vu Va Vu (slow) (one type of BernsteinrsquosAttack [3]) vulnerability
ndash Hit-based internal interference can be solvedby randomly bringing data into the cache egRandom Fill cache prevents Ad Vu Va
(fast) (Cache Internal Collision) vulnerabilityndash To limit internal interference at lower perfor-
mance cost rather than simply assume all ofvictimrsquos data is sensitive it is better to differ-entiate real sensitive data from other data in thevictim code However identification of sen-sitive information needs to be carefully usedeg Random Fill cache is vulnerable to Vu Ad Vu (fast) (one type of Evict + Time [34])vulnerability which most of the secure cachesare able to prevent
bull Direct partitioning between the victim and the attackeralthough may hurt cache space utilization or perfor-mance is good at disallowing attacker to set known
initial state to victimrsquos partition and therefore pre-vents external interference Alternatively careful use ofrandomization can also prevent external interference
It should be noted that some cache designs only focuson certain levels eg CATalyst cache only works at thelast level cache In order to fully protect the whole cachesystem from timing-based attacks all levels of caches in thehierarchy should be protected with related security featuresFor example Sanctum is able to prevent all levels of cachesfrom L1 to last-level cache Consequently secure cachedesign needs to be realizable at all levels of the cachehierarchy
7 RelatedWork
There are a lot of existing attacks exploring timing-based cache channels eg [1 3 5 16ndash19 34 36 5455] Furthermore our recent paper [11] has summarizedcache timingndashbased side-channel vulnerabilities using athree-step model and inspired this work on checkingwhich vulnerability types are truly defeated by the securecaches in context of timing-based attacks In other workZhang and Lee [59] used finite-state machine to modelcache architectures and leveraged mutual information tomeasure potential side-channel leakage of the modeledcache architectures Meanwhile He and Lee [20] modeledinterference using probabilistic information flow graph andused attackerrsquos success probability to estimate differentcachesrsquo ability to defend against some cache timingndashbasedside-channel attacks However they did not explore allpossible vulnerabilities due to cache timingndashbased channels
There is also some other work focusing on cacheside-channel verification [13 14 47] Among theseCacheAudit [14] efficiently computes possible side-channelobservations using abstractions in a modular way Bit-level and arithmetic reasoning is used in [13] for memoryaccesses in the presence of dynamic memory allocationCacheD [47] detects potential cache differences at eachprogram point leveraging symbolic execution and constraintsolving
Hardware transactional memory has also been lever-aged to prevent timing-based cache side-channel attacks [815] Hardware transactional memory (HTM) is availableon modern commercial processors such as Intelrsquos Transac-tional Synchronization Extensions (TSX) Its main featureis to abort the transaction and roll back the modificationswhenever a cache block contained in the read set or writeset is evicted out of the cache In [15] HTM was com-bined with preloading strategy for code and data to preventFlush + Reload attacks in the local setting and Prime andProbe attacks in the cloud setting In [8] the software-level
J Hardw Syst Secur (2019) 3397ndash425 417
solution targets system calls page faults code refactoringand abort reasoning to eliminate not only Prime + ProbeFlush + Reload but also Evict + time and Cache Collisionattacks
8 Conclusion
This paper first proposed a new three-step model in orderto model all possible cache timing vulnerabilities It furtherprovided a cache three-step simulator and reduction rulesto derive effective vulnerabilities allowing us to find onesthat have not been exploited in literature With exhaustiveeffective vulnerability types listed this paper presentedanalysis of 18 secure processor cache designs with respectto how well they can defend against these timing-basedvulnerabilities Our work showed that vulnerabilities basedon internal interference of the victim application aredifficult to protect against and many secure cache designsfail in this We also provided a summary of secure processorcache features that could be integrated to make an idealsecure cache that is able to defend timing-based attacksOverall implementing a secure cache in a processor can be aviable alternative to defend timing-based attacks Howeverit requires design of an ideal secure cache or correction ofexisting secure cache designs to eliminate the few attacksthat they do not protect against
Acknowledgments This work was support in part by NSF grants1651945 and 1813797 and through Semiconductor Research Corpo-ration (SRC) under GRC Task 2844001
Appendix A Attack Strategies Descriptions
This appendix gives overview of the attack strategies shownin Tables 2 and 3 in Section 3 For each attack strategy anoverview of the three steps of the strategy is given Some ofthe strategies are similar and some may not be precise butwe keep and use the original names as they were assignedin prior work One advantage of our three-step model isthat it gives precise definition of each attack Neverthelessthe attack strategy names used before (and added by us forstrategies which did not have such names) may be useful torecall the attacksrsquo high-level operation
Cache Internal Collision In Step 1 cache blockrsquos data isinvalidated by flushing or eviction done by either theattacker or the victim Then the victim accesses secret datain Step 2 Finally the victim accesses data at a knownaddress in Step 3 if there is a cache hit then it reveals thatthere is an internal collision and leaks value of u
Flush + Reload In Step 1 either the attacker or the victiminvalidates the cache blockrsquos data by flushing or evictionThen the victim access secret data in Step 2 Finally theattacker tries to access some data in Step 2 using a knownaddress If a cache hit is observed then addresses from lasttwo steps are the same and the attacker learns the secretaddress This strategy has similar Step 1 and Step 2 asCache Internal Collision vulnerability but for Step 3 it isthe attacker who does the reload access
Reload + Time (New Name Assigned in this Paper) In Step
1 secret data is invalidated by the victim Then the attackerdoes some known data access in Step 2 that could possiblybring back the invalidated the victimrsquos secret data in Step
1 In Step 3 if the victim reloads the secret data a cachehit is observed and the attacker can derive the secret datarsquosaddress
Flush + Probe (New Name Assigned in this Paper) In Step
1 the victim or the attacker access some known addressIn Step 2 the victim invalidates secret data In Step 3reloading of Step 1rsquos data and observation of a cache misswill help the attacker learn that the secret data maps to theknown address from Step 1
Evict + Time In Step 1 some victimrsquos secret data is put intothe cache by the victim itself In Step 2 the attacker evicts aspecific cache set by performing a memory related operationthat is not a flush In Step 3 the victim reloads secret dataand if a cache miss is observed the will learn the secretdatarsquos cache set information This attack has similar Step 1and Step 3 as Flush + Time vulnerability but for Step 2 inEvict + Time the attacker invalidates some known addressallowing it to find the full address of the secret data insteadof evicting a cache set to only find the secret datarsquos cacheindex as in the Flush + Time attack
Prime + Probe In Step 1 the attacker primes the cache setusing data at address known to the attacker In Step 2 thevictim accesses the secret data which possibly evicts datafrom Step 1 In Step 3 the attacker probes each cache setand if a cache miss is observed the attacker knowns thesecret data maps to the cache set he or she primed
Bernsteinrsquos Attack This attack strategy leverages the vic-timrsquos internal interference to trigger the miss-based attackFor one case the victim does the same secret data access inStep 1 and Step 3 while in Step 2 the victim tries to evictone whole cache setrsquos data by known data accesses If cachemiss is observed in Step 3 that will tell the attacker thecache set is the one secret data maps to For another casethe victim primes and probe a cache set in Step 1 and Step
3 driven by the attacker while in Step 2 the victim tries to
J Hardw Syst Secur (2019) 3397ndash425418
access the secret data Similar to the first case observingcache miss in Step 3 tells the attacker the cache set is theone secret data maps to
Evict + Probe (NewName Assigned in this Paper) In Step 1victim evict the cache set using the access to a data at anaddress known to the attacker In Step 2 the victim accessessecret data which possibly evicts data from Step 1 In Step
3 the attacker probes each cache set using the same data asin Step 1 if a cache miss is observed the attacker knownsthe secret data maps to the cache set he or she primed Thisattack strategy has similar Step 2 and Step 3 as Prime +Probe attack but for Step 1 it is the victim that does theeviction accesses
Prime + Time (NewName Assigned in this Paper) In Step 1the attacker primes the cache set using access to data at anaddress known to the attacker In Step 2 the victim accessessecret data which possibly evicts data from Step 1 In Step
3 the victim probes each cache set using the same data Step
1 if a cache miss is observed the attacker knowns the secretdata maps to the cache set he or she primed in Step 1 Thisattack strategy has similar Step 1 and Step 2 as Prime +Probe attack but for Step 3 it is the victim that does theprobing accesses
Flush + Time (NewNameAssigned in this Paper) The victimaccesses the same secret data in Step 1 and Step 3 whilein Step 2 the attacker tries to invalidate data at a knownaddress If cache miss is observed in Step 3 that will tellthe attacker the data address he or she invalidated in Step 2maps to the secret data
Invalidation related (new names assigned in this paper)Vulnerabilities that have names ending with ldquoinvalidationrdquoin Table 3 correspond to the vulnerabilities that havethe same name (except for the ldquoinvalidationrdquo part) inTable 2 The difference between each set of correspondingvulnerabilities is that the vulnerabilities ending withldquoinvalidationrdquo use invalidation related operation in the laststep to derive the timing information rather than the normalmemory access related operations
Appendix B Soundness Analysisof the Three-StepModel
In this section we analyze the soundness of the three-stepmodel to demonstrate that the three-step model can coverall possible timing-based cache vulnerabilities in normalcaches If there is a vulnerability that is represented usingmore than three steps the steps can be reduced to only three
steps or a three-step sub-pattern can be found in the longerrepresentation
In the below analysis we use β to denote the numberof memory-related operations ie steps in a representationof a vulnerability We show that β = 1 is not sufficient torepresent a vulnerability β = 2 covers some vulnerabilitiesbut not all β = 3 represents all the vulnerabilities andβ gt 3 can be reduced to only three steps or a three-step sub-pattern can be found in the longer representation Knownaddresses refer to all the cache states that interferencewith the data a aalias and d Unknown address refersu An access to a known memory address is denoted asknown access operation and an invalidation of a knownmemory address is denoted as known inv operationThe known access operation and known inv operation
together make up not u operations An unknown memoryrelated operation (containing u) is denoted as u operation
B1 Patterns with β = 1
When β = 1 there is only one memory-relatedoperation and it is not possible to create interferencebetween memory-related operations since two memory-related operations are the minimum requirement for aninterference Furthermore β = 1 corresponds to the three-step pattern with both Step 1 and Step 2 being since thecache state gives no information and Step 3 being theone operation These types of patterns are all examined bythe cache three-step simulator and none of these types arefound to be effective Consequently a vulnerability cannotexit when β = 1
B2 Patterns with β = 2
When β = 2 it satisfies the minimum requirementof an interference for memory related operations andcorresponds to the three-step cases where Step 1 is and Step 2 and Step 3 are the two operations Thesetypes are all examined by the cache three-step simulatorand some of them belong to Weak Vulnerabilities like Aa Vu Therefore three-step cases where Step
1 is have corresponding effective vulnerabilities shownin Table 2 Consequently β = 2 can represent someweak vulnerabilities but not all vulnerabilities as there existsome that are represented with three steps as discussednext
B3 Patterns with β = 3
When β = 3 we have tested all possible combinations ofthree-step memory related operations in Section 33 usingour cache simulator for the three-step model We foundthat there are in total 72 types of Strong Vulnerabilities
J Hardw Syst Secur (2019) 3397ndash425 419
and 64 types of Weak Vulnerabilities that are representedby patterns with β = 3 steps Consequently β = 3can represent all the vulnerabilities (including some weakones where Step 1 is ) Using more steps to representvulnerabilities is not necessary as discussed next
B4 Patterns with β gt 3
When β gt 3 the pattern of memory-related operations fora vulnerability can be reduced using the following rules
B41 Subdivision Rules
First a set of subdivision rules is used to divide the longpattern into shorter patterns following the below rulesEach subdivision rule should be applied recursively beforeapplying the next rule
Subdivision Rule 1 If the longer pattern contains a sub-pattern such as the longer pattern canbe divided into two separate patterns where is assignedas Step 1 of the second pattern This is because givesno timing information and the attacker loses track of thecache state after This rule should be recursively applieduntil there are no sub-patterns left with a in the middleor as last step ( in the last step will be deleted) in thelonger pattern
Subdivision Rule 2 Next if a pattern (derived afterrecursive application of the Rule 1 contains a sub-patternsuch as AinvVinv the longer pattern canbe divided into two separate patterns where AinvVinv isassigned as Step 1 of the second pattern This is becauseAinvVinv will flush all the timing information of thecurrent block and it can be used as the flushing step forStep 1 eg vulnerability Ainv Vu Aa(f ast)shown in Table 2 AinvVinv cannot be a candidate formiddle steps or the last step because it will flush alltiming information making the attacker unable to deducethe final timing with victimrsquos sensitive address translationinformation This rule should be recursively applied untilthere are no sub-patterns left with a AinvVinv in themiddle or the last step (AinvVinv in the last step will bedeleted)
B42 Simplification Rules
For each of the patterns resulting from the subdivisionof the original pattern we define Commute Rules UnionRules and Reduction Rules for a each set of two adjacentsteps in these remaining patterns In Table 7 we showall the possible cases of the rule applying conditions for
each adjacent two steps regardless of the attackerrsquos access(A) or the victimrsquos access (V ) The table shows whetherthe corresponding two steps can be commuted reduced orunioned (and the reduced or the unioned result if the rulescan be applied)
B421 Commute Rules
Suppose there are two adjacent steps M and N for a memorysequences M N If commuting M andN lead to the same observation result ie M N and N M will have thesame timing observation information in the final step for theattacker we can freely exchange the place of M and N inthis pattern In this case we have more chance to Reduceand Union the steps within the memory sequence by thefollowing Rules In the possible commuting process we willtry every possible combinations to commute different pairsof two steps that are able to apply the Commute Rules andthen further apply Reduce Rules and Union Rules to seewhether the commute is effective ie there can be stepsreduced or unioned after the proper commuting process Thefollowing two adjacent memory-related operations can becommuted
ndash Commute Rule 1 For two adjacent steps if one stepis a known access operation and another step is aknown inv operation and the addresses they referto are different these two steps can be commuted nomatter which position of the two steps they are in withinthe whole memory sequence It will show a ldquoyesrdquo forthe corresponding two-step pattern for the CommuteRule 1 column if these two can be commuted in Table 7
ndash Commute Rule 2 A superset of two-step patterns thatcan apply Commute Rule 1 can be commuted if thesecond step of these two adjacent steps is not thelast step in the whole memory sequence There aresome two adjacent steps that can only be commutedif the second step of these two adjacent steps is notthe last step in the whole memory sequence Therewill be a ldquoyesrdquo for the corresponding two-step patternfor the Commute Rule 2 column and a ldquonordquo for thecorresponding two-step pattern for the Commute Rule 1column in Table 7
B422 Reduction Rules
If the memory sequence after applying Commute Ruleshave a sub-pattern that has two adjacent steps both relatedto known addresses or both related to unknown address(including repeating states) the two adjacent steps can bereduced to only one following the reduction rules (if thetwo-step pattern has ldquoyesrdquo for the Column ldquoUnion Rule or
J Hardw Syst Secur (2019) 3397ndash425420
Table7
Rul
esfo
rco
mbi
ning
two
adja
cent
step
s
Firs
tSe
cond
Com
mute
Com
mute
Union
Rule
orC
ombi
ned
Firs
tSe
cond
Com
mute
Com
mute
Union
Rule
orC
ombi
ned
Rule1
Rule2
ReduceRule
Step
Rule1
Rule2
ReduceRule
Step
aa
yes
yes
yes
aa
inv
ano
noye
sa
aa
ali
as
nono
yes
aali
as
ain
va
ali
as
yes
yes
yes
aali
as
ad
nono
yes
da
inv
dye
sye
sye
sd
au
nono
nominus
ain
vu
nono
nominus
aa
inv
nono
yes
ain
va
inv
ain
vye
sye
sye
sa
inv
aa
ali
asin
vye
sye
sye
sa
ain
va
ali
asin
vye
sye
sye
sU
nio
n(a
inva
ali
asin
v)
ad
inv
yes
yes
yes
aa
inv
din
vye
sye
sye
sU
nio
n(a
invd
inv)
au
inv
nono
nominus
ain
vu
inv
noye
sno
minusa
ali
as
ano
noye
sa
aali
asin
va
yes
yes
yes
a
aali
as
aali
as
yes
yes
yes
aali
as
aali
asin
va
ali
as
nono
yes
aali
as
aali
as
dno
noye
sd
aali
asin
vd
yes
yes
yes
d
aali
as
uno
nono
minusa
ali
asin
vu
nono
nominus
aali
as
ain
vye
sye
sye
sa
ali
as
aali
asin
va
inv
yes
yes
yes
Unio
n(a
inva
ali
asin
v)
aali
as
aali
asin
vno
noye
sa
ali
asin
va
ali
asin
va
ali
asin
vye
sye
sye
sa
ali
asin
v
aali
as
din
vye
sye
sye
sa
ali
as
aali
asin
vd
inv
yes
yes
yes
Unio
n(d
inva
ali
asin
v)
aali
as
uin
vno
nono
minusa
ali
asin
vu
inv
noye
sno
minusd
ano
noye
sa
din
va
yes
yes
yes
a
da
ali
as
nono
yes
aali
as
din
va
ali
as
yes
yes
yes
aali
as
dd
yes
yes
yes
dd
inv
dno
noye
sd
du
nono
nominus
din
vu
noye
sno
minusd
ain
vye
sye
sye
sd
din
va
inv
yes
yes
yes
Unio
n(a
invd
inv)
da
ali
asin
vye
sye
sye
sd
din
va
ali
asin
vye
sye
sye
sU
nio
n(d
inva
ali
asin
v)
dd
inv
nono
yes
din
vd
inv
din
vye
sye
sye
sd
inv
du
inv
noye
sno
minusd
inv
uin
vno
yes
nominus
ua
nono
nominus
uin
va
nono
nominus
ua
ali
as
nono
nominus
uin
va
ali
as
nono
nominus
ud
nono
nominus
uin
vd
noye
sno
minusu
uye
sye
sye
su
uin
vu
nono
yes
u
ua
inv
nono
nominus
uin
va
inv
noye
sno
minusu
aali
asin
vno
nono
minusu
inv
aali
asin
vno
yes
nominus
ud
inv
noye
sno
minusu
inv
din
vno
yes
nominus
uu
inv
nono
yes
uin
vu
inv
uin
vye
sye
sye
su
inv
J Hardw Syst Secur (2019) 3397ndash425 421
Reduce Rulerdquo and has no Union result for the ldquoCombinedSteprdquo Column in Table 7
ndash Reduction Rule 1 For two u operations although u isunknown both of the accesses target on the same u socan be reduced to only keep the second access in thememory sequence
ndash Reduction Rule 2 For two known adjacent memoryaccessndashrelated operations (known access operation)they always result in a deterministic state of the secondmemory accessndashrelated cache block so these two stepscan be reduced to only one step
ndash Reduction Rule 3 For two adjacent steps if onestep is known access operation and another one isknown inv operation no matter what order theyhave and the address they refer to is the same these twocan be reduced to one step which is the second step
B423 Union Rules
Suppose there are two adjacent steps M and N for memorysequences M N If combining M andN leads to the same timing observation result ie M N and Union(M N) will havethe same timing observation information in the final step forthe attacker we can combine step M and N to be a joint onestep for this memory sequence defined as Union(M N)Two adjacent steps that can be combined are discussed inthe following cases
ndash Union Rule 1 Two invalidations to two knowndifferent memory addresses can be applied UnionRule 1 known inv operation are two operations bothinvalidating some known address therefore they canbe combined to only one step The Union Rule can becontinuously done to union all the adjacent invalidationstep that invalidates known different memory addresses
B424 Final Check Rules
Each long memory sequence will recursively apply thesethree categorizations of the rules in the order Com-mute Rules first to put known access operations andknown inv operation that targets the same address as nearas possible and u operations and not u operations areputting together as much as possible The Reduced Rulesare then checked and applied to the processed memorysequence to reduce the steps Then the Union Rule isapplied to the processed memory sequence
The recursion at each application to these threecategorizations of the rules should be always applied andreduce at least one step until the resulting sequence matchesone of the two possible cases
bull the long (β gt 3) memory sequence with u operation
and not u operation is further reduced to a sequencewhere there are at most three steps in the followingpatterns or less
ndash u operation not u operation u operation
ndash not u operation u operation not u operation
There might be possible extra or AinvV inv beforethese three-step pattern where
bull An extra in the first step will not influencethe result and can be directly removed
bull If an extra AinvV inv in the first step
ndash If followed by known access ope-ration AinvV inv can be removeddue to the actual state further put intothe cache block
ndash If followed by known inv opera-t ion or V inv
u AinvV inv can also beremoved since the memory locationis repeatedly flushed by the two steps
ndash If followed by Vu worst case will beAinvV inv Vu not u operation u operationwhich is either an effective vul-nerability according to Table 2and reduction rules shown inSection 33 or AinvV inv Vu Ainv
d V invd u operation where
Vu Ainvd V inv
d can further beapplied Commute Rule 2 to reduceand be within three steps
In this case the steps are finally within three steps andthe checking is done
bull There exist two adjacent steps that cannot be appliedany Rules above and requires the Rest Checking
The only left two adjacent steps that cannot be appliedby any of the three categorizations of the Rules are thefollowing
ndash AaVaAaalias Vaalias AdVdAinva V inv
a
Ainvaalias V inv
aalias Vu ndash AaVaAaalias Vaalias V inv
u ndash Vu AaVaAaalias Vaalias AdVd
Ainva V inv
a Ainvaalias V inv
aalias ndash V inv
u AaVaAaalias Vaalias We manually checked all of the two adjacent step patterns
above and found that adding extra step before or after these
J Hardw Syst Secur (2019) 3397ndash425422
Algorithm 2 -Step ( 3) pattern reduction
Input number of steps of the pattern
a two-dimensional dynamic-size array [0] contains the states of each step of the original pattern in
order [1] [2] are empty initially
Output reduced effective vulnerability pattern(s) array It will be an empty list if the original pattern does not
correspond to an effective vulnerability
1 = Oslash
2 while contain( ) and index not 0 do3 = 1 ( )
4 end while5 while ( contain( ) and index not 0) or ( contain( ) and index not 0) do6 = 2 ( )
7 end while8 while ( is ineffecitve or has interval effective three steps) do9 = ( )
10 = ( )
11 = ( )
12 if ( is ineffecitve or has interval effective three steps) then13 += ( )
14 end if15 end while16 return
two steps can either generate two adjacent step patterns thatbe processed by the three Rules where further step can bereduced or construct effective vulnerability according toTable 2 and reduction rules shown in Section 33 wherethe corresponding pattern can be treated effective and thechecking is done
B43 Algorithm for Reducing and CheckingMemorySequence
Algorithm 2 is used to (i) reduce a β-step (β gt 3)pattern to a three-step pattern thus demonstrating that thecorresponding β gt 3 step pattern actually is equivalent tothe output three-step pattern and represents a vulnerabilitythat is captured by an existing three-step pattern or (ii)demonstrate that the β-step pattern can be mapped to one ormore three-step vulnerabilities It is not possible for a β-stepvulnerability pattern to not be either (i) or (ii) after doing theRule applications Key outcome of our analysis is that anyβ-step pattern is not a vulnerability or if it is a vulnerabilityit maps to either outputs (i) or (ii) of the algorithm
Inside the Algorithm 2 contain() represents a function tocheck if a list contains a corresponding state is ineffective()represents a function that checks the corresponding memorysequence does not contain any effective three-stepshas interval effective three steps() represents a functionthat check if the corresponding memory sequence can bemapped to one or more three-step vulnerabilities
B44 Summary
In conclusion the three-step model can model all possibletiming-based cache vulnerability in normal caches Vulner-abilities which are represented by more than three steps canbe always reduced to one (or more) vulnerabilities from ourthree-step model and thus using more than three step is notnecessary
References
1 Acıicmez O Koc CK (2006) Trace-driven cache attacks on AES(short paper) In International conference on information andcommunications security Springer pp 112ndash121
2 Agrawal D Archambeault B Rao JR Rohatgi P (2002) TheEM side-channel (s) In International workshop on cryptographichardware and embedded systems Springer pp 29ndash45
3 Bernstein DJ (2005) Cache-timing attacks on AES4 Binkert N Beckmann B Black G Reinhardt SK Saidi A Basu
A Hestness J Hower DR Krishna T Sardashti S et al (2011)The gem5 simulator ACM SIGARCH computer architecture news39(2)1ndash7
5 Bonneau J Mironov I (2006) Cache-collision timing attacksagainst AES In International workshop on cryptographichardware and embedded systems Springer pp 201ndash215
6 Borghoff J Canteaut A Guneysu T Kavun EB Knezevic MKnudsen LR Leander G Nikov V Paar C Rechberger Cet al (2012) Princendasha low-latency block cipher for pervasivecomputing applications In International conference on the theoryand application of cryptology and information security Springerpp 208ndash225
J Hardw Syst Secur (2019) 3397ndash425 423
7 Bourgeat T Lebedev I Wright A Zhang S Devadas S et al (2019)MI6 Secure enclaves in a speculative out-of-order processorIn Proceedings of the 52nd Annual IEEEACM InternationalSymposium on Microarchitecture ACM pp 42ndash56
8 Chen S Liu F Mi Z Zhang Y Lee RB Chen H Wang X(2018) Leveraging hardware transactional memory for cache side-channel defenses In Proceedings of the 2018 on Asia conferenceon computer and communications security ACM pp 601ndash608
9 Clark SS Ransford B Rahmati A Guineau S Sorber J Xu WFu K (2013) Wattsupdoc power side channels to nonintrusivelydiscover untargeted malware on embedded medical devices InPresented as part of the 2013 USENIX workshop on healthinformation technologies
10 Costan V Lebedev IA Devadas S (2016) Sanctum minimalhardware extensions for strong software isolation In USENIXsecurity symposium pp 857ndash874
11 Deng S Xiong W Szefer J (2018) Cache timing side-channel vulnerability checking with computation tree logic InProceedings of the 7th international workshop on hardware andarchitectural support for security and privacy ACM p 2
12 Domnitser L Jaleel A Loew J Abu-Ghazaleh N Ponomarev D(2012) Non-monopolizable caches low-complexity mitigation ofcache side channel attacks ACM Transactions on Architectureand Code Optimization (TACO) 8(4)35
13 Doychev G Kopf B (2017) Rigorous analysis of softwarecountermeasures against cache attacks In Proceedings of the 38thACM SIGPLAN conference on programming language design andimplementation ACM pp 406ndash421
14 Doychev G Kopf B Mauborgne L Reineke J (2015) CacheAudita tool for the static analysis of cache side channels ACM Trans-actions on information and system security (TISSEC) 18(1)4
15 Gruss D Lettner J Schuster F Ohrimenko O Haller I CostaM (2017) Strong and efficient cache side-channel protectionusing hardware transactional memory In USENIX securitysymposium pp 217ndash233
16 Gruss D Maurice C Wagner K Mangard S (2016) Flush+ flusha fast and stealthy cache attack In International conference ondetection of intrusions and malware and vulnerability assessmentSpringer pp 279ndash299
17 Gruss D Spreitzer R Mangard S (2015) Cache template attacksautomating attacks on inclusive last-level caches In USENIXsecurity symposium pp 897ndash912
18 Guanciale R Nemati H Baumann C Dam M (2016) Cache stor-age channels alias-driven attacks and verified countermeasuresIn 2016 IEEE symposium on security and privacy (SP) IEEEpp 38ndash55
19 Gullasch D Bangerter E Krenn S (2011) Cache gamesndashbringingaccess-based cache attacks on AES to practice In 2011 IEEEsymposium on security and privacy (SP) IEEE pp 490ndash505
20 He Z Lee RB (2017) How secure is your cache against side-channel attacks In Proceedings of the 50th annual IEEEACMinternational symposium on microarchitecture ACM pp 341ndash353
21 Intel C (2015) Improving real-time performance by utilizing cacheallocation technology Intel Corporation
22 Kayaalp M Khasawneh KN Esfeden HA Elwell J Abu-Ghazaleh N Ponomarev D Jaleel A (2017) RIC relaxed inclusioncaches for mitigating LLC side-channel attacks In 2017 54thACMEDACIEEE design automation conference (DAC) IEEEpp 1ndash6
23 Keramidas G Antonopoulos A Serpanos DN Kaxiras S (2008)Non deterministic caches a simple and effective defense againstside channel attacks Des Autom Embed Syst 12(3)221ndash230
24 Kessler RE Hill MD (1992) Page placement algorithms for largerealindexed caches ACM Trans Comput Syst (TOCS) 10(4)338ndash359
25 Kiriansky V Lebedev I Amarasinghe S Devadas S Emer J(2018) DAWG a defense against cache timing attacks in spec-ulative execution processors In 2018 51st Annual IEEEACMinternational symposium on microarchitecture (MICRO) IEEEpp 974ndash987
26 Kocher P Genkin D Gruss D Haas W Hamburg M LippM Mangard S Prescher T Schwarz M Yarom Y (2018)Spectre attacks exploiting speculativeexecution In 2019 IEEESymposium on Security and Privacy (SP) IEEE pp 1ndash19
27 Lee RB Kwan P McGregor JP Dwoskin J Wang Z (2005)Architecture for protecting critical secrets in microprocessorsIn ACM SIGARCH computer architecture news vol 33 IEEEComputer Society pp 2ndash13
28 Lee Y Waterman A Avizienis R Cook H Sun C StojanovicV Asanovic K (2014) A 45nm 13 GHz 167 double-precisionGFLOPSW RISC-V processor with vector accelerators InEuropean solid state circuits conference (ESSCIRC) ESSCIRC2014-40th IEEE pp 199ndash202
29 Liu F Ge Q Yarom Y Mckeen F Rozas C Heiser G Lee RB(2016) CATalyst defeating last-level cache side channel attacks incloud computing In 2016 IEEE international symposium on highperformance computer architecture (HPCA) IEEE pp 406ndash418
30 Liu F Lee RB (2014) Random fill cache architecture In2014 47th annual IEEEACM international symposium onmicroarchitecture (MICRO) IEEE pp 203ndash215
31 Liu F Wu H Mai K Lee RB (2016) Newcache secure cachearchitecture thwarting cache side-channel attacks IEEE Micro36(5)8ndash16
32 Luk CK Cohn R Muth R Patil H Klauser A Lowney G WallaceS Reddi VJ Hazelwood K (2005) Pin building customizedprogram analysis tools with dynamic instrumentation In ACMsigplan notices vol 40 ACM pp 190ndash200
33 Masti RJ Rai D Ranganathan A Muller C Thiele L Capkun S(2015) Thermal covert channels on multi-core platforms In 24thUSENIX security symposium (USENIX security 15) pp 865ndash880
34 Osvik DA Shamir A Tromer E (2006) Cache attacks andcountermeasures the case of AES In Cryptographersrsquo track atthe RSA conference Springer pp 1ndash20
35 Patel A Afram F Chen S Ghose K (2011) MARSS Afull system simulator for multicore x86 CPUs In 2011 48thACMEDACIEEE design automation conference (DAC) IEEEpp 1050ndash1055
36 Percival C (2005) Cache missing for fun and profit BSDCan 2005Ottawa 2005 httpwwwdaemonologynetpapershttpdf
37 Hammarlund P Martinez AJ Bajwa AA Hill DL Hallnor EJiang H Dixon M Derr M Hunsaker M Kumar R Osborne RB(2014) Haswell The fourth-generation intel core processor IEEEMicro 1034(2)6ndash20
38 Qureshi MK (2018) CEASER mitigating conflict-based cacheattacks via encrypted-address and remapping In 2018 51Stannual IEEEACM international symposium on microarchitecture(MICRO) IEEE pp 775ndash787
39 Sanchez D Kozyrakis C (2013) ZSIm fast and accuratemicroarchitectural simulation of thousand-core systems In ACMSIGARCH computer architecture news vol 41 ACM pp 475ndash486
40 Schwarz M Schwarzl M Lipp M Gruss D (2018) NetSpectreread arbitrary memory over network In European Symposium onResearch in Computer Security Springer pp 279ndash299
41 Seznec A (1993) A case for two-way skewed-associative cachesACM SIGARCH computer architecture news 21(2)169ndash178
42 Sharkey J Ponomarev D Ghose K (2005) M-sim a flexiblemultithreaded architectural simulation environment Techenicalreport Department of Computer Science State University ofNew York at Binghamton In Technical Report 20012 CompaqComputer Corporation Aug 2001
J Hardw Syst Secur (2019) 3397ndash425424
43 Shivakumar P Jouppi NP (2001) Cacti 30 an integrated cachetiming power and area model In Technical Report 20012Compaq Computer Corporation Aug 2001
44 Taylor G Davies P Farmwald M (1990) The TLB slice-alow-cost high-speed address translation mechanism In 17thannual international symposium on computer architecture 1990Proceedings IEEE pp 355ndash363
45 Thoziyoor S Muralimanohar N Ahn JH Jouppi NP (2008)CACTI 51 Technical Report HPL-2008-20 HP Labs
46 Trippel C Lustig D Martonosi M (2018) MeltdownPrimeand SpectrePrime automatically-synthesized attacks exploitinginvalidation-based coherence protocols arXiv180203802
47 Wang S Wang P Liu X Zhang D Wu D (2017) CacheDidentifying cache-based timing channels in production softwareIn 26th USENIX security symposium USENIX association
48 Wang Y Ferraiuolo A Zhang D Myers AC Suh GE (2016)SecDCP secure dynamic cache partitioning for efficient timingchannel protection In 2016 53nd ACMEDACIEEE designautomation conference (DAC) IEEE pp 1ndash6
49 Wang Z Lee RB (2007) New cache designs for thwarting softwarecache-based side channel attacks In ACM SIGARCH computerarchitecture news vol 35 ACM pp 494ndash505
50 Wang Z Lee RB (2008) A novel cache architecture withenhanced performance and security In 2008 41st IEEEACMinternational symposium on Microarchitecture 2008 MICRO-41IEEE pp 83ndash93
51 Werner M Unterluggauer T Giner L Schwarz M Gruss D Man-gard S (2019) ScatterCache thwarting cache attacks via cache setrandomization In 28th USENIX security symposium (USENIXSecurity 19) USENIX Association Santa Clara CA httpswwwusenixorgconferenceusenixsecurity19presentationwerner
52 Yan M Choi J Skarlatos D Morrison A Fletcher C Torrellas J(2018) InvisiSpec making speculative execution invisible in thecache hierarchy In 2018 51st annual IEEEACM internationalsymposium on microarchitecture (MICRO) IEEE pp 428ndash441
53 Yan M Gopireddy B Shull T Torrellas J (2017) Secure hierarchy-aware cache replacement policy (SHARP) defending againstcache-based side channel attacks In Proceedings of the 44thannual international symposium on computer architecture ACMpp 347ndash360
54 Yao F Doroslovacki M Venkataramani G (2018) Are coherenceprotocol states vulnerable to information leakage In 2018IEEE international symposium on high performance computerarchitecture (HPCA) IEEE pp 168ndash179
55 Yarom Y Falkner K (2014) FLUSH+ RELOAD a high resolutionlow noise L3 cache side-channel attack In USENIX securitysymposium pp 719ndash732
56 Zhang D Askarov A Myers AC (2012) Language-based controland mitigation of timing channels ACM SIGPLAN Not 47(6)99ndash110
57 Zhang D Wang Y Suh GE Myers AC (2015) A hardware designlanguage for timing-sensitive information-flow security In ACMSIGARCH computer architecture news vol 43 ACM pp 503ndash516
58 Zhang S Wright A Bourgeat T Arvind A (2018) Composablebuilding blocks to open up processor design In 2018 51stannual IEEEACM international symposium on microarchitecture(MICRO) IEEE pp 68ndash81
59 Zhang T Lee RB (2014) New models of cache architecturescharacterizing information leakage from cache side channels InProceedings of the 30th annual computer security applicationsconference ACM pp 96ndash105
60 Zhang Y Parikh D Sankaranarayanan K Skadron K Stan M(2003) HotLeakage a temperature-aware model of subthresholdand gate leakage for architects University of Virginia Dept ofComputer Science Tech Report CS-2003 5
Publisherrsquos Note Springer Nature remains neutral with regard tojurisdictional claims in published maps and institutional affiliations
J Hardw Syst Secur (2019) 3397ndash425 425
solution targets system calls page faults code refactoringand abort reasoning to eliminate not only Prime + ProbeFlush + Reload but also Evict + time and Cache Collisionattacks
8 Conclusion
This paper first proposed a new three-step model in orderto model all possible cache timing vulnerabilities It furtherprovided a cache three-step simulator and reduction rulesto derive effective vulnerabilities allowing us to find onesthat have not been exploited in literature With exhaustiveeffective vulnerability types listed this paper presentedanalysis of 18 secure processor cache designs with respectto how well they can defend against these timing-basedvulnerabilities Our work showed that vulnerabilities basedon internal interference of the victim application aredifficult to protect against and many secure cache designsfail in this We also provided a summary of secure processorcache features that could be integrated to make an idealsecure cache that is able to defend timing-based attacksOverall implementing a secure cache in a processor can be aviable alternative to defend timing-based attacks Howeverit requires design of an ideal secure cache or correction ofexisting secure cache designs to eliminate the few attacksthat they do not protect against
Acknowledgments This work was support in part by NSF grants1651945 and 1813797 and through Semiconductor Research Corpo-ration (SRC) under GRC Task 2844001
Appendix A Attack Strategies Descriptions
This appendix gives overview of the attack strategies shownin Tables 2 and 3 in Section 3 For each attack strategy anoverview of the three steps of the strategy is given Some ofthe strategies are similar and some may not be precise butwe keep and use the original names as they were assignedin prior work One advantage of our three-step model isthat it gives precise definition of each attack Neverthelessthe attack strategy names used before (and added by us forstrategies which did not have such names) may be useful torecall the attacksrsquo high-level operation
Cache Internal Collision In Step 1 cache blockrsquos data isinvalidated by flushing or eviction done by either theattacker or the victim Then the victim accesses secret datain Step 2 Finally the victim accesses data at a knownaddress in Step 3 if there is a cache hit then it reveals thatthere is an internal collision and leaks value of u
Flush + Reload In Step 1 either the attacker or the victiminvalidates the cache blockrsquos data by flushing or evictionThen the victim access secret data in Step 2 Finally theattacker tries to access some data in Step 2 using a knownaddress If a cache hit is observed then addresses from lasttwo steps are the same and the attacker learns the secretaddress This strategy has similar Step 1 and Step 2 asCache Internal Collision vulnerability but for Step 3 it isthe attacker who does the reload access
Reload + Time (New Name Assigned in this Paper) In Step
1 secret data is invalidated by the victim Then the attackerdoes some known data access in Step 2 that could possiblybring back the invalidated the victimrsquos secret data in Step
1 In Step 3 if the victim reloads the secret data a cachehit is observed and the attacker can derive the secret datarsquosaddress
Flush + Probe (New Name Assigned in this Paper) In Step
1 the victim or the attacker access some known addressIn Step 2 the victim invalidates secret data In Step 3reloading of Step 1rsquos data and observation of a cache misswill help the attacker learn that the secret data maps to theknown address from Step 1
Evict + Time In Step 1 some victimrsquos secret data is put intothe cache by the victim itself In Step 2 the attacker evicts aspecific cache set by performing a memory related operationthat is not a flush In Step 3 the victim reloads secret dataand if a cache miss is observed the will learn the secretdatarsquos cache set information This attack has similar Step 1and Step 3 as Flush + Time vulnerability but for Step 2 inEvict + Time the attacker invalidates some known addressallowing it to find the full address of the secret data insteadof evicting a cache set to only find the secret datarsquos cacheindex as in the Flush + Time attack
Prime + Probe In Step 1 the attacker primes the cache setusing data at address known to the attacker In Step 2 thevictim accesses the secret data which possibly evicts datafrom Step 1 In Step 3 the attacker probes each cache setand if a cache miss is observed the attacker knowns thesecret data maps to the cache set he or she primed
Bernsteinrsquos Attack This attack strategy leverages the vic-timrsquos internal interference to trigger the miss-based attackFor one case the victim does the same secret data access inStep 1 and Step 3 while in Step 2 the victim tries to evictone whole cache setrsquos data by known data accesses If cachemiss is observed in Step 3 that will tell the attacker thecache set is the one secret data maps to For another casethe victim primes and probe a cache set in Step 1 and Step
3 driven by the attacker while in Step 2 the victim tries to
J Hardw Syst Secur (2019) 3397ndash425418
access the secret data Similar to the first case observingcache miss in Step 3 tells the attacker the cache set is theone secret data maps to
Evict + Probe (NewName Assigned in this Paper) In Step 1victim evict the cache set using the access to a data at anaddress known to the attacker In Step 2 the victim accessessecret data which possibly evicts data from Step 1 In Step
3 the attacker probes each cache set using the same data asin Step 1 if a cache miss is observed the attacker knownsthe secret data maps to the cache set he or she primed Thisattack strategy has similar Step 2 and Step 3 as Prime +Probe attack but for Step 1 it is the victim that does theeviction accesses
Prime + Time (NewName Assigned in this Paper) In Step 1the attacker primes the cache set using access to data at anaddress known to the attacker In Step 2 the victim accessessecret data which possibly evicts data from Step 1 In Step
3 the victim probes each cache set using the same data Step
1 if a cache miss is observed the attacker knowns the secretdata maps to the cache set he or she primed in Step 1 Thisattack strategy has similar Step 1 and Step 2 as Prime +Probe attack but for Step 3 it is the victim that does theprobing accesses
Flush + Time (NewNameAssigned in this Paper) The victimaccesses the same secret data in Step 1 and Step 3 whilein Step 2 the attacker tries to invalidate data at a knownaddress If cache miss is observed in Step 3 that will tellthe attacker the data address he or she invalidated in Step 2maps to the secret data
Invalidation related (new names assigned in this paper)Vulnerabilities that have names ending with ldquoinvalidationrdquoin Table 3 correspond to the vulnerabilities that havethe same name (except for the ldquoinvalidationrdquo part) inTable 2 The difference between each set of correspondingvulnerabilities is that the vulnerabilities ending withldquoinvalidationrdquo use invalidation related operation in the laststep to derive the timing information rather than the normalmemory access related operations
Appendix B Soundness Analysisof the Three-StepModel
In this section we analyze the soundness of the three-stepmodel to demonstrate that the three-step model can coverall possible timing-based cache vulnerabilities in normalcaches If there is a vulnerability that is represented usingmore than three steps the steps can be reduced to only three
steps or a three-step sub-pattern can be found in the longerrepresentation
In the below analysis we use β to denote the numberof memory-related operations ie steps in a representationof a vulnerability We show that β = 1 is not sufficient torepresent a vulnerability β = 2 covers some vulnerabilitiesbut not all β = 3 represents all the vulnerabilities andβ gt 3 can be reduced to only three steps or a three-step sub-pattern can be found in the longer representation Knownaddresses refer to all the cache states that interferencewith the data a aalias and d Unknown address refersu An access to a known memory address is denoted asknown access operation and an invalidation of a knownmemory address is denoted as known inv operationThe known access operation and known inv operation
together make up not u operations An unknown memoryrelated operation (containing u) is denoted as u operation
B1 Patterns with β = 1
When β = 1 there is only one memory-relatedoperation and it is not possible to create interferencebetween memory-related operations since two memory-related operations are the minimum requirement for aninterference Furthermore β = 1 corresponds to the three-step pattern with both Step 1 and Step 2 being since thecache state gives no information and Step 3 being theone operation These types of patterns are all examined bythe cache three-step simulator and none of these types arefound to be effective Consequently a vulnerability cannotexit when β = 1
B2 Patterns with β = 2
When β = 2 it satisfies the minimum requirementof an interference for memory related operations andcorresponds to the three-step cases where Step 1 is and Step 2 and Step 3 are the two operations Thesetypes are all examined by the cache three-step simulatorand some of them belong to Weak Vulnerabilities like Aa Vu Therefore three-step cases where Step
1 is have corresponding effective vulnerabilities shownin Table 2 Consequently β = 2 can represent someweak vulnerabilities but not all vulnerabilities as there existsome that are represented with three steps as discussednext
B3 Patterns with β = 3
When β = 3 we have tested all possible combinations ofthree-step memory related operations in Section 33 usingour cache simulator for the three-step model We foundthat there are in total 72 types of Strong Vulnerabilities
J Hardw Syst Secur (2019) 3397ndash425 419
and 64 types of Weak Vulnerabilities that are representedby patterns with β = 3 steps Consequently β = 3can represent all the vulnerabilities (including some weakones where Step 1 is ) Using more steps to representvulnerabilities is not necessary as discussed next
B4 Patterns with β gt 3
When β gt 3 the pattern of memory-related operations fora vulnerability can be reduced using the following rules
B41 Subdivision Rules
First a set of subdivision rules is used to divide the longpattern into shorter patterns following the below rulesEach subdivision rule should be applied recursively beforeapplying the next rule
Subdivision Rule 1 If the longer pattern contains a sub-pattern such as the longer pattern canbe divided into two separate patterns where is assignedas Step 1 of the second pattern This is because givesno timing information and the attacker loses track of thecache state after This rule should be recursively applieduntil there are no sub-patterns left with a in the middleor as last step ( in the last step will be deleted) in thelonger pattern
Subdivision Rule 2 Next if a pattern (derived afterrecursive application of the Rule 1 contains a sub-patternsuch as AinvVinv the longer pattern canbe divided into two separate patterns where AinvVinv isassigned as Step 1 of the second pattern This is becauseAinvVinv will flush all the timing information of thecurrent block and it can be used as the flushing step forStep 1 eg vulnerability Ainv Vu Aa(f ast)shown in Table 2 AinvVinv cannot be a candidate formiddle steps or the last step because it will flush alltiming information making the attacker unable to deducethe final timing with victimrsquos sensitive address translationinformation This rule should be recursively applied untilthere are no sub-patterns left with a AinvVinv in themiddle or the last step (AinvVinv in the last step will bedeleted)
B42 Simplification Rules
For each of the patterns resulting from the subdivisionof the original pattern we define Commute Rules UnionRules and Reduction Rules for a each set of two adjacentsteps in these remaining patterns In Table 7 we showall the possible cases of the rule applying conditions for
each adjacent two steps regardless of the attackerrsquos access(A) or the victimrsquos access (V ) The table shows whetherthe corresponding two steps can be commuted reduced orunioned (and the reduced or the unioned result if the rulescan be applied)
B421 Commute Rules
Suppose there are two adjacent steps M and N for a memorysequences M N If commuting M andN lead to the same observation result ie M N and N M will have thesame timing observation information in the final step for theattacker we can freely exchange the place of M and N inthis pattern In this case we have more chance to Reduceand Union the steps within the memory sequence by thefollowing Rules In the possible commuting process we willtry every possible combinations to commute different pairsof two steps that are able to apply the Commute Rules andthen further apply Reduce Rules and Union Rules to seewhether the commute is effective ie there can be stepsreduced or unioned after the proper commuting process Thefollowing two adjacent memory-related operations can becommuted
ndash Commute Rule 1 For two adjacent steps if one stepis a known access operation and another step is aknown inv operation and the addresses they referto are different these two steps can be commuted nomatter which position of the two steps they are in withinthe whole memory sequence It will show a ldquoyesrdquo forthe corresponding two-step pattern for the CommuteRule 1 column if these two can be commuted in Table 7
ndash Commute Rule 2 A superset of two-step patterns thatcan apply Commute Rule 1 can be commuted if thesecond step of these two adjacent steps is not thelast step in the whole memory sequence There aresome two adjacent steps that can only be commutedif the second step of these two adjacent steps is notthe last step in the whole memory sequence Therewill be a ldquoyesrdquo for the corresponding two-step patternfor the Commute Rule 2 column and a ldquonordquo for thecorresponding two-step pattern for the Commute Rule 1column in Table 7
B422 Reduction Rules
If the memory sequence after applying Commute Ruleshave a sub-pattern that has two adjacent steps both relatedto known addresses or both related to unknown address(including repeating states) the two adjacent steps can bereduced to only one following the reduction rules (if thetwo-step pattern has ldquoyesrdquo for the Column ldquoUnion Rule or
J Hardw Syst Secur (2019) 3397ndash425420
Table7
Rul
esfo
rco
mbi
ning
two
adja
cent
step
s
Firs
tSe
cond
Com
mute
Com
mute
Union
Rule
orC
ombi
ned
Firs
tSe
cond
Com
mute
Com
mute
Union
Rule
orC
ombi
ned
Rule1
Rule2
ReduceRule
Step
Rule1
Rule2
ReduceRule
Step
aa
yes
yes
yes
aa
inv
ano
noye
sa
aa
ali
as
nono
yes
aali
as
ain
va
ali
as
yes
yes
yes
aali
as
ad
nono
yes
da
inv
dye
sye
sye
sd
au
nono
nominus
ain
vu
nono
nominus
aa
inv
nono
yes
ain
va
inv
ain
vye
sye
sye
sa
inv
aa
ali
asin
vye
sye
sye
sa
ain
va
ali
asin
vye
sye
sye
sU
nio
n(a
inva
ali
asin
v)
ad
inv
yes
yes
yes
aa
inv
din
vye
sye
sye
sU
nio
n(a
invd
inv)
au
inv
nono
nominus
ain
vu
inv
noye
sno
minusa
ali
as
ano
noye
sa
aali
asin
va
yes
yes
yes
a
aali
as
aali
as
yes
yes
yes
aali
as
aali
asin
va
ali
as
nono
yes
aali
as
aali
as
dno
noye
sd
aali
asin
vd
yes
yes
yes
d
aali
as
uno
nono
minusa
ali
asin
vu
nono
nominus
aali
as
ain
vye
sye
sye
sa
ali
as
aali
asin
va
inv
yes
yes
yes
Unio
n(a
inva
ali
asin
v)
aali
as
aali
asin
vno
noye
sa
ali
asin
va
ali
asin
va
ali
asin
vye
sye
sye
sa
ali
asin
v
aali
as
din
vye
sye
sye
sa
ali
as
aali
asin
vd
inv
yes
yes
yes
Unio
n(d
inva
ali
asin
v)
aali
as
uin
vno
nono
minusa
ali
asin
vu
inv
noye
sno
minusd
ano
noye
sa
din
va
yes
yes
yes
a
da
ali
as
nono
yes
aali
as
din
va
ali
as
yes
yes
yes
aali
as
dd
yes
yes
yes
dd
inv
dno
noye
sd
du
nono
nominus
din
vu
noye
sno
minusd
ain
vye
sye
sye
sd
din
va
inv
yes
yes
yes
Unio
n(a
invd
inv)
da
ali
asin
vye
sye
sye
sd
din
va
ali
asin
vye
sye
sye
sU
nio
n(d
inva
ali
asin
v)
dd
inv
nono
yes
din
vd
inv
din
vye
sye
sye
sd
inv
du
inv
noye
sno
minusd
inv
uin
vno
yes
nominus
ua
nono
nominus
uin
va
nono
nominus
ua
ali
as
nono
nominus
uin
va
ali
as
nono
nominus
ud
nono
nominus
uin
vd
noye
sno
minusu
uye
sye
sye
su
uin
vu
nono
yes
u
ua
inv
nono
nominus
uin
va
inv
noye
sno
minusu
aali
asin
vno
nono
minusu
inv
aali
asin
vno
yes
nominus
ud
inv
noye
sno
minusu
inv
din
vno
yes
nominus
uu
inv
nono
yes
uin
vu
inv
uin
vye
sye
sye
su
inv
J Hardw Syst Secur (2019) 3397ndash425 421
Reduce Rulerdquo and has no Union result for the ldquoCombinedSteprdquo Column in Table 7
ndash Reduction Rule 1 For two u operations although u isunknown both of the accesses target on the same u socan be reduced to only keep the second access in thememory sequence
ndash Reduction Rule 2 For two known adjacent memoryaccessndashrelated operations (known access operation)they always result in a deterministic state of the secondmemory accessndashrelated cache block so these two stepscan be reduced to only one step
ndash Reduction Rule 3 For two adjacent steps if onestep is known access operation and another one isknown inv operation no matter what order theyhave and the address they refer to is the same these twocan be reduced to one step which is the second step
B423 Union Rules
Suppose there are two adjacent steps M and N for memorysequences M N If combining M andN leads to the same timing observation result ie M N and Union(M N) will havethe same timing observation information in the final step forthe attacker we can combine step M and N to be a joint onestep for this memory sequence defined as Union(M N)Two adjacent steps that can be combined are discussed inthe following cases
ndash Union Rule 1 Two invalidations to two knowndifferent memory addresses can be applied UnionRule 1 known inv operation are two operations bothinvalidating some known address therefore they canbe combined to only one step The Union Rule can becontinuously done to union all the adjacent invalidationstep that invalidates known different memory addresses
B424 Final Check Rules
Each long memory sequence will recursively apply thesethree categorizations of the rules in the order Com-mute Rules first to put known access operations andknown inv operation that targets the same address as nearas possible and u operations and not u operations areputting together as much as possible The Reduced Rulesare then checked and applied to the processed memorysequence to reduce the steps Then the Union Rule isapplied to the processed memory sequence
The recursion at each application to these threecategorizations of the rules should be always applied andreduce at least one step until the resulting sequence matchesone of the two possible cases
bull the long (β gt 3) memory sequence with u operation
and not u operation is further reduced to a sequencewhere there are at most three steps in the followingpatterns or less
ndash u operation not u operation u operation
ndash not u operation u operation not u operation
There might be possible extra or AinvV inv beforethese three-step pattern where
bull An extra in the first step will not influencethe result and can be directly removed
bull If an extra AinvV inv in the first step
ndash If followed by known access ope-ration AinvV inv can be removeddue to the actual state further put intothe cache block
ndash If followed by known inv opera-t ion or V inv
u AinvV inv can also beremoved since the memory locationis repeatedly flushed by the two steps
ndash If followed by Vu worst case will beAinvV inv Vu not u operation u operationwhich is either an effective vul-nerability according to Table 2and reduction rules shown inSection 33 or AinvV inv Vu Ainv
d V invd u operation where
Vu Ainvd V inv
d can further beapplied Commute Rule 2 to reduceand be within three steps
In this case the steps are finally within three steps andthe checking is done
bull There exist two adjacent steps that cannot be appliedany Rules above and requires the Rest Checking
The only left two adjacent steps that cannot be appliedby any of the three categorizations of the Rules are thefollowing
ndash AaVaAaalias Vaalias AdVdAinva V inv
a
Ainvaalias V inv
aalias Vu ndash AaVaAaalias Vaalias V inv
u ndash Vu AaVaAaalias Vaalias AdVd
Ainva V inv
a Ainvaalias V inv
aalias ndash V inv
u AaVaAaalias Vaalias We manually checked all of the two adjacent step patterns
above and found that adding extra step before or after these
J Hardw Syst Secur (2019) 3397ndash425422
Algorithm 2 -Step ( 3) pattern reduction
Input number of steps of the pattern
a two-dimensional dynamic-size array [0] contains the states of each step of the original pattern in
order [1] [2] are empty initially
Output reduced effective vulnerability pattern(s) array It will be an empty list if the original pattern does not
correspond to an effective vulnerability
1 = Oslash
2 while contain( ) and index not 0 do3 = 1 ( )
4 end while5 while ( contain( ) and index not 0) or ( contain( ) and index not 0) do6 = 2 ( )
7 end while8 while ( is ineffecitve or has interval effective three steps) do9 = ( )
10 = ( )
11 = ( )
12 if ( is ineffecitve or has interval effective three steps) then13 += ( )
14 end if15 end while16 return
two steps can either generate two adjacent step patterns thatbe processed by the three Rules where further step can bereduced or construct effective vulnerability according toTable 2 and reduction rules shown in Section 33 wherethe corresponding pattern can be treated effective and thechecking is done
B43 Algorithm for Reducing and CheckingMemorySequence
Algorithm 2 is used to (i) reduce a β-step (β gt 3)pattern to a three-step pattern thus demonstrating that thecorresponding β gt 3 step pattern actually is equivalent tothe output three-step pattern and represents a vulnerabilitythat is captured by an existing three-step pattern or (ii)demonstrate that the β-step pattern can be mapped to one ormore three-step vulnerabilities It is not possible for a β-stepvulnerability pattern to not be either (i) or (ii) after doing theRule applications Key outcome of our analysis is that anyβ-step pattern is not a vulnerability or if it is a vulnerabilityit maps to either outputs (i) or (ii) of the algorithm
Inside the Algorithm 2 contain() represents a function tocheck if a list contains a corresponding state is ineffective()represents a function that checks the corresponding memorysequence does not contain any effective three-stepshas interval effective three steps() represents a functionthat check if the corresponding memory sequence can bemapped to one or more three-step vulnerabilities
B44 Summary
In conclusion the three-step model can model all possibletiming-based cache vulnerability in normal caches Vulner-abilities which are represented by more than three steps canbe always reduced to one (or more) vulnerabilities from ourthree-step model and thus using more than three step is notnecessary
References
1 Acıicmez O Koc CK (2006) Trace-driven cache attacks on AES(short paper) In International conference on information andcommunications security Springer pp 112ndash121
2 Agrawal D Archambeault B Rao JR Rohatgi P (2002) TheEM side-channel (s) In International workshop on cryptographichardware and embedded systems Springer pp 29ndash45
3 Bernstein DJ (2005) Cache-timing attacks on AES4 Binkert N Beckmann B Black G Reinhardt SK Saidi A Basu
A Hestness J Hower DR Krishna T Sardashti S et al (2011)The gem5 simulator ACM SIGARCH computer architecture news39(2)1ndash7
5 Bonneau J Mironov I (2006) Cache-collision timing attacksagainst AES In International workshop on cryptographichardware and embedded systems Springer pp 201ndash215
6 Borghoff J Canteaut A Guneysu T Kavun EB Knezevic MKnudsen LR Leander G Nikov V Paar C Rechberger Cet al (2012) Princendasha low-latency block cipher for pervasivecomputing applications In International conference on the theoryand application of cryptology and information security Springerpp 208ndash225
J Hardw Syst Secur (2019) 3397ndash425 423
7 Bourgeat T Lebedev I Wright A Zhang S Devadas S et al (2019)MI6 Secure enclaves in a speculative out-of-order processorIn Proceedings of the 52nd Annual IEEEACM InternationalSymposium on Microarchitecture ACM pp 42ndash56
8 Chen S Liu F Mi Z Zhang Y Lee RB Chen H Wang X(2018) Leveraging hardware transactional memory for cache side-channel defenses In Proceedings of the 2018 on Asia conferenceon computer and communications security ACM pp 601ndash608
9 Clark SS Ransford B Rahmati A Guineau S Sorber J Xu WFu K (2013) Wattsupdoc power side channels to nonintrusivelydiscover untargeted malware on embedded medical devices InPresented as part of the 2013 USENIX workshop on healthinformation technologies
10 Costan V Lebedev IA Devadas S (2016) Sanctum minimalhardware extensions for strong software isolation In USENIXsecurity symposium pp 857ndash874
11 Deng S Xiong W Szefer J (2018) Cache timing side-channel vulnerability checking with computation tree logic InProceedings of the 7th international workshop on hardware andarchitectural support for security and privacy ACM p 2
12 Domnitser L Jaleel A Loew J Abu-Ghazaleh N Ponomarev D(2012) Non-monopolizable caches low-complexity mitigation ofcache side channel attacks ACM Transactions on Architectureand Code Optimization (TACO) 8(4)35
13 Doychev G Kopf B (2017) Rigorous analysis of softwarecountermeasures against cache attacks In Proceedings of the 38thACM SIGPLAN conference on programming language design andimplementation ACM pp 406ndash421
14 Doychev G Kopf B Mauborgne L Reineke J (2015) CacheAudita tool for the static analysis of cache side channels ACM Trans-actions on information and system security (TISSEC) 18(1)4
15 Gruss D Lettner J Schuster F Ohrimenko O Haller I CostaM (2017) Strong and efficient cache side-channel protectionusing hardware transactional memory In USENIX securitysymposium pp 217ndash233
16 Gruss D Maurice C Wagner K Mangard S (2016) Flush+ flusha fast and stealthy cache attack In International conference ondetection of intrusions and malware and vulnerability assessmentSpringer pp 279ndash299
17 Gruss D Spreitzer R Mangard S (2015) Cache template attacksautomating attacks on inclusive last-level caches In USENIXsecurity symposium pp 897ndash912
18 Guanciale R Nemati H Baumann C Dam M (2016) Cache stor-age channels alias-driven attacks and verified countermeasuresIn 2016 IEEE symposium on security and privacy (SP) IEEEpp 38ndash55
19 Gullasch D Bangerter E Krenn S (2011) Cache gamesndashbringingaccess-based cache attacks on AES to practice In 2011 IEEEsymposium on security and privacy (SP) IEEE pp 490ndash505
20 He Z Lee RB (2017) How secure is your cache against side-channel attacks In Proceedings of the 50th annual IEEEACMinternational symposium on microarchitecture ACM pp 341ndash353
21 Intel C (2015) Improving real-time performance by utilizing cacheallocation technology Intel Corporation
22 Kayaalp M Khasawneh KN Esfeden HA Elwell J Abu-Ghazaleh N Ponomarev D Jaleel A (2017) RIC relaxed inclusioncaches for mitigating LLC side-channel attacks In 2017 54thACMEDACIEEE design automation conference (DAC) IEEEpp 1ndash6
23 Keramidas G Antonopoulos A Serpanos DN Kaxiras S (2008)Non deterministic caches a simple and effective defense againstside channel attacks Des Autom Embed Syst 12(3)221ndash230
24 Kessler RE Hill MD (1992) Page placement algorithms for largerealindexed caches ACM Trans Comput Syst (TOCS) 10(4)338ndash359
25 Kiriansky V Lebedev I Amarasinghe S Devadas S Emer J(2018) DAWG a defense against cache timing attacks in spec-ulative execution processors In 2018 51st Annual IEEEACMinternational symposium on microarchitecture (MICRO) IEEEpp 974ndash987
26 Kocher P Genkin D Gruss D Haas W Hamburg M LippM Mangard S Prescher T Schwarz M Yarom Y (2018)Spectre attacks exploiting speculativeexecution In 2019 IEEESymposium on Security and Privacy (SP) IEEE pp 1ndash19
27 Lee RB Kwan P McGregor JP Dwoskin J Wang Z (2005)Architecture for protecting critical secrets in microprocessorsIn ACM SIGARCH computer architecture news vol 33 IEEEComputer Society pp 2ndash13
28 Lee Y Waterman A Avizienis R Cook H Sun C StojanovicV Asanovic K (2014) A 45nm 13 GHz 167 double-precisionGFLOPSW RISC-V processor with vector accelerators InEuropean solid state circuits conference (ESSCIRC) ESSCIRC2014-40th IEEE pp 199ndash202
29 Liu F Ge Q Yarom Y Mckeen F Rozas C Heiser G Lee RB(2016) CATalyst defeating last-level cache side channel attacks incloud computing In 2016 IEEE international symposium on highperformance computer architecture (HPCA) IEEE pp 406ndash418
30 Liu F Lee RB (2014) Random fill cache architecture In2014 47th annual IEEEACM international symposium onmicroarchitecture (MICRO) IEEE pp 203ndash215
31 Liu F Wu H Mai K Lee RB (2016) Newcache secure cachearchitecture thwarting cache side-channel attacks IEEE Micro36(5)8ndash16
32 Luk CK Cohn R Muth R Patil H Klauser A Lowney G WallaceS Reddi VJ Hazelwood K (2005) Pin building customizedprogram analysis tools with dynamic instrumentation In ACMsigplan notices vol 40 ACM pp 190ndash200
33 Masti RJ Rai D Ranganathan A Muller C Thiele L Capkun S(2015) Thermal covert channels on multi-core platforms In 24thUSENIX security symposium (USENIX security 15) pp 865ndash880
34 Osvik DA Shamir A Tromer E (2006) Cache attacks andcountermeasures the case of AES In Cryptographersrsquo track atthe RSA conference Springer pp 1ndash20
35 Patel A Afram F Chen S Ghose K (2011) MARSS Afull system simulator for multicore x86 CPUs In 2011 48thACMEDACIEEE design automation conference (DAC) IEEEpp 1050ndash1055
36 Percival C (2005) Cache missing for fun and profit BSDCan 2005Ottawa 2005 httpwwwdaemonologynetpapershttpdf
37 Hammarlund P Martinez AJ Bajwa AA Hill DL Hallnor EJiang H Dixon M Derr M Hunsaker M Kumar R Osborne RB(2014) Haswell The fourth-generation intel core processor IEEEMicro 1034(2)6ndash20
38 Qureshi MK (2018) CEASER mitigating conflict-based cacheattacks via encrypted-address and remapping In 2018 51Stannual IEEEACM international symposium on microarchitecture(MICRO) IEEE pp 775ndash787
39 Sanchez D Kozyrakis C (2013) ZSIm fast and accuratemicroarchitectural simulation of thousand-core systems In ACMSIGARCH computer architecture news vol 41 ACM pp 475ndash486
40 Schwarz M Schwarzl M Lipp M Gruss D (2018) NetSpectreread arbitrary memory over network In European Symposium onResearch in Computer Security Springer pp 279ndash299
41 Seznec A (1993) A case for two-way skewed-associative cachesACM SIGARCH computer architecture news 21(2)169ndash178
42 Sharkey J Ponomarev D Ghose K (2005) M-sim a flexiblemultithreaded architectural simulation environment Techenicalreport Department of Computer Science State University ofNew York at Binghamton In Technical Report 20012 CompaqComputer Corporation Aug 2001
J Hardw Syst Secur (2019) 3397ndash425424
43 Shivakumar P Jouppi NP (2001) Cacti 30 an integrated cachetiming power and area model In Technical Report 20012Compaq Computer Corporation Aug 2001
44 Taylor G Davies P Farmwald M (1990) The TLB slice-alow-cost high-speed address translation mechanism In 17thannual international symposium on computer architecture 1990Proceedings IEEE pp 355ndash363
45 Thoziyoor S Muralimanohar N Ahn JH Jouppi NP (2008)CACTI 51 Technical Report HPL-2008-20 HP Labs
46 Trippel C Lustig D Martonosi M (2018) MeltdownPrimeand SpectrePrime automatically-synthesized attacks exploitinginvalidation-based coherence protocols arXiv180203802
47 Wang S Wang P Liu X Zhang D Wu D (2017) CacheDidentifying cache-based timing channels in production softwareIn 26th USENIX security symposium USENIX association
48 Wang Y Ferraiuolo A Zhang D Myers AC Suh GE (2016)SecDCP secure dynamic cache partitioning for efficient timingchannel protection In 2016 53nd ACMEDACIEEE designautomation conference (DAC) IEEE pp 1ndash6
49 Wang Z Lee RB (2007) New cache designs for thwarting softwarecache-based side channel attacks In ACM SIGARCH computerarchitecture news vol 35 ACM pp 494ndash505
50 Wang Z Lee RB (2008) A novel cache architecture withenhanced performance and security In 2008 41st IEEEACMinternational symposium on Microarchitecture 2008 MICRO-41IEEE pp 83ndash93
51 Werner M Unterluggauer T Giner L Schwarz M Gruss D Man-gard S (2019) ScatterCache thwarting cache attacks via cache setrandomization In 28th USENIX security symposium (USENIXSecurity 19) USENIX Association Santa Clara CA httpswwwusenixorgconferenceusenixsecurity19presentationwerner
52 Yan M Choi J Skarlatos D Morrison A Fletcher C Torrellas J(2018) InvisiSpec making speculative execution invisible in thecache hierarchy In 2018 51st annual IEEEACM internationalsymposium on microarchitecture (MICRO) IEEE pp 428ndash441
53 Yan M Gopireddy B Shull T Torrellas J (2017) Secure hierarchy-aware cache replacement policy (SHARP) defending againstcache-based side channel attacks In Proceedings of the 44thannual international symposium on computer architecture ACMpp 347ndash360
54 Yao F Doroslovacki M Venkataramani G (2018) Are coherenceprotocol states vulnerable to information leakage In 2018IEEE international symposium on high performance computerarchitecture (HPCA) IEEE pp 168ndash179
55 Yarom Y Falkner K (2014) FLUSH+ RELOAD a high resolutionlow noise L3 cache side-channel attack In USENIX securitysymposium pp 719ndash732
56 Zhang D Askarov A Myers AC (2012) Language-based controland mitigation of timing channels ACM SIGPLAN Not 47(6)99ndash110
57 Zhang D Wang Y Suh GE Myers AC (2015) A hardware designlanguage for timing-sensitive information-flow security In ACMSIGARCH computer architecture news vol 43 ACM pp 503ndash516
58 Zhang S Wright A Bourgeat T Arvind A (2018) Composablebuilding blocks to open up processor design In 2018 51stannual IEEEACM international symposium on microarchitecture(MICRO) IEEE pp 68ndash81
59 Zhang T Lee RB (2014) New models of cache architecturescharacterizing information leakage from cache side channels InProceedings of the 30th annual computer security applicationsconference ACM pp 96ndash105
60 Zhang Y Parikh D Sankaranarayanan K Skadron K Stan M(2003) HotLeakage a temperature-aware model of subthresholdand gate leakage for architects University of Virginia Dept ofComputer Science Tech Report CS-2003 5
Publisherrsquos Note Springer Nature remains neutral with regard tojurisdictional claims in published maps and institutional affiliations
J Hardw Syst Secur (2019) 3397ndash425 425
access the secret data Similar to the first case observingcache miss in Step 3 tells the attacker the cache set is theone secret data maps to
Evict + Probe (NewName Assigned in this Paper) In Step 1victim evict the cache set using the access to a data at anaddress known to the attacker In Step 2 the victim accessessecret data which possibly evicts data from Step 1 In Step
3 the attacker probes each cache set using the same data asin Step 1 if a cache miss is observed the attacker knownsthe secret data maps to the cache set he or she primed Thisattack strategy has similar Step 2 and Step 3 as Prime +Probe attack but for Step 1 it is the victim that does theeviction accesses
Prime + Time (NewName Assigned in this Paper) In Step 1the attacker primes the cache set using access to data at anaddress known to the attacker In Step 2 the victim accessessecret data which possibly evicts data from Step 1 In Step
3 the victim probes each cache set using the same data Step
1 if a cache miss is observed the attacker knowns the secretdata maps to the cache set he or she primed in Step 1 Thisattack strategy has similar Step 1 and Step 2 as Prime +Probe attack but for Step 3 it is the victim that does theprobing accesses
Flush + Time (NewNameAssigned in this Paper) The victimaccesses the same secret data in Step 1 and Step 3 whilein Step 2 the attacker tries to invalidate data at a knownaddress If cache miss is observed in Step 3 that will tellthe attacker the data address he or she invalidated in Step 2maps to the secret data
Invalidation related (new names assigned in this paper)Vulnerabilities that have names ending with ldquoinvalidationrdquoin Table 3 correspond to the vulnerabilities that havethe same name (except for the ldquoinvalidationrdquo part) inTable 2 The difference between each set of correspondingvulnerabilities is that the vulnerabilities ending withldquoinvalidationrdquo use invalidation related operation in the laststep to derive the timing information rather than the normalmemory access related operations
Appendix B Soundness Analysisof the Three-StepModel
In this section we analyze the soundness of the three-stepmodel to demonstrate that the three-step model can coverall possible timing-based cache vulnerabilities in normalcaches If there is a vulnerability that is represented usingmore than three steps the steps can be reduced to only three
steps or a three-step sub-pattern can be found in the longerrepresentation
In the below analysis we use β to denote the numberof memory-related operations ie steps in a representationof a vulnerability We show that β = 1 is not sufficient torepresent a vulnerability β = 2 covers some vulnerabilitiesbut not all β = 3 represents all the vulnerabilities andβ gt 3 can be reduced to only three steps or a three-step sub-pattern can be found in the longer representation Knownaddresses refer to all the cache states that interferencewith the data a aalias and d Unknown address refersu An access to a known memory address is denoted asknown access operation and an invalidation of a knownmemory address is denoted as known inv operationThe known access operation and known inv operation
together make up not u operations An unknown memoryrelated operation (containing u) is denoted as u operation
B1 Patterns with β = 1
When β = 1 there is only one memory-relatedoperation and it is not possible to create interferencebetween memory-related operations since two memory-related operations are the minimum requirement for aninterference Furthermore β = 1 corresponds to the three-step pattern with both Step 1 and Step 2 being since thecache state gives no information and Step 3 being theone operation These types of patterns are all examined bythe cache three-step simulator and none of these types arefound to be effective Consequently a vulnerability cannotexit when β = 1
B2 Patterns with β = 2
When β = 2 it satisfies the minimum requirementof an interference for memory related operations andcorresponds to the three-step cases where Step 1 is and Step 2 and Step 3 are the two operations Thesetypes are all examined by the cache three-step simulatorand some of them belong to Weak Vulnerabilities like Aa Vu Therefore three-step cases where Step
1 is have corresponding effective vulnerabilities shownin Table 2 Consequently β = 2 can represent someweak vulnerabilities but not all vulnerabilities as there existsome that are represented with three steps as discussednext
B3 Patterns with β = 3
When β = 3 we have tested all possible combinations ofthree-step memory related operations in Section 33 usingour cache simulator for the three-step model We foundthat there are in total 72 types of Strong Vulnerabilities
J Hardw Syst Secur (2019) 3397ndash425 419
and 64 types of Weak Vulnerabilities that are representedby patterns with β = 3 steps Consequently β = 3can represent all the vulnerabilities (including some weakones where Step 1 is ) Using more steps to representvulnerabilities is not necessary as discussed next
B4 Patterns with β gt 3
When β gt 3 the pattern of memory-related operations fora vulnerability can be reduced using the following rules
B41 Subdivision Rules
First a set of subdivision rules is used to divide the longpattern into shorter patterns following the below rulesEach subdivision rule should be applied recursively beforeapplying the next rule
Subdivision Rule 1 If the longer pattern contains a sub-pattern such as the longer pattern canbe divided into two separate patterns where is assignedas Step 1 of the second pattern This is because givesno timing information and the attacker loses track of thecache state after This rule should be recursively applieduntil there are no sub-patterns left with a in the middleor as last step ( in the last step will be deleted) in thelonger pattern
Subdivision Rule 2 Next if a pattern (derived afterrecursive application of the Rule 1 contains a sub-patternsuch as AinvVinv the longer pattern canbe divided into two separate patterns where AinvVinv isassigned as Step 1 of the second pattern This is becauseAinvVinv will flush all the timing information of thecurrent block and it can be used as the flushing step forStep 1 eg vulnerability Ainv Vu Aa(f ast)shown in Table 2 AinvVinv cannot be a candidate formiddle steps or the last step because it will flush alltiming information making the attacker unable to deducethe final timing with victimrsquos sensitive address translationinformation This rule should be recursively applied untilthere are no sub-patterns left with a AinvVinv in themiddle or the last step (AinvVinv in the last step will bedeleted)
B42 Simplification Rules
For each of the patterns resulting from the subdivisionof the original pattern we define Commute Rules UnionRules and Reduction Rules for a each set of two adjacentsteps in these remaining patterns In Table 7 we showall the possible cases of the rule applying conditions for
each adjacent two steps regardless of the attackerrsquos access(A) or the victimrsquos access (V ) The table shows whetherthe corresponding two steps can be commuted reduced orunioned (and the reduced or the unioned result if the rulescan be applied)
B421 Commute Rules
Suppose there are two adjacent steps M and N for a memorysequences M N If commuting M andN lead to the same observation result ie M N and N M will have thesame timing observation information in the final step for theattacker we can freely exchange the place of M and N inthis pattern In this case we have more chance to Reduceand Union the steps within the memory sequence by thefollowing Rules In the possible commuting process we willtry every possible combinations to commute different pairsof two steps that are able to apply the Commute Rules andthen further apply Reduce Rules and Union Rules to seewhether the commute is effective ie there can be stepsreduced or unioned after the proper commuting process Thefollowing two adjacent memory-related operations can becommuted
ndash Commute Rule 1 For two adjacent steps if one stepis a known access operation and another step is aknown inv operation and the addresses they referto are different these two steps can be commuted nomatter which position of the two steps they are in withinthe whole memory sequence It will show a ldquoyesrdquo forthe corresponding two-step pattern for the CommuteRule 1 column if these two can be commuted in Table 7
ndash Commute Rule 2 A superset of two-step patterns thatcan apply Commute Rule 1 can be commuted if thesecond step of these two adjacent steps is not thelast step in the whole memory sequence There aresome two adjacent steps that can only be commutedif the second step of these two adjacent steps is notthe last step in the whole memory sequence Therewill be a ldquoyesrdquo for the corresponding two-step patternfor the Commute Rule 2 column and a ldquonordquo for thecorresponding two-step pattern for the Commute Rule 1column in Table 7
B422 Reduction Rules
If the memory sequence after applying Commute Ruleshave a sub-pattern that has two adjacent steps both relatedto known addresses or both related to unknown address(including repeating states) the two adjacent steps can bereduced to only one following the reduction rules (if thetwo-step pattern has ldquoyesrdquo for the Column ldquoUnion Rule or
J Hardw Syst Secur (2019) 3397ndash425420
Table7
Rul
esfo
rco
mbi
ning
two
adja
cent
step
s
Firs
tSe
cond
Com
mute
Com
mute
Union
Rule
orC
ombi
ned
Firs
tSe
cond
Com
mute
Com
mute
Union
Rule
orC
ombi
ned
Rule1
Rule2
ReduceRule
Step
Rule1
Rule2
ReduceRule
Step
aa
yes
yes
yes
aa
inv
ano
noye
sa
aa
ali
as
nono
yes
aali
as
ain
va
ali
as
yes
yes
yes
aali
as
ad
nono
yes
da
inv
dye
sye
sye
sd
au
nono
nominus
ain
vu
nono
nominus
aa
inv
nono
yes
ain
va
inv
ain
vye
sye
sye
sa
inv
aa
ali
asin
vye
sye
sye
sa
ain
va
ali
asin
vye
sye
sye
sU
nio
n(a
inva
ali
asin
v)
ad
inv
yes
yes
yes
aa
inv
din
vye
sye
sye
sU
nio
n(a
invd
inv)
au
inv
nono
nominus
ain
vu
inv
noye
sno
minusa
ali
as
ano
noye
sa
aali
asin
va
yes
yes
yes
a
aali
as
aali
as
yes
yes
yes
aali
as
aali
asin
va
ali
as
nono
yes
aali
as
aali
as
dno
noye
sd
aali
asin
vd
yes
yes
yes
d
aali
as
uno
nono
minusa
ali
asin
vu
nono
nominus
aali
as
ain
vye
sye
sye
sa
ali
as
aali
asin
va
inv
yes
yes
yes
Unio
n(a
inva
ali
asin
v)
aali
as
aali
asin
vno
noye
sa
ali
asin
va
ali
asin
va
ali
asin
vye
sye
sye
sa
ali
asin
v
aali
as
din
vye
sye
sye
sa
ali
as
aali
asin
vd
inv
yes
yes
yes
Unio
n(d
inva
ali
asin
v)
aali
as
uin
vno
nono
minusa
ali
asin
vu
inv
noye
sno
minusd
ano
noye
sa
din
va
yes
yes
yes
a
da
ali
as
nono
yes
aali
as
din
va
ali
as
yes
yes
yes
aali
as
dd
yes
yes
yes
dd
inv
dno
noye
sd
du
nono
nominus
din
vu
noye
sno
minusd
ain
vye
sye
sye
sd
din
va
inv
yes
yes
yes
Unio
n(a
invd
inv)
da
ali
asin
vye
sye
sye
sd
din
va
ali
asin
vye
sye
sye
sU
nio
n(d
inva
ali
asin
v)
dd
inv
nono
yes
din
vd
inv
din
vye
sye
sye
sd
inv
du
inv
noye
sno
minusd
inv
uin
vno
yes
nominus
ua
nono
nominus
uin
va
nono
nominus
ua
ali
as
nono
nominus
uin
va
ali
as
nono
nominus
ud
nono
nominus
uin
vd
noye
sno
minusu
uye
sye
sye
su
uin
vu
nono
yes
u
ua
inv
nono
nominus
uin
va
inv
noye
sno
minusu
aali
asin
vno
nono
minusu
inv
aali
asin
vno
yes
nominus
ud
inv
noye
sno
minusu
inv
din
vno
yes
nominus
uu
inv
nono
yes
uin
vu
inv
uin
vye
sye
sye
su
inv
J Hardw Syst Secur (2019) 3397ndash425 421
Reduce Rulerdquo and has no Union result for the ldquoCombinedSteprdquo Column in Table 7
ndash Reduction Rule 1 For two u operations although u isunknown both of the accesses target on the same u socan be reduced to only keep the second access in thememory sequence
ndash Reduction Rule 2 For two known adjacent memoryaccessndashrelated operations (known access operation)they always result in a deterministic state of the secondmemory accessndashrelated cache block so these two stepscan be reduced to only one step
ndash Reduction Rule 3 For two adjacent steps if onestep is known access operation and another one isknown inv operation no matter what order theyhave and the address they refer to is the same these twocan be reduced to one step which is the second step
B423 Union Rules
Suppose there are two adjacent steps M and N for memorysequences M N If combining M andN leads to the same timing observation result ie M N and Union(M N) will havethe same timing observation information in the final step forthe attacker we can combine step M and N to be a joint onestep for this memory sequence defined as Union(M N)Two adjacent steps that can be combined are discussed inthe following cases
ndash Union Rule 1 Two invalidations to two knowndifferent memory addresses can be applied UnionRule 1 known inv operation are two operations bothinvalidating some known address therefore they canbe combined to only one step The Union Rule can becontinuously done to union all the adjacent invalidationstep that invalidates known different memory addresses
B424 Final Check Rules
Each long memory sequence will recursively apply thesethree categorizations of the rules in the order Com-mute Rules first to put known access operations andknown inv operation that targets the same address as nearas possible and u operations and not u operations areputting together as much as possible The Reduced Rulesare then checked and applied to the processed memorysequence to reduce the steps Then the Union Rule isapplied to the processed memory sequence
The recursion at each application to these threecategorizations of the rules should be always applied andreduce at least one step until the resulting sequence matchesone of the two possible cases
bull the long (β gt 3) memory sequence with u operation
and not u operation is further reduced to a sequencewhere there are at most three steps in the followingpatterns or less
ndash u operation not u operation u operation
ndash not u operation u operation not u operation
There might be possible extra or AinvV inv beforethese three-step pattern where
bull An extra in the first step will not influencethe result and can be directly removed
bull If an extra AinvV inv in the first step
ndash If followed by known access ope-ration AinvV inv can be removeddue to the actual state further put intothe cache block
ndash If followed by known inv opera-t ion or V inv
u AinvV inv can also beremoved since the memory locationis repeatedly flushed by the two steps
ndash If followed by Vu worst case will beAinvV inv Vu not u operation u operationwhich is either an effective vul-nerability according to Table 2and reduction rules shown inSection 33 or AinvV inv Vu Ainv
d V invd u operation where
Vu Ainvd V inv
d can further beapplied Commute Rule 2 to reduceand be within three steps
In this case the steps are finally within three steps andthe checking is done
bull There exist two adjacent steps that cannot be appliedany Rules above and requires the Rest Checking
The only left two adjacent steps that cannot be appliedby any of the three categorizations of the Rules are thefollowing
ndash AaVaAaalias Vaalias AdVdAinva V inv
a
Ainvaalias V inv
aalias Vu ndash AaVaAaalias Vaalias V inv
u ndash Vu AaVaAaalias Vaalias AdVd
Ainva V inv
a Ainvaalias V inv
aalias ndash V inv
u AaVaAaalias Vaalias We manually checked all of the two adjacent step patterns
above and found that adding extra step before or after these
J Hardw Syst Secur (2019) 3397ndash425422
Algorithm 2 -Step ( 3) pattern reduction
Input number of steps of the pattern
a two-dimensional dynamic-size array [0] contains the states of each step of the original pattern in
order [1] [2] are empty initially
Output reduced effective vulnerability pattern(s) array It will be an empty list if the original pattern does not
correspond to an effective vulnerability
1 = Oslash
2 while contain( ) and index not 0 do3 = 1 ( )
4 end while5 while ( contain( ) and index not 0) or ( contain( ) and index not 0) do6 = 2 ( )
7 end while8 while ( is ineffecitve or has interval effective three steps) do9 = ( )
10 = ( )
11 = ( )
12 if ( is ineffecitve or has interval effective three steps) then13 += ( )
14 end if15 end while16 return
two steps can either generate two adjacent step patterns thatbe processed by the three Rules where further step can bereduced or construct effective vulnerability according toTable 2 and reduction rules shown in Section 33 wherethe corresponding pattern can be treated effective and thechecking is done
B43 Algorithm for Reducing and CheckingMemorySequence
Algorithm 2 is used to (i) reduce a β-step (β gt 3)pattern to a three-step pattern thus demonstrating that thecorresponding β gt 3 step pattern actually is equivalent tothe output three-step pattern and represents a vulnerabilitythat is captured by an existing three-step pattern or (ii)demonstrate that the β-step pattern can be mapped to one ormore three-step vulnerabilities It is not possible for a β-stepvulnerability pattern to not be either (i) or (ii) after doing theRule applications Key outcome of our analysis is that anyβ-step pattern is not a vulnerability or if it is a vulnerabilityit maps to either outputs (i) or (ii) of the algorithm
Inside the Algorithm 2 contain() represents a function tocheck if a list contains a corresponding state is ineffective()represents a function that checks the corresponding memorysequence does not contain any effective three-stepshas interval effective three steps() represents a functionthat check if the corresponding memory sequence can bemapped to one or more three-step vulnerabilities
B44 Summary
In conclusion the three-step model can model all possibletiming-based cache vulnerability in normal caches Vulner-abilities which are represented by more than three steps canbe always reduced to one (or more) vulnerabilities from ourthree-step model and thus using more than three step is notnecessary
References
1 Acıicmez O Koc CK (2006) Trace-driven cache attacks on AES(short paper) In International conference on information andcommunications security Springer pp 112ndash121
2 Agrawal D Archambeault B Rao JR Rohatgi P (2002) TheEM side-channel (s) In International workshop on cryptographichardware and embedded systems Springer pp 29ndash45
3 Bernstein DJ (2005) Cache-timing attacks on AES4 Binkert N Beckmann B Black G Reinhardt SK Saidi A Basu
A Hestness J Hower DR Krishna T Sardashti S et al (2011)The gem5 simulator ACM SIGARCH computer architecture news39(2)1ndash7
5 Bonneau J Mironov I (2006) Cache-collision timing attacksagainst AES In International workshop on cryptographichardware and embedded systems Springer pp 201ndash215
6 Borghoff J Canteaut A Guneysu T Kavun EB Knezevic MKnudsen LR Leander G Nikov V Paar C Rechberger Cet al (2012) Princendasha low-latency block cipher for pervasivecomputing applications In International conference on the theoryand application of cryptology and information security Springerpp 208ndash225
J Hardw Syst Secur (2019) 3397ndash425 423
7 Bourgeat T Lebedev I Wright A Zhang S Devadas S et al (2019)MI6 Secure enclaves in a speculative out-of-order processorIn Proceedings of the 52nd Annual IEEEACM InternationalSymposium on Microarchitecture ACM pp 42ndash56
8 Chen S Liu F Mi Z Zhang Y Lee RB Chen H Wang X(2018) Leveraging hardware transactional memory for cache side-channel defenses In Proceedings of the 2018 on Asia conferenceon computer and communications security ACM pp 601ndash608
9 Clark SS Ransford B Rahmati A Guineau S Sorber J Xu WFu K (2013) Wattsupdoc power side channels to nonintrusivelydiscover untargeted malware on embedded medical devices InPresented as part of the 2013 USENIX workshop on healthinformation technologies
10 Costan V Lebedev IA Devadas S (2016) Sanctum minimalhardware extensions for strong software isolation In USENIXsecurity symposium pp 857ndash874
11 Deng S Xiong W Szefer J (2018) Cache timing side-channel vulnerability checking with computation tree logic InProceedings of the 7th international workshop on hardware andarchitectural support for security and privacy ACM p 2
12 Domnitser L Jaleel A Loew J Abu-Ghazaleh N Ponomarev D(2012) Non-monopolizable caches low-complexity mitigation ofcache side channel attacks ACM Transactions on Architectureand Code Optimization (TACO) 8(4)35
13 Doychev G Kopf B (2017) Rigorous analysis of softwarecountermeasures against cache attacks In Proceedings of the 38thACM SIGPLAN conference on programming language design andimplementation ACM pp 406ndash421
14 Doychev G Kopf B Mauborgne L Reineke J (2015) CacheAudita tool for the static analysis of cache side channels ACM Trans-actions on information and system security (TISSEC) 18(1)4
15 Gruss D Lettner J Schuster F Ohrimenko O Haller I CostaM (2017) Strong and efficient cache side-channel protectionusing hardware transactional memory In USENIX securitysymposium pp 217ndash233
16 Gruss D Maurice C Wagner K Mangard S (2016) Flush+ flusha fast and stealthy cache attack In International conference ondetection of intrusions and malware and vulnerability assessmentSpringer pp 279ndash299
17 Gruss D Spreitzer R Mangard S (2015) Cache template attacksautomating attacks on inclusive last-level caches In USENIXsecurity symposium pp 897ndash912
18 Guanciale R Nemati H Baumann C Dam M (2016) Cache stor-age channels alias-driven attacks and verified countermeasuresIn 2016 IEEE symposium on security and privacy (SP) IEEEpp 38ndash55
19 Gullasch D Bangerter E Krenn S (2011) Cache gamesndashbringingaccess-based cache attacks on AES to practice In 2011 IEEEsymposium on security and privacy (SP) IEEE pp 490ndash505
20 He Z Lee RB (2017) How secure is your cache against side-channel attacks In Proceedings of the 50th annual IEEEACMinternational symposium on microarchitecture ACM pp 341ndash353
21 Intel C (2015) Improving real-time performance by utilizing cacheallocation technology Intel Corporation
22 Kayaalp M Khasawneh KN Esfeden HA Elwell J Abu-Ghazaleh N Ponomarev D Jaleel A (2017) RIC relaxed inclusioncaches for mitigating LLC side-channel attacks In 2017 54thACMEDACIEEE design automation conference (DAC) IEEEpp 1ndash6
23 Keramidas G Antonopoulos A Serpanos DN Kaxiras S (2008)Non deterministic caches a simple and effective defense againstside channel attacks Des Autom Embed Syst 12(3)221ndash230
24 Kessler RE Hill MD (1992) Page placement algorithms for largerealindexed caches ACM Trans Comput Syst (TOCS) 10(4)338ndash359
25 Kiriansky V Lebedev I Amarasinghe S Devadas S Emer J(2018) DAWG a defense against cache timing attacks in spec-ulative execution processors In 2018 51st Annual IEEEACMinternational symposium on microarchitecture (MICRO) IEEEpp 974ndash987
26 Kocher P Genkin D Gruss D Haas W Hamburg M LippM Mangard S Prescher T Schwarz M Yarom Y (2018)Spectre attacks exploiting speculativeexecution In 2019 IEEESymposium on Security and Privacy (SP) IEEE pp 1ndash19
27 Lee RB Kwan P McGregor JP Dwoskin J Wang Z (2005)Architecture for protecting critical secrets in microprocessorsIn ACM SIGARCH computer architecture news vol 33 IEEEComputer Society pp 2ndash13
28 Lee Y Waterman A Avizienis R Cook H Sun C StojanovicV Asanovic K (2014) A 45nm 13 GHz 167 double-precisionGFLOPSW RISC-V processor with vector accelerators InEuropean solid state circuits conference (ESSCIRC) ESSCIRC2014-40th IEEE pp 199ndash202
29 Liu F Ge Q Yarom Y Mckeen F Rozas C Heiser G Lee RB(2016) CATalyst defeating last-level cache side channel attacks incloud computing In 2016 IEEE international symposium on highperformance computer architecture (HPCA) IEEE pp 406ndash418
30 Liu F Lee RB (2014) Random fill cache architecture In2014 47th annual IEEEACM international symposium onmicroarchitecture (MICRO) IEEE pp 203ndash215
31 Liu F Wu H Mai K Lee RB (2016) Newcache secure cachearchitecture thwarting cache side-channel attacks IEEE Micro36(5)8ndash16
32 Luk CK Cohn R Muth R Patil H Klauser A Lowney G WallaceS Reddi VJ Hazelwood K (2005) Pin building customizedprogram analysis tools with dynamic instrumentation In ACMsigplan notices vol 40 ACM pp 190ndash200
33 Masti RJ Rai D Ranganathan A Muller C Thiele L Capkun S(2015) Thermal covert channels on multi-core platforms In 24thUSENIX security symposium (USENIX security 15) pp 865ndash880
34 Osvik DA Shamir A Tromer E (2006) Cache attacks andcountermeasures the case of AES In Cryptographersrsquo track atthe RSA conference Springer pp 1ndash20
35 Patel A Afram F Chen S Ghose K (2011) MARSS Afull system simulator for multicore x86 CPUs In 2011 48thACMEDACIEEE design automation conference (DAC) IEEEpp 1050ndash1055
36 Percival C (2005) Cache missing for fun and profit BSDCan 2005Ottawa 2005 httpwwwdaemonologynetpapershttpdf
37 Hammarlund P Martinez AJ Bajwa AA Hill DL Hallnor EJiang H Dixon M Derr M Hunsaker M Kumar R Osborne RB(2014) Haswell The fourth-generation intel core processor IEEEMicro 1034(2)6ndash20
38 Qureshi MK (2018) CEASER mitigating conflict-based cacheattacks via encrypted-address and remapping In 2018 51Stannual IEEEACM international symposium on microarchitecture(MICRO) IEEE pp 775ndash787
39 Sanchez D Kozyrakis C (2013) ZSIm fast and accuratemicroarchitectural simulation of thousand-core systems In ACMSIGARCH computer architecture news vol 41 ACM pp 475ndash486
40 Schwarz M Schwarzl M Lipp M Gruss D (2018) NetSpectreread arbitrary memory over network In European Symposium onResearch in Computer Security Springer pp 279ndash299
41 Seznec A (1993) A case for two-way skewed-associative cachesACM SIGARCH computer architecture news 21(2)169ndash178
42 Sharkey J Ponomarev D Ghose K (2005) M-sim a flexiblemultithreaded architectural simulation environment Techenicalreport Department of Computer Science State University ofNew York at Binghamton In Technical Report 20012 CompaqComputer Corporation Aug 2001
J Hardw Syst Secur (2019) 3397ndash425424
43 Shivakumar P Jouppi NP (2001) Cacti 30 an integrated cachetiming power and area model In Technical Report 20012Compaq Computer Corporation Aug 2001
44 Taylor G Davies P Farmwald M (1990) The TLB slice-alow-cost high-speed address translation mechanism In 17thannual international symposium on computer architecture 1990Proceedings IEEE pp 355ndash363
45 Thoziyoor S Muralimanohar N Ahn JH Jouppi NP (2008)CACTI 51 Technical Report HPL-2008-20 HP Labs
46 Trippel C Lustig D Martonosi M (2018) MeltdownPrimeand SpectrePrime automatically-synthesized attacks exploitinginvalidation-based coherence protocols arXiv180203802
47 Wang S Wang P Liu X Zhang D Wu D (2017) CacheDidentifying cache-based timing channels in production softwareIn 26th USENIX security symposium USENIX association
48 Wang Y Ferraiuolo A Zhang D Myers AC Suh GE (2016)SecDCP secure dynamic cache partitioning for efficient timingchannel protection In 2016 53nd ACMEDACIEEE designautomation conference (DAC) IEEE pp 1ndash6
49 Wang Z Lee RB (2007) New cache designs for thwarting softwarecache-based side channel attacks In ACM SIGARCH computerarchitecture news vol 35 ACM pp 494ndash505
50 Wang Z Lee RB (2008) A novel cache architecture withenhanced performance and security In 2008 41st IEEEACMinternational symposium on Microarchitecture 2008 MICRO-41IEEE pp 83ndash93
51 Werner M Unterluggauer T Giner L Schwarz M Gruss D Man-gard S (2019) ScatterCache thwarting cache attacks via cache setrandomization In 28th USENIX security symposium (USENIXSecurity 19) USENIX Association Santa Clara CA httpswwwusenixorgconferenceusenixsecurity19presentationwerner
52 Yan M Choi J Skarlatos D Morrison A Fletcher C Torrellas J(2018) InvisiSpec making speculative execution invisible in thecache hierarchy In 2018 51st annual IEEEACM internationalsymposium on microarchitecture (MICRO) IEEE pp 428ndash441
53 Yan M Gopireddy B Shull T Torrellas J (2017) Secure hierarchy-aware cache replacement policy (SHARP) defending againstcache-based side channel attacks In Proceedings of the 44thannual international symposium on computer architecture ACMpp 347ndash360
54 Yao F Doroslovacki M Venkataramani G (2018) Are coherenceprotocol states vulnerable to information leakage In 2018IEEE international symposium on high performance computerarchitecture (HPCA) IEEE pp 168ndash179
55 Yarom Y Falkner K (2014) FLUSH+ RELOAD a high resolutionlow noise L3 cache side-channel attack In USENIX securitysymposium pp 719ndash732
56 Zhang D Askarov A Myers AC (2012) Language-based controland mitigation of timing channels ACM SIGPLAN Not 47(6)99ndash110
57 Zhang D Wang Y Suh GE Myers AC (2015) A hardware designlanguage for timing-sensitive information-flow security In ACMSIGARCH computer architecture news vol 43 ACM pp 503ndash516
58 Zhang S Wright A Bourgeat T Arvind A (2018) Composablebuilding blocks to open up processor design In 2018 51stannual IEEEACM international symposium on microarchitecture(MICRO) IEEE pp 68ndash81
59 Zhang T Lee RB (2014) New models of cache architecturescharacterizing information leakage from cache side channels InProceedings of the 30th annual computer security applicationsconference ACM pp 96ndash105
60 Zhang Y Parikh D Sankaranarayanan K Skadron K Stan M(2003) HotLeakage a temperature-aware model of subthresholdand gate leakage for architects University of Virginia Dept ofComputer Science Tech Report CS-2003 5
Publisherrsquos Note Springer Nature remains neutral with regard tojurisdictional claims in published maps and institutional affiliations
J Hardw Syst Secur (2019) 3397ndash425 425
and 64 types of Weak Vulnerabilities that are representedby patterns with β = 3 steps Consequently β = 3can represent all the vulnerabilities (including some weakones where Step 1 is ) Using more steps to representvulnerabilities is not necessary as discussed next
B4 Patterns with β gt 3
When β gt 3 the pattern of memory-related operations fora vulnerability can be reduced using the following rules
B41 Subdivision Rules
First a set of subdivision rules is used to divide the longpattern into shorter patterns following the below rulesEach subdivision rule should be applied recursively beforeapplying the next rule
Subdivision Rule 1 If the longer pattern contains a sub-pattern such as the longer pattern canbe divided into two separate patterns where is assignedas Step 1 of the second pattern This is because givesno timing information and the attacker loses track of thecache state after This rule should be recursively applieduntil there are no sub-patterns left with a in the middleor as last step ( in the last step will be deleted) in thelonger pattern
Subdivision Rule 2 Next if a pattern (derived afterrecursive application of the Rule 1 contains a sub-patternsuch as AinvVinv the longer pattern canbe divided into two separate patterns where AinvVinv isassigned as Step 1 of the second pattern This is becauseAinvVinv will flush all the timing information of thecurrent block and it can be used as the flushing step forStep 1 eg vulnerability Ainv Vu Aa(f ast)shown in Table 2 AinvVinv cannot be a candidate formiddle steps or the last step because it will flush alltiming information making the attacker unable to deducethe final timing with victimrsquos sensitive address translationinformation This rule should be recursively applied untilthere are no sub-patterns left with a AinvVinv in themiddle or the last step (AinvVinv in the last step will bedeleted)
B42 Simplification Rules
For each of the patterns resulting from the subdivisionof the original pattern we define Commute Rules UnionRules and Reduction Rules for a each set of two adjacentsteps in these remaining patterns In Table 7 we showall the possible cases of the rule applying conditions for
each adjacent two steps regardless of the attackerrsquos access(A) or the victimrsquos access (V ) The table shows whetherthe corresponding two steps can be commuted reduced orunioned (and the reduced or the unioned result if the rulescan be applied)
B421 Commute Rules
Suppose there are two adjacent steps M and N for a memorysequences M N If commuting M andN lead to the same observation result ie M N and N M will have thesame timing observation information in the final step for theattacker we can freely exchange the place of M and N inthis pattern In this case we have more chance to Reduceand Union the steps within the memory sequence by thefollowing Rules In the possible commuting process we willtry every possible combinations to commute different pairsof two steps that are able to apply the Commute Rules andthen further apply Reduce Rules and Union Rules to seewhether the commute is effective ie there can be stepsreduced or unioned after the proper commuting process Thefollowing two adjacent memory-related operations can becommuted
ndash Commute Rule 1 For two adjacent steps if one stepis a known access operation and another step is aknown inv operation and the addresses they referto are different these two steps can be commuted nomatter which position of the two steps they are in withinthe whole memory sequence It will show a ldquoyesrdquo forthe corresponding two-step pattern for the CommuteRule 1 column if these two can be commuted in Table 7
ndash Commute Rule 2 A superset of two-step patterns thatcan apply Commute Rule 1 can be commuted if thesecond step of these two adjacent steps is not thelast step in the whole memory sequence There aresome two adjacent steps that can only be commutedif the second step of these two adjacent steps is notthe last step in the whole memory sequence Therewill be a ldquoyesrdquo for the corresponding two-step patternfor the Commute Rule 2 column and a ldquonordquo for thecorresponding two-step pattern for the Commute Rule 1column in Table 7
B422 Reduction Rules
If the memory sequence after applying Commute Ruleshave a sub-pattern that has two adjacent steps both relatedto known addresses or both related to unknown address(including repeating states) the two adjacent steps can bereduced to only one following the reduction rules (if thetwo-step pattern has ldquoyesrdquo for the Column ldquoUnion Rule or
J Hardw Syst Secur (2019) 3397ndash425420
Table7
Rul
esfo
rco
mbi
ning
two
adja
cent
step
s
Firs
tSe
cond
Com
mute
Com
mute
Union
Rule
orC
ombi
ned
Firs
tSe
cond
Com
mute
Com
mute
Union
Rule
orC
ombi
ned
Rule1
Rule2
ReduceRule
Step
Rule1
Rule2
ReduceRule
Step
aa
yes
yes
yes
aa
inv
ano
noye
sa
aa
ali
as
nono
yes
aali
as
ain
va
ali
as
yes
yes
yes
aali
as
ad
nono
yes
da
inv
dye
sye
sye
sd
au
nono
nominus
ain
vu
nono
nominus
aa
inv
nono
yes
ain
va
inv
ain
vye
sye
sye
sa
inv
aa
ali
asin
vye
sye
sye
sa
ain
va
ali
asin
vye
sye
sye
sU
nio
n(a
inva
ali
asin
v)
ad
inv
yes
yes
yes
aa
inv
din
vye
sye
sye
sU
nio
n(a
invd
inv)
au
inv
nono
nominus
ain
vu
inv
noye
sno
minusa
ali
as
ano
noye
sa
aali
asin
va
yes
yes
yes
a
aali
as
aali
as
yes
yes
yes
aali
as
aali
asin
va
ali
as
nono
yes
aali
as
aali
as
dno
noye
sd
aali
asin
vd
yes
yes
yes
d
aali
as
uno
nono
minusa
ali
asin
vu
nono
nominus
aali
as
ain
vye
sye
sye
sa
ali
as
aali
asin
va
inv
yes
yes
yes
Unio
n(a
inva
ali
asin
v)
aali
as
aali
asin
vno
noye
sa
ali
asin
va
ali
asin
va
ali
asin
vye
sye
sye
sa
ali
asin
v
aali
as
din
vye
sye
sye
sa
ali
as
aali
asin
vd
inv
yes
yes
yes
Unio
n(d
inva
ali
asin
v)
aali
as
uin
vno
nono
minusa
ali
asin
vu
inv
noye
sno
minusd
ano
noye
sa
din
va
yes
yes
yes
a
da
ali
as
nono
yes
aali
as
din
va
ali
as
yes
yes
yes
aali
as
dd
yes
yes
yes
dd
inv
dno
noye
sd
du
nono
nominus
din
vu
noye
sno
minusd
ain
vye
sye
sye
sd
din
va
inv
yes
yes
yes
Unio
n(a
invd
inv)
da
ali
asin
vye
sye
sye
sd
din
va
ali
asin
vye
sye
sye
sU
nio
n(d
inva
ali
asin
v)
dd
inv
nono
yes
din
vd
inv
din
vye
sye
sye
sd
inv
du
inv
noye
sno
minusd
inv
uin
vno
yes
nominus
ua
nono
nominus
uin
va
nono
nominus
ua
ali
as
nono
nominus
uin
va
ali
as
nono
nominus
ud
nono
nominus
uin
vd
noye
sno
minusu
uye
sye
sye
su
uin
vu
nono
yes
u
ua
inv
nono
nominus
uin
va
inv
noye
sno
minusu
aali
asin
vno
nono
minusu
inv
aali
asin
vno
yes
nominus
ud
inv
noye
sno
minusu
inv
din
vno
yes
nominus
uu
inv
nono
yes
uin
vu
inv
uin
vye
sye
sye
su
inv
J Hardw Syst Secur (2019) 3397ndash425 421
Reduce Rulerdquo and has no Union result for the ldquoCombinedSteprdquo Column in Table 7
ndash Reduction Rule 1 For two u operations although u isunknown both of the accesses target on the same u socan be reduced to only keep the second access in thememory sequence
ndash Reduction Rule 2 For two known adjacent memoryaccessndashrelated operations (known access operation)they always result in a deterministic state of the secondmemory accessndashrelated cache block so these two stepscan be reduced to only one step
ndash Reduction Rule 3 For two adjacent steps if onestep is known access operation and another one isknown inv operation no matter what order theyhave and the address they refer to is the same these twocan be reduced to one step which is the second step
B423 Union Rules
Suppose there are two adjacent steps M and N for memorysequences M N If combining M andN leads to the same timing observation result ie M N and Union(M N) will havethe same timing observation information in the final step forthe attacker we can combine step M and N to be a joint onestep for this memory sequence defined as Union(M N)Two adjacent steps that can be combined are discussed inthe following cases
ndash Union Rule 1 Two invalidations to two knowndifferent memory addresses can be applied UnionRule 1 known inv operation are two operations bothinvalidating some known address therefore they canbe combined to only one step The Union Rule can becontinuously done to union all the adjacent invalidationstep that invalidates known different memory addresses
B424 Final Check Rules
Each long memory sequence will recursively apply thesethree categorizations of the rules in the order Com-mute Rules first to put known access operations andknown inv operation that targets the same address as nearas possible and u operations and not u operations areputting together as much as possible The Reduced Rulesare then checked and applied to the processed memorysequence to reduce the steps Then the Union Rule isapplied to the processed memory sequence
The recursion at each application to these threecategorizations of the rules should be always applied andreduce at least one step until the resulting sequence matchesone of the two possible cases
bull the long (β gt 3) memory sequence with u operation
and not u operation is further reduced to a sequencewhere there are at most three steps in the followingpatterns or less
ndash u operation not u operation u operation
ndash not u operation u operation not u operation
There might be possible extra or AinvV inv beforethese three-step pattern where
bull An extra in the first step will not influencethe result and can be directly removed
bull If an extra AinvV inv in the first step
ndash If followed by known access ope-ration AinvV inv can be removeddue to the actual state further put intothe cache block
ndash If followed by known inv opera-t ion or V inv
u AinvV inv can also beremoved since the memory locationis repeatedly flushed by the two steps
ndash If followed by Vu worst case will beAinvV inv Vu not u operation u operationwhich is either an effective vul-nerability according to Table 2and reduction rules shown inSection 33 or AinvV inv Vu Ainv
d V invd u operation where
Vu Ainvd V inv
d can further beapplied Commute Rule 2 to reduceand be within three steps
In this case the steps are finally within three steps andthe checking is done
bull There exist two adjacent steps that cannot be appliedany Rules above and requires the Rest Checking
The only left two adjacent steps that cannot be appliedby any of the three categorizations of the Rules are thefollowing
ndash AaVaAaalias Vaalias AdVdAinva V inv
a
Ainvaalias V inv
aalias Vu ndash AaVaAaalias Vaalias V inv
u ndash Vu AaVaAaalias Vaalias AdVd
Ainva V inv
a Ainvaalias V inv
aalias ndash V inv
u AaVaAaalias Vaalias We manually checked all of the two adjacent step patterns
above and found that adding extra step before or after these
J Hardw Syst Secur (2019) 3397ndash425422
Algorithm 2 -Step ( 3) pattern reduction
Input number of steps of the pattern
a two-dimensional dynamic-size array [0] contains the states of each step of the original pattern in
order [1] [2] are empty initially
Output reduced effective vulnerability pattern(s) array It will be an empty list if the original pattern does not
correspond to an effective vulnerability
1 = Oslash
2 while contain( ) and index not 0 do3 = 1 ( )
4 end while5 while ( contain( ) and index not 0) or ( contain( ) and index not 0) do6 = 2 ( )
7 end while8 while ( is ineffecitve or has interval effective three steps) do9 = ( )
10 = ( )
11 = ( )
12 if ( is ineffecitve or has interval effective three steps) then13 += ( )
14 end if15 end while16 return
two steps can either generate two adjacent step patterns thatbe processed by the three Rules where further step can bereduced or construct effective vulnerability according toTable 2 and reduction rules shown in Section 33 wherethe corresponding pattern can be treated effective and thechecking is done
B43 Algorithm for Reducing and CheckingMemorySequence
Algorithm 2 is used to (i) reduce a β-step (β gt 3)pattern to a three-step pattern thus demonstrating that thecorresponding β gt 3 step pattern actually is equivalent tothe output three-step pattern and represents a vulnerabilitythat is captured by an existing three-step pattern or (ii)demonstrate that the β-step pattern can be mapped to one ormore three-step vulnerabilities It is not possible for a β-stepvulnerability pattern to not be either (i) or (ii) after doing theRule applications Key outcome of our analysis is that anyβ-step pattern is not a vulnerability or if it is a vulnerabilityit maps to either outputs (i) or (ii) of the algorithm
Inside the Algorithm 2 contain() represents a function tocheck if a list contains a corresponding state is ineffective()represents a function that checks the corresponding memorysequence does not contain any effective three-stepshas interval effective three steps() represents a functionthat check if the corresponding memory sequence can bemapped to one or more three-step vulnerabilities
B44 Summary
In conclusion the three-step model can model all possibletiming-based cache vulnerability in normal caches Vulner-abilities which are represented by more than three steps canbe always reduced to one (or more) vulnerabilities from ourthree-step model and thus using more than three step is notnecessary
References
1 Acıicmez O Koc CK (2006) Trace-driven cache attacks on AES(short paper) In International conference on information andcommunications security Springer pp 112ndash121
2 Agrawal D Archambeault B Rao JR Rohatgi P (2002) TheEM side-channel (s) In International workshop on cryptographichardware and embedded systems Springer pp 29ndash45
3 Bernstein DJ (2005) Cache-timing attacks on AES4 Binkert N Beckmann B Black G Reinhardt SK Saidi A Basu
A Hestness J Hower DR Krishna T Sardashti S et al (2011)The gem5 simulator ACM SIGARCH computer architecture news39(2)1ndash7
5 Bonneau J Mironov I (2006) Cache-collision timing attacksagainst AES In International workshop on cryptographichardware and embedded systems Springer pp 201ndash215
6 Borghoff J Canteaut A Guneysu T Kavun EB Knezevic MKnudsen LR Leander G Nikov V Paar C Rechberger Cet al (2012) Princendasha low-latency block cipher for pervasivecomputing applications In International conference on the theoryand application of cryptology and information security Springerpp 208ndash225
J Hardw Syst Secur (2019) 3397ndash425 423
7 Bourgeat T Lebedev I Wright A Zhang S Devadas S et al (2019)MI6 Secure enclaves in a speculative out-of-order processorIn Proceedings of the 52nd Annual IEEEACM InternationalSymposium on Microarchitecture ACM pp 42ndash56
8 Chen S Liu F Mi Z Zhang Y Lee RB Chen H Wang X(2018) Leveraging hardware transactional memory for cache side-channel defenses In Proceedings of the 2018 on Asia conferenceon computer and communications security ACM pp 601ndash608
9 Clark SS Ransford B Rahmati A Guineau S Sorber J Xu WFu K (2013) Wattsupdoc power side channels to nonintrusivelydiscover untargeted malware on embedded medical devices InPresented as part of the 2013 USENIX workshop on healthinformation technologies
10 Costan V Lebedev IA Devadas S (2016) Sanctum minimalhardware extensions for strong software isolation In USENIXsecurity symposium pp 857ndash874
11 Deng S Xiong W Szefer J (2018) Cache timing side-channel vulnerability checking with computation tree logic InProceedings of the 7th international workshop on hardware andarchitectural support for security and privacy ACM p 2
12 Domnitser L Jaleel A Loew J Abu-Ghazaleh N Ponomarev D(2012) Non-monopolizable caches low-complexity mitigation ofcache side channel attacks ACM Transactions on Architectureand Code Optimization (TACO) 8(4)35
13 Doychev G Kopf B (2017) Rigorous analysis of softwarecountermeasures against cache attacks In Proceedings of the 38thACM SIGPLAN conference on programming language design andimplementation ACM pp 406ndash421
14 Doychev G Kopf B Mauborgne L Reineke J (2015) CacheAudita tool for the static analysis of cache side channels ACM Trans-actions on information and system security (TISSEC) 18(1)4
15 Gruss D Lettner J Schuster F Ohrimenko O Haller I CostaM (2017) Strong and efficient cache side-channel protectionusing hardware transactional memory In USENIX securitysymposium pp 217ndash233
16 Gruss D Maurice C Wagner K Mangard S (2016) Flush+ flusha fast and stealthy cache attack In International conference ondetection of intrusions and malware and vulnerability assessmentSpringer pp 279ndash299
17 Gruss D Spreitzer R Mangard S (2015) Cache template attacksautomating attacks on inclusive last-level caches In USENIXsecurity symposium pp 897ndash912
18 Guanciale R Nemati H Baumann C Dam M (2016) Cache stor-age channels alias-driven attacks and verified countermeasuresIn 2016 IEEE symposium on security and privacy (SP) IEEEpp 38ndash55
19 Gullasch D Bangerter E Krenn S (2011) Cache gamesndashbringingaccess-based cache attacks on AES to practice In 2011 IEEEsymposium on security and privacy (SP) IEEE pp 490ndash505
20 He Z Lee RB (2017) How secure is your cache against side-channel attacks In Proceedings of the 50th annual IEEEACMinternational symposium on microarchitecture ACM pp 341ndash353
21 Intel C (2015) Improving real-time performance by utilizing cacheallocation technology Intel Corporation
22 Kayaalp M Khasawneh KN Esfeden HA Elwell J Abu-Ghazaleh N Ponomarev D Jaleel A (2017) RIC relaxed inclusioncaches for mitigating LLC side-channel attacks In 2017 54thACMEDACIEEE design automation conference (DAC) IEEEpp 1ndash6
23 Keramidas G Antonopoulos A Serpanos DN Kaxiras S (2008)Non deterministic caches a simple and effective defense againstside channel attacks Des Autom Embed Syst 12(3)221ndash230
24 Kessler RE Hill MD (1992) Page placement algorithms for largerealindexed caches ACM Trans Comput Syst (TOCS) 10(4)338ndash359
25 Kiriansky V Lebedev I Amarasinghe S Devadas S Emer J(2018) DAWG a defense against cache timing attacks in spec-ulative execution processors In 2018 51st Annual IEEEACMinternational symposium on microarchitecture (MICRO) IEEEpp 974ndash987
26 Kocher P Genkin D Gruss D Haas W Hamburg M LippM Mangard S Prescher T Schwarz M Yarom Y (2018)Spectre attacks exploiting speculativeexecution In 2019 IEEESymposium on Security and Privacy (SP) IEEE pp 1ndash19
27 Lee RB Kwan P McGregor JP Dwoskin J Wang Z (2005)Architecture for protecting critical secrets in microprocessorsIn ACM SIGARCH computer architecture news vol 33 IEEEComputer Society pp 2ndash13
28 Lee Y Waterman A Avizienis R Cook H Sun C StojanovicV Asanovic K (2014) A 45nm 13 GHz 167 double-precisionGFLOPSW RISC-V processor with vector accelerators InEuropean solid state circuits conference (ESSCIRC) ESSCIRC2014-40th IEEE pp 199ndash202
29 Liu F Ge Q Yarom Y Mckeen F Rozas C Heiser G Lee RB(2016) CATalyst defeating last-level cache side channel attacks incloud computing In 2016 IEEE international symposium on highperformance computer architecture (HPCA) IEEE pp 406ndash418
30 Liu F Lee RB (2014) Random fill cache architecture In2014 47th annual IEEEACM international symposium onmicroarchitecture (MICRO) IEEE pp 203ndash215
31 Liu F Wu H Mai K Lee RB (2016) Newcache secure cachearchitecture thwarting cache side-channel attacks IEEE Micro36(5)8ndash16
32 Luk CK Cohn R Muth R Patil H Klauser A Lowney G WallaceS Reddi VJ Hazelwood K (2005) Pin building customizedprogram analysis tools with dynamic instrumentation In ACMsigplan notices vol 40 ACM pp 190ndash200
33 Masti RJ Rai D Ranganathan A Muller C Thiele L Capkun S(2015) Thermal covert channels on multi-core platforms In 24thUSENIX security symposium (USENIX security 15) pp 865ndash880
34 Osvik DA Shamir A Tromer E (2006) Cache attacks andcountermeasures the case of AES In Cryptographersrsquo track atthe RSA conference Springer pp 1ndash20
35 Patel A Afram F Chen S Ghose K (2011) MARSS Afull system simulator for multicore x86 CPUs In 2011 48thACMEDACIEEE design automation conference (DAC) IEEEpp 1050ndash1055
36 Percival C (2005) Cache missing for fun and profit BSDCan 2005Ottawa 2005 httpwwwdaemonologynetpapershttpdf
37 Hammarlund P Martinez AJ Bajwa AA Hill DL Hallnor EJiang H Dixon M Derr M Hunsaker M Kumar R Osborne RB(2014) Haswell The fourth-generation intel core processor IEEEMicro 1034(2)6ndash20
38 Qureshi MK (2018) CEASER mitigating conflict-based cacheattacks via encrypted-address and remapping In 2018 51Stannual IEEEACM international symposium on microarchitecture(MICRO) IEEE pp 775ndash787
39 Sanchez D Kozyrakis C (2013) ZSIm fast and accuratemicroarchitectural simulation of thousand-core systems In ACMSIGARCH computer architecture news vol 41 ACM pp 475ndash486
40 Schwarz M Schwarzl M Lipp M Gruss D (2018) NetSpectreread arbitrary memory over network In European Symposium onResearch in Computer Security Springer pp 279ndash299
41 Seznec A (1993) A case for two-way skewed-associative cachesACM SIGARCH computer architecture news 21(2)169ndash178
42 Sharkey J Ponomarev D Ghose K (2005) M-sim a flexiblemultithreaded architectural simulation environment Techenicalreport Department of Computer Science State University ofNew York at Binghamton In Technical Report 20012 CompaqComputer Corporation Aug 2001
J Hardw Syst Secur (2019) 3397ndash425424
43 Shivakumar P Jouppi NP (2001) Cacti 30 an integrated cachetiming power and area model In Technical Report 20012Compaq Computer Corporation Aug 2001
44 Taylor G Davies P Farmwald M (1990) The TLB slice-alow-cost high-speed address translation mechanism In 17thannual international symposium on computer architecture 1990Proceedings IEEE pp 355ndash363
45 Thoziyoor S Muralimanohar N Ahn JH Jouppi NP (2008)CACTI 51 Technical Report HPL-2008-20 HP Labs
46 Trippel C Lustig D Martonosi M (2018) MeltdownPrimeand SpectrePrime automatically-synthesized attacks exploitinginvalidation-based coherence protocols arXiv180203802
47 Wang S Wang P Liu X Zhang D Wu D (2017) CacheDidentifying cache-based timing channels in production softwareIn 26th USENIX security symposium USENIX association
48 Wang Y Ferraiuolo A Zhang D Myers AC Suh GE (2016)SecDCP secure dynamic cache partitioning for efficient timingchannel protection In 2016 53nd ACMEDACIEEE designautomation conference (DAC) IEEE pp 1ndash6
49 Wang Z Lee RB (2007) New cache designs for thwarting softwarecache-based side channel attacks In ACM SIGARCH computerarchitecture news vol 35 ACM pp 494ndash505
50 Wang Z Lee RB (2008) A novel cache architecture withenhanced performance and security In 2008 41st IEEEACMinternational symposium on Microarchitecture 2008 MICRO-41IEEE pp 83ndash93
51 Werner M Unterluggauer T Giner L Schwarz M Gruss D Man-gard S (2019) ScatterCache thwarting cache attacks via cache setrandomization In 28th USENIX security symposium (USENIXSecurity 19) USENIX Association Santa Clara CA httpswwwusenixorgconferenceusenixsecurity19presentationwerner
52 Yan M Choi J Skarlatos D Morrison A Fletcher C Torrellas J(2018) InvisiSpec making speculative execution invisible in thecache hierarchy In 2018 51st annual IEEEACM internationalsymposium on microarchitecture (MICRO) IEEE pp 428ndash441
53 Yan M Gopireddy B Shull T Torrellas J (2017) Secure hierarchy-aware cache replacement policy (SHARP) defending againstcache-based side channel attacks In Proceedings of the 44thannual international symposium on computer architecture ACMpp 347ndash360
54 Yao F Doroslovacki M Venkataramani G (2018) Are coherenceprotocol states vulnerable to information leakage In 2018IEEE international symposium on high performance computerarchitecture (HPCA) IEEE pp 168ndash179
55 Yarom Y Falkner K (2014) FLUSH+ RELOAD a high resolutionlow noise L3 cache side-channel attack In USENIX securitysymposium pp 719ndash732
56 Zhang D Askarov A Myers AC (2012) Language-based controland mitigation of timing channels ACM SIGPLAN Not 47(6)99ndash110
57 Zhang D Wang Y Suh GE Myers AC (2015) A hardware designlanguage for timing-sensitive information-flow security In ACMSIGARCH computer architecture news vol 43 ACM pp 503ndash516
58 Zhang S Wright A Bourgeat T Arvind A (2018) Composablebuilding blocks to open up processor design In 2018 51stannual IEEEACM international symposium on microarchitecture(MICRO) IEEE pp 68ndash81
59 Zhang T Lee RB (2014) New models of cache architecturescharacterizing information leakage from cache side channels InProceedings of the 30th annual computer security applicationsconference ACM pp 96ndash105
60 Zhang Y Parikh D Sankaranarayanan K Skadron K Stan M(2003) HotLeakage a temperature-aware model of subthresholdand gate leakage for architects University of Virginia Dept ofComputer Science Tech Report CS-2003 5
Publisherrsquos Note Springer Nature remains neutral with regard tojurisdictional claims in published maps and institutional affiliations
J Hardw Syst Secur (2019) 3397ndash425 425
Table7
Rul
esfo
rco
mbi
ning
two
adja
cent
step
s
Firs
tSe
cond
Com
mute
Com
mute
Union
Rule
orC
ombi
ned
Firs
tSe
cond
Com
mute
Com
mute
Union
Rule
orC
ombi
ned
Rule1
Rule2
ReduceRule
Step
Rule1
Rule2
ReduceRule
Step
aa
yes
yes
yes
aa
inv
ano
noye
sa
aa
ali
as
nono
yes
aali
as
ain
va
ali
as
yes
yes
yes
aali
as
ad
nono
yes
da
inv
dye
sye
sye
sd
au
nono
nominus
ain
vu
nono
nominus
aa
inv
nono
yes
ain
va
inv
ain
vye
sye
sye
sa
inv
aa
ali
asin
vye
sye
sye
sa
ain
va
ali
asin
vye
sye
sye
sU
nio
n(a
inva
ali
asin
v)
ad
inv
yes
yes
yes
aa
inv
din
vye
sye
sye
sU
nio
n(a
invd
inv)
au
inv
nono
nominus
ain
vu
inv
noye
sno
minusa
ali
as
ano
noye
sa
aali
asin
va
yes
yes
yes
a
aali
as
aali
as
yes
yes
yes
aali
as
aali
asin
va
ali
as
nono
yes
aali
as
aali
as
dno
noye
sd
aali
asin
vd
yes
yes
yes
d
aali
as
uno
nono
minusa
ali
asin
vu
nono
nominus
aali
as
ain
vye
sye
sye
sa
ali
as
aali
asin
va
inv
yes
yes
yes
Unio
n(a
inva
ali
asin
v)
aali
as
aali
asin
vno
noye
sa
ali
asin
va
ali
asin
va
ali
asin
vye
sye
sye
sa
ali
asin
v
aali
as
din
vye
sye
sye
sa
ali
as
aali
asin
vd
inv
yes
yes
yes
Unio
n(d
inva
ali
asin
v)
aali
as
uin
vno
nono
minusa
ali
asin
vu
inv
noye
sno
minusd
ano
noye
sa
din
va
yes
yes
yes
a
da
ali
as
nono
yes
aali
as
din
va
ali
as
yes
yes
yes
aali
as
dd
yes
yes
yes
dd
inv
dno
noye
sd
du
nono
nominus
din
vu
noye
sno
minusd
ain
vye
sye
sye
sd
din
va
inv
yes
yes
yes
Unio
n(a
invd
inv)
da
ali
asin
vye
sye
sye
sd
din
va
ali
asin
vye
sye
sye
sU
nio
n(d
inva
ali
asin
v)
dd
inv
nono
yes
din
vd
inv
din
vye
sye
sye
sd
inv
du
inv
noye
sno
minusd
inv
uin
vno
yes
nominus
ua
nono
nominus
uin
va
nono
nominus
ua
ali
as
nono
nominus
uin
va
ali
as
nono
nominus
ud
nono
nominus
uin
vd
noye
sno
minusu
uye
sye
sye
su
uin
vu
nono
yes
u
ua
inv
nono
nominus
uin
va
inv
noye
sno
minusu
aali
asin
vno
nono
minusu
inv
aali
asin
vno
yes
nominus
ud
inv
noye
sno
minusu
inv
din
vno
yes
nominus
uu
inv
nono
yes
uin
vu
inv
uin
vye
sye
sye
su
inv
J Hardw Syst Secur (2019) 3397ndash425 421
Reduce Rulerdquo and has no Union result for the ldquoCombinedSteprdquo Column in Table 7
ndash Reduction Rule 1 For two u operations although u isunknown both of the accesses target on the same u socan be reduced to only keep the second access in thememory sequence
ndash Reduction Rule 2 For two known adjacent memoryaccessndashrelated operations (known access operation)they always result in a deterministic state of the secondmemory accessndashrelated cache block so these two stepscan be reduced to only one step
ndash Reduction Rule 3 For two adjacent steps if onestep is known access operation and another one isknown inv operation no matter what order theyhave and the address they refer to is the same these twocan be reduced to one step which is the second step
B423 Union Rules
Suppose there are two adjacent steps M and N for memorysequences M N If combining M andN leads to the same timing observation result ie M N and Union(M N) will havethe same timing observation information in the final step forthe attacker we can combine step M and N to be a joint onestep for this memory sequence defined as Union(M N)Two adjacent steps that can be combined are discussed inthe following cases
ndash Union Rule 1 Two invalidations to two knowndifferent memory addresses can be applied UnionRule 1 known inv operation are two operations bothinvalidating some known address therefore they canbe combined to only one step The Union Rule can becontinuously done to union all the adjacent invalidationstep that invalidates known different memory addresses
B424 Final Check Rules
Each long memory sequence will recursively apply thesethree categorizations of the rules in the order Com-mute Rules first to put known access operations andknown inv operation that targets the same address as nearas possible and u operations and not u operations areputting together as much as possible The Reduced Rulesare then checked and applied to the processed memorysequence to reduce the steps Then the Union Rule isapplied to the processed memory sequence
The recursion at each application to these threecategorizations of the rules should be always applied andreduce at least one step until the resulting sequence matchesone of the two possible cases
bull the long (β gt 3) memory sequence with u operation
and not u operation is further reduced to a sequencewhere there are at most three steps in the followingpatterns or less
ndash u operation not u operation u operation
ndash not u operation u operation not u operation
There might be possible extra or AinvV inv beforethese three-step pattern where
bull An extra in the first step will not influencethe result and can be directly removed
bull If an extra AinvV inv in the first step
ndash If followed by known access ope-ration AinvV inv can be removeddue to the actual state further put intothe cache block
ndash If followed by known inv opera-t ion or V inv
u AinvV inv can also beremoved since the memory locationis repeatedly flushed by the two steps
ndash If followed by Vu worst case will beAinvV inv Vu not u operation u operationwhich is either an effective vul-nerability according to Table 2and reduction rules shown inSection 33 or AinvV inv Vu Ainv
d V invd u operation where
Vu Ainvd V inv
d can further beapplied Commute Rule 2 to reduceand be within three steps
In this case the steps are finally within three steps andthe checking is done
bull There exist two adjacent steps that cannot be appliedany Rules above and requires the Rest Checking
The only left two adjacent steps that cannot be appliedby any of the three categorizations of the Rules are thefollowing
ndash AaVaAaalias Vaalias AdVdAinva V inv
a
Ainvaalias V inv
aalias Vu ndash AaVaAaalias Vaalias V inv
u ndash Vu AaVaAaalias Vaalias AdVd
Ainva V inv
a Ainvaalias V inv
aalias ndash V inv
u AaVaAaalias Vaalias We manually checked all of the two adjacent step patterns
above and found that adding extra step before or after these
J Hardw Syst Secur (2019) 3397ndash425422
Algorithm 2 -Step ( 3) pattern reduction
Input number of steps of the pattern
a two-dimensional dynamic-size array [0] contains the states of each step of the original pattern in
order [1] [2] are empty initially
Output reduced effective vulnerability pattern(s) array It will be an empty list if the original pattern does not
correspond to an effective vulnerability
1 = Oslash
2 while contain( ) and index not 0 do3 = 1 ( )
4 end while5 while ( contain( ) and index not 0) or ( contain( ) and index not 0) do6 = 2 ( )
7 end while8 while ( is ineffecitve or has interval effective three steps) do9 = ( )
10 = ( )
11 = ( )
12 if ( is ineffecitve or has interval effective three steps) then13 += ( )
14 end if15 end while16 return
two steps can either generate two adjacent step patterns thatbe processed by the three Rules where further step can bereduced or construct effective vulnerability according toTable 2 and reduction rules shown in Section 33 wherethe corresponding pattern can be treated effective and thechecking is done
B43 Algorithm for Reducing and CheckingMemorySequence
Algorithm 2 is used to (i) reduce a β-step (β gt 3)pattern to a three-step pattern thus demonstrating that thecorresponding β gt 3 step pattern actually is equivalent tothe output three-step pattern and represents a vulnerabilitythat is captured by an existing three-step pattern or (ii)demonstrate that the β-step pattern can be mapped to one ormore three-step vulnerabilities It is not possible for a β-stepvulnerability pattern to not be either (i) or (ii) after doing theRule applications Key outcome of our analysis is that anyβ-step pattern is not a vulnerability or if it is a vulnerabilityit maps to either outputs (i) or (ii) of the algorithm
Inside the Algorithm 2 contain() represents a function tocheck if a list contains a corresponding state is ineffective()represents a function that checks the corresponding memorysequence does not contain any effective three-stepshas interval effective three steps() represents a functionthat check if the corresponding memory sequence can bemapped to one or more three-step vulnerabilities
B44 Summary
In conclusion the three-step model can model all possibletiming-based cache vulnerability in normal caches Vulner-abilities which are represented by more than three steps canbe always reduced to one (or more) vulnerabilities from ourthree-step model and thus using more than three step is notnecessary
References
1 Acıicmez O Koc CK (2006) Trace-driven cache attacks on AES(short paper) In International conference on information andcommunications security Springer pp 112ndash121
2 Agrawal D Archambeault B Rao JR Rohatgi P (2002) TheEM side-channel (s) In International workshop on cryptographichardware and embedded systems Springer pp 29ndash45
3 Bernstein DJ (2005) Cache-timing attacks on AES4 Binkert N Beckmann B Black G Reinhardt SK Saidi A Basu
A Hestness J Hower DR Krishna T Sardashti S et al (2011)The gem5 simulator ACM SIGARCH computer architecture news39(2)1ndash7
5 Bonneau J Mironov I (2006) Cache-collision timing attacksagainst AES In International workshop on cryptographichardware and embedded systems Springer pp 201ndash215
6 Borghoff J Canteaut A Guneysu T Kavun EB Knezevic MKnudsen LR Leander G Nikov V Paar C Rechberger Cet al (2012) Princendasha low-latency block cipher for pervasivecomputing applications In International conference on the theoryand application of cryptology and information security Springerpp 208ndash225
J Hardw Syst Secur (2019) 3397ndash425 423
7 Bourgeat T Lebedev I Wright A Zhang S Devadas S et al (2019)MI6 Secure enclaves in a speculative out-of-order processorIn Proceedings of the 52nd Annual IEEEACM InternationalSymposium on Microarchitecture ACM pp 42ndash56
8 Chen S Liu F Mi Z Zhang Y Lee RB Chen H Wang X(2018) Leveraging hardware transactional memory for cache side-channel defenses In Proceedings of the 2018 on Asia conferenceon computer and communications security ACM pp 601ndash608
9 Clark SS Ransford B Rahmati A Guineau S Sorber J Xu WFu K (2013) Wattsupdoc power side channels to nonintrusivelydiscover untargeted malware on embedded medical devices InPresented as part of the 2013 USENIX workshop on healthinformation technologies
10 Costan V Lebedev IA Devadas S (2016) Sanctum minimalhardware extensions for strong software isolation In USENIXsecurity symposium pp 857ndash874
11 Deng S Xiong W Szefer J (2018) Cache timing side-channel vulnerability checking with computation tree logic InProceedings of the 7th international workshop on hardware andarchitectural support for security and privacy ACM p 2
12 Domnitser L Jaleel A Loew J Abu-Ghazaleh N Ponomarev D(2012) Non-monopolizable caches low-complexity mitigation ofcache side channel attacks ACM Transactions on Architectureand Code Optimization (TACO) 8(4)35
13 Doychev G Kopf B (2017) Rigorous analysis of softwarecountermeasures against cache attacks In Proceedings of the 38thACM SIGPLAN conference on programming language design andimplementation ACM pp 406ndash421
14 Doychev G Kopf B Mauborgne L Reineke J (2015) CacheAudita tool for the static analysis of cache side channels ACM Trans-actions on information and system security (TISSEC) 18(1)4
15 Gruss D Lettner J Schuster F Ohrimenko O Haller I CostaM (2017) Strong and efficient cache side-channel protectionusing hardware transactional memory In USENIX securitysymposium pp 217ndash233
16 Gruss D Maurice C Wagner K Mangard S (2016) Flush+ flusha fast and stealthy cache attack In International conference ondetection of intrusions and malware and vulnerability assessmentSpringer pp 279ndash299
17 Gruss D Spreitzer R Mangard S (2015) Cache template attacksautomating attacks on inclusive last-level caches In USENIXsecurity symposium pp 897ndash912
18 Guanciale R Nemati H Baumann C Dam M (2016) Cache stor-age channels alias-driven attacks and verified countermeasuresIn 2016 IEEE symposium on security and privacy (SP) IEEEpp 38ndash55
19 Gullasch D Bangerter E Krenn S (2011) Cache gamesndashbringingaccess-based cache attacks on AES to practice In 2011 IEEEsymposium on security and privacy (SP) IEEE pp 490ndash505
20 He Z Lee RB (2017) How secure is your cache against side-channel attacks In Proceedings of the 50th annual IEEEACMinternational symposium on microarchitecture ACM pp 341ndash353
21 Intel C (2015) Improving real-time performance by utilizing cacheallocation technology Intel Corporation
22 Kayaalp M Khasawneh KN Esfeden HA Elwell J Abu-Ghazaleh N Ponomarev D Jaleel A (2017) RIC relaxed inclusioncaches for mitigating LLC side-channel attacks In 2017 54thACMEDACIEEE design automation conference (DAC) IEEEpp 1ndash6
23 Keramidas G Antonopoulos A Serpanos DN Kaxiras S (2008)Non deterministic caches a simple and effective defense againstside channel attacks Des Autom Embed Syst 12(3)221ndash230
24 Kessler RE Hill MD (1992) Page placement algorithms for largerealindexed caches ACM Trans Comput Syst (TOCS) 10(4)338ndash359
25 Kiriansky V Lebedev I Amarasinghe S Devadas S Emer J(2018) DAWG a defense against cache timing attacks in spec-ulative execution processors In 2018 51st Annual IEEEACMinternational symposium on microarchitecture (MICRO) IEEEpp 974ndash987
26 Kocher P Genkin D Gruss D Haas W Hamburg M LippM Mangard S Prescher T Schwarz M Yarom Y (2018)Spectre attacks exploiting speculativeexecution In 2019 IEEESymposium on Security and Privacy (SP) IEEE pp 1ndash19
27 Lee RB Kwan P McGregor JP Dwoskin J Wang Z (2005)Architecture for protecting critical secrets in microprocessorsIn ACM SIGARCH computer architecture news vol 33 IEEEComputer Society pp 2ndash13
28 Lee Y Waterman A Avizienis R Cook H Sun C StojanovicV Asanovic K (2014) A 45nm 13 GHz 167 double-precisionGFLOPSW RISC-V processor with vector accelerators InEuropean solid state circuits conference (ESSCIRC) ESSCIRC2014-40th IEEE pp 199ndash202
29 Liu F Ge Q Yarom Y Mckeen F Rozas C Heiser G Lee RB(2016) CATalyst defeating last-level cache side channel attacks incloud computing In 2016 IEEE international symposium on highperformance computer architecture (HPCA) IEEE pp 406ndash418
30 Liu F Lee RB (2014) Random fill cache architecture In2014 47th annual IEEEACM international symposium onmicroarchitecture (MICRO) IEEE pp 203ndash215
31 Liu F Wu H Mai K Lee RB (2016) Newcache secure cachearchitecture thwarting cache side-channel attacks IEEE Micro36(5)8ndash16
32 Luk CK Cohn R Muth R Patil H Klauser A Lowney G WallaceS Reddi VJ Hazelwood K (2005) Pin building customizedprogram analysis tools with dynamic instrumentation In ACMsigplan notices vol 40 ACM pp 190ndash200
33 Masti RJ Rai D Ranganathan A Muller C Thiele L Capkun S(2015) Thermal covert channels on multi-core platforms In 24thUSENIX security symposium (USENIX security 15) pp 865ndash880
34 Osvik DA Shamir A Tromer E (2006) Cache attacks andcountermeasures the case of AES In Cryptographersrsquo track atthe RSA conference Springer pp 1ndash20
35 Patel A Afram F Chen S Ghose K (2011) MARSS Afull system simulator for multicore x86 CPUs In 2011 48thACMEDACIEEE design automation conference (DAC) IEEEpp 1050ndash1055
36 Percival C (2005) Cache missing for fun and profit BSDCan 2005Ottawa 2005 httpwwwdaemonologynetpapershttpdf
37 Hammarlund P Martinez AJ Bajwa AA Hill DL Hallnor EJiang H Dixon M Derr M Hunsaker M Kumar R Osborne RB(2014) Haswell The fourth-generation intel core processor IEEEMicro 1034(2)6ndash20
38 Qureshi MK (2018) CEASER mitigating conflict-based cacheattacks via encrypted-address and remapping In 2018 51Stannual IEEEACM international symposium on microarchitecture(MICRO) IEEE pp 775ndash787
39 Sanchez D Kozyrakis C (2013) ZSIm fast and accuratemicroarchitectural simulation of thousand-core systems In ACMSIGARCH computer architecture news vol 41 ACM pp 475ndash486
40 Schwarz M Schwarzl M Lipp M Gruss D (2018) NetSpectreread arbitrary memory over network In European Symposium onResearch in Computer Security Springer pp 279ndash299
41 Seznec A (1993) A case for two-way skewed-associative cachesACM SIGARCH computer architecture news 21(2)169ndash178
42 Sharkey J Ponomarev D Ghose K (2005) M-sim a flexiblemultithreaded architectural simulation environment Techenicalreport Department of Computer Science State University ofNew York at Binghamton In Technical Report 20012 CompaqComputer Corporation Aug 2001
J Hardw Syst Secur (2019) 3397ndash425424
43 Shivakumar P Jouppi NP (2001) Cacti 30 an integrated cachetiming power and area model In Technical Report 20012Compaq Computer Corporation Aug 2001
44 Taylor G Davies P Farmwald M (1990) The TLB slice-alow-cost high-speed address translation mechanism In 17thannual international symposium on computer architecture 1990Proceedings IEEE pp 355ndash363
45 Thoziyoor S Muralimanohar N Ahn JH Jouppi NP (2008)CACTI 51 Technical Report HPL-2008-20 HP Labs
46 Trippel C Lustig D Martonosi M (2018) MeltdownPrimeand SpectrePrime automatically-synthesized attacks exploitinginvalidation-based coherence protocols arXiv180203802
47 Wang S Wang P Liu X Zhang D Wu D (2017) CacheDidentifying cache-based timing channels in production softwareIn 26th USENIX security symposium USENIX association
48 Wang Y Ferraiuolo A Zhang D Myers AC Suh GE (2016)SecDCP secure dynamic cache partitioning for efficient timingchannel protection In 2016 53nd ACMEDACIEEE designautomation conference (DAC) IEEE pp 1ndash6
49 Wang Z Lee RB (2007) New cache designs for thwarting softwarecache-based side channel attacks In ACM SIGARCH computerarchitecture news vol 35 ACM pp 494ndash505
50 Wang Z Lee RB (2008) A novel cache architecture withenhanced performance and security In 2008 41st IEEEACMinternational symposium on Microarchitecture 2008 MICRO-41IEEE pp 83ndash93
51 Werner M Unterluggauer T Giner L Schwarz M Gruss D Man-gard S (2019) ScatterCache thwarting cache attacks via cache setrandomization In 28th USENIX security symposium (USENIXSecurity 19) USENIX Association Santa Clara CA httpswwwusenixorgconferenceusenixsecurity19presentationwerner
52 Yan M Choi J Skarlatos D Morrison A Fletcher C Torrellas J(2018) InvisiSpec making speculative execution invisible in thecache hierarchy In 2018 51st annual IEEEACM internationalsymposium on microarchitecture (MICRO) IEEE pp 428ndash441
53 Yan M Gopireddy B Shull T Torrellas J (2017) Secure hierarchy-aware cache replacement policy (SHARP) defending againstcache-based side channel attacks In Proceedings of the 44thannual international symposium on computer architecture ACMpp 347ndash360
54 Yao F Doroslovacki M Venkataramani G (2018) Are coherenceprotocol states vulnerable to information leakage In 2018IEEE international symposium on high performance computerarchitecture (HPCA) IEEE pp 168ndash179
55 Yarom Y Falkner K (2014) FLUSH+ RELOAD a high resolutionlow noise L3 cache side-channel attack In USENIX securitysymposium pp 719ndash732
56 Zhang D Askarov A Myers AC (2012) Language-based controland mitigation of timing channels ACM SIGPLAN Not 47(6)99ndash110
57 Zhang D Wang Y Suh GE Myers AC (2015) A hardware designlanguage for timing-sensitive information-flow security In ACMSIGARCH computer architecture news vol 43 ACM pp 503ndash516
58 Zhang S Wright A Bourgeat T Arvind A (2018) Composablebuilding blocks to open up processor design In 2018 51stannual IEEEACM international symposium on microarchitecture(MICRO) IEEE pp 68ndash81
59 Zhang T Lee RB (2014) New models of cache architecturescharacterizing information leakage from cache side channels InProceedings of the 30th annual computer security applicationsconference ACM pp 96ndash105
60 Zhang Y Parikh D Sankaranarayanan K Skadron K Stan M(2003) HotLeakage a temperature-aware model of subthresholdand gate leakage for architects University of Virginia Dept ofComputer Science Tech Report CS-2003 5
Publisherrsquos Note Springer Nature remains neutral with regard tojurisdictional claims in published maps and institutional affiliations
J Hardw Syst Secur (2019) 3397ndash425 425
Reduce Rulerdquo and has no Union result for the ldquoCombinedSteprdquo Column in Table 7
ndash Reduction Rule 1 For two u operations although u isunknown both of the accesses target on the same u socan be reduced to only keep the second access in thememory sequence
ndash Reduction Rule 2 For two known adjacent memoryaccessndashrelated operations (known access operation)they always result in a deterministic state of the secondmemory accessndashrelated cache block so these two stepscan be reduced to only one step
ndash Reduction Rule 3 For two adjacent steps if onestep is known access operation and another one isknown inv operation no matter what order theyhave and the address they refer to is the same these twocan be reduced to one step which is the second step
B423 Union Rules
Suppose there are two adjacent steps M and N for memorysequences M N If combining M andN leads to the same timing observation result ie M N and Union(M N) will havethe same timing observation information in the final step forthe attacker we can combine step M and N to be a joint onestep for this memory sequence defined as Union(M N)Two adjacent steps that can be combined are discussed inthe following cases
ndash Union Rule 1 Two invalidations to two knowndifferent memory addresses can be applied UnionRule 1 known inv operation are two operations bothinvalidating some known address therefore they canbe combined to only one step The Union Rule can becontinuously done to union all the adjacent invalidationstep that invalidates known different memory addresses
B424 Final Check Rules
Each long memory sequence will recursively apply thesethree categorizations of the rules in the order Com-mute Rules first to put known access operations andknown inv operation that targets the same address as nearas possible and u operations and not u operations areputting together as much as possible The Reduced Rulesare then checked and applied to the processed memorysequence to reduce the steps Then the Union Rule isapplied to the processed memory sequence
The recursion at each application to these threecategorizations of the rules should be always applied andreduce at least one step until the resulting sequence matchesone of the two possible cases
bull the long (β gt 3) memory sequence with u operation
and not u operation is further reduced to a sequencewhere there are at most three steps in the followingpatterns or less
ndash u operation not u operation u operation
ndash not u operation u operation not u operation
There might be possible extra or AinvV inv beforethese three-step pattern where
bull An extra in the first step will not influencethe result and can be directly removed
bull If an extra AinvV inv in the first step
ndash If followed by known access ope-ration AinvV inv can be removeddue to the actual state further put intothe cache block
ndash If followed by known inv opera-t ion or V inv
u AinvV inv can also beremoved since the memory locationis repeatedly flushed by the two steps
ndash If followed by Vu worst case will beAinvV inv Vu not u operation u operationwhich is either an effective vul-nerability according to Table 2and reduction rules shown inSection 33 or AinvV inv Vu Ainv
d V invd u operation where
Vu Ainvd V inv
d can further beapplied Commute Rule 2 to reduceand be within three steps
In this case the steps are finally within three steps andthe checking is done
bull There exist two adjacent steps that cannot be appliedany Rules above and requires the Rest Checking
The only left two adjacent steps that cannot be appliedby any of the three categorizations of the Rules are thefollowing
ndash AaVaAaalias Vaalias AdVdAinva V inv
a
Ainvaalias V inv
aalias Vu ndash AaVaAaalias Vaalias V inv
u ndash Vu AaVaAaalias Vaalias AdVd
Ainva V inv
a Ainvaalias V inv
aalias ndash V inv
u AaVaAaalias Vaalias We manually checked all of the two adjacent step patterns
above and found that adding extra step before or after these
J Hardw Syst Secur (2019) 3397ndash425422
Algorithm 2 -Step ( 3) pattern reduction
Input number of steps of the pattern
a two-dimensional dynamic-size array [0] contains the states of each step of the original pattern in
order [1] [2] are empty initially
Output reduced effective vulnerability pattern(s) array It will be an empty list if the original pattern does not
correspond to an effective vulnerability
1 = Oslash
2 while contain( ) and index not 0 do3 = 1 ( )
4 end while5 while ( contain( ) and index not 0) or ( contain( ) and index not 0) do6 = 2 ( )
7 end while8 while ( is ineffecitve or has interval effective three steps) do9 = ( )
10 = ( )
11 = ( )
12 if ( is ineffecitve or has interval effective three steps) then13 += ( )
14 end if15 end while16 return
two steps can either generate two adjacent step patterns thatbe processed by the three Rules where further step can bereduced or construct effective vulnerability according toTable 2 and reduction rules shown in Section 33 wherethe corresponding pattern can be treated effective and thechecking is done
B43 Algorithm for Reducing and CheckingMemorySequence
Algorithm 2 is used to (i) reduce a β-step (β gt 3)pattern to a three-step pattern thus demonstrating that thecorresponding β gt 3 step pattern actually is equivalent tothe output three-step pattern and represents a vulnerabilitythat is captured by an existing three-step pattern or (ii)demonstrate that the β-step pattern can be mapped to one ormore three-step vulnerabilities It is not possible for a β-stepvulnerability pattern to not be either (i) or (ii) after doing theRule applications Key outcome of our analysis is that anyβ-step pattern is not a vulnerability or if it is a vulnerabilityit maps to either outputs (i) or (ii) of the algorithm
Inside the Algorithm 2 contain() represents a function tocheck if a list contains a corresponding state is ineffective()represents a function that checks the corresponding memorysequence does not contain any effective three-stepshas interval effective three steps() represents a functionthat check if the corresponding memory sequence can bemapped to one or more three-step vulnerabilities
B44 Summary
In conclusion the three-step model can model all possibletiming-based cache vulnerability in normal caches Vulner-abilities which are represented by more than three steps canbe always reduced to one (or more) vulnerabilities from ourthree-step model and thus using more than three step is notnecessary
References
1 Acıicmez O Koc CK (2006) Trace-driven cache attacks on AES(short paper) In International conference on information andcommunications security Springer pp 112ndash121
2 Agrawal D Archambeault B Rao JR Rohatgi P (2002) TheEM side-channel (s) In International workshop on cryptographichardware and embedded systems Springer pp 29ndash45
3 Bernstein DJ (2005) Cache-timing attacks on AES4 Binkert N Beckmann B Black G Reinhardt SK Saidi A Basu
A Hestness J Hower DR Krishna T Sardashti S et al (2011)The gem5 simulator ACM SIGARCH computer architecture news39(2)1ndash7
5 Bonneau J Mironov I (2006) Cache-collision timing attacksagainst AES In International workshop on cryptographichardware and embedded systems Springer pp 201ndash215
6 Borghoff J Canteaut A Guneysu T Kavun EB Knezevic MKnudsen LR Leander G Nikov V Paar C Rechberger Cet al (2012) Princendasha low-latency block cipher for pervasivecomputing applications In International conference on the theoryand application of cryptology and information security Springerpp 208ndash225
J Hardw Syst Secur (2019) 3397ndash425 423
7 Bourgeat T Lebedev I Wright A Zhang S Devadas S et al (2019)MI6 Secure enclaves in a speculative out-of-order processorIn Proceedings of the 52nd Annual IEEEACM InternationalSymposium on Microarchitecture ACM pp 42ndash56
8 Chen S Liu F Mi Z Zhang Y Lee RB Chen H Wang X(2018) Leveraging hardware transactional memory for cache side-channel defenses In Proceedings of the 2018 on Asia conferenceon computer and communications security ACM pp 601ndash608
9 Clark SS Ransford B Rahmati A Guineau S Sorber J Xu WFu K (2013) Wattsupdoc power side channels to nonintrusivelydiscover untargeted malware on embedded medical devices InPresented as part of the 2013 USENIX workshop on healthinformation technologies
10 Costan V Lebedev IA Devadas S (2016) Sanctum minimalhardware extensions for strong software isolation In USENIXsecurity symposium pp 857ndash874
11 Deng S Xiong W Szefer J (2018) Cache timing side-channel vulnerability checking with computation tree logic InProceedings of the 7th international workshop on hardware andarchitectural support for security and privacy ACM p 2
12 Domnitser L Jaleel A Loew J Abu-Ghazaleh N Ponomarev D(2012) Non-monopolizable caches low-complexity mitigation ofcache side channel attacks ACM Transactions on Architectureand Code Optimization (TACO) 8(4)35
13 Doychev G Kopf B (2017) Rigorous analysis of softwarecountermeasures against cache attacks In Proceedings of the 38thACM SIGPLAN conference on programming language design andimplementation ACM pp 406ndash421
14 Doychev G Kopf B Mauborgne L Reineke J (2015) CacheAudita tool for the static analysis of cache side channels ACM Trans-actions on information and system security (TISSEC) 18(1)4
15 Gruss D Lettner J Schuster F Ohrimenko O Haller I CostaM (2017) Strong and efficient cache side-channel protectionusing hardware transactional memory In USENIX securitysymposium pp 217ndash233
16 Gruss D Maurice C Wagner K Mangard S (2016) Flush+ flusha fast and stealthy cache attack In International conference ondetection of intrusions and malware and vulnerability assessmentSpringer pp 279ndash299
17 Gruss D Spreitzer R Mangard S (2015) Cache template attacksautomating attacks on inclusive last-level caches In USENIXsecurity symposium pp 897ndash912
18 Guanciale R Nemati H Baumann C Dam M (2016) Cache stor-age channels alias-driven attacks and verified countermeasuresIn 2016 IEEE symposium on security and privacy (SP) IEEEpp 38ndash55
19 Gullasch D Bangerter E Krenn S (2011) Cache gamesndashbringingaccess-based cache attacks on AES to practice In 2011 IEEEsymposium on security and privacy (SP) IEEE pp 490ndash505
20 He Z Lee RB (2017) How secure is your cache against side-channel attacks In Proceedings of the 50th annual IEEEACMinternational symposium on microarchitecture ACM pp 341ndash353
21 Intel C (2015) Improving real-time performance by utilizing cacheallocation technology Intel Corporation
22 Kayaalp M Khasawneh KN Esfeden HA Elwell J Abu-Ghazaleh N Ponomarev D Jaleel A (2017) RIC relaxed inclusioncaches for mitigating LLC side-channel attacks In 2017 54thACMEDACIEEE design automation conference (DAC) IEEEpp 1ndash6
23 Keramidas G Antonopoulos A Serpanos DN Kaxiras S (2008)Non deterministic caches a simple and effective defense againstside channel attacks Des Autom Embed Syst 12(3)221ndash230
24 Kessler RE Hill MD (1992) Page placement algorithms for largerealindexed caches ACM Trans Comput Syst (TOCS) 10(4)338ndash359
25 Kiriansky V Lebedev I Amarasinghe S Devadas S Emer J(2018) DAWG a defense against cache timing attacks in spec-ulative execution processors In 2018 51st Annual IEEEACMinternational symposium on microarchitecture (MICRO) IEEEpp 974ndash987
26 Kocher P Genkin D Gruss D Haas W Hamburg M LippM Mangard S Prescher T Schwarz M Yarom Y (2018)Spectre attacks exploiting speculativeexecution In 2019 IEEESymposium on Security and Privacy (SP) IEEE pp 1ndash19
27 Lee RB Kwan P McGregor JP Dwoskin J Wang Z (2005)Architecture for protecting critical secrets in microprocessorsIn ACM SIGARCH computer architecture news vol 33 IEEEComputer Society pp 2ndash13
28 Lee Y Waterman A Avizienis R Cook H Sun C StojanovicV Asanovic K (2014) A 45nm 13 GHz 167 double-precisionGFLOPSW RISC-V processor with vector accelerators InEuropean solid state circuits conference (ESSCIRC) ESSCIRC2014-40th IEEE pp 199ndash202
29 Liu F Ge Q Yarom Y Mckeen F Rozas C Heiser G Lee RB(2016) CATalyst defeating last-level cache side channel attacks incloud computing In 2016 IEEE international symposium on highperformance computer architecture (HPCA) IEEE pp 406ndash418
30 Liu F Lee RB (2014) Random fill cache architecture In2014 47th annual IEEEACM international symposium onmicroarchitecture (MICRO) IEEE pp 203ndash215
31 Liu F Wu H Mai K Lee RB (2016) Newcache secure cachearchitecture thwarting cache side-channel attacks IEEE Micro36(5)8ndash16
32 Luk CK Cohn R Muth R Patil H Klauser A Lowney G WallaceS Reddi VJ Hazelwood K (2005) Pin building customizedprogram analysis tools with dynamic instrumentation In ACMsigplan notices vol 40 ACM pp 190ndash200
33 Masti RJ Rai D Ranganathan A Muller C Thiele L Capkun S(2015) Thermal covert channels on multi-core platforms In 24thUSENIX security symposium (USENIX security 15) pp 865ndash880
34 Osvik DA Shamir A Tromer E (2006) Cache attacks andcountermeasures the case of AES In Cryptographersrsquo track atthe RSA conference Springer pp 1ndash20
35 Patel A Afram F Chen S Ghose K (2011) MARSS Afull system simulator for multicore x86 CPUs In 2011 48thACMEDACIEEE design automation conference (DAC) IEEEpp 1050ndash1055
36 Percival C (2005) Cache missing for fun and profit BSDCan 2005Ottawa 2005 httpwwwdaemonologynetpapershttpdf
37 Hammarlund P Martinez AJ Bajwa AA Hill DL Hallnor EJiang H Dixon M Derr M Hunsaker M Kumar R Osborne RB(2014) Haswell The fourth-generation intel core processor IEEEMicro 1034(2)6ndash20
38 Qureshi MK (2018) CEASER mitigating conflict-based cacheattacks via encrypted-address and remapping In 2018 51Stannual IEEEACM international symposium on microarchitecture(MICRO) IEEE pp 775ndash787
39 Sanchez D Kozyrakis C (2013) ZSIm fast and accuratemicroarchitectural simulation of thousand-core systems In ACMSIGARCH computer architecture news vol 41 ACM pp 475ndash486
40 Schwarz M Schwarzl M Lipp M Gruss D (2018) NetSpectreread arbitrary memory over network In European Symposium onResearch in Computer Security Springer pp 279ndash299
41 Seznec A (1993) A case for two-way skewed-associative cachesACM SIGARCH computer architecture news 21(2)169ndash178
42 Sharkey J Ponomarev D Ghose K (2005) M-sim a flexiblemultithreaded architectural simulation environment Techenicalreport Department of Computer Science State University ofNew York at Binghamton In Technical Report 20012 CompaqComputer Corporation Aug 2001
J Hardw Syst Secur (2019) 3397ndash425424
43 Shivakumar P Jouppi NP (2001) Cacti 30 an integrated cachetiming power and area model In Technical Report 20012Compaq Computer Corporation Aug 2001
44 Taylor G Davies P Farmwald M (1990) The TLB slice-alow-cost high-speed address translation mechanism In 17thannual international symposium on computer architecture 1990Proceedings IEEE pp 355ndash363
45 Thoziyoor S Muralimanohar N Ahn JH Jouppi NP (2008)CACTI 51 Technical Report HPL-2008-20 HP Labs
46 Trippel C Lustig D Martonosi M (2018) MeltdownPrimeand SpectrePrime automatically-synthesized attacks exploitinginvalidation-based coherence protocols arXiv180203802
47 Wang S Wang P Liu X Zhang D Wu D (2017) CacheDidentifying cache-based timing channels in production softwareIn 26th USENIX security symposium USENIX association
48 Wang Y Ferraiuolo A Zhang D Myers AC Suh GE (2016)SecDCP secure dynamic cache partitioning for efficient timingchannel protection In 2016 53nd ACMEDACIEEE designautomation conference (DAC) IEEE pp 1ndash6
49 Wang Z Lee RB (2007) New cache designs for thwarting softwarecache-based side channel attacks In ACM SIGARCH computerarchitecture news vol 35 ACM pp 494ndash505
50 Wang Z Lee RB (2008) A novel cache architecture withenhanced performance and security In 2008 41st IEEEACMinternational symposium on Microarchitecture 2008 MICRO-41IEEE pp 83ndash93
51 Werner M Unterluggauer T Giner L Schwarz M Gruss D Man-gard S (2019) ScatterCache thwarting cache attacks via cache setrandomization In 28th USENIX security symposium (USENIXSecurity 19) USENIX Association Santa Clara CA httpswwwusenixorgconferenceusenixsecurity19presentationwerner
52 Yan M Choi J Skarlatos D Morrison A Fletcher C Torrellas J(2018) InvisiSpec making speculative execution invisible in thecache hierarchy In 2018 51st annual IEEEACM internationalsymposium on microarchitecture (MICRO) IEEE pp 428ndash441
53 Yan M Gopireddy B Shull T Torrellas J (2017) Secure hierarchy-aware cache replacement policy (SHARP) defending againstcache-based side channel attacks In Proceedings of the 44thannual international symposium on computer architecture ACMpp 347ndash360
54 Yao F Doroslovacki M Venkataramani G (2018) Are coherenceprotocol states vulnerable to information leakage In 2018IEEE international symposium on high performance computerarchitecture (HPCA) IEEE pp 168ndash179
55 Yarom Y Falkner K (2014) FLUSH+ RELOAD a high resolutionlow noise L3 cache side-channel attack In USENIX securitysymposium pp 719ndash732
56 Zhang D Askarov A Myers AC (2012) Language-based controland mitigation of timing channels ACM SIGPLAN Not 47(6)99ndash110
57 Zhang D Wang Y Suh GE Myers AC (2015) A hardware designlanguage for timing-sensitive information-flow security In ACMSIGARCH computer architecture news vol 43 ACM pp 503ndash516
58 Zhang S Wright A Bourgeat T Arvind A (2018) Composablebuilding blocks to open up processor design In 2018 51stannual IEEEACM international symposium on microarchitecture(MICRO) IEEE pp 68ndash81
59 Zhang T Lee RB (2014) New models of cache architecturescharacterizing information leakage from cache side channels InProceedings of the 30th annual computer security applicationsconference ACM pp 96ndash105
60 Zhang Y Parikh D Sankaranarayanan K Skadron K Stan M(2003) HotLeakage a temperature-aware model of subthresholdand gate leakage for architects University of Virginia Dept ofComputer Science Tech Report CS-2003 5
Publisherrsquos Note Springer Nature remains neutral with regard tojurisdictional claims in published maps and institutional affiliations
J Hardw Syst Secur (2019) 3397ndash425 425
Algorithm 2 -Step ( 3) pattern reduction
Input number of steps of the pattern
a two-dimensional dynamic-size array [0] contains the states of each step of the original pattern in
order [1] [2] are empty initially
Output reduced effective vulnerability pattern(s) array It will be an empty list if the original pattern does not
correspond to an effective vulnerability
1 = Oslash
2 while contain( ) and index not 0 do3 = 1 ( )
4 end while5 while ( contain( ) and index not 0) or ( contain( ) and index not 0) do6 = 2 ( )
7 end while8 while ( is ineffecitve or has interval effective three steps) do9 = ( )
10 = ( )
11 = ( )
12 if ( is ineffecitve or has interval effective three steps) then13 += ( )
14 end if15 end while16 return
two steps can either generate two adjacent step patterns thatbe processed by the three Rules where further step can bereduced or construct effective vulnerability according toTable 2 and reduction rules shown in Section 33 wherethe corresponding pattern can be treated effective and thechecking is done
B43 Algorithm for Reducing and CheckingMemorySequence
Algorithm 2 is used to (i) reduce a β-step (β gt 3)pattern to a three-step pattern thus demonstrating that thecorresponding β gt 3 step pattern actually is equivalent tothe output three-step pattern and represents a vulnerabilitythat is captured by an existing three-step pattern or (ii)demonstrate that the β-step pattern can be mapped to one ormore three-step vulnerabilities It is not possible for a β-stepvulnerability pattern to not be either (i) or (ii) after doing theRule applications Key outcome of our analysis is that anyβ-step pattern is not a vulnerability or if it is a vulnerabilityit maps to either outputs (i) or (ii) of the algorithm
Inside the Algorithm 2 contain() represents a function tocheck if a list contains a corresponding state is ineffective()represents a function that checks the corresponding memorysequence does not contain any effective three-stepshas interval effective three steps() represents a functionthat check if the corresponding memory sequence can bemapped to one or more three-step vulnerabilities
B44 Summary
In conclusion the three-step model can model all possibletiming-based cache vulnerability in normal caches Vulner-abilities which are represented by more than three steps canbe always reduced to one (or more) vulnerabilities from ourthree-step model and thus using more than three step is notnecessary
References
1 Acıicmez O Koc CK (2006) Trace-driven cache attacks on AES(short paper) In International conference on information andcommunications security Springer pp 112ndash121
2 Agrawal D Archambeault B Rao JR Rohatgi P (2002) TheEM side-channel (s) In International workshop on cryptographichardware and embedded systems Springer pp 29ndash45
3 Bernstein DJ (2005) Cache-timing attacks on AES4 Binkert N Beckmann B Black G Reinhardt SK Saidi A Basu
A Hestness J Hower DR Krishna T Sardashti S et al (2011)The gem5 simulator ACM SIGARCH computer architecture news39(2)1ndash7
5 Bonneau J Mironov I (2006) Cache-collision timing attacksagainst AES In International workshop on cryptographichardware and embedded systems Springer pp 201ndash215
6 Borghoff J Canteaut A Guneysu T Kavun EB Knezevic MKnudsen LR Leander G Nikov V Paar C Rechberger Cet al (2012) Princendasha low-latency block cipher for pervasivecomputing applications In International conference on the theoryand application of cryptology and information security Springerpp 208ndash225
J Hardw Syst Secur (2019) 3397ndash425 423
7 Bourgeat T Lebedev I Wright A Zhang S Devadas S et al (2019)MI6 Secure enclaves in a speculative out-of-order processorIn Proceedings of the 52nd Annual IEEEACM InternationalSymposium on Microarchitecture ACM pp 42ndash56
8 Chen S Liu F Mi Z Zhang Y Lee RB Chen H Wang X(2018) Leveraging hardware transactional memory for cache side-channel defenses In Proceedings of the 2018 on Asia conferenceon computer and communications security ACM pp 601ndash608
9 Clark SS Ransford B Rahmati A Guineau S Sorber J Xu WFu K (2013) Wattsupdoc power side channels to nonintrusivelydiscover untargeted malware on embedded medical devices InPresented as part of the 2013 USENIX workshop on healthinformation technologies
10 Costan V Lebedev IA Devadas S (2016) Sanctum minimalhardware extensions for strong software isolation In USENIXsecurity symposium pp 857ndash874
11 Deng S Xiong W Szefer J (2018) Cache timing side-channel vulnerability checking with computation tree logic InProceedings of the 7th international workshop on hardware andarchitectural support for security and privacy ACM p 2
12 Domnitser L Jaleel A Loew J Abu-Ghazaleh N Ponomarev D(2012) Non-monopolizable caches low-complexity mitigation ofcache side channel attacks ACM Transactions on Architectureand Code Optimization (TACO) 8(4)35
13 Doychev G Kopf B (2017) Rigorous analysis of softwarecountermeasures against cache attacks In Proceedings of the 38thACM SIGPLAN conference on programming language design andimplementation ACM pp 406ndash421
14 Doychev G Kopf B Mauborgne L Reineke J (2015) CacheAudita tool for the static analysis of cache side channels ACM Trans-actions on information and system security (TISSEC) 18(1)4
15 Gruss D Lettner J Schuster F Ohrimenko O Haller I CostaM (2017) Strong and efficient cache side-channel protectionusing hardware transactional memory In USENIX securitysymposium pp 217ndash233
16 Gruss D Maurice C Wagner K Mangard S (2016) Flush+ flusha fast and stealthy cache attack In International conference ondetection of intrusions and malware and vulnerability assessmentSpringer pp 279ndash299
17 Gruss D Spreitzer R Mangard S (2015) Cache template attacksautomating attacks on inclusive last-level caches In USENIXsecurity symposium pp 897ndash912
18 Guanciale R Nemati H Baumann C Dam M (2016) Cache stor-age channels alias-driven attacks and verified countermeasuresIn 2016 IEEE symposium on security and privacy (SP) IEEEpp 38ndash55
19 Gullasch D Bangerter E Krenn S (2011) Cache gamesndashbringingaccess-based cache attacks on AES to practice In 2011 IEEEsymposium on security and privacy (SP) IEEE pp 490ndash505
20 He Z Lee RB (2017) How secure is your cache against side-channel attacks In Proceedings of the 50th annual IEEEACMinternational symposium on microarchitecture ACM pp 341ndash353
21 Intel C (2015) Improving real-time performance by utilizing cacheallocation technology Intel Corporation
22 Kayaalp M Khasawneh KN Esfeden HA Elwell J Abu-Ghazaleh N Ponomarev D Jaleel A (2017) RIC relaxed inclusioncaches for mitigating LLC side-channel attacks In 2017 54thACMEDACIEEE design automation conference (DAC) IEEEpp 1ndash6
23 Keramidas G Antonopoulos A Serpanos DN Kaxiras S (2008)Non deterministic caches a simple and effective defense againstside channel attacks Des Autom Embed Syst 12(3)221ndash230
24 Kessler RE Hill MD (1992) Page placement algorithms for largerealindexed caches ACM Trans Comput Syst (TOCS) 10(4)338ndash359
25 Kiriansky V Lebedev I Amarasinghe S Devadas S Emer J(2018) DAWG a defense against cache timing attacks in spec-ulative execution processors In 2018 51st Annual IEEEACMinternational symposium on microarchitecture (MICRO) IEEEpp 974ndash987
26 Kocher P Genkin D Gruss D Haas W Hamburg M LippM Mangard S Prescher T Schwarz M Yarom Y (2018)Spectre attacks exploiting speculativeexecution In 2019 IEEESymposium on Security and Privacy (SP) IEEE pp 1ndash19
27 Lee RB Kwan P McGregor JP Dwoskin J Wang Z (2005)Architecture for protecting critical secrets in microprocessorsIn ACM SIGARCH computer architecture news vol 33 IEEEComputer Society pp 2ndash13
28 Lee Y Waterman A Avizienis R Cook H Sun C StojanovicV Asanovic K (2014) A 45nm 13 GHz 167 double-precisionGFLOPSW RISC-V processor with vector accelerators InEuropean solid state circuits conference (ESSCIRC) ESSCIRC2014-40th IEEE pp 199ndash202
29 Liu F Ge Q Yarom Y Mckeen F Rozas C Heiser G Lee RB(2016) CATalyst defeating last-level cache side channel attacks incloud computing In 2016 IEEE international symposium on highperformance computer architecture (HPCA) IEEE pp 406ndash418
30 Liu F Lee RB (2014) Random fill cache architecture In2014 47th annual IEEEACM international symposium onmicroarchitecture (MICRO) IEEE pp 203ndash215
31 Liu F Wu H Mai K Lee RB (2016) Newcache secure cachearchitecture thwarting cache side-channel attacks IEEE Micro36(5)8ndash16
32 Luk CK Cohn R Muth R Patil H Klauser A Lowney G WallaceS Reddi VJ Hazelwood K (2005) Pin building customizedprogram analysis tools with dynamic instrumentation In ACMsigplan notices vol 40 ACM pp 190ndash200
33 Masti RJ Rai D Ranganathan A Muller C Thiele L Capkun S(2015) Thermal covert channels on multi-core platforms In 24thUSENIX security symposium (USENIX security 15) pp 865ndash880
34 Osvik DA Shamir A Tromer E (2006) Cache attacks andcountermeasures the case of AES In Cryptographersrsquo track atthe RSA conference Springer pp 1ndash20
35 Patel A Afram F Chen S Ghose K (2011) MARSS Afull system simulator for multicore x86 CPUs In 2011 48thACMEDACIEEE design automation conference (DAC) IEEEpp 1050ndash1055
36 Percival C (2005) Cache missing for fun and profit BSDCan 2005Ottawa 2005 httpwwwdaemonologynetpapershttpdf
37 Hammarlund P Martinez AJ Bajwa AA Hill DL Hallnor EJiang H Dixon M Derr M Hunsaker M Kumar R Osborne RB(2014) Haswell The fourth-generation intel core processor IEEEMicro 1034(2)6ndash20
38 Qureshi MK (2018) CEASER mitigating conflict-based cacheattacks via encrypted-address and remapping In 2018 51Stannual IEEEACM international symposium on microarchitecture(MICRO) IEEE pp 775ndash787
39 Sanchez D Kozyrakis C (2013) ZSIm fast and accuratemicroarchitectural simulation of thousand-core systems In ACMSIGARCH computer architecture news vol 41 ACM pp 475ndash486
40 Schwarz M Schwarzl M Lipp M Gruss D (2018) NetSpectreread arbitrary memory over network In European Symposium onResearch in Computer Security Springer pp 279ndash299
41 Seznec A (1993) A case for two-way skewed-associative cachesACM SIGARCH computer architecture news 21(2)169ndash178
42 Sharkey J Ponomarev D Ghose K (2005) M-sim a flexiblemultithreaded architectural simulation environment Techenicalreport Department of Computer Science State University ofNew York at Binghamton In Technical Report 20012 CompaqComputer Corporation Aug 2001
J Hardw Syst Secur (2019) 3397ndash425424
43 Shivakumar P Jouppi NP (2001) Cacti 30 an integrated cachetiming power and area model In Technical Report 20012Compaq Computer Corporation Aug 2001
44 Taylor G Davies P Farmwald M (1990) The TLB slice-alow-cost high-speed address translation mechanism In 17thannual international symposium on computer architecture 1990Proceedings IEEE pp 355ndash363
45 Thoziyoor S Muralimanohar N Ahn JH Jouppi NP (2008)CACTI 51 Technical Report HPL-2008-20 HP Labs
46 Trippel C Lustig D Martonosi M (2018) MeltdownPrimeand SpectrePrime automatically-synthesized attacks exploitinginvalidation-based coherence protocols arXiv180203802
47 Wang S Wang P Liu X Zhang D Wu D (2017) CacheDidentifying cache-based timing channels in production softwareIn 26th USENIX security symposium USENIX association
48 Wang Y Ferraiuolo A Zhang D Myers AC Suh GE (2016)SecDCP secure dynamic cache partitioning for efficient timingchannel protection In 2016 53nd ACMEDACIEEE designautomation conference (DAC) IEEE pp 1ndash6
49 Wang Z Lee RB (2007) New cache designs for thwarting softwarecache-based side channel attacks In ACM SIGARCH computerarchitecture news vol 35 ACM pp 494ndash505
50 Wang Z Lee RB (2008) A novel cache architecture withenhanced performance and security In 2008 41st IEEEACMinternational symposium on Microarchitecture 2008 MICRO-41IEEE pp 83ndash93
51 Werner M Unterluggauer T Giner L Schwarz M Gruss D Man-gard S (2019) ScatterCache thwarting cache attacks via cache setrandomization In 28th USENIX security symposium (USENIXSecurity 19) USENIX Association Santa Clara CA httpswwwusenixorgconferenceusenixsecurity19presentationwerner
52 Yan M Choi J Skarlatos D Morrison A Fletcher C Torrellas J(2018) InvisiSpec making speculative execution invisible in thecache hierarchy In 2018 51st annual IEEEACM internationalsymposium on microarchitecture (MICRO) IEEE pp 428ndash441
53 Yan M Gopireddy B Shull T Torrellas J (2017) Secure hierarchy-aware cache replacement policy (SHARP) defending againstcache-based side channel attacks In Proceedings of the 44thannual international symposium on computer architecture ACMpp 347ndash360
54 Yao F Doroslovacki M Venkataramani G (2018) Are coherenceprotocol states vulnerable to information leakage In 2018IEEE international symposium on high performance computerarchitecture (HPCA) IEEE pp 168ndash179
55 Yarom Y Falkner K (2014) FLUSH+ RELOAD a high resolutionlow noise L3 cache side-channel attack In USENIX securitysymposium pp 719ndash732
56 Zhang D Askarov A Myers AC (2012) Language-based controland mitigation of timing channels ACM SIGPLAN Not 47(6)99ndash110
57 Zhang D Wang Y Suh GE Myers AC (2015) A hardware designlanguage for timing-sensitive information-flow security In ACMSIGARCH computer architecture news vol 43 ACM pp 503ndash516
58 Zhang S Wright A Bourgeat T Arvind A (2018) Composablebuilding blocks to open up processor design In 2018 51stannual IEEEACM international symposium on microarchitecture(MICRO) IEEE pp 68ndash81
59 Zhang T Lee RB (2014) New models of cache architecturescharacterizing information leakage from cache side channels InProceedings of the 30th annual computer security applicationsconference ACM pp 96ndash105
60 Zhang Y Parikh D Sankaranarayanan K Skadron K Stan M(2003) HotLeakage a temperature-aware model of subthresholdand gate leakage for architects University of Virginia Dept ofComputer Science Tech Report CS-2003 5
Publisherrsquos Note Springer Nature remains neutral with regard tojurisdictional claims in published maps and institutional affiliations
J Hardw Syst Secur (2019) 3397ndash425 425
7 Bourgeat T Lebedev I Wright A Zhang S Devadas S et al (2019)MI6 Secure enclaves in a speculative out-of-order processorIn Proceedings of the 52nd Annual IEEEACM InternationalSymposium on Microarchitecture ACM pp 42ndash56
8 Chen S Liu F Mi Z Zhang Y Lee RB Chen H Wang X(2018) Leveraging hardware transactional memory for cache side-channel defenses In Proceedings of the 2018 on Asia conferenceon computer and communications security ACM pp 601ndash608
9 Clark SS Ransford B Rahmati A Guineau S Sorber J Xu WFu K (2013) Wattsupdoc power side channels to nonintrusivelydiscover untargeted malware on embedded medical devices InPresented as part of the 2013 USENIX workshop on healthinformation technologies
10 Costan V Lebedev IA Devadas S (2016) Sanctum minimalhardware extensions for strong software isolation In USENIXsecurity symposium pp 857ndash874
11 Deng S Xiong W Szefer J (2018) Cache timing side-channel vulnerability checking with computation tree logic InProceedings of the 7th international workshop on hardware andarchitectural support for security and privacy ACM p 2
12 Domnitser L Jaleel A Loew J Abu-Ghazaleh N Ponomarev D(2012) Non-monopolizable caches low-complexity mitigation ofcache side channel attacks ACM Transactions on Architectureand Code Optimization (TACO) 8(4)35
13 Doychev G Kopf B (2017) Rigorous analysis of softwarecountermeasures against cache attacks In Proceedings of the 38thACM SIGPLAN conference on programming language design andimplementation ACM pp 406ndash421
14 Doychev G Kopf B Mauborgne L Reineke J (2015) CacheAudita tool for the static analysis of cache side channels ACM Trans-actions on information and system security (TISSEC) 18(1)4
15 Gruss D Lettner J Schuster F Ohrimenko O Haller I CostaM (2017) Strong and efficient cache side-channel protectionusing hardware transactional memory In USENIX securitysymposium pp 217ndash233
16 Gruss D Maurice C Wagner K Mangard S (2016) Flush+ flusha fast and stealthy cache attack In International conference ondetection of intrusions and malware and vulnerability assessmentSpringer pp 279ndash299
17 Gruss D Spreitzer R Mangard S (2015) Cache template attacksautomating attacks on inclusive last-level caches In USENIXsecurity symposium pp 897ndash912
18 Guanciale R Nemati H Baumann C Dam M (2016) Cache stor-age channels alias-driven attacks and verified countermeasuresIn 2016 IEEE symposium on security and privacy (SP) IEEEpp 38ndash55
19 Gullasch D Bangerter E Krenn S (2011) Cache gamesndashbringingaccess-based cache attacks on AES to practice In 2011 IEEEsymposium on security and privacy (SP) IEEE pp 490ndash505
20 He Z Lee RB (2017) How secure is your cache against side-channel attacks In Proceedings of the 50th annual IEEEACMinternational symposium on microarchitecture ACM pp 341ndash353
21 Intel C (2015) Improving real-time performance by utilizing cacheallocation technology Intel Corporation
22 Kayaalp M Khasawneh KN Esfeden HA Elwell J Abu-Ghazaleh N Ponomarev D Jaleel A (2017) RIC relaxed inclusioncaches for mitigating LLC side-channel attacks In 2017 54thACMEDACIEEE design automation conference (DAC) IEEEpp 1ndash6
23 Keramidas G Antonopoulos A Serpanos DN Kaxiras S (2008)Non deterministic caches a simple and effective defense againstside channel attacks Des Autom Embed Syst 12(3)221ndash230
24 Kessler RE Hill MD (1992) Page placement algorithms for largerealindexed caches ACM Trans Comput Syst (TOCS) 10(4)338ndash359
25 Kiriansky V Lebedev I Amarasinghe S Devadas S Emer J(2018) DAWG a defense against cache timing attacks in spec-ulative execution processors In 2018 51st Annual IEEEACMinternational symposium on microarchitecture (MICRO) IEEEpp 974ndash987
26 Kocher P Genkin D Gruss D Haas W Hamburg M LippM Mangard S Prescher T Schwarz M Yarom Y (2018)Spectre attacks exploiting speculativeexecution In 2019 IEEESymposium on Security and Privacy (SP) IEEE pp 1ndash19
27 Lee RB Kwan P McGregor JP Dwoskin J Wang Z (2005)Architecture for protecting critical secrets in microprocessorsIn ACM SIGARCH computer architecture news vol 33 IEEEComputer Society pp 2ndash13
28 Lee Y Waterman A Avizienis R Cook H Sun C StojanovicV Asanovic K (2014) A 45nm 13 GHz 167 double-precisionGFLOPSW RISC-V processor with vector accelerators InEuropean solid state circuits conference (ESSCIRC) ESSCIRC2014-40th IEEE pp 199ndash202
29 Liu F Ge Q Yarom Y Mckeen F Rozas C Heiser G Lee RB(2016) CATalyst defeating last-level cache side channel attacks incloud computing In 2016 IEEE international symposium on highperformance computer architecture (HPCA) IEEE pp 406ndash418
30 Liu F Lee RB (2014) Random fill cache architecture In2014 47th annual IEEEACM international symposium onmicroarchitecture (MICRO) IEEE pp 203ndash215
31 Liu F Wu H Mai K Lee RB (2016) Newcache secure cachearchitecture thwarting cache side-channel attacks IEEE Micro36(5)8ndash16
32 Luk CK Cohn R Muth R Patil H Klauser A Lowney G WallaceS Reddi VJ Hazelwood K (2005) Pin building customizedprogram analysis tools with dynamic instrumentation In ACMsigplan notices vol 40 ACM pp 190ndash200
33 Masti RJ Rai D Ranganathan A Muller C Thiele L Capkun S(2015) Thermal covert channels on multi-core platforms In 24thUSENIX security symposium (USENIX security 15) pp 865ndash880
34 Osvik DA Shamir A Tromer E (2006) Cache attacks andcountermeasures the case of AES In Cryptographersrsquo track atthe RSA conference Springer pp 1ndash20
35 Patel A Afram F Chen S Ghose K (2011) MARSS Afull system simulator for multicore x86 CPUs In 2011 48thACMEDACIEEE design automation conference (DAC) IEEEpp 1050ndash1055
36 Percival C (2005) Cache missing for fun and profit BSDCan 2005Ottawa 2005 httpwwwdaemonologynetpapershttpdf
37 Hammarlund P Martinez AJ Bajwa AA Hill DL Hallnor EJiang H Dixon M Derr M Hunsaker M Kumar R Osborne RB(2014) Haswell The fourth-generation intel core processor IEEEMicro 1034(2)6ndash20
38 Qureshi MK (2018) CEASER mitigating conflict-based cacheattacks via encrypted-address and remapping In 2018 51Stannual IEEEACM international symposium on microarchitecture(MICRO) IEEE pp 775ndash787
39 Sanchez D Kozyrakis C (2013) ZSIm fast and accuratemicroarchitectural simulation of thousand-core systems In ACMSIGARCH computer architecture news vol 41 ACM pp 475ndash486
40 Schwarz M Schwarzl M Lipp M Gruss D (2018) NetSpectreread arbitrary memory over network In European Symposium onResearch in Computer Security Springer pp 279ndash299
41 Seznec A (1993) A case for two-way skewed-associative cachesACM SIGARCH computer architecture news 21(2)169ndash178
42 Sharkey J Ponomarev D Ghose K (2005) M-sim a flexiblemultithreaded architectural simulation environment Techenicalreport Department of Computer Science State University ofNew York at Binghamton In Technical Report 20012 CompaqComputer Corporation Aug 2001
J Hardw Syst Secur (2019) 3397ndash425424
43 Shivakumar P Jouppi NP (2001) Cacti 30 an integrated cachetiming power and area model In Technical Report 20012Compaq Computer Corporation Aug 2001
44 Taylor G Davies P Farmwald M (1990) The TLB slice-alow-cost high-speed address translation mechanism In 17thannual international symposium on computer architecture 1990Proceedings IEEE pp 355ndash363
45 Thoziyoor S Muralimanohar N Ahn JH Jouppi NP (2008)CACTI 51 Technical Report HPL-2008-20 HP Labs
46 Trippel C Lustig D Martonosi M (2018) MeltdownPrimeand SpectrePrime automatically-synthesized attacks exploitinginvalidation-based coherence protocols arXiv180203802
47 Wang S Wang P Liu X Zhang D Wu D (2017) CacheDidentifying cache-based timing channels in production softwareIn 26th USENIX security symposium USENIX association
48 Wang Y Ferraiuolo A Zhang D Myers AC Suh GE (2016)SecDCP secure dynamic cache partitioning for efficient timingchannel protection In 2016 53nd ACMEDACIEEE designautomation conference (DAC) IEEE pp 1ndash6
49 Wang Z Lee RB (2007) New cache designs for thwarting softwarecache-based side channel attacks In ACM SIGARCH computerarchitecture news vol 35 ACM pp 494ndash505
50 Wang Z Lee RB (2008) A novel cache architecture withenhanced performance and security In 2008 41st IEEEACMinternational symposium on Microarchitecture 2008 MICRO-41IEEE pp 83ndash93
51 Werner M Unterluggauer T Giner L Schwarz M Gruss D Man-gard S (2019) ScatterCache thwarting cache attacks via cache setrandomization In 28th USENIX security symposium (USENIXSecurity 19) USENIX Association Santa Clara CA httpswwwusenixorgconferenceusenixsecurity19presentationwerner
52 Yan M Choi J Skarlatos D Morrison A Fletcher C Torrellas J(2018) InvisiSpec making speculative execution invisible in thecache hierarchy In 2018 51st annual IEEEACM internationalsymposium on microarchitecture (MICRO) IEEE pp 428ndash441
53 Yan M Gopireddy B Shull T Torrellas J (2017) Secure hierarchy-aware cache replacement policy (SHARP) defending againstcache-based side channel attacks In Proceedings of the 44thannual international symposium on computer architecture ACMpp 347ndash360
54 Yao F Doroslovacki M Venkataramani G (2018) Are coherenceprotocol states vulnerable to information leakage In 2018IEEE international symposium on high performance computerarchitecture (HPCA) IEEE pp 168ndash179
55 Yarom Y Falkner K (2014) FLUSH+ RELOAD a high resolutionlow noise L3 cache side-channel attack In USENIX securitysymposium pp 719ndash732
56 Zhang D Askarov A Myers AC (2012) Language-based controland mitigation of timing channels ACM SIGPLAN Not 47(6)99ndash110
57 Zhang D Wang Y Suh GE Myers AC (2015) A hardware designlanguage for timing-sensitive information-flow security In ACMSIGARCH computer architecture news vol 43 ACM pp 503ndash516
58 Zhang S Wright A Bourgeat T Arvind A (2018) Composablebuilding blocks to open up processor design In 2018 51stannual IEEEACM international symposium on microarchitecture(MICRO) IEEE pp 68ndash81
59 Zhang T Lee RB (2014) New models of cache architecturescharacterizing information leakage from cache side channels InProceedings of the 30th annual computer security applicationsconference ACM pp 96ndash105
60 Zhang Y Parikh D Sankaranarayanan K Skadron K Stan M(2003) HotLeakage a temperature-aware model of subthresholdand gate leakage for architects University of Virginia Dept ofComputer Science Tech Report CS-2003 5
Publisherrsquos Note Springer Nature remains neutral with regard tojurisdictional claims in published maps and institutional affiliations
J Hardw Syst Secur (2019) 3397ndash425 425
43 Shivakumar P Jouppi NP (2001) Cacti 30 an integrated cachetiming power and area model In Technical Report 20012Compaq Computer Corporation Aug 2001
44 Taylor G Davies P Farmwald M (1990) The TLB slice-alow-cost high-speed address translation mechanism In 17thannual international symposium on computer architecture 1990Proceedings IEEE pp 355ndash363
45 Thoziyoor S Muralimanohar N Ahn JH Jouppi NP (2008)CACTI 51 Technical Report HPL-2008-20 HP Labs
46 Trippel C Lustig D Martonosi M (2018) MeltdownPrimeand SpectrePrime automatically-synthesized attacks exploitinginvalidation-based coherence protocols arXiv180203802
47 Wang S Wang P Liu X Zhang D Wu D (2017) CacheDidentifying cache-based timing channels in production softwareIn 26th USENIX security symposium USENIX association
48 Wang Y Ferraiuolo A Zhang D Myers AC Suh GE (2016)SecDCP secure dynamic cache partitioning for efficient timingchannel protection In 2016 53nd ACMEDACIEEE designautomation conference (DAC) IEEE pp 1ndash6
49 Wang Z Lee RB (2007) New cache designs for thwarting softwarecache-based side channel attacks In ACM SIGARCH computerarchitecture news vol 35 ACM pp 494ndash505
50 Wang Z Lee RB (2008) A novel cache architecture withenhanced performance and security In 2008 41st IEEEACMinternational symposium on Microarchitecture 2008 MICRO-41IEEE pp 83ndash93
51 Werner M Unterluggauer T Giner L Schwarz M Gruss D Man-gard S (2019) ScatterCache thwarting cache attacks via cache setrandomization In 28th USENIX security symposium (USENIXSecurity 19) USENIX Association Santa Clara CA httpswwwusenixorgconferenceusenixsecurity19presentationwerner
52 Yan M Choi J Skarlatos D Morrison A Fletcher C Torrellas J(2018) InvisiSpec making speculative execution invisible in thecache hierarchy In 2018 51st annual IEEEACM internationalsymposium on microarchitecture (MICRO) IEEE pp 428ndash441
53 Yan M Gopireddy B Shull T Torrellas J (2017) Secure hierarchy-aware cache replacement policy (SHARP) defending againstcache-based side channel attacks In Proceedings of the 44thannual international symposium on computer architecture ACMpp 347ndash360
54 Yao F Doroslovacki M Venkataramani G (2018) Are coherenceprotocol states vulnerable to information leakage In 2018IEEE international symposium on high performance computerarchitecture (HPCA) IEEE pp 168ndash179
55 Yarom Y Falkner K (2014) FLUSH+ RELOAD a high resolutionlow noise L3 cache side-channel attack In USENIX securitysymposium pp 719ndash732
56 Zhang D Askarov A Myers AC (2012) Language-based controland mitigation of timing channels ACM SIGPLAN Not 47(6)99ndash110
57 Zhang D Wang Y Suh GE Myers AC (2015) A hardware designlanguage for timing-sensitive information-flow security In ACMSIGARCH computer architecture news vol 43 ACM pp 503ndash516
58 Zhang S Wright A Bourgeat T Arvind A (2018) Composablebuilding blocks to open up processor design In 2018 51stannual IEEEACM international symposium on microarchitecture(MICRO) IEEE pp 68ndash81
59 Zhang T Lee RB (2014) New models of cache architecturescharacterizing information leakage from cache side channels InProceedings of the 30th annual computer security applicationsconference ACM pp 96ndash105
60 Zhang Y Parikh D Sankaranarayanan K Skadron K Stan M(2003) HotLeakage a temperature-aware model of subthresholdand gate leakage for architects University of Virginia Dept ofComputer Science Tech Report CS-2003 5
Publisherrsquos Note Springer Nature remains neutral with regard tojurisdictional claims in published maps and institutional affiliations
J Hardw Syst Secur (2019) 3397ndash425 425