Lest we forget: Cold-boot attacks on scrambled DDR3 memory · 2019-05-30 · Lest we forget: Cold-boot attacks on scrambled DDR3 memory Johannes Bauer*, Michael Gruhn**, Felix C

ilable at ScienceDirect

Digital Investigation 16 (2016) S65eS74

Contents lists ava

Digital Investigation

journal homepage: www.elsevier .com/locate/d i in

DFRWS 2016 Europe d Proceedings of the Third Annual DFRWS Europe

Lest we forget: Cold-boot attacks on scrambled DDR3memory

Johannes Bauer*, Michael Gruhn**, Felix C. Freiling***

Department of Computer Science, Friedrich-Alexander-Universit€at Erlangen-Nürnberg (FAU), Martensstr. 3, 91058 Erlangen, Germany

Keywords:Cold-boot memory acquisitionScrapingScramblingWhiteningDecryption

* Corresponding author.** Corresponding author.*** Corresponding author.

E-mail addresses: [email protected]@cs.fau.de (M. Gruhn), [email protected]

http://dx.doi.org/10.1016/j.diin.2016.01.0091742-2876/© 2016 The Authors. Published by Elsevcreativecommons.org/licenses/by-nc-nd/4.0/).

a b s t r a c t

As hard disk encryption, RAM disks, persistent data avoidance technology and memory-only malware become more widespread, memory analysis becomes more important.Cold-boot attacks are a software-independent method for such memory acquisition.However, on newer Intel computer systems the RAM contents are scrambled to minimizeundesirable parasitic effects of semiconductors. We present a descrambling attack thatrequires at most 128 bytes of known plaintext within the image in order to perform fullrecovery. We further refine this attack using the mathematical relationships within the keystream to at most 50 bytes of known plaintext for a dual memory channel system. Wetherefore enable cold-boot attacks on systems employing Intel's memory scramblingtechnology.© 2016 The Authors. Published by Elsevier Ltd on behalf of DFRWS. This is an open access

article under theCCBY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).

Introduction

For several reasons, the contents of volatile memory(RAM) are a valuable piece of digital evidence during aforensic investigation. Firstly, the keys for full diskencryption are usually stored in RAM. Extracting such keysfrom a memory snapshot therefore allows to access con-tents of encrypted storage that would be inaccessibleotherwise. Secondly, a plethora of other information aboutthe current system state can be recovered from RAM,including ephemeral cryptographic communication keys,the list of running processes and the details of activenetwork connections. Last but not least, new forms ofmemory-only malware can only be analyzed while they areactive in memory. So overall, with the increasing use ofencryption technology, cloud storage and memory-only

(J. Bauer), michael.(F.C. Freiling).

ier Ltd on behalf of DFRWS

malware, forensic memory image acquisition has becomeincreasingly important.

There has been a lot of debate on how to properlyperform imaging of volatile memory since there exist avariety of options (V€omel and Freiling, 2011). Onecommonly chosen option is runtime acquisition via soft-ware using specific memory acquisition tools like WinP-mem.1 While this method appears convenient in manycases, memory imaging may be manipulated by malwarethat hides in inaccessible memory regions (Stüttgen andCohen, 2013), thus creating a memory image that is notforensically sound. Another problem of software acquisi-tion methods is that they operate concurrent to regularsystem activity and therefore produce inconsistencies thatdo not occur in “perfect” memory snapshots (V€omel andFreiling, 2012). A generic option that avoids these prob-lems is to perform a so-called cold boot attack. These attacksexploit the remanence effect of modern RAM technology.

Modern RAM technology is commonly based on dy-namic random access memory (DRAM), a type of RAM in

1 http://www.rekall-forensic.com/.

. This is an open access article under the CC BY-NC-ND license (http://

http://creativecommons.org/licenses/by-nc-nd/4.0/

mailto:[email protected]




http://crossmark.crossref.org/dialog/?doi=10.1016/j.diin.2016.01.009&domain=pdf

http://www.rekall-forensic.com/

www.sciencedirect.com/science/journal/17422876

http://www.elsevier.com/locate/diin

http://dx.doi.org/10.1016/j.diin.2016.01.009





J. Bauer et al. / Digital Investigation 16 (2016) S65eS74S66

which the cells which store data are constituted of an arrayof capacitors. Each capacitor is either charged or dis-charged, depending onwhether the cell bit is set or cleared.Since capacitors have a leakage current, their data contentslowly dissipates over time. Therefore, in order to effec-tively use DRAM, each and every cell has to be periodicallyrefreshed. This is achieved by reading the contents andwriting it back to the RAM chip. The time that DRAM willkeep its contents without leakage affecting the content isreferred to as retention time. The fact that the retention timeis nonzero and that memory will keep its contents for awhile even when it is not actively refreshed, is oftenreferred to as the remanence effect. It is well-known fromelectrical engineering that the leakage current of capacitorsgrows exponentially with their temperature (Wyns andAnderson, 1989). Therefore the retention time of RAMdramatically decreases with increased chip temperature.

Cold boot attacks exploit the remanence effect and canbe executed in twoways: One way (Halderman et al., 2009)is to reset the target computer by using the reset buttonand boot from an alternative medium such as USB using aspecial imaging USB stick that contains only a minimaloperating system together with imaging software such asmemimage (Halderman et al., 2009). Ideally, the originalcontents of RAM are maintained and can be recovered,apart from those parts that have been overwritten by theacquisition software. Unfortunately, such an attack is easilythwarted by using trivial protection mechanisms like BIOSpasswords.

The other way to perform cold boot attacks is to phys-ically “transplant” the RAM module at runtime from thedevice under investigation into an acquisition computerand perform image extraction on that second computer.When the semiconductors are properly cooled beforetransplantation, they will retain most of their content. It istherefore beneficial to freeze the DRAM modules usingcooling spray in order to increase the remanence effect. Tothe best of our knowledge, this second form of cold bootattack is paradigmatic for the class of memory acquisitionprocedures since it combines genericity, availability andoffers the highest level of integrity and atomicity for theacquired memory images (V€omel and Freiling, 2011, 2012).

While the remanence effect itself is already well-studied (Hamamoto et al., 1998; Liu et al., 2013;Halderman et al., 2009) were the first to exploit it toattack full disc encryption systems on desktop PCs. How-ever, it has been shown that cold boot attacks also affect amultitude of other devices such as smart phones (Müllerand Spreitzenbarth, 2013). The dominating RAM technol-ogy at that timewas called DDR2. Recently, however, Gruhnand Müller (2013) reported that the results of Haldermanet al. (2009) could not be repeated with the more modernDDR3 RAM technology. Even worse, the memory imagesobtained from cold booting DDR3 devices appeared to berandom for reasons inherent in that technology.

With increasing speed of semiconductors, their unde-sirable parasitic effects also grow in magnitude. Currentspikes and electromagnetic interference in the speed cat-egories of DDR3 start to affect reliability and regulatorycompliance. To counteract this, RAM manufacturers ingeneral and Intel in particular performmemory scrambling

in DDR3 memory controllers. As we explain later (and asobserved by Gruhn and Müller (2013)), these scramblersseverely limit the potential of forensic image acquisition.While Lindenlauf et al. (2015) performed measurements ofthe remanence effect magnitudes with DDR3memory, theyexcluded a discussion of memory scrambling. So, to thebest of our knowledge, there is no published work whichinvestigates the possibilities of performing cold boot at-tacks on modern DDR3 systems in general, i.e., possibilitiesto “descramble” the scrambler effects.

van Zandwijk (2015) recently performed some relatedwork using an analytic approach to descrambling NANDflash chips. Their approach, however, is unfortunately noteasily transferable to RAM acquisition since they requireerror-free source bit streams which cannot be guaranteedfor a cold boot process. Faintly related is also work byRahmati et al. (2012) which use the remanence effect in aconstructiveway to increase security of embedded systems.For completeness,wealsomentionwork byKimet al. (2014)because they also exploit physical properties of RAM semi-conductors to provoke bit flips within the RAM modules.

To summarize, the ramification of memory scramblingis currently not well understood and there is no generalmethod of performing cold boot attacks on scrambledDDR3 memory. In this paper, we present an in-depthanalysis of DDR3 scrambling (using the Intel memoryscrambler as an example) and we show how to use thisknowledge to develop a practical method of descramblingDDR3 memory in real-world scenarios.

Contributions

Our contributions which we make in this paper are asfollows:

� We present a template attack on scrambled DDR3memory systems which requires 64 bytes of knownplaintext per memory channel (i.e., at most 128 bytes fora dual-channel system) within the memory snapshot inorder to yield complete descrambling of the image.

� We refine this template attack by exploiting the math-ematical relationships present within the key stream toreduce the number of known plaintext bytes to only 25(i.e., 50 bytes for a dual-channel system).

� We present methods which can be used to deinterleavedual-channel memory and give an algorithm which canconstruct an interleaved key stream of arbitrary lengthfrom the subkeys for each channel.

We make the source code and documentation of ourresearch freely available to the community at https://www1.informatik.uni-erlangen.de/filepool/mem/ddr3descramble.tar.gz.

Outline

We first give some necessary background information inSection Background, then precisely formulate the problemof descrambling DDR3 memory in Section Problemdescription. We then show how the problem can be

https://www1.informatik.uni-erlangen.de/filepool/mem/ddr3descramble.tar.gz



Fig. 1. LFSR with polynomial x6 þ x4 þ x3 þ x þ 1.

J. Bauer et al. / Digital Investigation 16 (2016) S65eS74 S67

solved in Section Towards descrambling and present someexperimental results confirming our findings in SectionExperimental results. We finally conclude in SectionConclusions and Outlook.

Background

Scrambling

Storage of bit streamswhich are strongly biased towardszero or one can lead to a multitude of practical problems:Modification of data within such a biased bit stream canlead to comparatively high peak currents when bits aretoggled. These current spikes cause problems in electronicsystems such as stronger electromagnetic emission anddecreased reliability. In contrast, when streams withoutDC-bias are used, the current when working with thosestorage semiconductors is, on average, half of the expectedmaximum.

Scrambling can be applied to biased bit streams toremove these undesirable side effects. In the simplest casesuch a scrambler is a pseudo-random binary sequence(PRBS) that is added onto the input bit stream using theexclusive or (XOR) addition. We call these devices additiveor synchronous scramblers. For the receiving side, the in-verse operation, descrambling, must be applied in order toget the original content. A pleasant side effect of the XORoperation is that scrambling and descrambling is sym-metric; it effectively is the same operation.

Linear-feedback shift registers

Since the goal of scrambling is only to achieve biasremoval, cryptographic requirements need not be satisfiedby a scrambling PRBS. A solution that is often chosenbecause of the simplicity and efficiency with which it canbe implemented in hardware are linear-feedback shiftregisters (LFSRs). As the name suggest, these hardwarebuilding blocks are shift registers of a fixed bit width n. Atany given point in time, the content of the register isreferred to as the internal state. The bit which is fed back tothe register on each clock cycle is a linear combination, i.e.,XOR, of some of the state bits. Bits which are inputs to thelinear function are called taps.

Mathematically, LFSRs can be modeled as polynomialsover F2 in which the state bits b0,…,bn�1 represented thepolynomial coefficients. Clocking corresponds to repeatedmultiplication of the whole polynomial with x and reduc-tion modulo the characteristic polynomial P of the LFSR.

Since an LFSR only has 2n possible states, it will alwaysproduce a periodical bit sequence. The period of an LFSRwith a primitive characteristic polynomial is 2n�1 and it isgenerated for every nonzero initialization of the shift reg-ister. One possible hardware instance of such a shift registeris shown in Fig. 1.

DRAM

As explained before, DRAM needs to be continuouslyrefreshed in order to keep its content. For modern DRAMgenerations such as DDR3 this process is implemented by

logic within the DRAM module itself (Micron, 2014). It is,however, triggered by the host. In Intel systems, this is thetask of the memory controller hub (MCH). While the CPUand MCH were separate chips in early hardware genera-tions, the MCH is completely contained within modernCPUs.

When the retention time is exceeded e for examplebecause the DRAM chip is unpowered e the stored chargeof the capacitors slowly decays and the RAM takes itsground state. The ground state is not a trivial pattern of, forexample, all zero bits, but depends on the physical con-struction of the chip itself. Namely, whether the staticallyconnected sides of the cells are biased against GND oragainst the positive supply voltage, Vcc. The memorypattern that a DRAM chip will therefore show when it farexceeded its retention time is heavily dependent on thephysical semiconductor construction.

Another important aspect is the mapping betweenphysical addresses and physical storage location withinmemorymodules. Whenmore than onememorymodule ispresent in a computer, the RAM is usually operated in so-called dual-channel mode. In this mode, consecutive datais alternated in a certain pattern between modules. This isdone for performance reasons, as it allows use of twomemory modules in parallel.

LFSR RAM scrambling

We now take a look at the actual process implementedby the MCH construction of Intel. As the authors explain inthe patent on the topic, their aim is to reduce both elec-tromagnetic interference (EMI) and current spikes byscrambling (Mozak, 2011; Falconer et al., 2013).

In order to achieve this, they use a set of parallel LFSRswhich generate a PRBS that is XORed with the data.Therefore the data on the memory bus appears to berandom and ideally exhibits no bit-bias. To be able togenerate the PRBS for an arbitrary memory location, a shortsecret value, the so-called seed, is chosen on first power-up.Upon a read or write request to the RAM, this seed is mixedtogether with the memory address that the request wasmade to. Fig. 2 illustrates this process schematically. Thiscombined value then serves as the parameterization of theLFSR that generates the PRBS for the requested memoryaccess. In this way, it is possible for the MCH to create therequired PRBS for every random memory request. Whilethe Intel patent (Falconer et al., 2013) gives an exampleseed calculation in which only the lesser significant bits ofthe address are involved to parameterize the localizedPRBS, it is unclear if this is indeed the method that is usedin practice.

Note that when memory is operated in dual-channelmode, as explained in Section DRAM, each of the twochannels will have its own scrambler and both scramblers

Fig. 2. Schematic display of a Intel DDR3 scrambler.

Fig. 3. Scrambled storage of data and image acquisition.


will also usually have distinct seed values. They are twocompletely independent hardware units.

During the boot process, the MCH is programmed bycode that is part of the computer's firmware (i.e., the BIOSor UEFI). It is during this initialization that the MCH pa-rameters e including the scrambler configuration andscrambler seed e are programmed as well. Therefore areseeding of the MCH scrambler can only occur when thecomputer is performing a cold start (i.e., transitions fromunpowered to powered state). In our trials, the seed wasnever reset when the computer was merely rebooted usingthe reset button.

Problem description

As mentioned in the introduction, one approach toperform image acquisition is to reset the target computerby using the reset button (thereby not reinitializing anymemory scramblers) and boot from an alternative mediumsuch as USB. Tools like the memimage imaging software(Halderman et al., 2009) have a sufficiently small memoryfootprint as to leave most of the memory contents intact.However, such attacks are easily thwarted using BIOSpasswords for example. The other approach is to physicallytransplant the RAM module from the suspect's computerinto an acquisition computer and perform image extractionon that second computer, as shown in Fig. 3. It is in this casethat scrambling comes into full effect, since the contents ofthe RAM chip will contain only scrambled data and theacquisition computer cannot have the correct seed toreplicate the original key stream.

As shown in Fig. 2, all data that is passed to or from theDRAM chips is first passed through the scrambler circuitrywithin the memory controller hub. It is transparentlyscrambled when writing to and transparently descrambledwhen reading fromRAM. Therefore at any point in time, theRAM module will only contain a scrambled image M. Allconnected peripherals will not be aware that scrambling iseven happening, but will only see the plain RAM image P.The scrambling data streamwill be referred to as K and theconnection between the three is simply an XOR relation-ship, as in a stream cipher:M ¼ P4K . When such an image

is forensically acquired via means of cold booting, thecaptured image is referred to as I. While during normaloperation of the computer the key K0 was used by thescrambler, during the image acquisition phase this keymight be K1, where K0 is not necessarily equal to K1. Fig. 3shows this formally: During the normal use, the plainimage P is scrambled by K0 and the memory M consists ofthe value K04P which resides in RAM:

P /K0

scrambleP4K0 ¼ M (1)

When the RAM module is transplanted to the analysismachine, generally the scrambling keywill be different. Wedenote the new key by K1. Subsequently, during theacquisition phase the descrambler adds K1 to the RAMimage, yielding an image

I ¼ P4K04K1 ¼ M4K1 )K1

descrambleM (2)

Therefore, in a cold boot scenario the descrambler doesnot do what the name suggests (i.e., recover the originalplaintext image), but instead actually adds another layer ofscrambling on top of the already scrambled image. Duringnormal operation of the computer, K0 is equal to K1 so thatthe key streams cancel each other out and the descrambleractually recovers the plaintext image.

Intel's patent on their scrambler mechanics explainsthat a parallel LFSR is used to generate the scrambler bitstream K. Therefore it seems that decrypting such ascrambled image would be a rather simple, straightforwardtask. Surprisingly enough, in practice it turns out to bemore complicated than initially anticipated. This has anumber of reasons:

1. Nonexistent public documentation: All documentationthat explains the registers which are used by the mem-ory controller hub (MCH)e the component that containsthe scrambling unit e is non-public. Parts of the docu-mentation which are publicly available, such as thepatents Intel has filed on the issue (Mozak, 2011;Falconer et al., 2013), are worded as broadly as possibleto include a plethora of different options. It is unclearwhich one is used in practice.


2. Lossy image acquisition: Forensic image acquisitionwhenusing cold boot techniques relies on the remanence ef-fect of the semiconductors. This effect is neither guar-anteed nor reliable. Bit flips occur frequently as theDRAM cells lose their content. When trying to reverseengineer the used scrambling mechanisms, this poses aproblem since algorithms like Berlekamp-Massey(Massey, 1969) for synthesis of a LFSR from a given bitstream rely on perfect input data to produce correct re-sults. When the algorithms are fed noisy input they willnot indicate failure, but instead synthesize misleadingoutput.

3. Unknown ground state: If the DRAM content of a chipwhich was inactive for a long time, i.e., the ground stateof that chip G, were known, then it would be easy todetermine the pure scrambler bit stream. We couldperform a cold boot attack on a machine that had beenturned off for a long time. While then assuming thatM ¼ G, we can determine K ¼ G4I, since for this ma-chine the memory content C would be equal to G and itwould run through the descrambler during forensicimage acquisition. However, the ground state G is highlydependent on the actual constitution of the hardwareitself and forms a nontrivial pattern. Therefore it isdifficult to gain access to the pure scrambling bit stream.

4. Interleaved memory: Lastly, most modern systems withmore than one physical RAM chip will be configured touse dual channel mode in order to improve systemperformance. This means that consecutive data will beput on alternating RAMmodules in an unknown pattern.Since each channel has its own, completely separatescrambler instance, it must be known which pieces ofdata have been scrambled by which unit in order toperform descrambling for such images.

Towards descrambling

We now describe an approach to descramble the con-tents of a DDR3 memory image that was acquired usingcold boot. The first steps which we describe are necessaryto calculate certain parameters of the hardware that arenecessary for descrambling to work. These steps can,however, also be performed after the memory image hasbeen taken.

Fig. 4. Experimental setup for image recovery.

Calculating memory offsets

As mentioned before, the scrambler LFSR is parameter-ized by a global seed and (parts of) the memory addressthat is accessed. It is therefore vital to know the exactphysical memory address of every byte in the acquiredimage. Unfortunately, not all acquisition software worksreliably; in fact, there are many examples in which areas ofmemory which are inaccessible are simply skipped insteadof being correctly filled with padding data (V€omel andStüttgen, 2013). When scanning through plaintext imagesin order to locate cryptographic keys, this is not a problem.For our purposes, however, it is not only important to getthe data, but also important to be able to pinpoint the exact

storage location of that data. Only then can we select thecorrect key stream offset with which we will be able todescramble the image. In order to work around inherentlimitations of acquisition software, we wrote a custom dataplacer program to store the 64-bit physical address every 8bytes throughout all available memory. Then a soft resetwas performed as to not reseed the memory scrambler anda forensic imagewas created with the same software whichwould later also capture the data images.

Note that using this approach we would also be able todetermine the effects of memory address scrambling whenperforming acquisition on transplanted memory. Whilethis was not the case in our experiments, there are indeedhints in literature that this would be something that couldbe expected in the future (Gould, 2009).

By examining these dumped images, it was easy toidentify the locations where the acquisition process wasdiscontinuous. These discontinuities are referred to ashidden memory regions (Stüttgen, 2015; Stüttgen et al.,2015). This is expected, as the BIOS memory map (whichthe acquisition software usually relies on) only looselycorrelates with the intricate details of the actual memorymapping (which the MCH uses). As a consequence, suchforensic images may contain holes within where hiddenmemory regions were present. Capturing exactly wherethese discontinuities were located for any given combina-tion of computer and DRAM allowed us to calculate theactual address in the original physical memory of a givenoffset within the dumped memory image file.

Distinguishing the scrambler type

Now we know which addresses in physical memorymap to an acquired image file offset, but we do not yetknow about the scrambling behavior of the device undertest at all. We now show how it is possible to determinehow the scrambler is configured by the computer'sfirmware.

Here is the procedure to distinguish the differentscrambler types (see Fig. 4):


1. Turn the device completely off and leave it off for anextended period of time (e.g., 1 min). This ensures thatthe DRAM content will definitely be the ground state G.

2. Turn the device on and immediately perform cold bootimage acquisition.

3. Repeat these two steps twice to capture two indepen-dent cold boot images I0 and I1.

To determine the scrambler type, it is now sufficient toinvestigate the image content which has not been modifiedby the BIOS or dumping software. (Note that the amount ofRAM that is overwritten by the BIOS on reboot needs to beevaluated on an individual basis.) When analyzing thismemory, there are three possible outcomes:

1. The two captured images I0 and I1 are identical (exceptfor noise) and look non-random. Long sequences ofconsecutive 0 and 1 bits are expected to be present in theoutput. The details of the pattern depend on the physicalhardware wiring of the respective memory cells. Thisfinding implies that scrambling is disabled on themachine.

2. The captured images are identical (except for noise), butlook random (i.e., equal distribution of all bytes withapproximately identical probability). This implies thatscrambling with a constant seed is used on the machine.Note that disabled scrambling is a special case of con-stant scrambling, where the constant scrambling keystream KC ¼ 0

!.

3. The captured images look completely different and alsoboth look random. This implies that scrambling with arandom seed is used on the machine.

In our experiments, we did not find any machine whichdisabled the scrambling feature altogether. But there weremachines of either of the two latter types. Which type amachine belongs to is determined by the system firmware,as explained earlier in Section LFSR RAM scrambling. Forexample, we found a system consisting of an MSI H55M-P33 mainboard with an Intel Core i5-760 CPU to use con-stant scrambling, while an Intel Core i3-3225 CPU within aMSI B75MA-P45 mainboard used random scrambling.

Attacking constant scrambling

Assume a machine that performs constant scrambling.For such a computer, descrambling the memory contents isnot necessary in most scenarios, since the scrambling anddescrambling key K is, as the name suggest, constant overpower cycles. Therefore, K ¼K0 ¼ K1 and thereforeI ¼ P4K04K1 ¼ P. The intuitive reason is that scramblingand descrambling cancel each other out on the same sys-tem when the key stream remains the same for both.Therefore, if the forensic image acquisition is performed onthe same computer which also wrote the data into theRAM, the system can be treated as if there were noscrambling used at all. This also applies to RAM that wastransplanted from one system to another one which usesthe exact same firmware and therefore same, constant,scrambler seed.

Unfortunately, in most practical cases the machine withwhich the image recovery was performed will be differentfrom the computer which contains the forensically inter-esting data. In such cases, things become more compli-cated. Fortunately, for all hardware that we tested, the basicprinciple of how the scrambler works was always identical,so there is reason to assume that there currently is only onegeneration of scrambling hardware available. This is alsothe case which we assume and deal with here. If the twocomputers use different scrambler generations, the resultscould vary greatly depending on the exact scramblermechanisms which are employed.

Assuming that the two computers use at least the samescrambler generation and merely differ in the parameteri-zation (e.g., the host system uses constant scrambling withK0, but the acquisition system uses a different key streamK1), then the captured image can simply be treated as if itwere created on a randomized scrambling system, asdescribed next.

Attacking randomized scrambling

Consider again the two images I0 and I1 from Fig. 4. Theyboth capture the same, unknown, ground state G withdifferent scrambling keys applied to them. In other words,

I0 ¼ K04G and I1 ¼ K14G:

We can therefore apply a differential approach by con-structing the image D

D ¼ I04I1 ¼ K04G4K14G ¼ K04K1

By eliminating the unknown ground state from theequation we now only deal with the differential of twounknown key streams. Since we know that the key streamis periodic, as explained in Sect. Linear-feedback shiftregisters, it can be written as the repeated concatenationof some unknown partial key stream S (the subkey):

D ¼ Sx

We then inspect chunks from this D of varying size(concretely, we used powers of two from 32…1024). Usingautocorrelation on these chunks we can identify the peri-odicity p of S within D. To do this, we define an equalityfunction on two bit streams X;Y of equal length:

XzY⇔HðX4YÞ

jXj < 3

Here, H is the Hamming weight of a bit vector. Intui-tively, we consider two bit streams X,Y to be approximatelyequal if and only if the average Hamming weight of theirbitwise difference is below a certain threshold 3. We groupn of these chunks into an equivalence class:

fC0; C1;…;Cn�1g with CizCj c i; j2f0;…;n� 1gOnce the periodicity p is selected correctly, only one

equivalence class will emerge with lots of approximatelyequal differential subkey candidates Ci, all of length p.


During our experiments, we determined the smallest valuefor p at which this occurred to be 64 bytes.

Due to bit flips during the lossy acquisition, we still donot, however, know S. Under the assumption that all can-didates Ci are just deviations from S caused by randomnoise during acquisition, we can calculate S by performing amajority vote on each individual bit sj of S:

sj ¼

8><>:

0 ifPn�1

i¼0Cij <

n2

1 otherwise

As a result, we have the most likely subkey candidate S,where D ¼ K04K1 ¼ Sx. We can now use this informationto recover P using a known plaintext attack.

Stencil attack

From our measurements we found that on all machineswe investigated, the differential of two keys K0 and K1exhibited a 64-byte periodicity (i.e., p ¼ 64). This directlyenables what we refer to as the stencil attack for DDR3descrambling. The attack works as follows:

1. Perform forensic recovery of the image that shall bedescrambled. Without loss of generality, the memoryimage contentM of the image P isM ¼ P4K0. Here, K0 isthe key stream that is applied to the data by thescrambler unit on the target system.

2. The captured image will be I ¼ P4K04K1, with both K0and K1 unknown. Note that the descrambled image P isthe information of actual forensic interest and K1 is thekey stream added by the descrambler unit on theacquisition system.

3. Therefore, I ¼ P4ðK04K1Þ ¼ P4D, where D is still un-known. However, we know from Sect. 4.4 the periodicityp of D. Therefore, scan through the image I at p-byteboundaries and cluster p-byte chunks together using theapproximative equality function described above. Selecta partition that has lots of candidates: this is likely to be apattern of all 0 � 00. Construct the maximum likelihoodcandidate S by majority vote.

4. Construct P ¼ I4Sx4Tx where T is the known plaintext.If the known plaintext was a chunk of 0 � 00, i.e., T ¼ 0

!then P ¼ I4Sx.

To reconstruct P, we therefore only need a knownplaintext of length p, i.e., 64 bytes in our case.

Mathematical approach

The stencil attack allows an attack to be mountedagainst scrambled DDR3 memory, effectively yielding theoriginal image with relatively few pieces of known plain-text required. However, we now look at the mathematicalrelationships within the differential subkey stream withthe purpose of constructing a key stream from less knownplaintext.

In our approach we are limited to examination of adifferential key stream K04K1. This is as an inherent

limitation of performing acquisition with systems whichcontain an active descrambler. During analysis, we foundsome interesting congruencies within this differential keystream. First, we partitioned the 64-byte differential keystream stencil into 32 values of 2 bytes each. Each valuewasinterpreted as a little endian integer. We ended up with 3216-bit integer values v0,…,v31. We then were able to findthree 16-byte polynomials p0,p1,p2 for which 24 congru-encies hold for 0 � i � 7 and 0 � j � 2:

��v4iþj[1

�4pj

�&0x7fff ¼ v4iþjþ1

If you recall Section Linear-feedback shift registers, thisrelationship can immediately be recognized as a LFSRrelationship: Two related values, v4iþj and its adjacent wordv4iþjþ1 can be constructed from each other when the firstone is shifted right by one bit ([1) and then has the LFSRpolynomial added onto it (4 pj). Note however that in thiscongruence we can only determine 15 of the 16 bits of theadjacent word (&0x7fff). This is because it is lost when thedifference between two LFSR output streams isconstructed.

These relationships are a useful additional method toconfirm the validity of key streams and aid the search for aknown plaintext in the differential memory snapshot D. Itreduces the number of known plaintext bits to less than40% of the original stencil attack: in our case, instead of 64bytes we now need only (8 þ 3)$2 þ 3 ¼ 25 bytes if weexploit the mathematical inter-stream relationships. Onlythe eight initial states v4i (16 bytes), the three polynomialsp0…p3 (6 bytes) and the three most significant bits in everygroup of 8 bytes (total of 3 bytes) need to be known toconstruct the entire differential key stream.

Deinterleaving of memory

As stated before in Sects. 2.3 and 2.4 a systemwithmorethan one DRAM module will usually operate in dual-channel mode to improve system performance. Eachmemory channel has an independent scrambler, so theattack as described in Section Stencil attack still workswhen it is knownwhich part of the memory image needs tobe descrambled with which channel key. In our experi-ments we determined how the algorithm to split data be-tween channels works in Intel systems. Consider twochannel subkeys A and B of 64 bytes each. The two basicinterleaved streams Q1 and R1, each of length 256 bytes, aredefined to be:

Q1 ¼ A2jjB2

R1 ¼ B2jjA2

All further key streams are then defined recursively:

Qn ¼ Qn�1jjR2n�1jjQn�1

Rn ¼ Rn�1jjQ2n�1jjRn�1

This definition can be applied until one finds a Qn ofsufficient length. This Qn is then the complete key stream K.


Since every channel subkey is 64 bytes in length, thelength of an interleaved key stream Qi is exactly:��Qi

�� ¼ 64,4i

Solving for i with given jQij:

i ¼ log4jQij64

i.e., for a dual-channel memory system of 4 GiB, onewould choose i ¼ 13.

The interleaving pattern for i ¼ 6, i.e., for 4096 streamsof 64 bytes each, is graphically shown in Fig. 5. It is exactly64 by 64 pixels (i.e., 4096 pixels) in size and every pixel'scolor indicates whether stream 0 or stream 1 is in effect.The stream order is shown left-to-right, top-to-bottom.

With this knowledge, we can perform the stencil attackeven when RAM is accessed in dual channel mode. Theacquired image I has to be deinterleaved into two channelimages IA and IB which can be treated independently beforeinterleaving them back together to form a plain image P.

Experimental results

Investigated machines

Our measurements were performed predominantly onthe Intel Core i3-3225 with a MSI B75MA-P45 mainboard.We took care to verify that the results also apply todifferent machines. In the process, we confirmed that ourresults also apply to the following combination of CPUs andmainboards:

� Intel i5-760/MSI H55M-P33� i5-2520M/Dell 03PH4G� i5-2400/Esprimo P900 E90þ

Fig. 5. Dual channel interleaving graphically.

Applying the stencil attack

In our experiments we first started out with a singlememory module present in the target computers. We thenused our data placer code to place 512 � 775 pixel grayscale images of Mona Lisa at every 1 MiB boundary. At thespace in between images we placed an easily recognizable,distinct pattern block.

We then froze the memory module by applying coolingspray to it until it had reached around �30 �C. Then we cutpower to the system by shutting it completely off andrestarted immediately afterwards. The latency that ourtargets took from complete shutdown to new boot-upranged from around 2 to up to 5 s. We then drew mem-ory images using the memimage toolkit. By using thetechniques described in Section Stencil attack wewere ableto recover the original memory image.

You can see the results of our experiments in Fig. 6. Thefirst image, Fig. 6a shows an image acquisition that was

Fig. 6. Images of descrambling single-channel memory.

Fig. 7. Images of descrambling dual-channel memory.


performed at operating temperature (about þ30 �C). Nodata could be recovered from this test, as everything wascompletely decayed.

On the second image, Fig. 6b, the image is shown whenit was drawn from the target after cooling to about �30 �Cwas applied. The basic shape of Mona Lisa is still visible, butit is distorted by a repeating pattern. This repeating patternis exactly the subkey stencil which we need to apply todescramble the memory images. When this key is un-known (because there is no known plaintext or at least noknown plaintext yet) we made a related-data experiment,shown in Fig. 6c. For this we make the assumption thatconsecutive 64-byte plaintexts will e at least to some de-gree e repeat within the plaintext. Therefore the first 64-byte block was chosen as a stencil subkey. You can seeclearly that the image looks a lot better than the completelyscrambled variant, but still has lots of distortions.

Finally, we used the method described in Section Stencilattack and recovered the most probable key using majorityvote and known plaintext. This key was then applied to thecaptured image, yielding a result that was, except for theoccasional bit error, very close to the original image.

Dual channel mode and decay rate

During our experiments we found the high decay rate ofDDR3 RAM when at operating temperature curious. Toverify our hypothesis that the retention time of memorywas indeed much less than what can be seen in DDR2counterparts, we performed an experiment:We placed twomemory modules in our target PC and took care that theywere operated in dual-channel mode. Then we placed theMona Lisa images in RAM using our data placer program. Atthis point in time, roughly every second 64-byte block ofthe image is on one RAM module and every other 64-byteblock is on the other RAM module. We then inserted arectangular piece of 500 mm PET (polyethylene tere-phthalate) in between the two RAM modules to achievethermal isolation between them. One of the twomodulesebut not the other e was then frozen by us before per-forming a cold boot attack.

This experiment served a dual purpose: First, it allowedus to confirm that the thermal dependency of DDR3 isreally as critical as we assumed it was. This is because theonly difference in the process was the temperature of themodules e all other parameters like the power-down timewere exactly identical. Secondly, it allowed us to confirmthat our algorithm to decode dual-channel memory, asexplained in Section Deinterleaving of memory, worked asexpected.

The results are depicted in Fig. 7. On the left side, Fig. 7ashows the interleaved, descrambled, memory image. Theresults verified our hypothesis: Approximately every otherrow was completely decayed and shows up as white noisein the image. Even though the image in Fig. 7b gives arather noisy visual impression this noise is really onlypresent in the parts that were acquired from the warmRAM module. This becomes obvious when the noisychannel is masked out according to the algorithm we pre-sented in Section Deinterleaving of memory. The result isFig. 7b, a successfully descrambled one-channel image of a

RAMmodule that was operated in dual-channel mode. Theimage shows virtually no noise because our algorithmcorrectly masks out only the module which had decayedcontent.

Remanence effect in DDR3

In their original paper, Halderman et al. (2009) foundthat DDR2 RAM exhibits a comparatively strong remanenceeffect. They performed tests at operating temperature andeven without externally applied cooling to the chips, someDDR2 chips had decay times of up to 35 s before showingcomplete data loss. To determine these values for DDR3memory, Lindenlauf et al. (2015) did similar decay mea-surements. They found an astonishingly low bit error ratewhen the modules were cooled to a temperature between�30 �C and �35 �C and kept the RAM modules unpoweredfor up to 50 s. It is not apparent, however, if these longdecay times were also performed with DDR3 memory orjust with DDR2, and while they do show the dependency ofthe decay rate on the die temperature for DDR2 memorythey omit how these results transfer to DDR3 memory.

In our experiments, we found DDR3 memory to bemuch less forgiving during cold booting. Much in contrastto DDR2 memory it was absolutely essential for us to al-ways keepmodules at low temperatures (around�30 �C) inorder to produce usable results. We observed retentiontimes of about 10 s before total decay occurred even whensuch cooling was performed. At operating temperature(aroundþ30 �C) wewere not able to acquire a single usableimage, because all data content had dissipated.

It is our assumption that this can be explained by thedifferent types of memory modules that were used byLidenlauf et al. compared to ours. While they used modulesthat were produced in 2011, we used slightly more recentmodules (produced 2013) that were also a bit faster (666and 833 MHz types).


Conclusions and outlook

Memory acquisition of DDR3 memory in the real-worldis more complicated thanwith a laboratory setup: While ina lab setup researchers can choose systems which work fortheir demonstration e systems which will usually useconstant scrambling e this luxury is not available in a real-world scenario. On top of the intricacy of descramblingimages come practical aspects like dual-channel decoding.Both are obstacles that are not in place to deter cold bootattacks, but they still complicate memory acquisitionsignificantly in practice.

We have demonstrated that our explanations and as-sumptions about the internal construction of the IntelDRAM scrambler are in line with the observations wemadefrom our experimental results. Cracking a dual-channelsystem requires only 128 bytes of known plaintext toapply our stencil method and only 50 bytes if the mathe-matical approach is chosen. This is a negligible amount ofdata compared to the huge amount of RAM that is presentin the computer systems of today. Large chunks of the RAMwill usually be set to zero in regular operation of a com-puter, be it either by the operating system or by anyrunning application. We further demonstrated that we areable to correctly deinterleave RAM images. This is a pre-requisite to correct descrambling of dual-channel systems,as each channel has an independent scrambler.

Since the operation of memory scramblers is trans-parent for the system during normal operation, it couldwell be possible that newer MCH revisions choose to usedifferent mechanisms for scrambling. Of particular interestfor forensic investigation would be if our results can beapplied to DDR4 memory as well. This is something wewould like to explore in future work.

In order to improve on our attack, it would be mostinteresting to mathematically attack the generated keystream itself. Since our approach only works with differ-ential streamswe at no point in time are able to reconstructthe original key stream e we only ever reconstruct differ-ential key streams. Our ideas for future work are to utilizecustom-built hardware around an FPGA developmentboard in order to be able to read out the raw key streamfrom a cold booted DDR3memorymodule. This would thenenable brute forcing of key streams by trying differentseeds, but it would also be an attack that would be signif-icantly more difficult than what we show in this work.

Investing this time would be interesting not only froman academical standpoint, but also within a real-worldscenario. The reason that scramblers are present in thefirst place is because Intel deemed it necessary to limitexcessive current spikes on the memory bus and in thememory modules. This leads us to believe that there couldbe possibly exploitable detrimental effects if one couldpurposefully produce these excessive current spikes. If thescrambling key stream were known to an attacker, thiscould be leveraged from any unprivileged application tomount an attack which aims to distort RAM integrity. Sincedisturbing RAM-integrity is a relevant topic and is receiving

increased attention after the inspiring row hammer attacksof Kim et al. (2014), this might prove to be a worthwhileinvestment of time after all.

References

M. Falconer, C. Mozak, A. Norman, Suppressing power supply noise usingdata scrambling in double data rate memory systems, US Patent8,503,678 (Aug. 6 2013). URL http://www.google.com.ar/patents/US8503678.

G. Gould, Address scrambling to simplify memory controller's addressoutput multiplexer, US Patent 7,493,467 (Feb. 17 2009). URL http://www.google.com/patents/US7493467.

Gruhn M, Müller T. On the practicability of cold boot attacks. In: Avail-ability, Reliability and Security (ARES), 2013 Eighth InternationalConference on, IEEE; 2013. p. 390e7.

Halderman JA, Schoen SD, Heninger N, Clarkson W, Paul W, Calandrino JA,et al. Lest we remember: cold-boot attacks on encryption keys.Commun ACM 2009;52(5):91e8.

Hamamoto T, Sugiura S, Sawada S. On the retention time distribution ofdynamic random access memory (DRAM). Electron Devices IEEETrans 1998;45(6):1300e9.

Kim Y, Daly R, Kim J, Fallin C, Lee JH, Lee D, et al. Flipping bits in memorywithout accessing them: an experimental study of DRAM disturbanceerrors. In: Proceeding of the 41st annual international symposium onComputer architecuture. IEEE Press; 2014. p. 361e72.

Lindenlauf S, H€ofken H, Schuba M. Cold boot attacks on DDR2 and DDR3SDRAM. In: Availability, Reliability and Security (ARES), 2015 10thInternational Conference on, IEEE; 2015. p. 287e92.

Liu J, Jaiyen B, Kim Y, Wilkerson C, Mutlu O. An experimental study of dataretention behavior in modern dram devices: implications for reten-tion time profiling mechanisms. ACM SIGARCH Comput Archit News2013;41(3):60e71.

Massey JL. Shift-register synthesis and BCH decoding. Inf Theory IEEETrans 1969;15(1):122e7.

Micron. DDR3 SDRAM datasheet for MT41J256M4, MT41J128M4 andMT41J64M4. 2014. URL, https://www.micron.com/~/media/documents/products/data-sheet/dram/ddr3/1gb_ddr3_sdram.pdf.

C. Mozak, Suppressing power supply noise using data scrambling indouble data rate memory systems, US Patent 7,945,050 (May 172011). URL https://www.google.com.ar/patents/US7945050.

Müller T, Spreitzenbarth M. Frost: forensic recovery of scrambled tele-phones. In: Applied cryptography and network security. Springer;2013. p. 373e88.

Rahmati A, Salajegheh M, Holcomb D, Sorber J, Burleson WP, Fu K. TAR-DIS: time and remanence decay in SRAM to implement secure pro-tocols on embedded devices without clocks. In: Presented as part ofthe 21st USENIX Security Symposium (USENIX Security 12), USENIX,Bellevue, WA; 2012. p. 221e36. URL, https://www.usenix.org/conference/usenixsecurity12/technical-sessions/presentation/rahmati.

Stüttgen J. On the viability of memory forensics in compromised envi-ronments. Ph.D. thesis. Friedrich-Alexander-Universit€at Erlangen-Nürnberg (FAU), Department of Computer Science; 2015.

Stüttgen J, Cohen M. Anti-forensic resilient memory acquisition. DigitInvestig 2013;10:S105e15.

Stüttgen J, V€omel S, Denzel M. Acquisition and analysis of compromisedfirmware using memory forensics. Digit Investig 2015;12(Suppl. 1):S50e60. http://dx.doi.org/10.1016/j.diin.2015.01.010.

van Zandwijk JP. A mathematical approach to NAND flash-memorydescrambling and decoding. Digit Investig 2015;12:41e52.

V€omel S, Freiling FC. A survey of main memory acquisition and analysistechniques for the windows operating system. Digit Investig 2011;8(1):3e22. http://dx.doi.org/10.1016/j.diin.2011.06.002. URL, http://dx.doi.org/10.1016/j.diin.2011.06.002.

V€omel S, Freiling FC. Correctness, atomicity, and integrity: definingcriteria for forensically-sound memory acquisition. Digit Investig2012;9(2):125e37.

V€omel S, St€ottgen J. An evaluation platform for forensic memory acqui-sition software. Digit Investig 2013;10(Supplement):S30e40. http://dx.doi.org/10.1016/j.diin.2013.06.004.

Wyns P, Anderson RL. Low-temperature operation of silicon dynamicrandom-access memories. Electron Devices IEEE Trans 1989;36(8):1423e8.

http://www.google.com.ar/patents/US8503678

http://www.google.com.ar/patents/US8503678

http://www.google.com/patents/US7493467

http://www.google.com/patents/US7493467

http://refhub.elsevier.com/S1742-2876(16)30003-2/sref3






























https://www.micron.com/%7E/media/documents/products/data-sheet/dram/ddr3/1gb_ddr3_sdram.pdf

https://www.micron.com/%7E/media/documents/products/data-sheet/dram/ddr3/1gb_ddr3_sdram.pdf

https://www.google.com.ar/patents/US7945050





https://www.usenix.org/conference/usenixsecurity12/technical-sessions/presentation/rahmati




























Documents

Lest we forget: Cold-boot attacks on scrambled DDR3 memory · 2019-05-30 · Lest we forget: Cold-boot attacks on scrambled DDR3 memory Johannes Bauer*, Michael Gruhn**, Felix C