6
978-1-4673-2921-7/12/$31.00 c 2012 IEEE Efficient Reconfigurable Hardware Architecture for Accurately Computing Success Probability and Data Complexity of Linear Attacks Andrey Bogdanov a , Elif Bilge Kavun b , Elmar Tischhauser c , Tolga Yalçın b a Technical University of Denmark, Dept. of Mathematics, Denmark [email protected] b Ruhr-University Bochum, Embedded Security Group, Germany {elif.kavun, tolga.yalcin}@rub.de c KU Leuven, ESAT/SCD/COSIC and IBBT, Belgium [email protected] Abstract—An accurate estimation of the success probability and data complexity of linear cryptanalysis is a fundamental question in symmetric cryptography. In this paper, we propose an efficient reconfigurable hardware architecture to compute the success probability and data com- plexity of Matsui’s Algorithm 2 which is the central technique in linear cryptanalysis for block ciphers. Using this dedicated architecture, we are able to investigate the complexity of the algo- rithm for up to 40-bit block ciphers for low-correlation lineaer approximations and high advantages. Performing experiments on larger block lengths ensures that any empirical observations are not due to differences in statistical behavior for artificially small block lengths. Rather surprisingly, we observed in previous experiments a significant deviation between the theory and practice for Matsui’s Algorithm 2 for larger block sizes in a vast range of parameters. The new hardware architecture allows us to verify the existing theoretical models for the complexity estimation in linear cryptanalysis. The designed hardware architecture is realized on two Xilinx Virtex-6 XC6VLX240T FPGAs for smaller block lengths, and on RIVYERA platform with 128 Xilinx Spartan-3 XC3S5000 FPGAs for larger block lengths. Keywords-FPGA; Reconfigurable hardware architecture; Cryptanalysis; Data complexity of linear attacks; Success prob- ability of linear attacks I. I NTRODUCTION Modern society heavily relies on the security of communi- cations and information processing. Cryptography is the sci- ence of providing protocols and algorithms for security goals such as confidentiality and authenticity of communications in the presence of adversaries. Dually, cryptanalysis is concerned with determining to what extent an algorithm or protocol actually achieves the intended security level by subjecting them to cryptanalytic attacks. This leads to a continuous interplay between efforts to break cryptographic algorithms and improvements to the design strategies to prevent such attacks. Linear cryptanalysis, introduced by Matsui [1], [2], has been a powerful cryptanalytic technique for almost two decades now. Besides differential cryptanalysis, it can be considered as the most important cryptanalysis technique in symmetric cryptology. Numerous works investigated methods to improve this technique [3]–[7], while others focused on the design of ciphers resistant to linear cryptanalysis [8]–[10]. As a consequence, several new ciphers which are arguably resistant to linear cryptanalysis have emerged [11]–[13]. In response, cryptanalysts extended the field to take advantage of multiple approximations in an effort to revive it. Also, a lot of detailed works have been published in terms of success probability and data complexity estimation [14]–[18]. Analyses of the data complexity and success probability of a linear attack universally require some important idealizing assumptions to be made about the behavior of the cipher [14]. To what extent these assumptions are justified when dealing with a concrete cipher has to be determined experimentally. Since most practically used block ciphers have a block length of 48-64 bits or more, exact experimental verification is infeasible. For this reason, small-scale versions of the ciphers are used instead, which introduces the potential risk that the bigger the gap to the real block length, the less meaningful the experimental results might be. Previously, estimations for the complexity of a linear attack were mostly achieved via software experiments, which severely limited their scale, though the works [19], [20] already used reconfigurable hardware to support the crypt- analytic intuition. In this study, we take a step further and focus on the design of a reconfigurable hardware architecture to accurately and systematically estimate the complexity of linear attacks. The main contribution of our paper is to provide a reconfig- urable hardware architecture for the verification of the data complexity and success probability of linear cryptanalysis, allowing experiments of much larger scale than in previous software-based approaches. The remainder of the paper is organized as follows: In the following section, we briefly describe linear cryptanalysis and its principal algorithm. The data complexity estimation approach will be given in detail in Section III, where we also motivate the need for computation on a hardware platform. In Section IV, we explain our evaluation approach on hardware. In Sections IV-A and IV-B, the hardware architecture is described in detail. Finally, we provide our timing results in Section V. The paper is concluded with future directions.

[IEEE 2012 International Conference on Reconfigurable Computing and FPGAs (ReConFig 2012) - Cancun, Mexico (2012.12.5-2012.12.7)] 2012 International Conference on Reconfigurable Computing

  • Upload
    tolga

  • View
    214

  • Download
    0

Embed Size (px)

Citation preview

978-1-4673-2921-7/12/$31.00 c©2012 IEEE

Efficient Reconfigurable Hardware Architecture forAccurately Computing Success Probability and Data Complexity of Linear Attacks

Andrey Bogdanova, Elif Bilge Kavunb, Elmar Tischhauserc, Tolga Yalçınb

aTechnical University of Denmark, Dept. of Mathematics, [email protected]

bRuhr-University Bochum, Embedded Security Group, Germany{elif.kavun, tolga.yalcin}@rub.de

cKU Leuven, ESAT/SCD/COSIC and IBBT, [email protected]

Abstract—An accurate estimation of the success probabilityand data complexity of linear cryptanalysis is a fundamentalquestion in symmetric cryptography.

In this paper, we propose an efficient reconfigurable hardwarearchitecture to compute the success probability and data com-plexity of Matsui’s Algorithm 2 which is the central techniquein linear cryptanalysis for block ciphers. Using this dedicatedarchitecture, we are able to investigate the complexity of the algo-rithm for up to 40-bit block ciphers for low-correlation lineaerapproximations and high advantages. Performing experimentson larger block lengths ensures that any empirical observationsare not due to differences in statistical behavior for artificiallysmall block lengths.

Rather surprisingly, we observed in previous experimentsa significant deviation between the theory and practice forMatsui’s Algorithm 2 for larger block sizes in a vast range ofparameters. The new hardware architecture allows us to verifythe existing theoretical models for the complexity estimation inlinear cryptanalysis.

The designed hardware architecture is realized on two XilinxVirtex-6 XC6VLX240T FPGAs for smaller block lengths, andon RIVYERA platform with 128 Xilinx Spartan-3 XC3S5000FPGAs for larger block lengths.

Keywords-FPGA; Reconfigurable hardware architecture;Cryptanalysis; Data complexity of linear attacks; Success prob-ability of linear attacks

I. INTRODUCTION

Modern society heavily relies on the security of communi-cations and information processing. Cryptography is the sci-ence of providing protocols and algorithms for security goalssuch as confidentiality and authenticity of communications inthe presence of adversaries. Dually, cryptanalysis is concernedwith determining to what extent an algorithm or protocolactually achieves the intended security level by subjectingthem to cryptanalytic attacks. This leads to a continuousinterplay between efforts to break cryptographic algorithmsand improvements to the design strategies to prevent suchattacks.

Linear cryptanalysis, introduced by Matsui [1], [2], hasbeen a powerful cryptanalytic technique for almost twodecades now. Besides differential cryptanalysis, it can beconsidered as the most important cryptanalysis technique insymmetric cryptology. Numerous works investigated methodsto improve this technique [3]–[7], while others focused on the

design of ciphers resistant to linear cryptanalysis [8]–[10].As a consequence, several new ciphers which are arguablyresistant to linear cryptanalysis have emerged [11]–[13]. Inresponse, cryptanalysts extended the field to take advantageof multiple approximations in an effort to revive it. Also, alot of detailed works have been published in terms of successprobability and data complexity estimation [14]–[18].

Analyses of the data complexity and success probability ofa linear attack universally require some important idealizingassumptions to be made about the behavior of the cipher [14].To what extent these assumptions are justified when dealingwith a concrete cipher has to be determined experimentally.Since most practically used block ciphers have a block lengthof 48-64 bits or more, exact experimental verification isinfeasible. For this reason, small-scale versions of the ciphersare used instead, which introduces the potential risk that thebigger the gap to the real block length, the less meaningfulthe experimental results might be.

Previously, estimations for the complexity of a linearattack were mostly achieved via software experiments, whichseverely limited their scale, though the works [19], [20]already used reconfigurable hardware to support the crypt-analytic intuition. In this study, we take a step further andfocus on the design of a reconfigurable hardware architectureto accurately and systematically estimate the complexity oflinear attacks.

The main contribution of our paper is to provide a reconfig-urable hardware architecture for the verification of the datacomplexity and success probability of linear cryptanalysis,allowing experiments of much larger scale than in previoussoftware-based approaches.

The remainder of the paper is organized as follows: Inthe following section, we briefly describe linear cryptanalysisand its principal algorithm. The data complexity estimationapproach will be given in detail in Section III, where we alsomotivate the need for computation on a hardware platform. InSection IV, we explain our evaluation approach on hardware.In Sections IV-A and IV-B, the hardware architecture isdescribed in detail. Finally, we provide our timing results inSection V. The paper is concluded with future directions.

II. LINEAR CRYPTANALYSIS WITH MATSUI’SALGORITHM 2

We denote by F2 = {0, 1} the finite field with two elementsand the n-dimensional vector space over F2 by F

n2 . A block

cipher is a mapping E : Fn2 × F

κ2 → F

n2 with the property

that Ekdef= E(·, k) is a bijection of F

n2 for every k ∈ F

κ2 . If

y = Ek(x), we refer to x as the plaintext, k as the key and yas the ciphertext of x under the key k. We call n the blocklength of the cipher.

Linear cryptanalysis makes use of so-called linear approx-imations of block ciphers. In general, a linear approximation(α, β) of a vectorial Boolean function f : Fn

2 → Fn2 is an

ordered pair of n-bit masks α and β, which is said to hold withprobability p

def= Prx∈F

n2(αTx = βT f(x)). The deviation of

p from 1/2 is called the bias εdef= p− 1/2.

Matsui’s Algorithm 2 [1], [2] for linear cryptanalysis ofiterated block ciphers now exploits linear relations holdingwith absolute bias |ε| � 0. This is done by using a linearapproximation of some inner rounds to distinguish partial trialdecryptions with the wrong keys from trial decryptions withthe (unknown) correct key over a number N of known plain-text/ciphertext pairs. Suppose the number of target subkey bitsis m. For each possible subkey guess i, we count the numberof times Ti the approximation holds over the N samples.The key candidates are then ranked in increasing order ofthe absolute sample bias |Ti/N − 1/2|. If the correct key isranked among the highest 2l out of the 2m key candidates withprobability PS , we say that the attack provides an advantageof a

def= m − l bits over exhaustive search with success

probability PS [17]. The common case where only the lastround is partially decrypted is depicted in Figure 1.

αTx

βTy

r rounds

last round(s)

Ek(x)

partially guess k

Figure 1. Linear cryptanalysis with Matsui’s Algorithm 2

III. PROBLEM DEFINITION AND OUR PROPOSAL

In his original papers, Matsui estimates the data complexityto be of the order |2ε|−2 (with ε denoting the bias of theused linear approximation), and gives estimations on whatis required to obtain a certain success probability. Furtherstudies by Junod and Vaudenay [14]–[16] deepened thisanalysis. Selçuk [17] further improves it by presenting athorough statistical analysis of the data complexity of linearattacks based on a model of Junod, yielding practical closedformulas for the success probability PS and data complexity

N of a linear attack for a target advantage of a bits overexhaustive key search.

We observed in [21] that the success probability of Matsui’sAlgorithm 2 seems to have been somewhat overestimated,especially in cases where the adversary attempts to use alinear approximation with a low bias, to attain a high compu-tational advantage over brute force, or both. Our experimentalstudy [21] indicates that in these cases, deviations of a factorof two in the data complexity (corresponding to an error of100%) can arise compared to the model of Selçuk. Similarly,in some cases, the maximum achievable advantage has beenoverestimated by up to 15 bits. Note that these are preciselythe cases of the greatest practical interest, since attacks areusually trying to cover as many rounds as possible.

This, in fact, is the main bottleneck in many attacks andmodels published so far: As soon as the parameters reach thepractically most interesting ranges (i.e., attacks pushed to asmany rounds as possible), the data and time requirementsbecome so high that it becomes practically infeasible toaccurately and efficiently verify the success probability andcomplexity of the attack as predicted by the theoretical model.For example, in our study, it takes about 10 minutes tocalculate biases for a 20-bit cipher on a AMD Opteron 6276cluster. On the other hand, key evaluation takes about 8 dayson the same cluster. The same type of evaluation would takeabout 23,000 years for a 40-bit cipher.

Clearly, more accurate estimates for the success probabilityand data complexity of linear attacks require a completelynew approach: Hardware has already proved to be a successfulcryptanalysis tool, especially in brute-force attacks, whereseveral instances of a block cipher are run in parallel in real-time. This suggests that a hardware-aided approach might beuseful for the problem at hand, too.

Extending the real-time performance of hardware platformsto more theoretical aspects of cryptanalysis is still a largelyunexplored research area. Recently, experimental verificationsof various theoretical attacks have been published [22], [23].However, none of these works involves utilization of hardwarefor development of theoretical models. In our work, wedevelop parameterizable array architectures of block ciphersand implement them on reconfigurable hardware platformssuch as FPGAs, which can allow us to evaluate several crucialparameters instrumental for linear cryptanalysis in practicaltime frames. In particular, the use of hardware can enable usto perform experiments with block lengths of up to n = 40bits.

IV. DESIGN AND IMPLEMENTATION

The main advantage of hardware accelerated cryptanalysisis the mass parallelism achieved by instantiation of severalinstances of the basic cryptanalytic core. However, this hasto be accompanied by optimized design of the cryptanalyticcore and the internal modules – both in terms of area andspeed (i.e. time-area product). Therefore, we have started ourdesign process by investigating the basic modules generatedby the Xilinx CORE Generator. While that approach providesa very quick turnaround time, the automatically generated

modules failed to compete with hand-tailored designs interms of the time-area product. As a result, we decided tomanually implement all modules from scratch without COREGenerator.

In our design process, we apply a two step approach:In the first step, we evaluate several (α, β) masks. We

start by constructing a wide set of masks that are goodcandidates. We then apply these masks to various number ofrounds of a chosen block cipher using a set of 128 randomlychosen keys (which we have observed to be adequate forstatistical purposes). As the block cipher, we have chosenSMALLPRESENT [24], a family of small-scale versions ofthe lightweight block cipher PRESENT [25] with variousblock lengths (n = 16 to n = 40 bits in 4-bit increments)and number of rounds (R = 8 to R = 12). For eachkey and mask combination, a counter Ti is kept. We applyall possible plaintexts (2n) to the cipher and increment thecounter whenever masked plaintext (αTx) matches maskedciphertext (βT y = βT f(x)). The collected counter values,which correspond to probabilities, determine the masks thatprovide high bias values. In Section IV-A, this first step isexplained in detail.

In the second step, we apply the attack scenario explainedabove. In our attack, we first choose a small set of masks(and corresponding number of rounds and cipher size) fromthe first step with the highest bias values. For each keycandidate, N randomly generated plaintexts are applied tothe attacked cipher. The distribution of the plaintexts mustbe uniform. This is achieved via a 64-bit Tiny MersenneTwister pseudorandom number generator with 2128 period.As explained before, a counter is kept for each key candidate,and in the end the advantage is determined via the rankingof the highest 2l of the 2m key candidates. m can be as highas 216. Section IV-B explains the second step in detail.

A. Mask Evaluation ArchitectureIn the first step, which is the mask evaluation part,

we try 128 selected keys. For fast execution, the keys tobe tried are hard-coded in the design – which means wehave 128 separate key-specific units. Each key-specific unitmainly consists of three different types of modules: An n-bitSMALLPRESENT cipher round module (Figure 2), a moduleto mask plaintext with each n-bit α-mask (Figure 3), anda counter module which handles the masking of respectiveSMALLPRESENT round output with each n-bit β-mask andincrements the counter Ti if the relation αTx = βT y holds(Figure 4).

Rn

n

subkey

data_inn

data_out

Figure 2. One of the SMALLPRESENT round modules in mask evaluationarchitecture

α -mult

n

αi

n

n

x

1αi . x (8 times delayed)

1

start

1 n

start (delayed) x (delayed)

1

1start (8 times delayed)

1

αi . x

Figure 3. One of the α multiplication modules in mask evaluationarchitecture

β - mult

n

βj

n

n

y

start

n

y (delayed)

1

1 start (delayed)

βj . y

1

αi . x1

αi . x (delayed)1

Ti

Tin

Tout

n

n

shift

1

incrementenable

Figure 4. One of the counter modules in mask evaluation architecture

The n-bit SMALLPRESENT cipher round module simplyhandles one round operation of PRESENT cipher: R blockrealizes the substitution and permutation layers and it isfollowed by the key addition. The round definition is differentthan the original PRESENT round definition, the key-addition-only last round is now at the beginning to have a modulararchitecture. As a result of this change, the order of operationswithin the block is also changed; the key addition now comesafter the substitution and permutation operations.

As can be seen in Figure 5 (where we show the overallmask evaluation unit for one key), at the input of our key-specific unit we have an n-bit plaintext register which startsfrom 0 and increments by 1 in each clock cycle to applyall possible 2n plaintexts. The plaintext goes through the n-bit SMALLPRESENT cipher round modules, which output theencrypted values after 8th, 9th, 10th, 11th, and 12th rounds.These round outputs are input to the counter modules, wherewe mask them with each n-bit β-mask in β-mult blocks(depicted as βj × y = βjy in Figure 4). The plaintext is inparallel sent to plaintext masking modules, to be masked byeach n-bit α-mask in α-mult blocks (depicted as αi×x = αixin Figure 3). All corresponding mask pairs are also comparedin counter modules to check if the relation αTx = βT yholds. If so, the counter Ti is incremented by 1 (via settingthe increment enable in Figure 4 to 1). Note that, we keepseparate counters for each key and mask combination. These

nstart

Ptext01

1

01 0

active1 x

α -multn 1

1

α0 . x

α - mult

nα63

n 1

1

α63 . x

R1

nk1

nk0

nα0

β - multn 1

β0 . y

Ti

shift1

incrementenable

β - multn 1

β0 . y

Ti

shift1

incrementenable

β -multn 1

β0 . y

Ti

shift1

incrementenable

β -mult

nβ63

n 1β63 . y

Ti

1

incrementenable

β -mult

nβ63

n 1β63 . y

Ti

1

incrementenable

β - mult

nβ63

n 1β63 . y

Ti

shift1

incrementenable

shift shift Tout

R8

nk8

R9

nk9

R12

nk12

nβ0 Tin

n nβ0

nβ0

Figure 5. Mask evaluation unit

counters are connected to each other in a chain structure inorder to simplify result extraction in the end (Tin of Figure 4is the input counter value coming from the previous moduleof the chain and Tout of Figure 4 is the output counter valuegoing to the next module of the chain).

In the current design, we have 12 SMALLPRESENT mod-ules (for 12 rounds of SMALLPRESENT), 64 α multiplicationmodules, and 64×5 counter modules – which are all con-nected to each other. Note that, the number of the α andcounter blocks depend on the number of evaluated masks.As a result of having such a large circuit, we are not fastenough. To have a shorter running time while finding a goodset of masks, we need aggressive pipelining to speed upthe circuit. Hence, we follow a modular approach so thatwe can apply proper and parameterizable pipelining betweenblocks. We have 13 stages of pipeline horizontally and 64stages of pipeline vertically. Horizontal pipelines depend onthe pipeline stage in each round of SMALLPRESENT while thevertical pipelines depend on the number of masks applied.

As the mask evaluation part takes considerably less timewith respect to key evaluation part, we were able to implementit on two Xilinx Virtex-6 XC6VLX240T boards. Differentnumber of units were implemented on FPGAs for each ciphersize, which determined the required time to try all given keysfor each cipher size. In Section V, Table I summarizes theperformance of the architecture for each cipher size in termsof evaluation time for 128 keys.

B. Key Evaluation Architecture

For the second – key evaluation – step, initially a smallset of masks with corresponding number of rounds andcipher size are chosen from the first step. The chosen set ofmasks should have the highest bias values. We then apply Nrandomly generated plaintexts to the attacked cipher for each

candidate. Note that, the distribution of the applied randomplaintexts should be uniform, we therefore use a 64-bit TinyMersenne Twister pseudorandom number generator with 2128

period. In addition to the pseudorandom number generator,we have an R round SMALLPRESENT core – the Oracle, andseveral partial attack modules (Figure 6) running in parallel.The Oracle module provides the corresponding ciphertextfor a given input. The input to the Oracle comes from thePseudorandom Number Generator and its result is sent tothe partial attack modules.

R

R-1

R-1

plaintextkey0ciphertextkey0

plaintext

ciphertext

α -mult

R-1 β -multn

β

plaintext(delayed)

ciphertext(delayed)

Ti

start

start (delayed)

Tin

Tout

shift

Figure 6. One of the partial attack modules in key evaluation architecture

The partial attack modules implement one-round encryp-tion R on the plaintext and two-round decryption R−1 –which is the inverse of round encryption – on the ciphertext,using the m-bit partial keys. The partial encryption anddecryption results (x, y pairs) are masked with α in α-multand with β in β-mult, respectively. For each αTx = βT ymatch, the counter inside the partial attack core, Ti, isincremented. These counters are also chain-connected foreasy result extraction in a similar way to the mask evaluationpart (Tin of Figure 6 is the input counter value coming fromthe previous module of the chain and Tout of Figure 6 is the

PseudorandomNumber Generatorstart

1

Oracle

R

R-1

R-1

plaintextkey0

ciphertextkey0

α - mult

R-1 β - multn

βTi

Tin

shift

R

R-1

R-1

α - mult

R-1 β - multn

βTi shift

1 15

Tout

11

NN

N+1 N+15

1 15

15

14

14 16

N+16

N+15N+14

N+14plaintextkeyNciphertextkeyN

Figure 7. Key evaluation unit

output counter value going to the next module of the chain).Figure 7 shows the overall key evaluation unit in detail.

m-bit partial keys are hard-coded into each core, as we didin the previous case. In addition to the delay in each roundinside the cipher oracle and partial attack modules, pipelinedelay stages are implemented between and inside the partialattack modules in order to achieve a high frequency. Althoughthe partial attack modules are much less complex compared tothe mask evaluation cores, due to the high number of partialkeys to be evaluated (2m), much more intensive parallelism isrequired, especially for cipher sizes n > 32 bit. Application ofsuch an aggressive pipelining, of course, results in an increasein the area consumption. Even though the platform used forthe mask evaluation is still enough for small cipher sizes,we need larger platforms for larger cipher sizes to keep theexecution time practical.

Therefore, we have implemented key evaluation on twodifferent platforms. For small cipher sizes (≤ 24 bits) westill use the same Xilinx Virtex-6 XC6VLX240T boards. Forlarger sizes, we use the RIVYERA platform [26] with 128Xilinx Spartan-3 XC3S5000 FPGAs. Table II in Section Vsummarizes the performance of our architecture for eachcipher size in terms of evaluation time for m = 216 partialkeys.

V. RESULTS AND PERFORMANCE EVALUATION

Timing results for the proposed evaluation architectures arepresented in Tables I and II, where number of cores is the totalnumber of mask and key evaluation modules implementedon FPGAs. Note the time reduction from 8 days down to 55msec, and 23,000 years down to 54 minutes.

VI. CONCLUSION AND FUTURE WORK

In this paper, we proposed a hardware architecture forthe accurate estimation of the success probability and data

Table IMASK EVALUATION PERFORMANCE

Mask EvaluationCipher No. of Freq. Eval.

Block Length Cores (MHz) Time

16 bits 66(V irtex) 455 288 μsec20 bits 50(V irtex) 434 7 msec24 bits 32(V irtex) 402 167 msec28 bits 40(V irtex) 328 3 sec32 bits 40(V irtex) 325 53 sec36 bits 32(V irtex) 302 15 min40 bits 28(V irtex) 300 5 hr

Table IIKEY EVALUATION PERFORMANCE

Key EvaluationCipher No. of Freq. Eval.

Block Length Cores (MHz) Time

16 bits 6144(V irtex) 302 2 msec20 bits 3584(V irtex) 182 55 msec24 bits 3584(V irtex) 185 167 msec28 bits 4096(V irtex) 262 8 sec32 bits 4096(V irtex) 206 3 min36 bits 98304(Spartan) 168 4 min40 bits 98304(Spartan) 172 54 min

complexity of linear cryptanalysis in Matsui’s Algorithm 2.This architecture allows conducting experiments for largerblock lengths of up to n = 40 bits. Previous theoretical studiesand experiments on smaller block lengths [21] indicated thatespecially in the practically relevant cases where the attackerattempts to exploit a bias close to 2−n/2 or targets a highadvantage over exhaustive search, predictions by existing

theoretical models seem to have significantly overestimatedthe success probability or the achievable advantage. Ourhardware architecture can be helpful in the development ofnew models for success probability and data complexity oflinear attacks in further studies.

Acknowledgements: This work has been supported in partby the IAP Programme P6/26 BCRYPT of the Belgian State,by the European Commission under contract number ICT-2007-216676 ECRYPT NoE phase II, by KU Leuven-BOF(OT/08/027) and by the Research Council KU Leuven (GOATENSE). Elmar Tischhauser is a research assistant of theFonds Wetenschappelijk Onderzoek (FWO) Vlaanderen.

REFERENCES

[1] M. Matsui, “Linear Cryptanalysis Method for DES Cipher,”in EUROCRYPT, ser. Lecture Notes in Computer Science,T. Helleseth, Ed., vol. 765. Springer, 1993, pp. 386–397.

[2] ——, “The First Experimental Cryptanalysis of the Data En-cryption Standard,” in CRYPTO, ser. Lecture Notes in Com-puter Science, Y. Desmedt, Ed., vol. 839. Springer, 1994, pp.1–11.

[3] S. K. Langford and M. E. Hellman, “Differential-Linear Crypt-analysis,” in CRYPTO, ser. Lecture Notes in Computer Science,Y. Desmedt, Ed., vol. 839. Springer, 1994, pp. 17–25.

[4] A. Biryukov, C. D. Cannière, and M. Quisquater, “On MultipleLinear Approximations,” in CRYPTO, ser. Lecture Notes inComputer Science, M. K. Franklin, Ed., vol. 3152. Springer,2004, pp. 1–22.

[5] M. Hermelin, J. Y. Cho, and K. Nyberg, “MultidimensionalExtension of Matsui’s Algorithm 2,” in FSE, ser. Lecture Notesin Computer Science, O. Dunkelman, Ed., vol. 5665. Springer,2009, pp. 209–227.

[6] A. Bogdanov and M. Wang, “Zero correlation linear cryptanal-ysis with reduced data complexity,” in FSE, ser. Lecture Notesin Computer Science, A. Canteaut, Ed., vol. 7549. Springer,2012, pp. 29–48.

[7] A. Bogdanov, G. Leander, K. Nyberg, and M. Wang, “Integraland multidimensional linear distinguishers with correlationzero,” in ASIACRYPT’12, ser. Lecture Notes in ComputerScience, X. Wang and K. Sako, Eds., vol. 7658. Springer,2012.

[8] M. Matsui, “New Structure of Block Ciphers with ProvableSecurity Against Differential and Linear Cryptanalysis,” inFSE, ser. Lecture Notes in Computer Science, D. Gollmann,Ed., vol. 1039. Springer, 1996, pp. 205–218.

[9] J. Daemen, L. R. Knudsen, and V. Rijmen, “Linear Frameworksfor Block Ciphers,” Des. Codes Cryptography, vol. 22, no. 1,pp. 65–87, 2001.

[10] J. Daemen and V. Rijmen, “The Wide Trail Design Strategy,”in IMA Int. Conf., ser. Lecture Notes in Computer Science,B. Honary, Ed., vol. 2260. Springer, 2001, pp. 222–238.

[11] M. Matsui, “New Block Encryption Algorithm MISTY,” inFSE, ser. Lecture Notes in Computer Science, E. Biham, Ed.,vol. 1267. Springer, 1997, pp. 54–68.

[12] J. Daemen and V. Rijmen, “The Design of Rijndael: AES –The Advanced Encryption Standard,” 2002.

[13] R. Anderson, E. Biham, and L. R. Knudsen, “A Proposalfor the Advanced Encryption Standard,” NIST AES proposal,1998.

[14] P. Junod, “On the Complexity of Matsui’s Attack,” in Se-lected Areas in Cryptography, ser. Lecture Notes in ComputerScience, S. Vaudenay and A. M. Youssef, Eds., vol. 2259.Springer, 2001, pp. 199–211.

[15] ——, “On the Optimality of Linear, Differential, and Sequen-tial Distinguishers,” in EUROCRYPT, ser. Lecture Notes inComputer Science, E. Biham, Ed., vol. 2656. Springer, 2003,pp. 17–32.

[16] P. Junod and S. Vaudenay, “Optimal Key Ranking Proceduresin a Statistical Cryptanalysis,” in FSE, ser. Lecture Notes inComputer Science, T. Johansson, Ed., vol. 2887. Springer,2003, pp. 235–246.

[17] A. A. Selçuk, “On Probability of Success in Linear andDifferential Cryptanalysis,” J. Cryptology, vol. 21, no. 1, pp.131–147, 2008.

[18] C. Blondeau, B. Gérard, and J.-P. Tillich, “Accurate Estimatesof the Data Complexity and Success Probability for VariousCryptanalyses,” Des. Codes Cryptography, vol. 59, no. 1-3,pp. 3–34, 2011.

[19] S. Kerckhof, B. Collard, and F.-X. Standaert, “FPGA Imple-mentation of a Statistical Saturation Attack against PRESENT,”in AFRICACRYPT, ser. Lecture Notes in Computer Science,A. Nitaj and D. Pointcheval, Eds., vol. 6737. Springer, 2011,pp. 100–116.

[20] G. Rouvroy, F.-X. Standaert, J.-J. Quisquater, and J.-D. Legat,“Efficient Uses of FPGAs for Implementations of DES and ItsExperimental Linear Cryptanalysis,” IEEE Trans. Computers,vol. 52, no. 4, pp. 473–482, 2003.

[21] A. Bogdanov and E. Tischhauser, “On the Wrong Key Ran-domization and Key Equivalence Hypotheses in Matsui’s Al-gorithm 2,” submitted.

[22] I. Dinur, T. Güneysu, C. Paar, A. Shamir, and R. Zimmermann,“Experimentally Verifying a Complex Algebraic Attack onGrain-128 Cipher Using Dedicated Reconfigurable Hardware,”in SHARCS, 2012.

[23] A. Bogdanov, E. B. Kavun, C. Paar, C. Rechberger, and T. Yal-cin, “Better than Brute-Force Optimized Hardware Architecturefor Efficient Biclique Attacks on AES-128,” in SHARCS, 2012.

[24] G. Leander, “Small Scale Variants of the Block CipherPRESENT,” IACR ePrint Report 2010/143, Tech. Rep. 143,2010, http://eprint.iacr.org/2010/143.

[25] A. Bogdanov, L. R. Knudsen, G. Leander, C. Paar,A. Poschmann, M. J. B. Robshaw, Y. Seurin, and C. Vikkelsoe,“PRESENT: An Ultra-Lightweight Block Cipher,” in CHES,ser. Lecture Notes in Computer Science, P. Paillier and I. Ver-bauwhede, Eds., vol. 4727. Springer, 2007, pp. 450–466.

[26] SciEngines, “RIVYERA S3-5000.” [Online]. Available:http://www.sciengines.com/products/com-puters-and-clusters/rivyera-s3-5000.html