Side channel attacks - EURECOMsoc.eurecom.fr/HWSec/lectures/side_channels/main.pdf · 2020. 4. 30. · Side channel attacks R. Pacalet [email protected] April 30,

Hardware Security

Side channel attacks

R. [email protected] 30, 2020

Outline

Introduction

Timing attacksP. KocherOptimizationsConclusion

Power attacksIntroductionSimple Power Analysis (SPA)Differential Power Analysis (DPA)Wrap up on DPAExercises on DPAConclusion

2/52 Institut Mines-Telecom R. Pacalet April 30, 2020

Introduction


Hardware leaks information

Eventually, security is always implemented in hardwareElectronic devices consume power, take time to compute and emit electromagneticradiations (not mentioning temperature, noise. . . )These “side-channels” are usually correlated with the processingIn security applications the side-channels can be used to retrieve embedded secretsA few hundreds of power traces can be sufficient to retrieve a secret key from atheoretically unbreakable systemUnlike in quantum cryptography the information leakage is usually undetectable

• True for time• Almost true for power


A bit of history1956: MI5 against Egyptian Embassy in London (click-sound of the Hagelinenciphering machine)1996: P. Kocher time-attacks RSA, DH, DSS (applied with success in 2003 againstOpenSSL 0.9.6)1999: P. Kocher power-attacks DES, AES, etc. (SPA, DPA). Successful against smartcards, FPGAs,. . .2000-: New attacks (SEMA, DEMA, TPA).Importance of hardware security increases (CHES)
















History repeats itself

2013, 18th of December: Daniel Genkin (Technion and Tel Aviv University), AdiShamir (Weizmann Institute of Science), Eran Tromer (Tel Aviv University) RSA KeyExtraction via Low-Bandwidth Acoustic Cryptanalysis.Target: a regular computer computing the GnuPG’s current implementation of RSA.Vibrations of electronic components (capacitors and coils) in the voltage regulationcircuit as they regulate the voltage accross large fluctuations in power consumption.http://www.cs.tau.ac.il/~tromer/acoustic/


http://www.cs.tau.ac.il/~tromer/acoustic/


The attack can extract full 4096-bit RSA decryption keys from laptop computers (ofvarious models), within an hour, using the sound generated by the computer duringthe decryption of some chosen ciphertexts. We experimentally demonstrate that suchattacks can be carried out, using either a plain mobile phone placed next to thecomputer, or a more sensitive microphone placed 4 meters away.









http://www.cs.tau.ac.il/~tromer/acoustic/img/gnupg-manykeys-downshifted.mp3 (the RSA Paso Doble)


http://soc.eurecom.fr/HWSec/misc/gnupg-manykeys-downshifted.mp3




Even if a memory location is only accessed duringout-of-order execution, it remains cached. Iteratingover the 256 pages of probe array shows one cache hit,exactly on the page that was accessed during the out-of-order execution.

https://meltdownattack.com/


https://meltdownattack.com/

Timing attacks


Timing attacks / P. Kocher


History

First published by Paul Kocher (CRYPTO’96)Implemented by Dhem, Quisquater, et al. (CARDIS’98)Used by Canvel, Hiltgen, Vaudenay, and Vuagnoux to attack OpenSSL (CRYPTO’03)


Example #1Exponentiation MD

D = dw−1dw−2 . . .d2d1d0, w-bits exponentdw−1: Most Significant Bit (MSB), d0: LSB

D = d0 +d1 ×2+d2 ×4+ . . .+dw−2 ×2w−2 +dw−1 ×2w−1

MD =Md0+d1×2+d2×4+...+dw−2×2w−2+dw−1×2w−1

=Md0 ×Md1×2 ×Md2×4 × . . .×Mdw−2×2w−2 ×Mdw−1×2w−1

=Md0 ×(Md1

)2 ×(Md2

)4 × . . .×(Mdw−2

)2w−2

×(Mdw−1

)2w−1

=Md0 ×(Md1 ×

(Md2

)2)2

× . . .×(Mdw−2

)2w−2

×(Mdw−1

)2w−1

=Md0 ×

Md1 ×Md2 ×

Md3 ×(

. . .×(Mdw−2 ×

(Mdw−1

)2)2

. . .

)222

2


Example #1Exponentiation, pseudo-codeMultiplication and square algorithm

Algorithm 1 Exponentiation1: A(w)← 12: for k ←w −1 . . .0 do . loop from MSB to LSB of D3: if dk = 1 then . bit #k of D (#w −1 is MSB)4: B(k)←A(k +1)×M . Multiplication5: else6: B(k)←A(k +1)7: end if8: A(k)←B(k)2 . Square9: end for

10: MD =B(0)


Example #1Modular exponentiation, pseudo-codeTA prone? Why? What can we expect from TA?

Algorithm 2 Modular exponentiation (modexp)1: A(w)← 12: for k ←w −1 . . .0 do . loop from MSB to LSB of D3: if dk = 1 then . bit #k of D (#w −1 is MSB)4: B(k)← (A(k +1)×M) mod N . Modular multiplication5: else6: B(k)←A(k +1)7: end if8: A(k)←B(k)2 mod N . Modular square9: end for

10: MD =B(0)


Example #2What if timing of modular product and square is data-dependant?What can we expect from TA?

Algorithm 3 Modular exponentiation (modexp)1: A(w)← 12: for k ←w −1 . . .0 do . loop from MSB to LSB of D3: if dk = 1 then . bit #k of D (#w −1 is MSB)4: B(k)← (A(k +1)×M) mod N . Modular multiplication5: else6: B(k)←A(k +1)7: end if8: A(k)←B(k)2 mod N . Modular square9: end for

10: MD =B(0)


Example #2: P. Kocher attack

A(w)← 1for k ←w −1 . . .b+1 do

. . .A(k)←B(k)2 mod N

end forif db = 1 then

B(b)← (A(b+1)×M) mod Nelse

B(b)←A(b+1)end ifA(b)←B(b)2 mod Nfor k ← b−1 . . .0 do

. . .A(k)←B(k)2 mod N

end forMD =B(0)

Victim computes n modexp with different knownplaintexts M[i], 1≤ i ≤ nFor each 1≤ i ≤ n attacker measure total modexp timet [i]= time

(M[i]D mod N

)Attacker already knows leading bits dw−1 . . .db+1 of D,w > b ≥ 0 (none if b =w −1)For each M[i], attacker can compute A(b+1)[i](thanks to knowledge of M[i], dw−1 . . .db+1)For each M[i], attacker measure or estimate time of(A(b+1)[i]×M[i]) mod NAssume for some M[i], (A(b+1)[i]×M[i]) mod Nextremely slow (fast) and attacker can distinguish“slow”, “average” and “fast” cases



A(w)← 1for k ←w −1 . . .b+1 do

. . .A(k)←B(k)2 mod N




. . .A(k)←B(k)2 mod N

end forMD =B(0)

If t [i] large (slow) when (A(b+1)[i]×M[i]) mod N slow⇒ bit db probably 1

• B(b)[i]← (A(b+1)[i]×M[i]) mod N probably computed⇒ t ↑

Else db probably 0• B(b)[i]← (A(b+1)[i]×M[i]) mod N probably skipped⇒ t

The larger the difference, the higher the attacker’sconfidenceDo you understand difference with example #1?



A(w)← 1for k ←w −1 . . .b+1 do

. . .A(k)←B(k)2 mod N




. . .A(k)←B(k)2 mod N

end forMD =B(0)

∀ 1≤ i ≤ n, M[i]: plaintext #i

T [i]: measured time for M[i]D

T [i]= e[i]+∑k=w−1k=0 t [i ,k ]

• e[i]: measurement error• t [i ,k ]: time of iteration k of M[i]D

• Attacker knows T [i], not e[i] or t [i ,k ]

Compute w −1−b first iterations• M[i],N ,dw−1 . . .db+1 knownàtm[i ,b]: attacker’s timing estimate for

A(b+1)[i]×M[i] mod N



A(w)← 1for k ←w −1 . . .b+1 do

. . .A(k)←B(k)2 mod N




. . .A(k)←B(k)2 mod N

end forMD =B(0)

∀ i ∈ Islow , ∀ j ∉ Islow , àtm[i ,b]> àtm[j ,b]|Islow | = n/10

∀ i ∈ Ifast , ∀ j ∉ Ifast , àtm[i ,b]< àtm[j ,b]|Ifast | = n/10

Tslow =∑

i∈Islow (T [i])n/10

Tfast =∑

i∈Ifast(T [i])

n/10∆=Tslow −Tfast



A(w)← 1for k ←w −1 . . .b+1 do

. . .A(k)←B(k)2 mod N




. . .A(k)←B(k)2 mod N

end forMD =B(0)

∆> τ⇒ db = 1∆< τ⇒ db = 0τ: thresholdContinue with next bit (b−1) until all bits of D knownAttack targets one bit at a timeAttack efficiency depends on:

• Number n of experiments• Variability of àtm[i ,b] (data dependency)• Noise e[i]• |Islow | and |Ifast | (10% in example)• τ


P. Kocher attack on RSAREF

Mult.:

E(tmult )≈ 1167.8×10−6s

σ(tmult )≈ 12.01×10−6s

Exp.:

E(texp)≈ 419901×10−6s

σ(texp)≈ 235×10−6s



RSAREF (functional) reference software library512 bits modular exponentiation, 256 bits exponent (to speed up experiments)With 250 timing measurements probability of correct decision at any step of attack:

X 0.885



RSAREF (functional) reference software library512 bits modular exponentiation, 256 bits exponent (to speed up experiments)With 250 timing measurements probability of correct decision at any step of attack:

X 0.885


OpenSSL BN library

Mult.:E(tmult )≈ 19929 ccσ(tmult )≈ 130 cc

Exp.:E(texp)≈ 482206 ccσ(texp)≈ 11453 cc


Timing attacks / Optimizations


Optimizations

A(w)← 1for k ←w −1 . . .b+1 do

. . .A(k)←B(k)2 mod N




. . .A(k)←B(k)2 mod N

end forMD =B(0)

Cross-check on modular square timing

Once db known, verification on �ts[i ,b]• Attacker’s timing estimate for Bb[i]

2 mod N

In average, if total time T [i]• Large (small) when àts[i ,b] large (small)?� Yes: better confidence in decision� No: doubt about decision


Optimizations

A(w)← 1for k ←w −1 . . .b+1 do

. . .A(k)←B(k)2 mod N




. . .A(k)←B(k)2 mod N

end forMD =B(0)

Detect wrong decisionsdb guessed incorrectly⇒ ( áA(k ≤ b)[i], áB(k ≤ b)[i]) 6= (A(k ≤ b)[i],B(k ≤ b)[i])Correlation with measured time not observable anymoreAttack improvement

• Keep list of decisions• Keep likelihood Tslow −Tfast• Likelihood-driven back-tracking• Hard decisions → soft decisions• Resembles channel decoding• More memory and CPU usage• Reduce number of experiments


Optimizations

A(w)← 1for k ←w −1 . . .b+1 do

. . .A(k)←B(k)2 mod N




. . .A(k)←B(k)2 mod N

end forMD =B(0)

t [i ,k ]: timing of iteration k of M[i]D mod N�t [i ,k ]: attacker’s estimateVariance of timing residue

U[i]=k=w−1∑k=b+1

�t [i ,k ]∆[i]=T [i]−U[i]

= e[i]+k=w−1∑

k=0t [i ,k ]−

k=w−1∑k=b+1

�t [i ,k ]= e[i]+

k=b∑k=0

t [i ,k ]+k=w−1∑k=b+1

(t [i ,k ]−�t [i ,k ])


Optimizations

A(w)← 1for k ←w −1 . . .b+1 do

. . .A(k)←B(k)2 mod N




. . .A(k)←B(k)2 mod N

end forMD =B(0)

If dw−1 . . .db+1 correct

át [i ,k > b]≈ t [i ,k > b]

∆[i]= e[i]+k=b∑k=0

t [i ,k ]+k=w−1∑k=b+1

(t [i ,k ]−�t [i ,k ])≈ e[i]+

k=b∑k=0

t [i ,k ]

Vari(∆)≈Vari(e[i])+Vari

(k=b∑k=0

t [i ,k ]

)≈Var(e)+ (b+1)×Var(t)

Var(∆) ↓ when b ↓


Optimizations

A(w)← 1for k ←w −1 . . .b+1 do

. . .A(k)←B(k)2 mod N




. . .A(k)←B(k)2 mod N

end forMD =B(0)

If dw−1 . . .da+1 correct but. . .. . . da . . .db+1 wrong for some a≥ b+1

át [i ,k > a]≈ t [i ,k > a], át [i ,a≥ k > b] 6= t [i ,a≥ k > b]

∆[i]= e[i]+k=b∑k=0

t [i ,k ]+k=w−1∑k=b+1

(t [i ,k ]−�t [i ,k ])≈ e[i]+

k=b∑k=0

t [i ,k ]+k=a∑

k=b+1(t [i ,k ]−�t [i ,k ])

≈ e[i]+k=a∑k=0

t [i ,k ]−k=a∑

k=b+1

�t [i ,k ]Var(∆)≈Var(e)+ (a+1)×Var(t)+ (a−b)×Var(t)

≈Var(e)+ (2×a−b+1)×Var(t)

Var(∆) ↑ when b ↓22/52 Institut Mines-Telecom R. Pacalet April 30, 2020

Timing attacks / Conclusion


Wrap up on TA (1/2)

Where does it come from?• Time: processing data takes time

How does it work?Acquisition phase

• Same secret• Sufficient number of experiments with different input messages• Build database of {input, time} pairs (works also with outputs, how?)

Analysis phase (usually off-line)• Attacker tries to retrieve part s of secret (e.g. 1 bit)• Attacker builds “timing models” TMg(M)

– Of one part of computation with input M (e.g. 1 modmul)– Under assumption that s = g

• Attacker estimates correlations between the TMg and the T• TMg with best correlation ⇒ g best candidate for s


Wrap up on TA (2/2)In our example where we attack one exponent bit at a time

• For each attacked bit b, 2 timing models:• TM0(M[i]): modmul time in iteration b of M[i]D mod N if db = 0• TM1(M[i]): modmul time in iteration b of M[i]D mod N if db = 1• Note: TM0(M[i])= 0

Correlation(TM1,T )>Correlation(TM0,T )⇒ b = 1Correlation(TM1,T )<Correlation(TM0,T )⇒ b = 0Tslow −Tfast is an estimator of correlation between TM1 and TNote: there are much better correlation estimators

• Pearson correlation coefficient• Kolmogorov-Smirnov test, . . .

Whatever the statistical tool, principle remains the same:• TMg with best correlation ⇒ g best candidate for s

Analysis usually off-line• But interactive, adaptive attacks also exist


Pearson correlation coefficient

A better statistical tool than partitionings: portion of the secret under attackn: number of time measurementsT [i] (1≤ i ≤ n): time measurementsM[i] (1≤ i ≤ n): input messagesTMg(M[i]): attacker’s timing model for M[i] and guess g on sPCCi(T ,TMg): estimator of correlation between T [i] and TMg(M[i])g with highest PCCi(T ,TMg)⇒ best candidate for s


Exercise on timing attacks

? Exercise #1: list the hypotheses for a timing attack to be practical


Homework

Imagine countermeasures, evaluate their cost and efficiencyStudy and understand P. Kocher paper and especially the blinding countermeasurehe proposes (section 10)Prepare the first lab

• Read directions• Look at provided software libraries• Imagine what you will do, why and how

Read 2018 exam problem: «La Blaisine and Cardinal de Richelieu»Prepare questions


Questions?


Power attacks


Power attacks / Introduction


Power in CMOS logic (1/2)Power consumption and cell output transition are correlated

S:

E:

S rising edge observed

resistor: I(VDD)through RP spying

I↑= Ishort + IL


Power in CMOS logic (2/2)Power consumption and cell output transition are correlated

S:

E:

S falling edge observed I↓= Ishortthrough RP spyingresistor: I(VDD)


Power analysis setup (1/2)

Deviceunderattack

Power supplyOffline analysis

Recorded traces


Power analysis setup (2/2)


Power attacks / Simple Power Analysis (SPA)


Simple power analysis, example #1

The power trace of a “naive” DES hardware implementation leaks a lot of information• CMOS structures consume power when switching• Hamming distance of register transitions• Hamming weights in some implementations• Clock spikes









Multiply by double and addalgorithmIterate on key bits from MSB toLSBC ←M ×K

0 0 01 1 1 1

Algorithm 4 The double and add algorithm1: A−1 ← 12: for k ← 0,w −1 do3: if K (k)= 1 then4: Bk ←Ak−1 +M . Add5: else6: Bk ←Ak−17: end if8: Ak ←Bk ×2 . Double9: end for

10: C =M ×K =Bw−1


Exercise #2: SPA on DES key schedule

Software implementation on a 8 bits CPULeft rotate by 1 position of 28-bits half-keyEight bits accumulator AOne bit carry flag C

C

C

mem[a]mem[a+3] mem[a+2] mem[a+1]

A[7..0]

A[7..0]

0

Shift right (LSR)

Rotate left (ROL)

28 bits half-key

1: CLC; . C ← 02: LDA a; . A←mem(a)3: ROL; . C||A←A||C4: STA a; . mem(a)←A5: LDA a+1; ROL; STA a+1;6: LDA a+2; ROL; STA a+2;7: LDA a+3; ROL; STA a+3;8: AND #$1F; . A← 000||A[4..0]9: LSR; . C||A←A[0]||0||A[7..1]

10: LSR; LSR; LSR; . A← 000||A[7..3]11: ORA a; . A←Aormem(a)12: STA a;


Power attacks / Differential Power Analysis (DPA)


P. Kocher attack on DES (1/2)

Last round of DESR (known) + guess g on Key

• ⇒ value L(g,R)

N pairs of power traces and cipher texts• T[i],C[i]

Split traces in subsets that shouldexhibit a different power consumption:

• S0 = {T[i] | L(g,R[i])(j)= 0}• S1 = {T[i] | L(g,R[i])(j)= 1}• score(g)=E(S0)−E(S1)

L R

SBox

Ciphertext

Key4

4

4

10

6 6

6

6


P. Kocher attack on DES (2/2)

The larger the difference, the more likelythe guessHence the name “Differential PowerAnalysis” (DPA)


Power attacks / Wrap up on DPA


Wrap up on DPA (1/3)

? Where does it come from?X Power: computing requires energy

Warning: ones do not consume more than zeros: CMOS logic has (almost) no staticpower consumptionTransitions consume power (dynamic power consumption)Guesses on the previous and present value of a node can be cross-checked with thepower trace



















? How does it work? Two phases:X Acquisition phase: same secret, different input messages, sufficient number of

experiments => a database of input (output) messages, power trace pairsX Analysis phase (usually off-line)

X Attacker builds “power model” PM of target implementationX For a given input (output) message PM gives estimate of processing power (all of it or only a

portion)X PM depends on guess g on the secret key: there are as many PMg models as guesses gX PMg model that best matches actual power traces is the one of most likely guess g for secret

key







key







key







key







key







key







key



Guesses on the present value only can be cross-checked too if rising and fallingtransitions consume differently (I↑− I↓ = ε) and previous value is uncorrelated

• If the guess is 0 (1) then we had a falling (rising) transition with probability 1/2 and notransition with probability 1/2

• On a large number of traces the average difference should be ε/2

Two types of attacks:• Hamming distances between two successive states of a node or set of nodes• Hamming weights of a node or set of nodes
































Power attacks / Exercises on DPA


Exercises on DPA (1/2)

? Exercise #3: most efficient power model?? Hamming weight (based on current state only)? Hamming distance (based on previous and current states)

? Exercise #4: differences with timing attacks?? Exercise #5: DPA easier than TA? Why?? Exercise #6: hypotheses?















? Exercise #7: countermeasures?? Exercise #8: what if energy depends only on key?? Exercise #9: what if energy depends only on input messages?? Exercise #10: what if energy depends neither on key nor on input messages?











Power attacks / Conclusion


Homework on Power Attacks

Read and understand every detail of original paper by P. KocherImagine attack against hardware DES implementation, describe in deep details youralgorithm:

• Ciphertexts are known• L0R0, . . . ,L15R15,R16L16 values stored successively in same 64 bits register• Attacker monitors current on power supply side

Find way to blind DES


Questions?


Documents

Side channel attacks - EURECOMsoc.eurecom.fr/HWSec/lectures/side_channels/main.pdf · 2020. 4. 30. · Side channel attacks R. Pacalet [email protected] April 30,