94
George Mason University Hardware Architectures of Secret-Key Block Ciphers and Hash Functions ECE 545 Lecture 8b

ECE 545 Lecture 8b Hardware Architectures of Secret-Key ... · - outer-round pipelining - inner-round pipelining - mixed inner and outer-round pipelining Area Throughput basic architecture

  • Upload
    others

  • View
    8

  • Download
    0

Embed Size (px)

Citation preview

Page 1: ECE 545 Lecture 8b Hardware Architectures of Secret-Key ... · - outer-round pipelining - inner-round pipelining - mixed inner and outer-round pipelining Area Throughput basic architecture

George Mason University

Hardware Architectures of Secret-Key Block Ciphers

and Hash Functions

ECE 545 Lecture 8b

Page 2: ECE 545 Lecture 8b Hardware Architectures of Secret-Key ... · - outer-round pipelining - inner-round pipelining - mixed inner and outer-round pipelining Area Throughput basic architecture

2

Recommended reading

•  K. Gaj and P. Chodowiec, FPGA and ASIC Implementations of AES, Chapter 10 in C.K. Koc (Ed.), Cryptographic Engineering Section 10.4 Parameters of Hardware Implementations Section 10.5 Hardware Architectures of Symmetric Block Ciphers

Page 3: ECE 545 Lecture 8b Hardware Architectures of Secret-Key ... · - outer-round pipelining - inner-round pipelining - mixed inner and outer-round pipelining Area Throughput basic architecture

3

Recommended reading E. Homsirikamol, M. Rogawski, and K. Gaj, "Throughput vs. Area Trade-offs in High-Speed Architectures of Five Round 3 SHA-3 Candidates Implemented Using Xilinx and Altera FPGAs," in LNCS 6917, Cryptographic Hardware and Embedded Systems - CHES 2011, Nara, Japan, Sep. 28-Oct. 1, pp. 491-506. Sections 1-4.

Page 4: ECE 545 Lecture 8b Hardware Architectures of Secret-Key ... · - outer-round pipelining - inner-round pipelining - mixed inner and outer-round pipelining Area Throughput basic architecture

Secret-key Ciphers

Page 5: ECE 545 Lecture 8b Hardware Architectures of Secret-Key ... · - outer-round pipelining - inner-round pipelining - mixed inner and outer-round pipelining Area Throughput basic architecture

Cipher

message

ciphertext

cryptographic key

N bits

N bits

K bits

Page 6: ECE 545 Lecture 8b Hardware Architectures of Secret-Key ... · - outer-round pipelining - inner-round pipelining - mixed inner and outer-round pipelining Area Throughput basic architecture

Current American Standards AES vs. Triple DES

Triple DES AES

3 DES

64 bits

input

64 bits

output

key

168 bits AES

128 bits

input

128 bits output

key

128, 192, and 256 bits

Page 7: ECE 545 Lecture 8b Hardware Architectures of Secret-Key ... · - outer-round pipelining - inner-round pipelining - mixed inner and outer-round pipelining Area Throughput basic architecture

Initial transformation

Final transformation

#rounds times

Round Key[i] i:=i+1

Round Key[0]

i:=1

i<#rounds?

Cipher Round

Round Key[#rounds+1]

Typical Flow Diagram of a Secret-Key Block Cipher

Page 8: ECE 545 Lecture 8b Hardware Architectures of Secret-Key ... · - outer-round pipelining - inner-round pipelining - mixed inner and outer-round pipelining Area Throughput basic architecture

key scheduling

encryption/decryption

memory of internal keys

output

input/key

input interface

output interface

Control unit

control

Top level block diagram

Page 9: ECE 545 Lecture 8b Hardware Architectures of Secret-Key ... · - outer-round pipelining - inner-round pipelining - mixed inner and outer-round pipelining Area Throughput basic architecture

key expansion encryption/

decryption

memory of internal keys

output

input

input interface

output interface

Control unit

control

key setup

key scheduling

key

Page 10: ECE 545 Lecture 8b Hardware Architectures of Secret-Key ... · - outer-round pipelining - inner-round pipelining - mixed inner and outer-round pipelining Area Throughput basic architecture

Primary parameters of hardware implementations for secret-key block ciphers

Latency Throughput

Encryption/ decryption

Time to encrypt/decrypt a single block

of data

Mi

Ci Number of bits

encrypted/decrypted in a unit of time

Encryption/ decryption

Mi Mi+1 Mi+2

Ci Ci+1 Ci+2

Throughput = Block_size · Number_of_blocks_processed_simultaneously Latency

Page 11: ECE 545 Lecture 8b Hardware Architectures of Secret-Key ... · - outer-round pipelining - inner-round pipelining - mixed inner and outer-round pipelining Area Throughput basic architecture

Encryption time

Latency (Message_size –Block_size)

Time

Message size

Dependence of the encryption time on latency and throughput

Throughput

Page 12: ECE 545 Lecture 8b Hardware Architectures of Secret-Key ... · - outer-round pipelining - inner-round pipelining - mixed inner and outer-round pipelining Area Throughput basic architecture

register

combinational logic

one round

multiplexer

Basic iterative architecture

round key

Page 13: ECE 545 Lecture 8b Hardware Architectures of Secret-Key ... · - outer-round pipelining - inner-round pipelining - mixed inner and outer-round pipelining Area Throughput basic architecture

register

combinational logic

one round

multiplexer

Basic iterative architecture

Page 14: ECE 545 Lecture 8b Hardware Architectures of Secret-Key ... · - outer-round pipelining - inner-round pipelining - mixed inner and outer-round pipelining Area Throughput basic architecture

IN

OUT

P1

C1

P2

C2

P3

Basic architecture: Timing

#rounds · clock_period

CLK

Page 15: ECE 545 Lecture 8b Hardware Architectures of Secret-Key ... · - outer-round pipelining - inner-round pipelining - mixed inner and outer-round pipelining Area Throughput basic architecture

Block vs. stream ciphers

Stream cipher

Internal state - IS Block cipher

K K

M1, M2, …, Mn m1, m2, …, mn

C1, C2, …, Cn c1, c2, …, cn

Ci=fK(Mi) ci = fK(mi, ISi) ISi+1=gK(mi, ISi)

Every block of ciphertext is a function of only one

corresponding block of plaintext

Every block of ciphertext is a function of the current block

of plaintext and the current internal state of the cipher

Page 16: ECE 545 Lecture 8b Hardware Architectures of Secret-Key ... · - outer-round pipelining - inner-round pipelining - mixed inner and outer-round pipelining Area Throughput basic architecture

Typical stream cipher Sender Receiver

Pseudorandom Key Generator

mi

plaintext

ci

ciphertext

ki keystream

key initialization vector (seed)

Pseudorandom Key Generator

mi plaintext

ci

ciphertext

ki keystream

key initialization vector (seed)

Page 17: ECE 545 Lecture 8b Hardware Architectures of Secret-Key ... · - outer-round pipelining - inner-round pipelining - mixed inner and outer-round pipelining Area Throughput basic architecture

ECB (Electronic CodeBook) mode

Page 18: ECE 545 Lecture 8b Hardware Architectures of Secret-Key ... · - outer-round pipelining - inner-round pipelining - mixed inner and outer-round pipelining Area Throughput basic architecture

Electronic CodeBook Mode – ECB Encryption

M1 M2 M3

E

Ci = EK(Mi) for i=1..N

MN-1 MN

E E E E . . .

C1 C2 C3 CN-1 CN

K K K K K

Page 19: ECE 545 Lecture 8b Hardware Architectures of Secret-Key ... · - outer-round pipelining - inner-round pipelining - mixed inner and outer-round pipelining Area Throughput basic architecture

Electronic CodeBook Mode – ECB Decryption

C1 C2 C3

D

Ci = EK(Mi) for i=1..N

CN-1 CN

D D D D . . .

M1 M2 M3 MN-1 MN

K K K K K

Page 20: ECE 545 Lecture 8b Hardware Architectures of Secret-Key ... · - outer-round pipelining - inner-round pipelining - mixed inner and outer-round pipelining Area Throughput basic architecture

Counter Mode

Page 21: ECE 545 Lecture 8b Hardware Architectures of Secret-Key ... · - outer-round pipelining - inner-round pipelining - mixed inner and outer-round pipelining Area Throughput basic architecture

Counter Mode - CTR Encryption

m1 m2 m3

E

ci = mi ⊕ ki ki = EK(IV+i-1) for i=1..N

mN-1 mN

. . .

E E E E . . .

c1 c2 c3 cN-1 cN

IV IV+1 IV+2 IV+N-2 IV+N-1

k1 k2 k3 kN-1 kN

K K K K K

Page 22: ECE 545 Lecture 8b Hardware Architectures of Secret-Key ... · - outer-round pipelining - inner-round pipelining - mixed inner and outer-round pipelining Area Throughput basic architecture

Counter Mode - CTR Decryption

c1 c2 c3

E

mi = ci ⊕ ki ki = EK(IV+i-1) for i=1..N

cN-1 cN

. . .

E E E E . . .

m1 m2 m3 mN-1 mN

IV IV+1 IV+2 IV+N-2 IV+N-1

k1 k2 k3 kN-1 kN

K K K K K

Page 23: ECE 545 Lecture 8b Hardware Architectures of Secret-Key ... · - outer-round pipelining - inner-round pipelining - mixed inner and outer-round pipelining Area Throughput basic architecture

Counter Mode - CTR

E K

IN

OUT

counter

IV

1 L

ci

mi

E K

IN

OUT

counter

IV

1 L

ci

mi

1 L 1 L

IS1 = IV ci = EK(ISi) ⊕ mi ISi+1 = ISi+1

Page 24: ECE 545 Lecture 8b Hardware Architectures of Secret-Key ... · - outer-round pipelining - inner-round pipelining - mixed inner and outer-round pipelining Area Throughput basic architecture

CBC (Cipher Block Chaining) Mode

Page 25: ECE 545 Lecture 8b Hardware Architectures of Secret-Key ... · - outer-round pipelining - inner-round pipelining - mixed inner and outer-round pipelining Area Throughput basic architecture

Cipher Block Chaining Mode - CBC Encryption

m1 m2 m3

E

IV

ci = EK(mi ⊕ ci-1) for i=1..N c0=IV

mN-1 mN . . .

E E E E . . .

c1 c2 c3 cN-1 cN

Page 26: ECE 545 Lecture 8b Hardware Architectures of Secret-Key ... · - outer-round pipelining - inner-round pipelining - mixed inner and outer-round pipelining Area Throughput basic architecture

Cipher Block Chaining Mode - CBC Decryption

mi = DK(ci) ⊕ ci-1 for i=1..N c0=IV

m1 m2 m3 mN-1 mN

IV . . .

D D D D D . . .

c1 c2 c3 cN-1 cN

Page 27: ECE 545 Lecture 8b Hardware Architectures of Secret-Key ... · - outer-round pipelining - inner-round pipelining - mixed inner and outer-round pipelining Area Throughput basic architecture

Primary factor in choosing the encryption/decryption unit architecture

Symmetric-key cipher mode of operation:

1. Non-feedback cipher modes

ECB, counter mode

2. Feedback cipher modes

CBC, CFB, OFB

Page 28: ECE 545 Lecture 8b Hardware Architectures of Secret-Key ... · - outer-round pipelining - inner-round pipelining - mixed inner and outer-round pipelining Area Throughput basic architecture

Non-feedback Counter Mode - CTR

M0 M1 M2

E

Ci = Mi ⊕ AES(IV+i) for i=0..N

MN-1 MN

. . .

E E E E . . .

C1 C2 C3 CN-1 CN

IV IV+1 IV+2 IV+N-1 IV+N

Page 29: ECE 545 Lecture 8b Hardware Architectures of Secret-Key ... · - outer-round pipelining - inner-round pipelining - mixed inner and outer-round pipelining Area Throughput basic architecture

Feedback cipher modes - CBC M1 M2 M3

E

IV

C1 = AES(Mi ⊕ IV) Ci = AES(Mi ⊕ Ci-1) for i=2..N

MN-1 MN . . .

E E E E . . .

C1 C2 C3 CN-1 CN

Page 30: ECE 545 Lecture 8b Hardware Architectures of Secret-Key ... · - outer-round pipelining - inner-round pipelining - mixed inner and outer-round pipelining Area Throughput basic architecture

Feedback cipher modes CBC, CFB, OFB

Page 31: ECE 545 Lecture 8b Hardware Architectures of Secret-Key ... · - outer-round pipelining - inner-round pipelining - mixed inner and outer-round pipelining Area Throughput basic architecture

combinational logic

k rounds

register

multiplexer

round 1 round 2

round k . . . . .

k-rounds Loop Unrolling

Page 32: ECE 545 Lecture 8b Hardware Architectures of Secret-Key ... · - outer-round pipelining - inner-round pipelining - mixed inner and outer-round pipelining Area Throughput basic architecture

Loop Unrolling: Timing

IN

OUT

P1

C1

P2

C2

P3

#rounds/k · extended_clock_period

CLK

Page 33: ECE 545 Lecture 8b Hardware Architectures of Secret-Key ... · - outer-round pipelining - inner-round pipelining - mixed inner and outer-round pipelining Area Throughput basic architecture

speed

area

k=2 k=3 k=4 k=5

loop-unrolling basic architecture

Loop Unrolling: Speed vs. Area

speed = speed basic 1 + τ

1 + τ / k

τ << 1

Page 34: ECE 545 Lecture 8b Hardware Architectures of Secret-Key ... · - outer-round pipelining - inner-round pipelining - mixed inner and outer-round pipelining Area Throughput basic architecture

combinational logic

MUX register

one round

Architectures suitable for feedback modes

round K . . . .

round 1

round 2

MUX

round #rounds

. . . .

round 1

round 2

Page 35: ECE 545 Lecture 8b Hardware Architectures of Secret-Key ... · - outer-round pipelining - inner-round pipelining - mixed inner and outer-round pipelining Area Throughput basic architecture

Decreasing area by resource sharing

F F

D0 D1

D0’ D1’

F

D0 D1

D0’ D1’

multiplexer

Before After

register

Page 36: ECE 545 Lecture 8b Hardware Architectures of Secret-Key ... · - outer-round pipelining - inner-round pipelining - mixed inner and outer-round pipelining Area Throughput basic architecture

Throughput

Area

basic architecture

Resource Sharing: Speed vs. Area

- basic architecture

- resource sharing

resource sharing

Page 37: ECE 545 Lecture 8b Hardware Architectures of Secret-Key ... · - outer-round pipelining - inner-round pipelining - mixed inner and outer-round pipelining Area Throughput basic architecture

Non-Feedback Cipher Modes ECB, counter, OCB

Page 38: ECE 545 Lecture 8b Hardware Architectures of Secret-Key ... · - outer-round pipelining - inner-round pipelining - mixed inner and outer-round pipelining Area Throughput basic architecture

Comparison for non-feedback cipher modes, e.g. Counter Mode - CTR

M0 M1 M2

E

Ci = Mi ⊕ AES(IV+i) for i=0..N

MN-1 MN

. . .

E E E E . . .

C1 C2 C3 CN-1 CN

IV IV+1 IV+2 IV+N-1 IV+N

Page 39: ECE 545 Lecture 8b Hardware Architectures of Secret-Key ... · - outer-round pipelining - inner-round pipelining - mixed inner and outer-round pipelining Area Throughput basic architecture

E

IV

E

C1

M1

Z1

Z1

E

C2

M2

Z2

Z2

E

CN-1

MN-1

ZN-1

ZN-1

E

CN

MN

ZN

MN

. . . L

R

length

g(L)

Zi=f(L, R)

E

0

E

T

ZN

τ bits

Control sum

OCB

Page 40: ECE 545 Lecture 8b Hardware Architectures of Secret-Key ... · - outer-round pipelining - inner-round pipelining - mixed inner and outer-round pipelining Area Throughput basic architecture

Increasing speed by parallel processing

Encryption/ decryption

unit

Encryption/ decryption

unit

Encryption/ decryption

unit

Encryption/ decryption

unit

Encryption/ decryption

unit

Encryption/ decryption

unit

Page 41: ECE 545 Lecture 8b Hardware Architectures of Secret-Key ... · - outer-round pipelining - inner-round pipelining - mixed inner and outer-round pipelining Area Throughput basic architecture

Increasing speed using pipelining

Cipher 1 Cipher 2

round 1 round 1

round 2

round 10

. . .

round 16

. . .

Speed = target_clock_period

block size

target clock period, e.g., 20 ns

Page 42: ECE 545 Lecture 8b Hardware Architectures of Secret-Key ... · - outer-round pipelining - inner-round pipelining - mixed inner and outer-round pipelining Area Throughput basic architecture

Pipelined operation of the encryption unit

B1

clock cycle 1

B2

2

B1 B3

3

B2 B1

B4

4

B3 B2 B1

B5

5

B4 B3 B2

B6

6

B5 B4 B3

B7

7

B6 B5 B4

B8 B7 B6 B5

8

B13 B12 B11 B10

B14 B13 B12 B11

B15 B14 B13 B12

B16 B15 B14 B13

B9 B8 B7 B6

B10 B9 B8 B7

B11 B10 B9 B8

B12 B11 B10 B9

clock cycle 9 10 11 12 13 14 15 16

Page 43: ECE 545 Lecture 8b Hardware Architectures of Secret-Key ... · - outer-round pipelining - inner-round pipelining - mixed inner and outer-round pipelining Area Throughput basic architecture

. . . .

#rounds registers

round #rounds = one pipeline stage

round 1 = one pipeline stage

round 2 = one pipeline stage

Full outer-round pipelining

Total # of pipeline stages = #rounds

Page 44: ECE 545 Lecture 8b Hardware Architectures of Secret-Key ... · - outer-round pipelining - inner-round pipelining - mixed inner and outer-round pipelining Area Throughput basic architecture

Full mixed inner- and outer-round pipelining

round #rounds =k pipeline stages

. . . .

round 1 = k pipeline stages

round 2 =k pipeline stages

. . . .

. . . .

. . . .

k registers

Total # of pipeline stages = #rounds·k

Page 45: ECE 545 Lecture 8b Hardware Architectures of Secret-Key ... · - outer-round pipelining - inner-round pipelining - mixed inner and outer-round pipelining Area Throughput basic architecture

k rounds

register1

register2

register k . . . .

pipeline stage 1 = round 1

pipeline stage 2 = round 2

pipeline stage k = round k

multiplexer

k-stage Outer-Round Pipelining

Page 46: ECE 545 Lecture 8b Hardware Architectures of Secret-Key ... · - outer-round pipelining - inner-round pipelining - mixed inner and outer-round pipelining Area Throughput basic architecture

IN

OUT

P1

C1

P2

C3

P3

Outer-Round Pipelining: Timing

#rounds · clock_period

CLK

P4 P5 P6

C2 C4

Page 47: ECE 545 Lecture 8b Hardware Architectures of Secret-Key ... · - outer-round pipelining - inner-round pipelining - mixed inner and outer-round pipelining Area Throughput basic architecture

speed

area

outer-round pipelining non-feedback modes

basic architecture

k=2

k=3

k=4

k=5

Outer-Round Pipelining: Speed vs. Area

outer-round pipelining feedback modes

Page 48: ECE 545 Lecture 8b Hardware Architectures of Secret-Key ... · - outer-round pipelining - inner-round pipelining - mixed inner and outer-round pipelining Area Throughput basic architecture

round #rounds = one pipeline stage

. . . .

round 1 = one pipeline stage

round 2 = one pipeline stage

K registers

round K = one pipeline stage

. . . .

round 1 = one pipeline stage

round 2 = one pipeline stage

MUX K registers

combinational logic

MUX register

one round, no pipelining

Outer-round Pipelining

Page 49: ECE 545 Lecture 8b Hardware Architectures of Secret-Key ... · - outer-round pipelining - inner-round pipelining - mixed inner and outer-round pipelining Area Throughput basic architecture

one round

register1

register2

register k . . . .

pipeline stage 1

pipeline stage 2

pipeline stage k

multiplexer

k-stage Inner-Round Pipelining

Page 50: ECE 545 Lecture 8b Hardware Architectures of Secret-Key ... · - outer-round pipelining - inner-round pipelining - mixed inner and outer-round pipelining Area Throughput basic architecture

IN

OUT

P1

C1

P2

C3

P3

Inner-Round Pipelining: Timing

#rounds · (k · reduced_clock_period)

CLK

P4 P5 P6

C2 C4

Page 51: ECE 545 Lecture 8b Hardware Architectures of Secret-Key ... · - outer-round pipelining - inner-round pipelining - mixed inner and outer-round pipelining Area Throughput basic architecture

speed

area

inner-round pipelining non-feedback modes

basic architecture k=2

k=3

k=4

k=5

inner-round pipelining feedback modes

Inner-Round Pipelining: Speed vs. Area

Page 52: ECE 545 Lecture 8b Hardware Architectures of Secret-Key ... · - outer-round pipelining - inner-round pipelining - mixed inner and outer-round pipelining Area Throughput basic architecture

Mixed Inner- and Outer-round Pipelining

round #rounds =k pipeline stages

. . . .

round 1 = k pipeline stages

round 2 =k pipeline stages

. . . .

. . . .

. . . .

d) k registers

round K = k pipeline stages

. . . .

round 1 = k pipeline stages

round 2 = k pipeline stages

MUX

. . . .

. . . .

. . . .

c)

k registers

one round = k pipeline stages

MUX

. . . .

b) k registers

one round, no pipelining

MUX

a) register

combinational logic

Page 53: ECE 545 Lecture 8b Hardware Architectures of Secret-Key ... · - outer-round pipelining - inner-round pipelining - mixed inner and outer-round pipelining Area Throughput basic architecture

- basic architecture - outer-round pipelining

- inner-round pipelining - mixed inner and outer-round pipelining

Area

Throughput

basic architecture

inner-round pipelining

mixed inner and outer-round pipelining

outer-round pipelining

K=2 K=3

K=4

K=2

K=3

k=2

kopt

Comparison of the traditional and new design methodologies

Page 54: ECE 545 Lecture 8b Hardware Architectures of Secret-Key ... · - outer-round pipelining - inner-round pipelining - mixed inner and outer-round pipelining Area Throughput basic architecture

Area [CLB slices]

Speed [Mbit/s]

Choosing optimum architecture for non-feedback cipher modes

basic architecture

inner-round pipelining

mixed inner and outer-round pipelining

Page 55: ECE 545 Lecture 8b Hardware Architectures of Secret-Key ... · - outer-round pipelining - inner-round pipelining - mixed inner and outer-round pipelining Area Throughput basic architecture

Area

Latency

basic architecture

inner-round pipelining

mixed inner and outer-round pipelining

outer-round pipelining

K=2 K=4 K=3 K=5

K=2 K=3

k=2

kopt

Latency vs. area dependence for the new design methodology

- basic architecture - outer-round pipelining

- inner-round pipelining - mixed inner and outer-round pipelining

Page 56: ECE 545 Lecture 8b Hardware Architectures of Secret-Key ... · - outer-round pipelining - inner-round pipelining - mixed inner and outer-round pipelining Area Throughput basic architecture

r1 r2 r1

r1 r2 r1 r3 r4

op1 op2 op3 op4 op5

op1 op2 op3 op4 op5

TCLKmin

TCLKmin

Limits on the minimum clock period after pipelining (1)

1. Delay of a single round divided by k = number of internal pipeline stages

k=2

2. Delay of the longest indivisible operation

k=4

Page 57: ECE 545 Lecture 8b Hardware Architectures of Secret-Key ... · - outer-round pipelining - inner-round pipelining - mixed inner and outer-round pipelining Area Throughput basic architecture

r1 r2 r1

cntr1 cntr3 cntr1

rc Control Unit

op1 op3 op4 op5

TCLKmin

op2

Limits on the minimum clock period after pipelining (2)

3. Delays within the control unit

4. Maximum latency

5. Maximum input/output bandwidth

cntr2 cntr4

Page 58: ECE 545 Lecture 8b Hardware Architectures of Secret-Key ... · - outer-round pipelining - inner-round pipelining - mixed inner and outer-round pipelining Area Throughput basic architecture

DES"encryption/decryption"

core"

clock reset

encrypt/decrypt

data input data available data read

64

key input

key available key read

64

IV input IV available IV read

64

Key memory"

Key schedule"

data output

write full

64

56 key

4 4

read bank"write bank"

round number" round key"4 48

DES/3DES

key choice"2

Page 59: ECE 545 Lecture 8b Hardware Architectures of Secret-Key ... · - outer-round pipelining - inner-round pipelining - mixed inner and outer-round pipelining Area Throughput basic architecture

AES"encryption/decryption"

core"

clock reset

encrypt/decrypt

data input data available data read

128

key input key available

key read

64

IV input IV available IV read

128!

Key schedule!

Key memory"

data output

write full

128!

64! Key material!

4"4

read bank"write bank"

round number" round key!4 128!

DES/3DES

Cycle number! 6!

2!key size!

Page 60: ECE 545 Lecture 8b Hardware Architectures of Secret-Key ... · - outer-round pipelining - inner-round pipelining - mixed inner and outer-round pipelining Area Throughput basic architecture

speed

area

k=2 k=3 k=4 k=5

outer-round pipelining

inner-round pipelining

loop-unrolling

resource sharing

- basic architecture - loop unrolling

- inner-round pipelining - outer-round pipelining

- resource sharing

basic architecture

k=2

k=3

k=4

k=2

k=3

k=4

k=5

k=5

Performance of alternative architectures: in non-feedback cipher modes (ECB, counter)

Page 61: ECE 545 Lecture 8b Hardware Architectures of Secret-Key ... · - outer-round pipelining - inner-round pipelining - mixed inner and outer-round pipelining Area Throughput basic architecture

speed

area

k=2 k=3 k=4 k=5

outer-round pipelining inner-round pipelining

loop-unrolling

resource sharing

basic architecture

Performance of alternative architectures: in feedback cipher modes (CBC, CFB, OFB)

- basic architecture - loop unrolling

- inner-round pipelining - outer-round pipelining

- resource sharing

Page 62: ECE 545 Lecture 8b Hardware Architectures of Secret-Key ... · - outer-round pipelining - inner-round pipelining - mixed inner and outer-round pipelining Area Throughput basic architecture

Hash Functions

Page 63: ECE 545 Lecture 8b Hardware Architectures of Secret-Key ... · - outer-round pipelining - inner-round pipelining - mixed inner and outer-round pipelining Area Throughput basic architecture

Hash Function

arbitrary length

message

hash function

hash value h(m)

h

m

fixed length

It is computationally infeasible to find such

m and m’ that h(m)=h(m’)

Collision Resistance:

Page 64: ECE 545 Lecture 8b Hardware Architectures of Secret-Key ... · - outer-round pipelining - inner-round pipelining - mixed inner and outer-round pipelining Area Throughput basic architecture

Message

Hash function

Public key cipher

Alice Signature

Alice’s private key

Bob

Hash function

Alice’s public key

Hash Functions in Digital Signature Schemes

Hash value 1

Hash value 2

Hash value

Public key cipher

yes no

Message Signature

Page 65: ECE 545 Lecture 8b Hardware Architectures of Secret-Key ... · - outer-round pipelining - inner-round pipelining - mixed inner and outer-round pipelining Area Throughput basic architecture

General scheme for constructing a secure hash function: Merkle–Damgård scheme

Message m

Padding, appending bit length, M

M1

IV H0 H1 H2 f f . . .

Ht

compression function

output transformation

h(m) g

M2 Mt . . .

Page 66: ECE 545 Lecture 8b Hardware Architectures of Secret-Key ... · - outer-round pipelining - inner-round pipelining - mixed inner and outer-round pipelining Area Throughput basic architecture

Sponge Scheme

Page 67: ECE 545 Lecture 8b Hardware Architectures of Secret-Key ... · - outer-round pipelining - inner-round pipelining - mixed inner and outer-round pipelining Area Throughput basic architecture

R

H

+

IV

CLR

Wt Kt Step t

Basic iterative architecture of SHA-1 and SHA-2

OUT

Page 68: ECE 545 Lecture 8b Hardware Architectures of Secret-Key ... · - outer-round pipelining - inner-round pipelining - mixed inner and outer-round pipelining Area Throughput basic architecture

Data Stream 1 . . . . . . . . Data Stream k

. . . . .

R

H

+

IV

CLR

Wt KtStep t

R

H

+

IV

CLR

Wt KtStep t

Architecture with Multiple Processing Units

R

H

+

IV

CLR

Wt KtStep t

R

H

+

IV

CLR

Wt KtStep t

Page 69: ECE 545 Lecture 8b Hardware Architectures of Secret-Key ... · - outer-round pipelining - inner-round pipelining - mixed inner and outer-round pipelining Area Throughput basic architecture

Features of architecture with multiple processing units

" Pros —  Throughput increases by a factor of k

" Cons —  Latency the same as for the basic architecture

—  Area increases by a factor of k

—  Requires k independent data streams (messages)

Page 70: ECE 545 Lecture 8b Hardware Architectures of Secret-Key ... · - outer-round pipelining - inner-round pipelining - mixed inner and outer-round pipelining Area Throughput basic architecture

Pipelined architecture

H

+

R 1

IV

K t W t

H

IV

R 2 step t, stage 1

step t, stage 2

Page 71: ECE 545 Lecture 8b Hardware Architectures of Secret-Key ... · - outer-round pipelining - inner-round pipelining - mixed inner and outer-round pipelining Area Throughput basic architecture

Features of the pipelined architecture

" Pros —  Throughput increases by a factor close to k

—  Area increases by a factor smaller than k

" Cons —  Latency the same as for the basic architecture

—  Requires k independent data streams (messages)

Page 72: ECE 545 Lecture 8b Hardware Architectures of Secret-Key ... · - outer-round pipelining - inner-round pipelining - mixed inner and outer-round pipelining Area Throughput basic architecture

R

H

+

IV

CLR

. . . .

Wt Wt+1

Wt+k-1

Kt Kt+1

Kt+k-1

Step t Step t+1

Step t+k-1

. . . .

Unrolled architecture

Page 73: ECE 545 Lecture 8b Hardware Architectures of Secret-Key ... · - outer-round pipelining - inner-round pipelining - mixed inner and outer-round pipelining Area Throughput basic architecture

Features of the unrolled architecture

" Pros —  Reduces both latency and throughput

—  Requires only one data stream

" Cons —  Area may increase substantially compared

to the basic architecture

Page 74: ECE 545 Lecture 8b Hardware Architectures of Secret-Key ... · - outer-round pipelining - inner-round pipelining - mixed inner and outer-round pipelining Area Throughput basic architecture

74

•  datapath width = state size •  one clock cycle per one round

Starting Point: Basic Iterative Architecture

Currently, most common architecture used to implement SHA-1, SHA-2, and many other hash functions.

Throughput

Area A

Th x1

Page 75: ECE 545 Lecture 8b Hardware Architectures of Secret-Key ... · - outer-round pipelining - inner-round pipelining - mixed inner and outer-round pipelining Area Throughput basic architecture

75

•  datapath width = state size •  two clock cycles per one round

Horizontal Folding - /2(h)

Typically Throughput/Area ratio increases

Throughput

Area A

Th x1 /2(h)

Page 76: ECE 545 Lecture 8b Hardware Architectures of Secret-Key ... · - outer-round pipelining - inner-round pipelining - mixed inner and outer-round pipelining Area Throughput basic architecture

76

•  datapath width = state size/2 •  two clock cycles per one round/step

Vertical Folding - /2(v)

Throughput

Area A

Th x1

/2(v)

Typically Throughput/Area ratio decreases

Page 77: ECE 545 Lecture 8b Hardware Architectures of Secret-Key ... · - outer-round pipelining - inner-round pipelining - mixed inner and outer-round pipelining Area Throughput basic architecture

77

Vertical Folding with the State Kept in Memory BLAKE /4(h)/4(v)-m

•  datapath width = state size/4 •  16 clock cycles per one round

Throughput/Area ratio increases

Throughput

Area A

Th x1

/4(h)/4(v)-m

Page 78: ECE 545 Lecture 8b Hardware Architectures of Secret-Key ... · - outer-round pipelining - inner-round pipelining - mixed inner and outer-round pipelining Area Throughput basic architecture

78

•  datapath width = state size •  one clock cycle per two rounds

Unrolling - x2

Typically Throughput/Area ratio decreases

Throughput

Area A

Th

2A

x1

x2

Page 79: ECE 545 Lecture 8b Hardware Architectures of Secret-Key ... · - outer-round pipelining - inner-round pipelining - mixed inner and outer-round pipelining Area Throughput basic architecture

79

•  datapath width = state size •  one clock cycle per two rounds

Efficient Unrolling - x2

Throughput

Area A

Th

2A

x1 x2

x2-efficient

Sometimes Throughput/Area ratio increases

,

,,

Page 80: ECE 545 Lecture 8b Hardware Architectures of Secret-Key ... · - outer-round pipelining - inner-round pipelining - mixed inner and outer-round pipelining Area Throughput basic architecture

80

Multiple Packets Available for Parallel Processing

PACKETS

1500B576B576B 64B

64B

1500B1500B64B 64B

64B 40B576B 40B

576B 576B40B N1

N2

N3

NX

PACKETS

PACKETS

PACKETS

Typical sizes of packets: 40B – 1500B 1500 B = Maximum Transmission Unit (MTU) for Ethernet v2

Page 81: ECE 545 Lecture 8b Hardware Architectures of Secret-Key ... · - outer-round pipelining - inner-round pipelining - mixed inner and outer-round pipelining Area Throughput basic architecture

81

Parallel Processing Using Multi-Unit Architecture – MU2

Throughput

Area A

Th

2A

2Th

Typically Throughput/Area ratio stays the same

x1

MU2

Page 82: ECE 545 Lecture 8b Hardware Architectures of Secret-Key ... · - outer-round pipelining - inner-round pipelining - mixed inner and outer-round pipelining Area Throughput basic architecture

82

Unrolled Architecture with Pipelining - x2-PPL2

Throughput

Area A

Th

2A

2Th

x1

x2-PPL2

Typically Throughput/Area ratio stays almost the same

Page 83: ECE 545 Lecture 8b Hardware Architectures of Secret-Key ... · - outer-round pipelining - inner-round pipelining - mixed inner and outer-round pipelining Area Throughput basic architecture

83

Basic Architecture with Pipelining - x1-PPL2

Throughput

Area A

Th

2A

2Th

x1

x1-PPL2

Typically Throughput/Area ratio increases

Page 84: ECE 545 Lecture 8b Hardware Architectures of Secret-Key ... · - outer-round pipelining - inner-round pipelining - mixed inner and outer-round pipelining Area Throughput basic architecture

84

BLAKE-256 in Virtex 5

Page 85: ECE 545 Lecture 8b Hardware Architectures of Secret-Key ... · - outer-round pipelining - inner-round pipelining - mixed inner and outer-round pipelining Area Throughput basic architecture

85

Why Interface Matters?

•  Pin limit

Total number of i/o ports ≤ Total number of an FPGA i/o pins

•  Support for the maximum throughput

Time to load the next message block ≤ Time to process previous block

Page 86: ECE 545 Lecture 8b Hardware Architectures of Secret-Key ... · - outer-round pipelining - inner-round pipelining - mixed inner and outer-round pipelining Area Throughput basic architecture

86

Interface: Two possible solutions

Length of the message communicated at the beginning

+ easy to implement passive source circuit − area overhead for the counter of message bits

Dedicated end of message port

− more intelligent source circuit required + no need for internal message bit counter

msg_bitlen

zero_word

message end_of_msg SHA core

Page 87: ECE 545 Lecture 8b Hardware Architectures of Secret-Key ... · - outer-round pipelining - inner-round pipelining - mixed inner and outer-round pipelining Area Throughput basic architecture

87

SHA Core: Interface & Typical Configuration

•  SHA core is an active component; surrounding FIFOs are passive and widely available •  Input interface is separate from an output interface •  Processing a current block, reading the next block, and storing a result for the previous message can be all done in parallel

fifoin_empty  

fifoin_read  

idata  w   w  

odata  

fifoout_full  

fifoout_write  

fifoin_full  

fifoin_write  

fifoout_empty  

fifoout_read  

Input  FIFO   SHA  core  

clk   rst  

ext_idata  

w  

ext_odata  din   dout  

src_ready  

src_read  

dst_ready  

dst_write  

din   dout  

full   empty  

write   read  

Output  FIFO  

din   dout  

full   empty  

write   read  

w  

clk   rst  

clk   rst   clk   rst  

clk   rst  

clk   rst  

Page 88: ECE 545 Lecture 8b Hardware Architectures of Secret-Key ... · - outer-round pipelining - inner-round pipelining - mixed inner and outer-round pipelining Area Throughput basic architecture

88

SHA Core Interface

w  

SHA  core  

din   dout  

src_ready  

src_read  

dst_ready  

dst_write  

clk   rst  

clk   rst  

w  

Page 89: ECE 545 Lecture 8b Hardware Architectures of Secret-Key ... · - outer-round pipelining - inner-round pipelining - mixed inner and outer-round pipelining Area Throughput basic architecture

89

Communication Protocol for Unpadded Messages

msg_bitlen

zero_word −−−−−

message

w bits

.

.

.

seg_0_bitlen

zero_word

seg_0

w bits

seg_1_bitlen

seg_1

� � �

seg_n-1_bitlen

seg_n-1

a) b)

−−−−−

Page 90: ECE 545 Lecture 8b Hardware Architectures of Secret-Key ... · - outer-round pipelining - inner-round pipelining - mixed inner and outer-round pipelining Area Throughput basic architecture

90

Communication Protocol for Unpadded Messages Without Message Splitting

last = 1 | msg_len_bp

message

msg_len_bp – message length before padding [bits]

w bits

Page 91: ECE 545 Lecture 8b Hardware Architectures of Secret-Key ... · - outer-round pipelining - inner-round pipelining - mixed inner and outer-round pipelining Area Throughput basic architecture

91

Communication Protocol for Unpadded Messages With Message Splitting

.

.

.

last=0 | seg_0_len_bp

seg_0

w bits

last = 0 | seg_1_len_bp

seg_1

� � �

last = 1 | seg_n-1_len_bp

seg_n-1

seg_i_len_bp – segment i length before padding [bits]

* For all i < n-1 segment i length is assumed to be a multiple of the message block size, b [characteristic to each function], and thus also the word size, w. The last segment cannot consist of only padding bits. It must include at least one message bit.

Page 92: ECE 545 Lecture 8b Hardware Architectures of Secret-Key ... · - outer-round pipelining - inner-round pipelining - mixed inner and outer-round pipelining Area Throughput basic architecture

w  

SHA  core  

din   dout  

src_ready  

src_read  

dst_ready  

dst_write  

clk   rst  

clk   rst  

w  

io_clk  

io_clk  

Page 93: ECE 545 Lecture 8b Hardware Architectures of Secret-Key ... · - outer-round pipelining - inner-round pipelining - mixed inner and outer-round pipelining Area Throughput basic architecture

fifoin_empty  

fifoin_read  

idata  w   w  

odata  

fifoout_full  

fifoout_write  

fifoin_full  

fifoin_write  

fifoout_empty  

fifoout_read  

Input  FIFO   SHA  core  

clk   rst  

ext_idata  

w  

ext_odata  din   dout  

src_ready  

src_read  

dst_ready  

dst_write  

din   dout  

full   empty  

write   read  

Output  FIFO  

din   dout  

full   empty  

write   read  

w  

clk   rst  

io_clk   rst   io_clk   rst  

clk   rst  

clk   rst  

io_clk  

io_clk  

Page 94: ECE 545 Lecture 8b Hardware Architectures of Secret-Key ... · - outer-round pipelining - inner-round pipelining - mixed inner and outer-round pipelining Area Throughput basic architecture

fifoin_empty  

fifoin_read  

idata  w   w  

odata  

fifoout_full  

fifoout_write  

fifoin_full  

fifoin_write  

fifoout_empty  

fifoout_read  

Input  FIFO   SHA  core  

clk   rst  

ext_idata  

w  

ext_odata  din   dout  

src_ready  

src_read  

dst_ready  

dst_write  

din   dout  

full   empty  

write   read  

Output  FIFO  

din   dout  

full   empty  

write   read  

w  

clk   rst  

clk  or  io_clk   rst   clk  or  io_clk   rst  

clk   rst  

clk   rst  

io_clk  

io_clk