Upload
ahmad-rushdan
View
623
Download
8
Embed Size (px)
Citation preview
Comparison of Hardware Implementationsof S-box and T-box architectures of AES
Bhupathi Kakarlapudi and Nitin Alabur
ECE 746 : Secure Telecommunication systemsInstructor: Dr. Kris Gaj
2
Agenda
Introduction
Motivation
Overview of architectures
Implementations
Key Scheduling
Test vectors and tools used
Results
Conclusion
3
Introduction to AES
In 1997, NIST initiated a contest known as AES todevelop a Federal Information Processing Standard.
Standard Should be capable of protecting sensitivegovernment information well into the next centuary.
After 5 years of extensive analysis, Rijndael was chosenas the winner of the contest, and become a officialstandard in Nov. 2001
AES is expected to be used by U.S. Government and, onvoluntary basis by a private sector.
4
Motivation
AES T-Box implementations for decryption andcombined encryption decryption units in software showedbetter throughput, compared to S-box implementations insoftware.
This performance improvement was shown in hardwareon Altera Flex devices by Viktor Fischer and MilosDrutarovsky.
Our idea is to show the same performance improvementof T-box architecture in hardware on Xilinx FPGAfamilies Virtex 5 & Spartan 3E.
5
S-box vs T-box
S-box architecture uses 8 x 8 look-up tables and theremaining round operations for encryption/ decryptionoperations
T-box Architecture uses 8 x 32 look-up tables and theremaining XOR operations for encryption/decryptionoperations.
T-box architecture uses 4 times more memory than S-box.
(S-box :16 times 8 x8 ::: T-box: 16 times 8 x 32)
6
S-box Architecture Overview
This architecture structure is same as generalproposed architecture of AES.
Encryption starts with add round key, andperforms
Round Operations:
subbytes (uses 8 x 8 Look-up tables), shift rows, MixColumn and add roundkey.
Last round doesn’t include Mix column operation.
7
S-box Enc/Dec
Subbytes
Shift Rows
MixColumn
Plaintext
Ciphertext
K0
Ki
i<Nr
i=Nr
InvMixColumn
InvShift Rows
InvSubbytes
Ciphertext
Plaintext
KNr
Ki
i>=0
i=Nr
a) Encryption b) Decryption
Nr : Total Number of Rounds
8
T-box architecture overview
This architecture allows the computation of the entireround only using look-up tables and XOR operations.
Pre-computed look-up tables represent the combinedoperation of subbytes and mixcolumn transformations.
T-box tables are of size 8 x 32 bits.
Memory of T-box Table
One T-box Table: 256 x 32(4B) = 1KB
Four T-box tables = 4KB ( Fast Implementations)
9
Description of T-box Tables
S15S11S7S3
S14S10S6S2
S13S9S5S1
S12S8S4S0
State (128 bit)
.
02 03 01 01
01 02 03 01
01 01 02 03
03 01 01 02
S0
S1
S2
S3
=
02 * S0 03* S1 01* S2 01* S3
01 * S0 02* S1 03* S2 01* S3
01 * S0 01* S1 02* S2 03* S3
03 * S0 01* S1 01* S2 02* S3
First rows elements, s0, s4, s8, s12
Second rows elements, s1, s5, s9, s13
Mix Column Operation In AES
T0 T1 T2 T3
10
T-Box Tables
T0[a] =
02. S[a]
S[a]
S[a]
03.S[a]
T1[a] =
03. S[a]
02.S[a]
S[a]
S[a]
T2[a] =
S[a]
03.S[a]
02.S[a]
S[a]
T3[a] =
S[a]
S[a]
03.S[a]
02.S[a]
T0-1[a] =
0E. S[a]
09.S[a]
0D.S[a]
0B.S[a]
T1-1[a] =
0B. S[a]
0E.S[a]
09.S[a]
0D.S[a]
T2-1[a] =
0D.S[a]
0B.S[a]
0E.S[a]
09.S[a]
T3-1[a] =
09.S[a]
0D.S[a]
0B.S[a]
0E.S[a]
11
Round Operation Computation
e0, j
e1, j
e2, j
e3, j
= T0 [a0,j] T1 [a1,j+c1] T2 [a2, j+c2] T3 [a3, j+c3]
K0, j
K1, j
K2, j
k3, j
e0, j
e1, j
e2, j
e3, j
= T0 [a0,j] Rotbyte( T0 [a1,j+c1]) Rotbyte( T0 [a2, j+c2] Rotbyte( T0 [a3, j+c3]) Kj
j- indicates key word
Mod 4
12
T-box ArchitecturePlaintext
T Tables
Enc XOR Network
Derived Subbytes
Shift Rows
8 8 8 8..
32 3232 32
K[0]
Ki
KNr
128
Cipher text
128Ciphertext
T-1 Tables
Dec XOR Network
Derived InvSubbytes
InvShift Rows
8 8 8 8..
32 3232 32
K[Nr]
Inv Ki
K0
128
Plaintext
128
128 128
128 128
128 128
128
.. ..
a) Encryption b) Decryption
13
Modified Decryption in T-box
InvShiftRows
Inv Subbytes
Add RoundKey
InvMixcolumns
InvSubbytes
Inv Shiftrows
InvMixcolumn
Inv Add RoundKey
KNr KNr
a) Standard decryption round b) Modified decryption round
14
S-box Basic Iterative Architecture
SubBytes&
Inv Subbytes
Shift Rows
MixColumns
Shift Rows
InvMixColumns
Data input
Round key
R
Round keyRound key
Data Output
Decryption CircuitEncryption Circuit
Ref: Dr Gaj and Chodowiec Publication
15
S-box Basic Iterative Architecture(1)
This architecture can only encrypt one block of data at atime and number of clock cycles necessary toencrypt/decrypt is equal to the total number of cipherrounds.
Critical path is located in the decryption circuit andincludes Invshift rows-addroundkey-Inv Mixcolumns- 3-to-1 multiplexer - Inv subbytes.
This architecture takes 11,13 and 15 clock cycles toprocess data for key sizes 128,192 and 256
16
T-box Iterative architecture
Subbytes Inv subbytes
Shift rows Inv shiftrows
Data input
Round Key
Enc Unit Dec Unit
Enc round Dec round
Inv Round KeyRound Key
Round Key Round Key
Data output
Ref: Dr Gaj and Chodowiec Publication
17
Key Scheduling
Key scheduling unit supports all three key sizes i.e128, 192 and 256.
It requires a key setup phase, during which roundkeys are computed and stored in internal memory.
This unit produces 64 bit key per clock cycle,independent of the size of the main key.
18
Key: Block Diagram
32
Input 64 bits
32
32
32
32 32Ki-2 Ki-1
Ki-4 Ki-3
Ki-6 Ki-5
Ki-8 Ki-7
Ki-NkKi+1-Nk
Ki Ki+1Output64 bits
Rcon
0Rot Sub
Register
Ref: Dr Gaj and Chodowiec Publication
19
Interface
20
Interface - Virtex
AES ENC/DECUNIT
CLK
RESET
DATA_IN
DATA_IN_WRITE
DATA_IN_READY
KEY_IN
KEY_IN_WRITE
KEY_IN_READY
ENC/DEC
DATA_OUT
FULL
WRITE
128
128
128
21
Interface - Spartan
22
Test Vectors
Test vectors provided by NIST in the fips 197publication
Contains intermediate state values
Test vectors for encryption and decryption areavailable for different key sizes
Separate decryption test vectors available fordecryption schemes using normal key and inversekeys
23
Design tools used
Aldec Active HDL 7.2 used for functional simulation
Xilinx ISE Design Suite 10.1 used for synthesis andimplementation
24
Results
25
Throughput (Gbps)
0.3190.9070.3551.01256
0.3381.020.4031.35192
0.3761.180.4261.53128
SpartanVirtexSpartanVirtexKey Size
T-boxS-box
26
Throughput
Comparison: Throughput
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
128_Virtex 192_Virtex 256_Virtex 128_Spartan 192_Spartan 256_Spartan
Implementation
Th
rou
gh
pu
t(G
bp
s)
S-box
T-box
27
Area (CLB slices)
11,68716863019622256
11,68716932913641192
11,68716963019633128
SpartanVirtexSpartanVirtexKey Size
T-boxS-box
28
Area
Comparison: Area
0
2000
4000
6000
8000
10000
12000
14000
128_Virtex 192_Virtex 256_Virtex 128_Spartan 192_Spartan 256_Spartan
Implementation
Are
a(C
LB
sli
ce
s)
S-box
T-box
29
Throughput/Area
27.27538.038317.821618.846256
28.90602.721354.652104.060192
32.15693.113376.962415.910128
SpartanVirtexSpartanVirtexKey Size
T-boxS-box
30
Throughput/Area
Comparison: Throughput/Area
0
500
1000
1500
2000
2500
3000
128_Virtex 192_Virtex 256_Virtex 128_Spartan 192_Spartan 256_Spartan
Implementations
Ra
tio S-box
T-box
31
Problems encountered
Unable to map the T – tables to the BRAMs.
By default, the tool implemented the tables as logicinstead of BRAMs
Possibility of the T-box architectures having higherlatency due to on the fly calculation of inverse round keys
32
Conclusion
Our S-box implementations perform better than the T-boximplentations
Area of T-box implementations nearly four times morethan that of the S-box implementations.
33
Conclusion (2)
Comparatively the throughputs of S-box implementationsare 11%, 29% and 31% higher than that of thecorresponding T-box implementations with key size 128bits, 192 bits and 256 bits
The throughput/areaCLB of the S-box implementation is atleast 10x and more than corresponding T-boximplementations
34
Scope for future work
Implement the T-box architecture implementations suchthat BRAMs are used to store the T table values
Partial or complete loop unrolling can be implemented forthe S-box architectures to further increase the throughput
For the T-box implementations, the inverse round keyscan be precomputed and stored in the memory, whichmay reduce the min clock period.
35
Questions?