Upload
nikhil-gupta
View
1.014
Download
0
Tags:
Embed Size (px)
DESCRIPTION
The project report describes basic details about AES Rijndael 128 bit parallel algo
Citation preview
1
1. Introduction
Cryptography is about the avoidance and recognition of fraud and other cruel
activities. Symmetric-key cryptography, also called secret key cryptography. It involves the use
of a secret key known only to the users. It is considered by the use of a single key to
perform both the encrypting and decrypting of data. On October, 2, 2000, The National
Institute of Standards and Technology (NIST) announced Rijndael as the new Advanced
Encryption Standard (AES).The Predecessor to the AES was Data Encryption Standard (DES)
which was considered to be unsecure because of its weakness to brute force attacks. DES
was a standard from 1977 and stayed until the mid1990’s To overcome the situation, the
National Institute of Standards and Technology (NIST) created a new encryption standard.
The methods were proposed by Joan Daemon and Vincent Rijman, which are called Rijndael.
The National Institute of Standards and Technology (NIST) have published the
specifications of this encryption standard in the Federal Information Processing Standards (FIPS)
Publication 197. Different versions of AES algorithm exist today (AES128, AES192, AES256)
depending on the size of the encryption key. Three architectural optimization approaches can
be employed to speed up the hardware implementations: Pipelining, Sub Pipelining, and Loop-
Unrolling. Among these approaches, the sub pipelined architecture can achieve maximum
speed up and optimum speed–area ratio in non-feedback modes. The Rijndael algorithm, the
Advanced Encryption Standard (AES) provides a symmetric key cryptography that allows for
the encryption and decryption of blocks of data. As a symmetric system, the secret key
must be shared between the sender and receiver in order for communication to be
possible.
AES algorithm is generally applied in the financial field in domestic, such as
realizing legal encryption in ATM, magnetism card and intelligence card.
2
2. AES RIJNDAEL ALGORITHM
AES is an iterated block cipher with a fixed size of 128 and a variable key length. The
state is a rectangular array of bytes and the block Size is of 128 bits, which is 16 bytes; the
Rectangular array is of dimensions 4x4.The key is similarly pictured as a rectangular array
with four rows. The number of columns of the key, denoted Nk , is equal to the key length
divided by 32.It is very important to know that the input bytes are mapped onto the state
bytes in the order a0,0,a1,0, a2,0, a3,0, a0,1, a1,1, a2,1, a3,1 ....... and the bytes of the key are
mapped onto the array in the order k0,0, k1,0, k2,0, k3,0, k0,1, k1,1,k2,1, k3,1..At the end of
the Operation, the output is extracted from the state by taking the state bytes in the
same order. AES uses a variable number of rounds, which are fixed: A key of size 128 has 10
rounds. A key of size 192 has 12 rounds. A key of size 256 has 14 rounds. On the encryption
algorithm, there will be four processes: Add Round Key, sub bytes, Shift Rows and Mix
Columns. But, on the last stage, the Mix Columns operation is unseen. The decryption
algorithm will use the inverse operations: Inverse Add Round Key, Inverse Sub Bytes,
Inverse Mix Columns and Inverse Shift Rows. In the decryption also, the Inverse Mix
Columns is unseen on the last stage.
AES Type Key Length (Nk Words)
Block Size (Nb words)
Number of Rounds (Nr)
AES-128 4 4 10
AES-192 6 4 12
AES-256 8 4 14
Figure1(i) Flow chart for AES Algorithm
3
Figure1 (ii). General AES Architecture
2.1 Encryption
Figure2. AES Architecture- Gate level
4
For both encryption and decryption, the process begins with an Add Round Key,
followed by nine rounds that each includes all four stages, followed by a tenth round of three
stages. Only the Add Round Key stage makes use of the key. For this reason, the cipher begins
and ends with an Add Round Key stage. The final round of both encryption and decryption
consists of only three stages (mix column and inverse mix column). The basic processing
unit for the AES algorithm is a byte. As a result, the plaintext, cipher text and the cipher key are
arranged and processed as arrays of bytes. For an input, an output or a cipher key denoted
by a, the bytes in the resulting array are referenced as a n , where n is in one of the
following ranges:
Block length = 128 bits, 0 ≤ n < 16
Key length = 128 bits, 0 ≤n < 16
Key length = 192 bits, 0 ≤ n < 24
All byte values in the AES algorithm will be presented as the concatenation of its
individual bit values (0 or 1) between braces in the order {b7, b6, b5, b4, b3, b2, b1, b0}. These
bytes are interpreted as finite field elements using a polynomial representation:
b7 x7 + b6 x6 + b5 x5 + b4 x4 + b3 x3 + b2 x2 + b1 x + b0 =
The 128-bit data block is divided into 16 bytes. These bytes are mapped to a 4x4 array
called the State.
2.1.1 The State
Internally, the AES algorithm’s operations are performed on a two-dimensional array of bytes
called the State. The State consists of four rows of bytes, each containing Nb bytes, where Nb is the
block length divided by 32. In the State array denoted by the symbol s, each individual byte has two
indices, with its row number r in the range 0 ≤ r <4 and its column number c in the range 0 ≤ c < Nb. This
allows an individual byte of the State to be referred to as either sr,c or s[r,c]. For this standard, Nb=4,
i.e., 0 ≤ c <4.
Figure3. State Array Input and Output
5
2.1.2 SubBytes (substitution bytes)
The substitute byte transformation, called SubBytes. AES defines a matrix of byte values,
called an S-box that contains all possible 256 8-bit values. Each individual byte of State is
mapped into a new byte in the following way: The leftmost 4 bits of the byte are used as a
row value and the rightmost 4 bits are used as a column value. These row and column values
provide as indexes into the S-box to select an 8-bit output value for the next process.
In matrix form, the affine transformation element of the S-box can be expressed as:
Figure4. SubBytes () applies the S-box to each byte of the State.
6
Table1. S Box Table (in Hexadecimal Format)
2.1.3 ShiftRows() Transformation
In the ShiftRows() transformation, the bytes in the last three rows of the State are
cyclically shifted over different numbers of bytes (offsets). The first row, r = 0, is not shifted.
Specifically, the ShiftRows() transformation proceeds as follows:
s' = s r,c r,(c+ shift (r, Nb)) mod Nb for 0 ≤r <4 and 0 ≤c < Nb,
Where the shift value shift(r, Nb) depends on the row number, r, as follows
shift (1,4) = 1 ; shift(2,4) = 2; shift(3,4) = 3
This has the effect of moving bytes to “lower” positions in the row (i.e., lower values of c
in a given row), while the “lowest” bytes wrap around into the “top” of the row (i.e., higher
values of c in a given row).
Figure5. ShiftRows() cyclically shifts the last three rows in the State
7
2.1.4 MixColumns() Transformation
The MixColumns() transformation operates on the State column-by-column, treating
each column as a four-term polynomial.
Let s’(x) = a(x) xor s(x)
Figure6. MixColumns() operates on the State column-by-column.
(a) Multiplication
In the polynomial representation, multiplication in GF(28) (denoted by •) corresponds
with the multiplication of polynomials modulo an irreducible polynomial of degree 8. A
polynomial is irreducible if its only divisors are one and itself.
For the AES algorithm, this irreducible polynomial is
m(x) = x8 + x4 + x3 + x +1
The modular reduction by m(x) ensures that the result will be a binary polynomial of
degree less than 8, and thus can be represented by a byte. The multiplication is associative. For
any non-zero binary polynomial b(x) of degree less than 8, the multiplicative inverse of b(x),
denoted b-1(x), can be found as follows: the extended Euclidean algorithm is used to compute
polynomials a(x) and c(x) such that
b(x)a(x) + m(x)c(x) = 1
8
Hence, a(x) • b(x) mod m(x) = 1, which means
b-1(x) = a(x) mod m(x)
Moreover, for any a(x), b(x) and c(x) in the field, it holds that
a(x) • (b(x) + c(x)) = a(x) • b(x) + a(x) • c(x).
(b) Multiplication by x
Multiplying the binary polynomial
b7 x7 + b6 x6 + b5 x5 + b4 x4 + b3 x3 + b2 x2 + b1 x + b0
With the polynomial x results in
b7 x8 + b6 x7 + b5 x6 + b4 x5 + b3 x4 + b2 x3 + b1 x2 + b0x
The result x•b(x) is obtained by reducing the above result modulo m(x). If b7= 0, the
result is already in reduced form. If b7= 1, the reduction is accomplished by subtracting (i.e.
XORing) the polynomial m(x). It follows that multiplication by x can be implemented at the byte
level as a left shift and a subsequent conditional bitwise XOR.
Table of ``exponentials'': E(rs) = 03^rs
E(rs) S
0 1 2 3 4 5 6 7 8 9 a b c d e f
r
0 01 03 05 0f 11 33 55 ff 1a 2e 72 96 a1 f8 13 35
1 5f e1 38 48 d8 73 95 a4 f7 02 06 0a 1e 22 66 aa
2 e5 34 5c e4 37 59 eb 26 6a be d9 70 90 ab e6 31
3 53 f5 04 0c 14 3c 44 cc 4f d1 68 b8 d3 6e b2 cd
4 4c d4 67 a9 e0 3b 4d d7 62 a6 f1 08 18 28 78 88
5 83 9e b9 d0 6b bd dc 7f 81 98 b3 ce 49 db 76 9a
6 b5 c4 57 f9 10 30 50 f0 0b 1d 27 69 bb d6 61 a3
7 fe 19 2b 7d 87 92 ad ec 2f 71 93 ae e9 20 60 a0
8 fb 16 3a 4e d2 6d b7 c2 5d e7 32 56 fa 15 3f 41
9 c3 5e e2 3d 47 c9 40 c0 5b ed 2c 74 9c bf da 75
a 9f ba d5 64 ac ef 2a 7e 82 9d bc df 7a 8e 89 80
b 9b b6 c1 58 e8 23 65 af ea 25 6f b1 c8 43 c5 54
c fc 1f 21 63 a5 f4 07 09 1b 2d 77 99 b0 cb 46 ca
d 45 cf 4a de 79 8b 86 91 a8 e3 3e 42 c6 51 f3 0e
e 12 36 5a ee 29 7b 8d 8c 8f 8a 85 94 a7 f2 0d 17
f 39 4b dd 7c 84 97 a2 fd 1c 24 6c b4 c7 52 f6 01
Table2. E-Table for Galois field Multiplication
9
Table of ``logarithms'': rs = 03^L(rs)
L(rs) S
0 1 2 3 4 5 6 7 8 9 a b c d e f
r
0 00 19 01 32 02 1a c6 4b c7 1b 68 33 ee df 03
1 64 04 e0 0e 34 8d 81 ef 4c 71 08 c8 f8 69 1c c1
2 7d c2 1d b5 f9 b9 27 6a 4d e4 a6 72 9a c9 09 78
3 65 2f 8a 05 21 0f e1 24 12 f0 82 45 35 93 da 8e
4 96 8f db bd 36 d0 ce 94 13 5c d2 f1 40 46 83 38
5 66 dd fd 30 bf 06 8b 62 b3 25 e2 98 22 88 91 10
6 7e 6e 48 c3 a3 b6 1e 42 3a 6b 28 54 fa 85 3d ba
7 2b 79 0a 15 9b 9f 5e ca 4e d4 ac e5 f3 73 a7 57
8 af 58 a8 50 f4 ea d6 74 4f ae e9 d5 e7 e6 ad e8
9 2c d7 75 7a eb 16 0b f5 59 cb 5f b0 9c a9 51 a0
a 7f 0c f6 6f 17 c4 49 ec d8 43 1f 2d a4 76 7b b7
b cc bb 3e 5a fb 60 b1 86 3b 52 a1 6c aa 55 29 9d
c 97 b2 87 90 61 be dc fc bc 95 cf cd 37 3f 5b d1
d 53 39 84 3c 41 a2 6d 47 14 2a 9e 5d 56 f2 d3 ab
e 44 11 92 d9 23 20 2e 89 b4 7c b8 26 77 99 e3 a5
f 67 4a ed de c5 31 fe 18 0d 63 8c 80 c0 f7 70 07
Table3. L-Table for Galois field Multiplication
The individual block values from the shift row are taken whereas their corresponding
values are obtained from L TABLE. The L TABLE values are given as the input for the E TABLE and
corresponding output are obtained from the same table. The output of multiplication is given as
input to the next state.
2.1.5 AddRoundKey() Transformation
In the AddRoundKey() transformation, a Round Key is added to the State by a simple
bitwise XOR operation. Each Round Key consists of Nb words from the key schedule Those Nb
words are each added into the columns of the State, such that
[s’0,c, s’1,c, s’2,c , s’3,c ]= [s’0,c, s’1,c, s’2,c, s’3,c ] xor [wround*Nb+c] for 0 ≤ c< Nb,
where [wi] are the key schedule words, and round is a value in the range 0 ≤ round ≤ Nr.
In the Cipher, the initial Round Key addition occurs when round = 0, prior to the first application
of the round function. The application of the AddRoundKey() transformation to the Nr rounds of
the Cipher occurs when 1 ≤ round ≤ Nr.
10
Figure7. AddRoundKey() XORs each column of the State with a word from the key schedule.
2.1.6 Key Expansion
The AES algorithm takes the Cipher Key, K, and performs a Key Expansion routine to
generate a key schedule. The Key Expansion generates a total of Nb(Nr+ 1) words: the
algorithm requires an initial set of Nb words, and each of the Nr rounds requires Nb words of
key data. The resulting key schedule consists of a linear array of 4-byte words, denoted [wi ],
within the range 0 ≤ i < Nb(Nr +1).
KeyExpansion(byte key[4*Nk], word w[Nb*(Nr+1)], Nk) begin word temp i = 0 while (i < Nk) w[i] = word(key[4*i], key[4*i+1], key[4*i+2], key[4*i+3]) i = i+1 end while i = Nk while (i < Nb * (Nr+1)] temp = w[i-1] if (i mod Nk = 0) temp = SubWord(RotWord(temp)) xor Rcon[i/Nk] else if (Nk > 6 and i mod Nk = 4) temp = SubWord(temp) end if w[i] = w[i-Nk] xor temp i = i + 1 end while end
11
SubWord() is a function that takes a four-byte input word and applies the S-box to each
of the four bytes to produce an output word. The function RotWord() takes a word
[a0,a1,a2,a3] as input, performs a cyclic permutation, and returns the word [a1,a2,a3,a0]. The
round constant word array, Rcon[i], contains the values given by [xi-1, {00}, {00}, {00}], with xi-1
being powers of x. The first Nk words of the expanded key are filled with the Cipher Key. Every
following word, w[i], is equal to the XOR of the previous word, w[ i-1] , and the word Nk
positions earlier, w[i-Nk]. For words in positions that are a multiple of Nk, a transformation is
applied to w[i-1] prior to the XOR, followed by an XOR with a round constant, Rcon[i]. This
transformation consists of a cyclic shift of the bytes in a word (RotWord()), followed by the
application of a table lookup to all four bytes of the word (SubWord()). It is important to note
that the Key Expansion routine for 256-bit Cipher Keys (Nk= 8) is slightly different than for 128-
and 192-bit Cipher Keys. If Nk = 8 and i-4is a multiple of Nk, then SubWord()is applied to w[[ i-
1]] prior to the XOR.
2.2 Decryption
The Cipher transformations can be inverted and then implemented in reverse order to
produce a straightforward Inverse Cipher for the AES algorithm. The individual transformations
used in the Inverse Cipher are - InvShiftRows(), InvSubBytes(),InvMixColumns(), and
AddRoundKey().
Figure 8. Decryption Module – Gate Level
12
2.2.1 InvShiftRows()Transformation
InvShiftRows()is the inverse of the ShiftRows()transformation. The bytes in the last
three rows of the State are cyclically shifted over different numbers of bytes (offsets). The first
row, r= 0, is not shifted. The bottom three rows are cyclically shifted by Nb -shift(r, Nb) bytes,
where the shift value shift(r,Nb)depends on the row number, Specifically, the
InvShiftRows()transformation proceeds as follows:
s'r,(c+ shift (r, Nb)) mod Nb= sr,c for 0 <r <4 and 0 ≤c < Nb
Figure9. InvShiftRows()cyclically shifts the last three rows in the State
2.2.2 InvSubBytes()Transformation
InvSubBytes()is the inverse of the byte substitution transformation, in which the inverse
Sbox is applied to each byte of the State. This is obtained by applying the inverse of the affine
transformation followed by taking the multiplicative inverse in GF(28).
13
Table4. Inverse S-box: substitution values for the byte xy(in hexadecimal format).
2.2.3 InvMixColumns()Transformation
InvMixColumns() is the inverse of the MixColumns()transformation. InvMixColumns()
operates on the State column-by-column, treating each column as a four term polynomial.
The columns are considered as polynomials over GF(28) and multiplied modulo x4+ 1 with a
fixed polynomial a-1(x), given by
a-1(x) = {0b}x3+ {0d}x2+ {09}x+ {0e}
Let s’(x) = a-1(x) xor s(x) :
2.2.4 Inverse AddRoundKey()Transformation
AddRoundKey() is its own inverse, since it only involves an application of the XOR operation.
14
2.3 Block Cipher Modes of Operation
For any given key, the underlying block cipher algorithm of the mode also consists of
two functions that are inverses of each other. These two functions are often called encryption
and decryption, but in this recommendation, those terms are reserved for the processes of the
confidentiality modes. Instead, as part of the choice of the block cipher algorithm, one of the
two functions is designated as the forward cipher function, denoted CIPHK; the other function is
then called the inverse cipher function, denoted CIPH–1 k. The inputs and outputs of both
functions are called input blocks and output blocks. The input and output blocks of the block
cipher algorithm have the same bit length, called the block size, denoted b.
2.3.1 The Electronic Codebook Mode
The Electronic Codebook (ECB) mode is a confidentiality mode that features, for a given
key, the assignment of a fixed cipher text block to each plain text block, analogous to the
assignment of code words in a codebook.
The Electronic Codebook (ECB) mode is defined as follows:
ECB Encryption: Cj= CIPHK(Pj) for j= 1 … n.
ECB Decryption: Pj= CIPH -1 (Cj) for j= 1 … n.
In ECB encryption, the forward cipher function is applied directly and independently to
each block of the plaintext. The resulting sequence of output blocks is the cipher text. In ECB
decryption, the inverse cipher function is applied directly and independently to each block of
the cipher text. The resulting sequence of output blocks is the plaintext
Figure10. The ECB Mode
15
2.3.2The Cipher Block Chaining Mode
The Cipher Block Chaining (CBC) mode is a confidentiality mode whose encryption
process features the combining (“chaining”) of the plaintext blocks with the previous cipher
text blocks. The CBC mode requires an Initialization Vector (IV) to combine with the first
plaintext block. The IV need not be secret, but it must be unpredictable; also, the integrity of
the IV should be protected. The CBC mode is defined as follows:
CBC Encryption: Ci = CIPHK(P1 ⊕ IV);
Cj = CIPHK (Pj ⊕ Cj-1) for j= 2 … n.
CBC Decryption: P1= CIPH-1k(C1) ⊕ IV;
Pj= CIPH-1k (Cj) ⊕ Cj-1 for j= 2 … n.
Figure11. The CBC Mode
2.3.3 The Cipher Feedback Mode
The Cipher Feedback (CFB) mode is a confidentiality mode that features the feedback of
successive cipher text segments into the input blocks of the forward cipher to generate output
blocks that are exclusive-ORed with the plaintext to produce the cipher text, and vice versa.
The CFB mode requires an IV as the initial input block. The IV need not be secret, but it must be
16
unpredictable. The CFB mode also requires an integer parameter, denoted s, such that 1 ≤ s ≤ b.
In the specification of the CFB mode below, each plaintext segment (P#) and cipher text
segment (C#j) consists of s bits. The value of s is sometimes incorporated into the name of the
mode, e.g. the 1-bit CFB mode, the 8-bit CFB mode, the 64-bit CFB mode, or the 128-bit CFB
mode.
The CFB mode is defined as follows:
CFB Encryption: I1-= IV;
Ij= LSBb-s(Ij –1) | C#j-1 for j= 2 … n;
Oj= CIPHK (Ij) for j= 1, 2 … n;
C#j= P#
j ⊕ MSBs(Oj) for j= 1, 2 … n.
CFB Decryption: I1= IV;
Ij= LSBb-s(Ij -1)| C#j -1 for j= 2 … n;
Oj= CIPHK(Ij) for j= 1, 2 … n;
Pj= C#j ⊕ MSBs(Oj) for j= 1, 2 … n.
Figure12. The CFB Mode
17
2.3.4 The Output Feedback Mode
The Output Feedback (OFB) mode is a confidentiality mode that features the iteration of
the forward cipher on an IV to generate a sequence of output blocks that are exclusive-ORed
with the plaintext to produce the cipher text, and vice versa. The OFB mode requires that the IV
is a nonce, i.e., the IV must be unique for each execution of the mode under the given key; The
OFB mode is defined as follows:
OFB Encryption: I1-= IV;
Ij= Oj -1 for j= 2 … n;
Oj= CIPHK (Ij) for j= 1, 2 … n;
Cj = Pj ⊕ Oj for j= 1, 2 … n-1
C#n= Pn ⊕ MSBu(On)
OFB Decryption: I1= IV;
Ij= Oj -1 for j= 2 … n;
Oj= CIPHK (Ij) for j= 1, 2 … n;
Pj = Cj ⊕ Oj for j= 1, 2 … n-1
Pn# = C#
n ⊕ MSBu(On)
Figure13. The OFB Mode
18
2.3.5 The Counter Mode
The Counter (CTR) mode is a confidentiality mode that features the application of the
forward cipher to a set of input blocks, called counters, to produce a sequence of output blocks
that are exclusive-ORed with the plaintext to produce the cipher text, and vice versa. The
sequence of counters must have the property that each block in the sequence is different from
every other block. This condition is not restricted to a single message: across all of the
messages that are encrypted under the given key, all of the counters must be distinct. The
counters for a given message are denoted T1, T2, …,Tn.
Given a sequence of counters, T1, T2, …, Tn, the CTR mode is defined as follows:
CTR Encryption: Oj = CIPHK(Tj) for j= 1, 2 … n;
Cj = Pj ⊕ Oj for j= 1, 2 … n-1;
C*n = P*n ⊕ MSBu(On).
CTR Decryption: Oj= CIPHK(Tj) for j= 1, 2 … n;
Pj = Cj ⊕ Oj for j= 1, 2 … n-1;
P*n = C*n ⊕ MSBu(On).
Figure14. The CTR Mode
19
2.4 Principle of AES Parallelism
Figure15. Parallel AES Algorithm
In the traditional implementation of AES, the computation of data blocks is performed
serially, therefore, the efficiency and speed is poor. The entire plain text is divided into blocks
of fixed length which can be processed independently. Each block of plaintext is encrypted with
the same key as a unit and turned into cipher text block. Blocks of length 128-bits are formed
from the given plain text. The CBC mode of encryption supports a parallel architecture as
the individual blocks of plain text can be processed independently.
The decryption of the cipher text can also be done in a similar way. The sequential
algorithm can be modified to take advantage of the multiprocessing units. According to the
parallel computations paradigms, the independent parts of the algorithms must be
identified and then prepared to work in separate threads. Initially the AES algorithm is
divided into parallelizable and unparallelizable parts. The data input of the parallelization
process is the well optimized sequential AES algorithm.
20
3. Implementation and Results
Verilog HDL is used as the standard hardware description language because of the
flexibility to exchange among environments. The implementation code is pure Verilog code that
could easily be implemented on other devices, without changing the design. We have used
mainly three tools to implement the code – Notepad++, Questasim, Xilinx Synthesis and
Simulation Tools (ISE 14.7). The goal of design implementation is Speed Optimization keeping
other constraints as minimum as possible. We have implemented CBC Mode of AES Rijndael
Algorithm.
3.1 Tool Details
The editor used for writing the design codes is Notepad++. Questasim 10.0 is used for
debugging and optimizing the design codes and simulating. Xilinx ISE 14.7 is used for
synthesizing the design to the Zed (Zynq™ Evaluation and Development) Board. The code
implementation results are based on Questasim 10.0 simulation results.
Figure16 (i) Zed Board Device Specifications
21
Figure15 (ii) Zed Board Device Specifications
3.2 Encryption Module
The Encryption Process (encryption()) of AES is presented below. Top_PipelinedInt
module is the top module.
128
128
128
128
128
128
128
128
128
Pre Round
pre_round ()
Inner Round
Inner_round()
Last Round
last_round ()
Key Expansion
Key_expansion()
Cipher Text
Plain Text Key
128
22
Figure17. Encryption Module
Figure18. Top Interfacing Implemented Module - RTL View.
Figure19. Top Module AES Process – RTL View
23
3.2.1 Encryption Pre Round
Round Key is added to the state by a simple bitwise XOR expansion. Because of fully
parallel architecture output of this stage is registered.
3.2.2 Encryption Inner Rounds
There are 10 rounds as per 128-bit AES Algorithm. Every round includes 4 sub modules –
SubBytes(), ShiftRow transformations(), MixColoumns Transformations() and AddRoundkey().
Inner round includes 9 rounds; remaining 1 round is implemented as Last Round.
For implementing 10 rounds, if we instantiate each module 10 times, the overall area
requirement is increases 10 times. And implementing it with Zed Board (XC7Z020) resources
utilization exceeds by 100% for each Encryption and Decryption. To overcome this problem we
used the concept of reusing the same modules as many times they are required. We used state
diagram at the top level [middle_round ()], which uses the same module each times and
registered output is sent to next state as input. Because of this process, the IOB utilization is
reduced to 5% and Slices utilization to 43%
SubBytes Transformation [SubBytes_transformation()] is a non-linear substitution of
bytes which operates on each byte of the state using a substitution table (S-Box Table).
The S-Box table contains 256 numbers [0 to 255] and their corresponding values. These
values are stored in 256 * 8 ROMs which takes 8-bit input address
Figure20. S-Box Round ROM Implementation
ROM0
Add
Data
ROM255
Add
Data
ROM1
Add
Data
128 128
To Mix Column stage From
Pre_Round
stage
24
In shift_rows transformation() the last three rows of the state are cyclically shifted over
different numbers of bytes. The operation output is registered therefore using only one
4-input LUT and 1 slice.
The mixcolumn() transformation operation is based on Galois Field Multiplication This
operation is performed on the state column by column.
AddRoundKey() transformation is same as pre-round.
3.2.3 Encryption Last Round
The last round contains three operation namely SubBytes_transformation(), shift_rows
transformation(),AddRoundKey() transformation but mixcolumn() transformation
operation is excluded.
3.2.4 Key Expansion Module
The key expansion term is used to describe the operation of generating all round keys
from the original input round key. The initial round key will be the original key in case of
encryption and the last group of the generated key expansion keys in case of decryption.
Key Expansion module includes for sub modules rotate_word() Transformation, Sub-
Word() transformation, Round_constant XORing(), and key_round_Module().
o Rotate_word() Transformation(rot_word()): The function rot_word() takes a
word [a0,a1,a2,a3], performs a cyclic rotation and returns the word
[a1,a2,a3,a0].
o Sub_Word() Transformation(key_sbox()): It is same as SubBytes()
transformation module only the difference is that it processes bits.
o Round_Constatnt() Transformation(key_rcon()): Predefined round 32 bits
constants of GF are fixed for each round. A 4-bit round number and 32-bit
output of Sub_Word() transformation is taken as input . Values corresponding to
round key is fetch from ROM and xored with key_sbox().
o AddRound() Transformation(key_round()): Key_round is xored with previous
round keys and output of key_rcon().
25
3.3 Decryption Module
Figure21. Decryption Module
As the decryption is inverse of Encryption; the operations are performed in the inverse manner
of Encryption. The last round of the Encryption becomes the first round in Decryption Process
and the expanded key generated in KeyExpansion() is fed back instead of cipher key.
128
Key
128
128
128
128
128
128
128
Cipher Text
Plain Text
Key Expansion
Key_expansion()
Last Round
last_round ()
Pre Round
pre_round ()
Inner Round
Inner_round()
128
26
3.4 Design Implementation and Verification
3.4.1 ChipScop Virtual Input/ Output (VIO) core:
Design Synthesis implementation and Verification is done using ChipScope™ Pro
Virtual Input/Output (VIO) core.
The LogiCORE™ IP ChipScope™ Pro Virtual Input/Output (VIO) core is a
customizable core that can both monitor and drive internal FPGA signals in real time.
Two different kinds of inputs and two different kinds of outputs are available, both of
which are customizable in size to interface with the FPGA design. Communication with
the VIO core is conducted using a connection to the JTAG port via the ICON core.
Figure22. VIO Core Connection to ICON Core.
Four types of signals are available in the VIO core:
• Asynchronous inputs:
These are sampled using the JTAG clock signal that is driven from the JTAG cable. The
input values are read back periodically and displayed in the Analyzer.
• Synchronous inputs:
These are sampled using the design clock. The input values are read back periodically
and displayed in the Analyzer.
• Asynchronous outputs:
These are user-defined in the Analyzer and driven out of the core to the surrounding
design. A logical 1 or 0 value can be defined for individual asynchronous outputs.
•Synchronous outputs:
These are user-defined in the Analyzer, synchronized to the design clock and driven out
of the core to the surrounding design. A logical 1 or 0 can be defined for individual
synchronous outputs. Pulse trains of 16 clock cycles worth of ones and zeros can also be
defined for synchronous outputs.
27
3.5 Simulation:
Simulation results are based on test performed on QuestaSim 10.0. The
verification is done using Verilog HDL.
Test Case1 - Reset Case :
When reset is asserted (active high), all signals are assigned to zero.
Test Case2 – Encryption :
The AES Module inputs are driven for encryption and expected outputs are
obtained. All the sequences below are in Hexadecimal.
Input Text: 00112233445566778899aabbccddeeff
Cipher Key: 000102030405060708090a0b0c0d0e0f
Expected Cipher Text: 69c4e0d86a7b0430d8cdb78070b4c55a
Figure23. Encryption Process output Waveform
Test Case3 – Decryption :
The AES Module inputs are driven for decryption and expected outputs are
obtained. All the sequences below are in Hexadecimal.
Input Cipher Text: 69c4e0d86a7b0430d8cdb78070b4c55a
Cipher Key: 000102030405060708090a0b0c0d0e0f
Expected Plain Text: 00112233445566778899aabbccddeeff
28
Figure 24. Decryption Process output Waveform
3.6 Synthesis Report:
Overall implementation of parallel AES Algorithm used resources as shown below:
Figure 25. Resource Utilization Summary
Timing Parameters:
Speed Grade: -1
Minimum period: 4.171ns (Maximum Frequency: 239.733MHz)
Minimum input arrival time before clock: 3.719ns
Maximum output required time after clock: 1.219ns
Maximum combinational path delay: 1.261ns
29
4 Conclusion and Future Scope
The parallel design of the AES encryption algorithm reduces the delay associated with
each round of encryption, which allows the hardware to operate at a much higher clock
frequencies, compared to a non-pipelined non parallel design. This increases the message
encryption throughput and makes the hardware model suitable for time critical encryption
applications. In addition, the hardware implementation of AES encryption algorithm provides
ultimate secrecy of the encryption key, much faster speed compared to software
implementation, and higher throughput by means of inherent hardware concurrency.
The design has been optimized on the time required to generate keys and decoding the
data. The design is implemented on Zed Board achieving clock frequency of 239.733MHz and
minimum clock period as 4.171ns.
The work has been extended in order to increase the security for more severe attacks
since the encryption time has been reduced. There has been further scope to optimize the
utilization of resources. The implementation can be further improved to achieve more efficient
usage of the resources and increase the maximum clock frequency. The key length can be
reduced, maintaining the same security, in order to optimize the resource utilization. The few
gaps have been covered but still a lot can be done to achieve the security of data along with the
optimization of resources.
30
References
1. csrc.nist.gov_publications_fips_fips197_fips-197 National Institute of Standards and
Technology, Advanced Encryption Standard, Federal Information Processing
Standards 197, November 2001.
2. M.Natheera Banu, “FPGA Based Hardware Implementation of Encryption
Algorithm”, International Journal of Engineering and Advanced Technology (IJEAT)
ISSN: 2249 – 8958, Volume-3, Issue-4, April 2014.
3. M.Pitchaiah, Philemon Daniel, Praveen, “Implementation of Advanced Encryption
Standard Algorithm”, International Journal of Scientific & Engineering Research
Volume 3, Issue 3, March -2012 ISSN 2229-5518.
4. Mahesh Walunjkar, Md. Manan Mujahid, Syed Anwar Ahmed, Ashish Jadhav, “An
AES-Core Development by Using Verilog”, IJIRCCE, ISSN (Online): 2320-9801 Vol. 1,
Issue 8, October 2013.
5. Deguang Le, Jinyi Chang, Xingdou Gou, Ankang Zhang, Conglan Lu, “Parallel AES
Algorithm for Fast Data Encryption on GPU”, IEEE journal on AES 2010.
6. Xilinx User Guides on Zed Board and chipscope_vio.
7. Wikipedia, the free encyclopedia.