Aes128 bit project_report

1

1. Introduction

Cryptography is about the avoidance and recognition of fraud and other cruel

activities. Symmetric-key cryptography, also called secret key cryptography. It involves the use

of a secret key known only to the users. It is considered by the use of a single key to

perform both the encrypting and decrypting of data. On October, 2, 2000, The National

Institute of Standards and Technology (NIST) announced Rijndael as the new Advanced

Encryption Standard (AES).The Predecessor to the AES was Data Encryption Standard (DES)

which was considered to be unsecure because of its weakness to brute force attacks. DES

was a standard from 1977 and stayed until the mid1990’s To overcome the situation, the

National Institute of Standards and Technology (NIST) created a new encryption standard.

The methods were proposed by Joan Daemon and Vincent Rijman, which are called Rijndael.

The National Institute of Standards and Technology (NIST) have published the

specifications of this encryption standard in the Federal Information Processing Standards (FIPS)

Publication 197. Different versions of AES algorithm exist today (AES128, AES192, AES256)

depending on the size of the encryption key. Three architectural optimization approaches can

be employed to speed up the hardware implementations: Pipelining, Sub Pipelining, and Loop-

Unrolling. Among these approaches, the sub pipelined architecture can achieve maximum

speed up and optimum speed–area ratio in non-feedback modes. The Rijndael algorithm, the

Advanced Encryption Standard (AES) provides a symmetric key cryptography that allows for

the encryption and decryption of blocks of data. As a symmetric system, the secret key

must be shared between the sender and receiver in order for communication to be

possible.

AES algorithm is generally applied in the financial field in domestic, such as

realizing legal encryption in ATM, magnetism card and intelligence card.

2

2. AES RIJNDAEL ALGORITHM

AES is an iterated block cipher with a fixed size of 128 and a variable key length. The

state is a rectangular array of bytes and the block Size is of 128 bits, which is 16 bytes; the

Rectangular array is of dimensions 4x4.The key is similarly pictured as a rectangular array

with four rows. The number of columns of the key, denoted Nk , is equal to the key length

divided by 32.It is very important to know that the input bytes are mapped onto the state

bytes in the order a0,0,a1,0, a2,0, a3,0, a0,1, a1,1, a2,1, a3,1 ....... and the bytes of the key are

mapped onto the array in the order k0,0, k1,0, k2,0, k3,0, k0,1, k1,1,k2,1, k3,1..At the end of

the Operation, the output is extracted from the state by taking the state bytes in the

same order. AES uses a variable number of rounds, which are fixed: A key of size 128 has 10

rounds. A key of size 192 has 12 rounds. A key of size 256 has 14 rounds. On the encryption

algorithm, there will be four processes: Add Round Key, sub bytes, Shift Rows and Mix

Columns. But, on the last stage, the Mix Columns operation is unseen. The decryption

algorithm will use the inverse operations: Inverse Add Round Key, Inverse Sub Bytes,

Inverse Mix Columns and Inverse Shift Rows. In the decryption also, the Inverse Mix

Columns is unseen on the last stage.

AES Type Key Length (Nk Words)

Block Size (Nb words)

Number of Rounds (Nr)

AES-128 4 4 10

AES-192 6 4 12

AES-256 8 4 14

Figure1(i) Flow chart for AES Algorithm

3

Figure1 (ii). General AES Architecture

2.1 Encryption

Figure2. AES Architecture- Gate level

4

For both encryption and decryption, the process begins with an Add Round Key,

followed by nine rounds that each includes all four stages, followed by a tenth round of three

stages. Only the Add Round Key stage makes use of the key. For this reason, the cipher begins

and ends with an Add Round Key stage. The final round of both encryption and decryption

consists of only three stages (mix column and inverse mix column). The basic processing

unit for the AES algorithm is a byte. As a result, the plaintext, cipher text and the cipher key are

arranged and processed as arrays of bytes. For an input, an output or a cipher key denoted

by a, the bytes in the resulting array are referenced as a n , where n is in one of the

following ranges:

Block length = 128 bits, 0 ≤ n < 16

Key length = 128 bits, 0 ≤n < 16

Key length = 192 bits, 0 ≤ n < 24

All byte values in the AES algorithm will be presented as the concatenation of its

individual bit values (0 or 1) between braces in the order {b7, b6, b5, b4, b3, b2, b1, b0}. These

bytes are interpreted as finite field elements using a polynomial representation:

b7 x7 + b6 x6 + b5 x5 + b4 x4 + b3 x3 + b2 x2 + b1 x + b0 =

The 128-bit data block is divided into 16 bytes. These bytes are mapped to a 4x4 array

called the State.

2.1.1 The State

Internally, the AES algorithm’s operations are performed on a two-dimensional array of bytes

called the State. The State consists of four rows of bytes, each containing Nb bytes, where Nb is the

block length divided by 32. In the State array denoted by the symbol s, each individual byte has two

indices, with its row number r in the range 0 ≤ r <4 and its column number c in the range 0 ≤ c < Nb. This

allows an individual byte of the State to be referred to as either sr,c or s[r,c]. For this standard, Nb=4,

i.e., 0 ≤ c <4.

Figure3. State Array Input and Output

5

2.1.2 SubBytes (substitution bytes)

The substitute byte transformation, called SubBytes. AES defines a matrix of byte values,

called an S-box that contains all possible 256 8-bit values. Each individual byte of State is

mapped into a new byte in the following way: The leftmost 4 bits of the byte are used as a

row value and the rightmost 4 bits are used as a column value. These row and column values

provide as indexes into the S-box to select an 8-bit output value for the next process.

In matrix form, the affine transformation element of the S-box can be expressed as:

Figure4. SubBytes () applies the S-box to each byte of the State.

6

Table1. S Box Table (in Hexadecimal Format)

2.1.3 ShiftRows() Transformation

In the ShiftRows() transformation, the bytes in the last three rows of the State are

cyclically shifted over different numbers of bytes (offsets). The first row, r = 0, is not shifted.

Specifically, the ShiftRows() transformation proceeds as follows:

s' = s r,c r,(c+ shift (r, Nb)) mod Nb for 0 ≤r <4 and 0 ≤c < Nb,

Where the shift value shift(r, Nb) depends on the row number, r, as follows

shift (1,4) = 1 ; shift(2,4) = 2; shift(3,4) = 3

This has the effect of moving bytes to “lower” positions in the row (i.e., lower values of c

in a given row), while the “lowest” bytes wrap around into the “top” of the row (i.e., higher

values of c in a given row).

Figure5. ShiftRows() cyclically shifts the last three rows in the State

7

2.1.4 MixColumns() Transformation

The MixColumns() transformation operates on the State column-by-column, treating

each column as a four-term polynomial.

Let s’(x) = a(x) xor s(x)

Figure6. MixColumns() operates on the State column-by-column.

(a) Multiplication

In the polynomial representation, multiplication in GF(28) (denoted by •) corresponds

with the multiplication of polynomials modulo an irreducible polynomial of degree 8. A

polynomial is irreducible if its only divisors are one and itself.

For the AES algorithm, this irreducible polynomial is

m(x) = x8 + x4 + x3 + x +1

The modular reduction by m(x) ensures that the result will be a binary polynomial of

degree less than 8, and thus can be represented by a byte. The multiplication is associative. For

any non-zero binary polynomial b(x) of degree less than 8, the multiplicative inverse of b(x),

denoted b-1(x), can be found as follows: the extended Euclidean algorithm is used to compute

polynomials a(x) and c(x) such that

b(x)a(x) + m(x)c(x) = 1

8

Hence, a(x) • b(x) mod m(x) = 1, which means

b-1(x) = a(x) mod m(x)

Moreover, for any a(x), b(x) and c(x) in the field, it holds that

a(x) • (b(x) + c(x)) = a(x) • b(x) + a(x) • c(x).

(b) Multiplication by x

Multiplying the binary polynomial

b7 x7 + b6 x6 + b5 x5 + b4 x4 + b3 x3 + b2 x2 + b1 x + b0

With the polynomial x results in

b7 x8 + b6 x7 + b5 x6 + b4 x5 + b3 x4 + b2 x3 + b1 x2 + b0x

The result x•b(x) is obtained by reducing the above result modulo m(x). If b7= 0, the

result is already in reduced form. If b7= 1, the reduction is accomplished by subtracting (i.e.

XORing) the polynomial m(x). It follows that multiplication by x can be implemented at the byte

level as a left shift and a subsequent conditional bitwise XOR.

Table of ``exponentials'': E(rs) = 03^rs

E(rs) S

0 1 2 3 4 5 6 7 8 9 a b c d e f

r

0 01 03 05 0f 11 33 55 ff 1a 2e 72 96 a1 f8 13 35

1 5f e1 38 48 d8 73 95 a4 f7 02 06 0a 1e 22 66 aa

2 e5 34 5c e4 37 59 eb 26 6a be d9 70 90 ab e6 31

3 53 f5 04 0c 14 3c 44 cc 4f d1 68 b8 d3 6e b2 cd

4 4c d4 67 a9 e0 3b 4d d7 62 a6 f1 08 18 28 78 88

5 83 9e b9 d0 6b bd dc 7f 81 98 b3 ce 49 db 76 9a

6 b5 c4 57 f9 10 30 50 f0 0b 1d 27 69 bb d6 61 a3

7 fe 19 2b 7d 87 92 ad ec 2f 71 93 ae e9 20 60 a0

8 fb 16 3a 4e d2 6d b7 c2 5d e7 32 56 fa 15 3f 41

9 c3 5e e2 3d 47 c9 40 c0 5b ed 2c 74 9c bf da 75

a 9f ba d5 64 ac ef 2a 7e 82 9d bc df 7a 8e 89 80

b 9b b6 c1 58 e8 23 65 af ea 25 6f b1 c8 43 c5 54

c fc 1f 21 63 a5 f4 07 09 1b 2d 77 99 b0 cb 46 ca

d 45 cf 4a de 79 8b 86 91 a8 e3 3e 42 c6 51 f3 0e

e 12 36 5a ee 29 7b 8d 8c 8f 8a 85 94 a7 f2 0d 17

f 39 4b dd 7c 84 97 a2 fd 1c 24 6c b4 c7 52 f6 01

Table2. E-Table for Galois field Multiplication

9

Table of ``logarithms'': rs = 03^L(rs)

L(rs) S

0 1 2 3 4 5 6 7 8 9 a b c d e f

r

0 00 19 01 32 02 1a c6 4b c7 1b 68 33 ee df 03

1 64 04 e0 0e 34 8d 81 ef 4c 71 08 c8 f8 69 1c c1

2 7d c2 1d b5 f9 b9 27 6a 4d e4 a6 72 9a c9 09 78

3 65 2f 8a 05 21 0f e1 24 12 f0 82 45 35 93 da 8e

4 96 8f db bd 36 d0 ce 94 13 5c d2 f1 40 46 83 38

5 66 dd fd 30 bf 06 8b 62 b3 25 e2 98 22 88 91 10

6 7e 6e 48 c3 a3 b6 1e 42 3a 6b 28 54 fa 85 3d ba

7 2b 79 0a 15 9b 9f 5e ca 4e d4 ac e5 f3 73 a7 57

8 af 58 a8 50 f4 ea d6 74 4f ae e9 d5 e7 e6 ad e8

9 2c d7 75 7a eb 16 0b f5 59 cb 5f b0 9c a9 51 a0

a 7f 0c f6 6f 17 c4 49 ec d8 43 1f 2d a4 76 7b b7

b cc bb 3e 5a fb 60 b1 86 3b 52 a1 6c aa 55 29 9d

c 97 b2 87 90 61 be dc fc bc 95 cf cd 37 3f 5b d1

d 53 39 84 3c 41 a2 6d 47 14 2a 9e 5d 56 f2 d3 ab

e 44 11 92 d9 23 20 2e 89 b4 7c b8 26 77 99 e3 a5

f 67 4a ed de c5 31 fe 18 0d 63 8c 80 c0 f7 70 07

Table3. L-Table for Galois field Multiplication

The individual block values from the shift row are taken whereas their corresponding

values are obtained from L TABLE. The L TABLE values are given as the input for the E TABLE and

corresponding output are obtained from the same table. The output of multiplication is given as

input to the next state.

2.1.5 AddRoundKey() Transformation

In the AddRoundKey() transformation, a Round Key is added to the State by a simple

bitwise XOR operation. Each Round Key consists of Nb words from the key schedule Those Nb

words are each added into the columns of the State, such that

[s’0,c, s’1,c, s’2,c , s’3,c ]= [s’0,c, s’1,c, s’2,c, s’3,c ] xor [wround*Nb+c] for 0 ≤ c< Nb,

where [wi] are the key schedule words, and round is a value in the range 0 ≤ round ≤ Nr.

In the Cipher, the initial Round Key addition occurs when round = 0, prior to the first application

of the round function. The application of the AddRoundKey() transformation to the Nr rounds of

the Cipher occurs when 1 ≤ round ≤ Nr.

10

Figure7. AddRoundKey() XORs each column of the State with a word from the key schedule.

2.1.6 Key Expansion

The AES algorithm takes the Cipher Key, K, and performs a Key Expansion routine to

generate a key schedule. The Key Expansion generates a total of Nb(Nr+ 1) words: the

algorithm requires an initial set of Nb words, and each of the Nr rounds requires Nb words of

key data. The resulting key schedule consists of a linear array of 4-byte words, denoted [wi ],

within the range 0 ≤ i < Nb(Nr +1).

KeyExpansion(byte key[4*Nk], word w[Nb*(Nr+1)], Nk) begin word temp i = 0 while (i < Nk) w[i] = word(key[4*i], key[4*i+1], key[4*i+2], key[4*i+3]) i = i+1 end while i = Nk while (i < Nb * (Nr+1)] temp = w[i-1] if (i mod Nk = 0) temp = SubWord(RotWord(temp)) xor Rcon[i/Nk] else if (Nk > 6 and i mod Nk = 4) temp = SubWord(temp) end if w[i] = w[i-Nk] xor temp i = i + 1 end while end

11

SubWord() is a function that takes a four-byte input word and applies the S-box to each

of the four bytes to produce an output word. The function RotWord() takes a word

[a0,a1,a2,a3] as input, performs a cyclic permutation, and returns the word [a1,a2,a3,a0]. The

round constant word array, Rcon[i], contains the values given by [xi-1, {00}, {00}, {00}], with xi-1

being powers of x. The first Nk words of the expanded key are filled with the Cipher Key. Every

following word, w[i], is equal to the XOR of the previous word, w[ i-1] , and the word Nk

positions earlier, w[i-Nk]. For words in positions that are a multiple of Nk, a transformation is

applied to w[i-1] prior to the XOR, followed by an XOR with a round constant, Rcon[i]. This

transformation consists of a cyclic shift of the bytes in a word (RotWord()), followed by the

application of a table lookup to all four bytes of the word (SubWord()). It is important to note

that the Key Expansion routine for 256-bit Cipher Keys (Nk= 8) is slightly different than for 128-

and 192-bit Cipher Keys. If Nk = 8 and i-4is a multiple of Nk, then SubWord()is applied to w[[ i-

1]] prior to the XOR.

2.2 Decryption

The Cipher transformations can be inverted and then implemented in reverse order to

produce a straightforward Inverse Cipher for the AES algorithm. The individual transformations

used in the Inverse Cipher are - InvShiftRows(), InvSubBytes(),InvMixColumns(), and

AddRoundKey().

Figure 8. Decryption Module – Gate Level

12

2.2.1 InvShiftRows()Transformation

InvShiftRows()is the inverse of the ShiftRows()transformation. The bytes in the last

three rows of the State are cyclically shifted over different numbers of bytes (offsets). The first

row, r= 0, is not shifted. The bottom three rows are cyclically shifted by Nb -shift(r, Nb) bytes,

where the shift value shift(r,Nb)depends on the row number, Specifically, the

InvShiftRows()transformation proceeds as follows:

s'r,(c+ shift (r, Nb)) mod Nb= sr,c for 0 <r <4 and 0 ≤c < Nb

Figure9. InvShiftRows()cyclically shifts the last three rows in the State

2.2.2 InvSubBytes()Transformation

InvSubBytes()is the inverse of the byte substitution transformation, in which the inverse

Sbox is applied to each byte of the State. This is obtained by applying the inverse of the affine

transformation followed by taking the multiplicative inverse in GF(28).

13

Table4. Inverse S-box: substitution values for the byte xy(in hexadecimal format).

2.2.3 InvMixColumns()Transformation

InvMixColumns() is the inverse of the MixColumns()transformation. InvMixColumns()

operates on the State column-by-column, treating each column as a four term polynomial.

The columns are considered as polynomials over GF(28) and multiplied modulo x4+ 1 with a

fixed polynomial a-1(x), given by

a-1(x) = {0b}x3+ {0d}x2+ {09}x+ {0e}

Let s’(x) = a-1(x) xor s(x) :

2.2.4 Inverse AddRoundKey()Transformation

AddRoundKey() is its own inverse, since it only involves an application of the XOR operation.

14

2.3 Block Cipher Modes of Operation

For any given key, the underlying block cipher algorithm of the mode also consists of

two functions that are inverses of each other. These two functions are often called encryption

and decryption, but in this recommendation, those terms are reserved for the processes of the

confidentiality modes. Instead, as part of the choice of the block cipher algorithm, one of the

two functions is designated as the forward cipher function, denoted CIPHK; the other function is

then called the inverse cipher function, denoted CIPH–1 k. The inputs and outputs of both

functions are called input blocks and output blocks. The input and output blocks of the block

cipher algorithm have the same bit length, called the block size, denoted b.

2.3.1 The Electronic Codebook Mode

The Electronic Codebook (ECB) mode is a confidentiality mode that features, for a given

key, the assignment of a fixed cipher text block to each plain text block, analogous to the

assignment of code words in a codebook.

The Electronic Codebook (ECB) mode is defined as follows:

ECB Encryption: Cj= CIPHK(Pj) for j= 1 … n.

ECB Decryption: Pj= CIPH -1 (Cj) for j= 1 … n.

In ECB encryption, the forward cipher function is applied directly and independently to

each block of the plaintext. The resulting sequence of output blocks is the cipher text. In ECB

decryption, the inverse cipher function is applied directly and independently to each block of

the cipher text. The resulting sequence of output blocks is the plaintext

Figure10. The ECB Mode

15

2.3.2The Cipher Block Chaining Mode

The Cipher Block Chaining (CBC) mode is a confidentiality mode whose encryption

process features the combining (“chaining”) of the plaintext blocks with the previous cipher

text blocks. The CBC mode requires an Initialization Vector (IV) to combine with the first

plaintext block. The IV need not be secret, but it must be unpredictable; also, the integrity of

the IV should be protected. The CBC mode is defined as follows:

CBC Encryption: Ci = CIPHK(P1 ⊕ IV);

Cj = CIPHK (Pj ⊕ Cj-1) for j= 2 … n.

CBC Decryption: P1= CIPH-1k(C1) ⊕ IV;

Pj= CIPH-1k (Cj) ⊕ Cj-1 for j= 2 … n.

Figure11. The CBC Mode

2.3.3 The Cipher Feedback Mode

The Cipher Feedback (CFB) mode is a confidentiality mode that features the feedback of

successive cipher text segments into the input blocks of the forward cipher to generate output

blocks that are exclusive-ORed with the plaintext to produce the cipher text, and vice versa.

The CFB mode requires an IV as the initial input block. The IV need not be secret, but it must be

16

unpredictable. The CFB mode also requires an integer parameter, denoted s, such that 1 ≤ s ≤ b.

In the specification of the CFB mode below, each plaintext segment (P#) and cipher text

segment (C#j) consists of s bits. The value of s is sometimes incorporated into the name of the

mode, e.g. the 1-bit CFB mode, the 8-bit CFB mode, the 64-bit CFB mode, or the 128-bit CFB

mode.

The CFB mode is defined as follows:

CFB Encryption: I1-= IV;

Ij= LSBb-s(Ij –1) | C#j-1 for j= 2 … n;

Oj= CIPHK (Ij) for j= 1, 2 … n;

C#j= P#

j ⊕ MSBs(Oj) for j= 1, 2 … n.

CFB Decryption: I1= IV;

Ij= LSBb-s(Ij -1)| C#j -1 for j= 2 … n;

Oj= CIPHK(Ij) for j= 1, 2 … n;

Pj= C#j ⊕ MSBs(Oj) for j= 1, 2 … n.

Figure12. The CFB Mode

17

2.3.4 The Output Feedback Mode

The Output Feedback (OFB) mode is a confidentiality mode that features the iteration of

the forward cipher on an IV to generate a sequence of output blocks that are exclusive-ORed

with the plaintext to produce the cipher text, and vice versa. The OFB mode requires that the IV

is a nonce, i.e., the IV must be unique for each execution of the mode under the given key; The

OFB mode is defined as follows:

OFB Encryption: I1-= IV;

Ij= Oj -1 for j= 2 … n;


Cj = Pj ⊕ Oj for j= 1, 2 … n-1

C#n= Pn ⊕ MSBu(On)

OFB Decryption: I1= IV;

Ij= Oj -1 for j= 2 … n;


Pj = Cj ⊕ Oj for j= 1, 2 … n-1

Pn# = C#

n ⊕ MSBu(On)

Figure13. The OFB Mode

18

2.3.5 The Counter Mode

The Counter (CTR) mode is a confidentiality mode that features the application of the

forward cipher to a set of input blocks, called counters, to produce a sequence of output blocks

that are exclusive-ORed with the plaintext to produce the cipher text, and vice versa. The

sequence of counters must have the property that each block in the sequence is different from

every other block. This condition is not restricted to a single message: across all of the

messages that are encrypted under the given key, all of the counters must be distinct. The

counters for a given message are denoted T1, T2, …,Tn.

Given a sequence of counters, T1, T2, …, Tn, the CTR mode is defined as follows:

CTR Encryption: Oj = CIPHK(Tj) for j= 1, 2 … n;

Cj = Pj ⊕ Oj for j= 1, 2 … n-1;

C*n = P*n ⊕ MSBu(On).

CTR Decryption: Oj= CIPHK(Tj) for j= 1, 2 … n;

Pj = Cj ⊕ Oj for j= 1, 2 … n-1;

P*n = C*n ⊕ MSBu(On).

Figure14. The CTR Mode

19

2.4 Principle of AES Parallelism

Figure15. Parallel AES Algorithm

In the traditional implementation of AES, the computation of data blocks is performed

serially, therefore, the efficiency and speed is poor. The entire plain text is divided into blocks

of fixed length which can be processed independently. Each block of plaintext is encrypted with

the same key as a unit and turned into cipher text block. Blocks of length 128-bits are formed

from the given plain text. The CBC mode of encryption supports a parallel architecture as

the individual blocks of plain text can be processed independently.

The decryption of the cipher text can also be done in a similar way. The sequential

algorithm can be modified to take advantage of the multiprocessing units. According to the

parallel computations paradigms, the independent parts of the algorithms must be

identified and then prepared to work in separate threads. Initially the AES algorithm is

divided into parallelizable and unparallelizable parts. The data input of the parallelization

process is the well optimized sequential AES algorithm.

20

3. Implementation and Results

Verilog HDL is used as the standard hardware description language because of the

flexibility to exchange among environments. The implementation code is pure Verilog code that

could easily be implemented on other devices, without changing the design. We have used

mainly three tools to implement the code – Notepad++, Questasim, Xilinx Synthesis and

Simulation Tools (ISE 14.7). The goal of design implementation is Speed Optimization keeping

other constraints as minimum as possible. We have implemented CBC Mode of AES Rijndael

Algorithm.

3.1 Tool Details

The editor used for writing the design codes is Notepad++. Questasim 10.0 is used for

debugging and optimizing the design codes and simulating. Xilinx ISE 14.7 is used for

synthesizing the design to the Zed (Zynq™ Evaluation and Development) Board. The code

implementation results are based on Questasim 10.0 simulation results.

Figure16 (i) Zed Board Device Specifications

21

Figure15 (ii) Zed Board Device Specifications

3.2 Encryption Module

The Encryption Process (encryption()) of AES is presented below. Top_PipelinedInt

module is the top module.

128

128

128

128

128

128

128

128

128

Pre Round

pre_round ()

Inner Round

Inner_round()

Last Round

last_round ()

Key Expansion

Key_expansion()

Cipher Text

Plain Text Key

128

22

Figure17. Encryption Module

Figure18. Top Interfacing Implemented Module - RTL View.

Figure19. Top Module AES Process – RTL View

23

3.2.1 Encryption Pre Round

Round Key is added to the state by a simple bitwise XOR expansion. Because of fully

parallel architecture output of this stage is registered.

3.2.2 Encryption Inner Rounds

There are 10 rounds as per 128-bit AES Algorithm. Every round includes 4 sub modules –

SubBytes(), ShiftRow transformations(), MixColoumns Transformations() and AddRoundkey().

Inner round includes 9 rounds; remaining 1 round is implemented as Last Round.

For implementing 10 rounds, if we instantiate each module 10 times, the overall area

requirement is increases 10 times. And implementing it with Zed Board (XC7Z020) resources

utilization exceeds by 100% for each Encryption and Decryption. To overcome this problem we

used the concept of reusing the same modules as many times they are required. We used state

diagram at the top level [middle_round ()], which uses the same module each times and

registered output is sent to next state as input. Because of this process, the IOB utilization is

reduced to 5% and Slices utilization to 43%

SubBytes Transformation [SubBytes_transformation()] is a non-linear substitution of

bytes which operates on each byte of the state using a substitution table (S-Box Table).

The S-Box table contains 256 numbers [0 to 255] and their corresponding values. These

values are stored in 256 * 8 ROMs which takes 8-bit input address

Figure20. S-Box Round ROM Implementation

ROM0

Add

Data

ROM255

Add

Data

ROM1

Add

Data

128 128

To Mix Column stage From

Pre_Round

stage

24

In shift_rows transformation() the last three rows of the state are cyclically shifted over

different numbers of bytes. The operation output is registered therefore using only one

4-input LUT and 1 slice.

The mixcolumn() transformation operation is based on Galois Field Multiplication This

operation is performed on the state column by column.

AddRoundKey() transformation is same as pre-round.

3.2.3 Encryption Last Round

The last round contains three operation namely SubBytes_transformation(), shift_rows

transformation(),AddRoundKey() transformation but mixcolumn() transformation

operation is excluded.

3.2.4 Key Expansion Module

The key expansion term is used to describe the operation of generating all round keys

from the original input round key. The initial round key will be the original key in case of

encryption and the last group of the generated key expansion keys in case of decryption.

Key Expansion module includes for sub modules rotate_word() Transformation, Sub-

Word() transformation, Round_constant XORing(), and key_round_Module().

o Rotate_word() Transformation(rot_word()): The function rot_word() takes a

word [a0,a1,a2,a3], performs a cyclic rotation and returns the word

[a1,a2,a3,a0].

o Sub_Word() Transformation(key_sbox()): It is same as SubBytes()

transformation module only the difference is that it processes bits.

o Round_Constatnt() Transformation(key_rcon()): Predefined round 32 bits

constants of GF are fixed for each round. A 4-bit round number and 32-bit

output of Sub_Word() transformation is taken as input . Values corresponding to

round key is fetch from ROM and xored with key_sbox().

o AddRound() Transformation(key_round()): Key_round is xored with previous

round keys and output of key_rcon().

25

3.3 Decryption Module

Figure21. Decryption Module

As the decryption is inverse of Encryption; the operations are performed in the inverse manner

of Encryption. The last round of the Encryption becomes the first round in Decryption Process

and the expanded key generated in KeyExpansion() is fed back instead of cipher key.

128

Key

128

128

128

128

128

128

128

Cipher Text

Plain Text

Key Expansion

Key_expansion()

Last Round

last_round ()

Pre Round

pre_round ()

Inner Round

Inner_round()

128

26

3.4 Design Implementation and Verification

3.4.1 ChipScop Virtual Input/ Output (VIO) core:

Design Synthesis implementation and Verification is done using ChipScope™ Pro

Virtual Input/Output (VIO) core.

The LogiCORE™ IP ChipScope™ Pro Virtual Input/Output (VIO) core is a

customizable core that can both monitor and drive internal FPGA signals in real time.

Two different kinds of inputs and two different kinds of outputs are available, both of

which are customizable in size to interface with the FPGA design. Communication with

the VIO core is conducted using a connection to the JTAG port via the ICON core.

Figure22. VIO Core Connection to ICON Core.

Four types of signals are available in the VIO core:

• Asynchronous inputs:

These are sampled using the JTAG clock signal that is driven from the JTAG cable. The

input values are read back periodically and displayed in the Analyzer.

• Synchronous inputs:

These are sampled using the design clock. The input values are read back periodically

and displayed in the Analyzer.

• Asynchronous outputs:

These are user-defined in the Analyzer and driven out of the core to the surrounding

design. A logical 1 or 0 value can be defined for individual asynchronous outputs.

•Synchronous outputs:

These are user-defined in the Analyzer, synchronized to the design clock and driven out

of the core to the surrounding design. A logical 1 or 0 can be defined for individual

synchronous outputs. Pulse trains of 16 clock cycles worth of ones and zeros can also be

defined for synchronous outputs.

27

3.5 Simulation:

Simulation results are based on test performed on QuestaSim 10.0. The

verification is done using Verilog HDL.

Test Case1 - Reset Case :

When reset is asserted (active high), all signals are assigned to zero.

Test Case2 – Encryption :

The AES Module inputs are driven for encryption and expected outputs are

obtained. All the sequences below are in Hexadecimal.

Input Text: 00112233445566778899aabbccddeeff

Cipher Key: 000102030405060708090a0b0c0d0e0f

Expected Cipher Text: 69c4e0d86a7b0430d8cdb78070b4c55a

Figure23. Encryption Process output Waveform

Test Case3 – Decryption :

The AES Module inputs are driven for decryption and expected outputs are

obtained. All the sequences below are in Hexadecimal.

Input Cipher Text: 69c4e0d86a7b0430d8cdb78070b4c55a

Cipher Key: 000102030405060708090a0b0c0d0e0f

Expected Plain Text: 00112233445566778899aabbccddeeff

28

Figure 24. Decryption Process output Waveform

3.6 Synthesis Report:

Overall implementation of parallel AES Algorithm used resources as shown below:

Figure 25. Resource Utilization Summary

Timing Parameters:

Speed Grade: -1

Minimum period: 4.171ns (Maximum Frequency: 239.733MHz)

Minimum input arrival time before clock: 3.719ns

Maximum output required time after clock: 1.219ns

Maximum combinational path delay: 1.261ns

29

4 Conclusion and Future Scope

The parallel design of the AES encryption algorithm reduces the delay associated with

each round of encryption, which allows the hardware to operate at a much higher clock

frequencies, compared to a non-pipelined non parallel design. This increases the message

encryption throughput and makes the hardware model suitable for time critical encryption

applications. In addition, the hardware implementation of AES encryption algorithm provides

ultimate secrecy of the encryption key, much faster speed compared to software

implementation, and higher throughput by means of inherent hardware concurrency.

The design has been optimized on the time required to generate keys and decoding the

data. The design is implemented on Zed Board achieving clock frequency of 239.733MHz and

minimum clock period as 4.171ns.

The work has been extended in order to increase the security for more severe attacks

since the encryption time has been reduced. There has been further scope to optimize the

utilization of resources. The implementation can be further improved to achieve more efficient

usage of the resources and increase the maximum clock frequency. The key length can be

reduced, maintaining the same security, in order to optimize the resource utilization. The few

gaps have been covered but still a lot can be done to achieve the security of data along with the

optimization of resources.

30

References

1. csrc.nist.gov_publications_fips_fips197_fips-197 National Institute of Standards and

Technology, Advanced Encryption Standard, Federal Information Processing

Standards 197, November 2001.

2. M.Natheera Banu, “FPGA Based Hardware Implementation of Encryption

Algorithm”, International Journal of Engineering and Advanced Technology (IJEAT)

ISSN: 2249 – 8958, Volume-3, Issue-4, April 2014.

3. M.Pitchaiah, Philemon Daniel, Praveen, “Implementation of Advanced Encryption

Standard Algorithm”, International Journal of Scientific & Engineering Research

Volume 3, Issue 3, March -2012 ISSN 2229-5518.

4. Mahesh Walunjkar, Md. Manan Mujahid, Syed Anwar Ahmed, Ashish Jadhav, “An

AES-Core Development by Using Verilog”, IJIRCCE, ISSN (Online): 2320-9801 Vol. 1,

Issue 8, October 2013.

5. Deguang Le, Jinyi Chang, Xingdou Gou, Ankang Zhang, Conglan Lu, “Parallel AES

Algorithm for Fast Data Encryption on GPU”, IEEE journal on AES 2010.

6. Xilinx User Guides on Zed Board and chipscope_vio.

7. Wikipedia, the free encyclopedia.