125
DESIGN OF A POWER AWARE ENCRYPTION ACCELERATOR A Project Presented to the faculty of the Department of Electrical and Electronic Engineering California State University, Sacramento Submitted in partial satisfaction of the requirements for the degree of MASTER OF SCIENCE in Electrical and Electronic Engineering by Muhammad Haider Pervaiz SPRING 2016

DESIGN OF A POWER AWARE ENCRYPTION ACCELERATOR A …

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: DESIGN OF A POWER AWARE ENCRYPTION ACCELERATOR A …

DESIGN OF A POWER AWARE ENCRYPTION ACCELERATOR

A Project

Presented to the faculty of the Department of Electrical and Electronic Engineering

California State University, Sacramento

Submitted in partial satisfaction of

the requirements for the degree of

MASTER OF SCIENCE

in

Electrical and Electronic Engineering

by

Muhammad Haider Pervaiz

SPRING

2016

Page 2: DESIGN OF A POWER AWARE ENCRYPTION ACCELERATOR A …

ii

DESIGN OF A POWER AWARE ENCRYPTION ACCELERATOR

A Project

by

Muhammad Haider Pervaiz

Approved by:

__________________________________, Committee Chair

Dr. Behnam Arad

__________________________________, Second Reader

Dr. Ted Krovetz

__________________________

Date

Page 3: DESIGN OF A POWER AWARE ENCRYPTION ACCELERATOR A …

iii

Student: Muhammad Haider Pervaiz

I certify that this student has met the requirements for format contained in the University format

manual, and that this project is suitable for shelving in the Library and credit is to be awarded for

the project.

__________________________, Graduate Coordinator ___________________ Dr. Preetham Kumar Date

Department of Electrical and Electronic Engineering

Page 4: DESIGN OF A POWER AWARE ENCRYPTION ACCELERATOR A …

iv

Abstract

of

DESIGN OF A POWER AWARE ENCRYPTION ACCELERATOR

by

Muhammad Haider Pervaiz

Today we live in an age where data security in digital communication has become an

important requirement. The need for privacy and protection of data has made major companies take

appropriate actions like recent addition of end-to-end encryption by WhatsApp for its over billion

users. There are both software and hardware approaches to encrypt messages with former being

more flexible but less efficient than latter. Moreover, hardware solutions are most advisable for

portable devices [1]. In this project, a hardware accelerator for AEGIS128L encryption algorithm

is presented for mobile devices. The accelerator was designed considering power efficiency as one

of the primary goals, since mobile devices operate on battery supply. Different power saving

techniques like parallel design, clock gating, power gating and multi-threshold voltage cells were

used to achieve this goal. Other important factors considered were speed and area.

The project encompassed power aware hardware implementation of AEGIS128L, modeling

it in SystemVerilog hardware description language (HDL), verifying it using a layered test bench,

synthesizing it using 90nm cell library and finally performing power estimation. In the power

estimation, we used the gate-level netlist generated during synthesis and the switching activity of

the netlist during simulation to get an accurate estimation of power usage. Synopsys Electronic

Page 5: DESIGN OF A POWER AWARE ENCRYPTION ACCELERATOR A …

v

Design Automation (EDA) tools like VCS simulator, Design Compiler synthesis, and Power

Compiler tools were used in this work.

Power consumption of the proposed design improved considerably throughout the project

phases. The proposed design required 7.6% less power compared to a non-power aware design in

the normal operating mode. The power saving during the sleep mode was 68%. The design supports

data rate of 1.6 Gigabytes per second.

_______________________, Committee Chair

Dr. Behnam Arad

_______________________

Date

Page 6: DESIGN OF A POWER AWARE ENCRYPTION ACCELERATOR A …

vi

ACKNOWLEDGEMENTS

I begin with thanking Almighty Allah who gave me the strength, courage, ability and

excellent support in the form of family, teachers, friends and colleagues that helped me succeed.

I am thankful to my parents for their endless support and encouragement throughout. I

would like to thank my brother and sister for sharing their thoughts whenever I needed them. I also

would like to thank my uncle for guiding me about the university procedures.

I am grateful to Prof. Arad for giving me an opportunity to work with him and for being an

excellent teacher and mentor. I thank Prof. Krovetz for accepting to be my second reader and giving

valued input throughout the project. I also thank Prof. Kumar and Prof. Faroughi for guiding me.

I would like to thank my colleagues Tasnif, Hardik, Rashid, Tejas, Dhaval and Zeeshan in

helping me understand the power aware design flows. I would also like to thank my friends who

accompanied me in my leisure time and helped me in relaxing. Special thanks to Muzaffar for being

a good friend and support.

Lastly, I would like to thank CSU, Sacramento, Electrical and Electronic Engineering

Department and Synopsys for providing me the facilities and required tools to complete this work.

Page 7: DESIGN OF A POWER AWARE ENCRYPTION ACCELERATOR A …

vii

TABLE OF CONTENTS

Page

Acknowledgements ......................................................................................................................... vi

List of Tables .................................................................................................................................. xi

List of Figures ................................................................................................................................ xii

Chapter

1. INTRODUCTION ....................................................................................................................... 1

1.1 Overview ............................................................................................................................... 1

1.2 Power Aware Design Flow for AEGIS128L ......................................................................... 4

1.3 Report Structure ..................................................................................................................... 6

2. THE AEGIS128L ENCRYPTION ALGORITHM ..................................................................... 9

2.1 AEGIS Family ....................................................................................................................... 9

2.2 AES128 VS AEGIS128L .................................................................................................... 10

2.3 AEGIS128L State Update Function .................................................................................... 11

2.3.1 AES Round Function ................................................................................................... 11

2.4 AEGIS128L Stages.............................................................................................................. 13

2.4.1 Initialization Stage........................................................................................................ 13

2.4.2 Processing the Authenticated Data ............................................................................... 14

2.4.3 Encryption Stage .......................................................................................................... 15

2.4.4 Finalization Stage ......................................................................................................... 15

2.5 AEGIS128L Usage Recommendations ............................................................................... 16

2.6 Decryption .......................................................................................................................... 16

3. ARCHITECTURAL DESIGN OF AEGIS128L .......................................................... 17

Page 8: DESIGN OF A POWER AWARE ENCRYPTION ACCELERATOR A …

viii

3.1 Introduction to Hardware Design of AEGIS128L ............................................................... 17

3.2 High Level Design of AEGIS128L ..................................................................................... 19

3.2.1 Block Level Design ...................................................................................................... 19

3.2.2 Encryption Cycle .......................................................................................................... 20

3.3 Parallel Design Over Pipeline .............................................................................................. 20

3.4 Power Domains ................................................................................................................... 22

3.5 Clock Gating Map ............................................................................................................... 23

3.6 SOC Level Power Saving Awareness ................................................................................. 24

3.6.1 Power Gating at the SOC Level ................................................................................... 25

3.6.2 Clock Frequency Division at the SOC Level ............................................................... 25

4. MODELING OF AEGIS128L IN SYSTEMVERILOG............................................................ 26

4.1 SystemVerilog Features Used ............................................................................................. 26

4.1.1 Package ........................................................................................................................ 26

4.1.2 Interface ........................................................................................................................ 26

4.1.3 User Defined Types ...................................................................................................... 26

4.1.4 Functions ...................................................................................................................... 27

4.1.5 Enhanced Blocks .......................................................................................................... 27

4.2 ENCRYPTION MODULE .................................................................................................. 27

4.2.1 Overview ...................................................................................................................... 27

4.2.2 Initialization Sub-Module ............................................................................................ 29

4.2.3 Controller Sub-Module ................................................................................................ 30

4.2.4 Datapath Sub-Module .................................................................................................. 31

4.3 FIFO RAM Module ............................................................................................................. 34

Page 9: DESIGN OF A POWER AWARE ENCRYPTION ACCELERATOR A …

ix

4.4 FIFO Controller Module ...................................................................................................... 34

5. AEGIS128L VERIFICATION .................................................................................................. 37

5.1 Overview ............................................................................................................................. 37

5.2 AEGIS128L Verification Framework ................................................................................. 37

5.2.1 Connection with DUT .................................................................................................. 38

5.2.2 Inter-process communication ....................................................................................... 39

5.2.3 Program block .............................................................................................................. 39

5.2.4 Validation ..................................................................................................................... 40

5.2.5 Coverage ...................................................................................................................... 40

5.3 Layered Testbench ............................................................................................................... 42

5.3.1 Class Transactor ........................................................................................................... 42

5.3.2 Class Generator ............................................................................................................ 42

5.3.3 Class Agent .................................................................................................................. 43

5.3.4 Class Driver .................................................................................................................. 43

5.3.5 Class Scoreboard .......................................................................................................... 43

5.3.6 Class Monitor ............................................................................................................... 43

5.3.7 Class Checker ............................................................................................................... 44

5.4 Regular Test Bench for Saif ................................................................................................ 44

5.4.1 Scenarios ...................................................................................................................... 44

5.4.2 SAIF File Generation ................................................................................................... 44

5.4.3 Gate Level Simulation .................................................................................................. 46

5.5 Multi-Voltage Aware Simulation ........................................................................................ 49

6. AEGIS128L SYNTHESIS ......................................................................................................... 50

Page 10: DESIGN OF A POWER AWARE ENCRYPTION ACCELERATOR A …

x

6.1 Synthesis Script ................................................................................................................... 51

6.2 UPF Script ........................................................................................................................... 51

6.3 Checks ................................................................................................................................. 52

6.4 Clock Rate ........................................................................................................................... 52

7. POWER ESTIMATION AND ANALYSIS .............................................................................. 53

7.1 Power Estimation ................................................................................................................. 53

7.1.1 Power Types ................................................................................................................. 53

7.1.2 Calculating Power ........................................................................................................ 54

7.1.3 Report_power ............................................................................................................... 55

7.1.4 Manual Setting ............................................................................................................. 56

7.1.5 Using SAIF ................................................................................................................... 56

7.2 Power Analysis .................................................................................................................... 57

8. CONCLUSION .......................................................................................................................... 60

Appendix A. AEGIS128L HARDWARE MODEL SOURCE FILES .......................................... 62

Appendix B. AEGIS128L VERIFICATION ENVIRONMENT ................................................... 85

Appendix C. SYNTHESIS, UPF AND PERL SCRIPTS ............................................................ 101

Appendix D. POWER AND SIMULATION RESULTS ............................................................ 105

References .................................................................................................................................... 111

Page 11: DESIGN OF A POWER AWARE ENCRYPTION ACCELERATOR A …

xi

LIST OF TABLES

Tables Page

1. Basic Specifications of AEGIS128L ............................................................................. 9

2. Comparison of AEGIS family’s algorithms , based on [4] .......................................... 10

3. IO description .............................................................................................................. 18

4. Clock cycle behavior ................................................................................................... 20

5. Total cycles one message takes for encryption ............................................................ 28

6. Encryption system states .............................................................................................. 31

7. Functional Coverage Commands, based on [10] ......................................................... 41

8. Modes of operation ...................................................................................................... 45

9. SAIF generation commands [17] ................................................................................. 46

10. Removing timing information from gate-level cell [16] .............................................. 48

11. Power Improvements ................................................................................................... 58

Page 12: DESIGN OF A POWER AWARE ENCRYPTION ACCELERATOR A …

xii

LIST OF FIGURES

Figures Page

1. Power aware design flow for AEGIS128L, based on [4] ................................................... 4

2. AEGIS128L state update function, based on [4] .............................................................. 11

3. AES round function .......................................................................................................... 12

4. AEGIS128L stages ........................................................................................................... 13

5. AEGIS128L Initialization stage summary [4] .................................................................. 14

6. Black box of AEGIS128L ................................................................................................. 17

7. Block level diagram .......................................................................................................... 19

8. Pipelined design limitations, no buffers shown ................................................................ 21

9. Parallel architecture, buffers not shown ............................................................................ 22

10. AEGIS128L Power Domains Summarized by Power Compiler ...................................... 23

11. Clock gated domains ......................................................................................................... 24

12. Initialization block ............................................................................................................ 30

13. Block diagram of datapath module ................................................................................... 32

14. Block diagram of state update function ............................................................................ 33

15. FIFO controller ................................................................................................................. 35

16. Mealy Finite State Machine for FIFO controller .............................................................. 36

17. AEGIS128L verification framework, based on [13] ......................................................... 38

18. Connection of test-bench with DUT, based on [13] ......................................................... 39

19. Functional Coverage results for AEGIS128L ................................................................... 42

20. VCS NLP Flow [15] ......................................................................................................... 49

21. Synthesis Process, based on [18] ...................................................................................... 50

Page 13: DESIGN OF A POWER AWARE ENCRYPTION ACCELERATOR A …

xiii

22. SAIF file snippet ............................................................................................................... 57

Page 14: DESIGN OF A POWER AWARE ENCRYPTION ACCELERATOR A …

1

CHAPTER 1: INTRODUCTION

1.1 Overview:

50 years of Moore’s Law has helped semiconductor industry grow at a phenomenal rate.

In 1971 Intel’s first processor (Intel 4004) had a transistor count of 2300. Today, its latest processor

Xeon Broadwell-E5 has 7.2 Billion transistors [2]. On one hand, such a huge transistor count

provides high computational performance and offers huge storage capabilities but on the other

hand, it presents challenges like high power consumption and data security. With Moore’s Law

driven devices changing our daily lifestyle with breakthroughs in modern cities, transportation,

healthcare, education [3] and more, it becomes extremely important to protect the user data from

falling into the hands of phishing attackers, hackers and other un-trusted parties. In addition, it is

worth noting that since 2013, more Tablets have been sold than Laptops and well over 4.7 Billion

mobile phone users exist today worldwide [5]. Clearly, with the increase of mobile network

devices, to provide power efficient solutions for security has become equally important.

One of the most effective ways to protect the data is by encryption. Encryption can be

implemented in software or hardware. The hardware implementation is faster and more secure

compared to the software implementation. It is also a more suitable option for portable devices [1]

and thus used in this project. Message protection typically requires protection of confidentiality

and authenticity. Both of these requirements can be treated either separately or in an integrated

fashion by an encryption algorithm. The advantage of latter is that it is more efficient as it saves

computation cost [4] and becomes an obvious choice for an encryption algorithm to be used in the

mobile devices. Furthermore, there are at least three ways in which integrated encryption algorithm

Page 15: DESIGN OF A POWER AWARE ENCRYPTION ACCELERATOR A …

2

can be developed; the first one is by using block cipher in special mode, the second is by using

stream cipher with key stream divided into two parts (one for encryption and other for

authentication), and the third is by designing a dedicated authenticated encryption algorithm [4].

In this project, the third type of integrated algorithm named AEGIS128L was used because of the

following reasons:-

1. It is very fast and can support modern day 4G network speed demands. It is 8 times faster

than AES in CBC mode [4].

2. It is suitable for parallel hardware implementation, as parallel AES round functions are

used at each step [4] and so can be operated on low frequency to save power.

3. It has less computation cost and so is relatively better option than other algorithms for

power aware hardware design.

4. Encryption and decryption share same algorithm [4] and so same hardware as well.

5. Authentication is achieved almost for free [4].

6. It does not need to encrypt packet header and so is suitable for network communication

[4].

7. It is a symmetric system and uses 128-bit key, which offers high security.

8. It is relatively new algorithm.

9. It provides robust security if following three conditions are met [4]:-

a. Nonce is not re-used

b. 128 bit authentication tag is used

c. Forgery attack is not successful by repeating the attack.

Today network communication is based on either Internet Protocol Version 4 (IPv4) or

Page 16: DESIGN OF A POWER AWARE ENCRYPTION ACCELERATOR A …

3

Version 6 (IPv6). A packet in IPv6 has 20 bytes header size and can have up to 65536 bytes in

payload. The implementation of AEGIS128L presented in this project was modeled for mobile

network devices and supports the IPv6’s packet size. More details of the algorithm are covered in

the next chapter.

As described above that in the hardware design for the portable devices, the emphasis on

low power design is inevitable. Thus in this project at every step decisions were made by keeping

power consumption as the most important factor. From selection of the algorithm to architectural

design to HDL modeling and synthesis, all the power saving techniques in Front End stage of the

Application Specific Integrated Circuits (ASIC) design flow were observed. These power saving

methods considered in this project are discussed in detail in [6] and are enlisted below:-

1. Selection of an algorithm that requires lesser number of transitions.

2. Architectural design with parallelism

3. In design modeling:-

a. Controlled counters

b. Gray coded state machines

c. Resource sharing

d. Avoiding glitches

4. Clock gating

5. Frequency division

6. Multi Voltage (considered but not used)

7. High threshold library cells

8. Power gating

Page 17: DESIGN OF A POWER AWARE ENCRYPTION ACCELERATOR A …

4

1.2 Power Aware Design Flow for Aegis128L:

Figure 1 shows an overview of the power aware design flow followed in this project. It

was inspired from [4]. Problem statement describes the high-level goal of the project and

Figure 1 – Power aware design flow for AEGIS128L, based on [4]

define the specifications to be met. The objective of this project was to design an accelerator for an

encryption algorithm for the mobile devices. Since the mobile devices are portable, the natural

requirement of power efficiency became a primary specification. Similarly, speed and area factors

Page 18: DESIGN OF A POWER AWARE ENCRYPTION ACCELERATOR A …

5

were also critical criteria for portable network devices. Furthermore, the encryption itself had to be

robust. We selected an algorithm to fulfill these requirements.

Next step was to design a high-level hardware model for the selected algorithm such that

power consumption can be as low as possible. RTL modeling was the next step in which the design

was coded in System Verilog to realize the devised architecture for the encryption algorithm in

hardware. RTL modeling style was kept power aware. Once the power aware RTL source code was

ready, the clock gating enable signals were added in the RTL code. The synthesis tool added the

clocking gating cells for the clock gating enable signals in the RTL. This type of clock gating is

called fine grain clock gating.

In the synthesis phase, the power aware technology cell library was provided to the

synthesis tool. The library supported clock gating and multi-threshold voltage level (Vt) cells. For

this project, multi Vt cell library was provided to the synthesis tool and it chose High Vt cells

according to the timing constraints of different paths. High Vt cells provide low leakage but are

slower and so were placed in the non-critical timing paths automatically.

The synthesis tool was also provided with the Unified Power Format (UPF) file. The UPF

file used in this project meets the IEEE 1801-2009 standard specifications. The UPF file describes

the power intent of the design to the synthesis tool. The power gated netlist helps in turning off

power on certain portions of the design, which are in sleep mode. This saved both leakage and

dynamic power spent on idle domains.

Page 19: DESIGN OF A POWER AWARE ENCRYPTION ACCELERATOR A …

6

Once we had the gate-level netlist, it was important to verify that everything synthesized

as intended. For this purpose, either formal verification tool can be run or gate-level simulation can

be verified. We opted for the latter as it also helped in generating accurate switching activity files

that were used to estimate power.

1.3 Report Structure:

Following lays out the structuring of this report:-

Chapter 1: This chapter introduces the project, describes its importance and goals. It talks

about the rational of the selection of the encryption algorithm and about the power aware design

methodology.

Chapter 2: This chapter summarizes the AEGIS128L encryption algorithm. It starts with

describing the rational of choosing AES128L over its earlier versions AEGIS256 and AEGIS128.

Then it gives the basic introduction of the AEGIS128L algorithm and moves on to explain how it

works.

Chapter 3: This chapter describes the architectural design of AEGIS128L. It explains why

parallel architecture is preferred over pipelined architecture. It gives a high-level view of the design

of AEGIS128L. All modules, their interconnections and controls for clock and power gating are

also explained.

Page 20: DESIGN OF A POWER AWARE ENCRYPTION ACCELERATOR A …

7

Chapter 4: This chapter provides details of the hardware modeling of the algorithm using

SystemVerilog HDL. It gives details of each module used and discusses how power aware design

is incorporated in the RTL code.

Chapter 5: This chapter contains information about the verification done using program

based layered test bench and SystemVerilog Object Oriented Programming concepts. Functional

coverage was used to get a sense of how much design was exercised. Apart from the functional

verification, this chapter also explains the regular test benches used on synthesized gate level netlist

to generate switching activity SAIF files for different scenarios to get accurate Front End power

estimation discussed later in Chapter 7.

Chapter 6: This chapter explains the synthesis script and the UPF script used in the power

aware synthesis process. Synopsys Design Compiler coupled with Power Compiler was the tool

used for the synthesis process. 90nm technology library’s regular, clock gating and multi-threshold

voltage cells were used for the synthesis. The power intent was formulated according to IEEE 1801-

2009 Unified Power Format standard.

Chapter 7: This chapter discusses the use of SAIF files generated in Chapter 5. It talks

about the Front End power estimation methods and presents the power results analysis. Step by

step improvement in power consumption is shown for different modes of operations like sleep,

normal and overdrive mode.

Chapter 8: Includes the conclusion describing the summary of the work, results obtained

and future work possibilities.

Page 21: DESIGN OF A POWER AWARE ENCRYPTION ACCELERATOR A …

8

Appendices: Appendix A contains the AEGIS128L model source files. Appendix B

contains the AEGIS128L verification files. Appendix C contains the synthesis, UPF and Perl scripts

used in the project. Appendix D contains the power estimation reports along with other simulation

result.

Page 22: DESIGN OF A POWER AWARE ENCRYPTION ACCELERATOR A …

9

CHAPTER 2: AEGIS128L ENCRYPTION ALGORITHM

In this chapter, AEGIS128L encryption algorithm is described. AEGIS128L is a symmetric

cipher which is designed by Hongjun Wu1 and Bart Preneel2. It is a dedicated authenticated

encryption algorithm, which means that the encryption of the message and production of the

authentication tag are integrated in the same algorithm. This feature of AEGIS128L makes it stand

out as an obvious choice for fast speed hardware implementation as it requires less computation.

Table 1 shows some basic specifications of the algorithm.

Table 1 – Basic Specifications of AEGIS128L

Key Size 128 bits

Message Block Size 256 bits

Nonce Size 128 bits

Authentication Tag Size 128 bits

2.1 AEGIS Family:

AEGIS family has three version of the algorithm, AEGIS128L, AEGIS128 and AEGIS256.

From the hardware design perspective, area, power, speed and security are four main criteria that

were analyzed and lead to the selection of AEGIS128L. A brief summary of the analysis is shown

in Table 2.

---------------------------- 1 Division of Mathematical Sciences, Nanyang Technological University, 50 Nanyang Ave, Singapore

639798 Email: [email protected] 2 Dept. Elektrotechniek-ESAT/COSIC, KU Leuven and iMinds, Ghent

Email: [email protected]

Page 23: DESIGN OF A POWER AWARE ENCRYPTION ACCELERATOR A …

10

Table 2 – Comparison of AEGIS family’s algorithms, based on [4]

Criteria 128 128L 256

AREA Least High Middle

POWER Medium Best Worst

SPEED In middle Fastest Slowest

SECURE Good

Key size: 128 bits

Good

Key size: 128 bits

Most but more than

required.

Key size: 256 bits

Leakage power is higher in AEGIS128L due to higher gate count but dynamic power will

be lower because AEGIS128L can do same job in same amount of time like other two versions but

for a lower clock frequency. Moreover, leakage power can be controlled by using power gating

when system is in sleep mode. So overall, AEGIS128L offers better power efficiency.

2.2 AES128 VS AEGIS128L:

AEGIS128L uses AES round function block as a primitive. The main difference between

AES128 and AEGIS128L encryption algorithms is that AEGIS128L provides good security in

fewer rounds per message block because of its overall mathematical scheme. AES128 encryption

algorithm needs ten rounds while AEGIS encryption algorithm uses only one per message block

(this is excluding the initialization and post encryption rounds per message) [4]. With all the stages

considered, it takes four rounds. Therefore, AEGIS is much faster than AES for the same clock rate

and thus consumes less power.

On the latest Intel Haswell microprocessors, the speed of AEGIS-128L is more than twice

that of AES-GCM [4]. Other features of AEGIS128L were already discussed in Chapter 1.

Page 24: DESIGN OF A POWER AWARE ENCRYPTION ACCELERATOR A …

11

2.3 AEGIS128L State Update Function:

AEGIS128L uses a 128-byte state. State is used to store the randomized data that is

produced by the algorithm. The state update function updates the 128-byte state Si with two 16-

byte message blocks ma and mb to produce Si+1 [4]. It consists of eight AES round function blocks

(not the last round). Figure 2 demonstrates how Si+1 (next state) is generated from Si (present state).

In the figure, ‘R’ represents the AES round function and ‘w’ is the temporary 16 byte word.

Figure 2 – AEGIS128L state update function, based on [4]

2.3.1 AES Round Function:

AES round function is a primitive that is used by different encryption algorithms. It is based

on substitution-permutation network and was designed by Joan Daemen and Vincent Rijmen [7].

Since it is commonly available, we have leveraged its System Verilog code from [11]. Figure 3

shows how the AES round function for the two 128-bits inputs operates.

Page 25: DESIGN OF A POWER AWARE ENCRYPTION ACCELERATOR A …

12

Figure 3 – AES round function

Page 26: DESIGN OF A POWER AWARE ENCRYPTION ACCELERATOR A …

13

The AES round function for 128-bits consists of four steps. SubBytes is a non-linear

substitution step where each byte is replaced with another according to a lookup table. ShiftRows

is a transposition step where the last three rows of the state are shifted cyclically a certain number

of steps. MixColumns is a mixing operation, which operates on the columns of the state, combining

the four bytes in each column. AddRoundKey does the bitwise XOR operation of the result of the

above three steps with the other key input into the round function. This summary is taken from [7].

2.4 AEGIS128L Stages:

AEGIS128L has four stages, which are depicted in Figure 4. Now let us understand how

each of the state is functioning.

Figure 4 – AEGIS128L stages

2.4.1 Initialization Stage:

In this stage for a new message, key and nonce are loaded into the state and cipher runs for

10 steps using key and nonce as input messages. Figure 5 which is based on [4], shows summary

of this stage.

Page 27: DESIGN OF A POWER AWARE ENCRYPTION ACCELERATOR A …

14

Figure 5 – AEGIS128L Initialization stage summary [4]

The abbreviations used in Figure 5 are described below:-

K128: 128 bit Key

IV128: 128 bit Nonce

Const0: First 16 bytes of constant

Const1: Last 16 bytes of constant

Constant: It is a 32-byte constant which is represented in hexadecimal as follows:-

Constant =

000101020305080d1522375990e979_62db3d18556dc22ff12011314273b528dd

Operation: Operation used in Figure 4 is bitwise XOR.

2.4.2 Processing the Authenticated Data:

Once initialization is done, the associated data updates the state. Following equation is used

‘x’ number of times where ‘x is the length of the associated data in bits divided by 256 [4].

Si+1 = Stateupdate128L (Si, Associated_datai, Associated_datai+1) [4]

Page 28: DESIGN OF A POWER AWARE ENCRYPTION ACCELERATOR A …

15

It is important to note here that associated data is not encrypted but instead is used to update

the state. Associated data can be send without undergoing encryption.

2.4.3 Encryption Stage:

Once initialization for a message is completed, cipher text is generated per one-step for

each block of that message. Message block is mixed with the state using the state update function

as follows:-

Ci = Pi Si,1 Si,6 (Si,2 & Si,3) [4]

Ci+1 = Pi+1 Si,2 Si,5 (Si,6 & Si,7) [4]

Si+1 = Stateupdate (Si,Pi,Pi+1) [4]

where Pi and Pi+1 are 128-bit message blocks. Ci and Ci+1 are respective ciphered blocks.

2.4.4 Finalization Stage:

In the finalization stage, seven steps are used to generate the authentication tag. Following

equations from [4] summarize this stage.

tmp = Si (64-bit assoc. data length concatenated with 64-bit message length)

For i = 0 to 6, states are updated as following:-

Si+1 = Stateupdate128L(Si,tmp,tmp)

Once the seven steps are completed, authentication tag is generated using following equation:-

Tag = 6i=0 S6,i (bitwise xor operation of the eight 128 bits of the state)

Page 29: DESIGN OF A POWER AWARE ENCRYPTION ACCELERATOR A …

16

2.5 AEGIS128L Usage Recommendations:

For robust security, AEGIS128L must be used such that the Nonce is not re-used.

Moreover, 128-bit authentication tag should be used. It is also worth mentioning here that

AEGIS28L can support up to 264 message blocks. However, since IPv6 requirement is only 216

octets of payload, the implementation in this project only supports up to 216 bytes of data payload

per message. For the fast data rate, it is recommended that the System on Chip (SOC) should only

send 64KB size messages. More detail on this is covered in Chapter 4.

2.6 Decryption:

In this project, we did not perform the decryption process for AEGIS128L. In order to

perform the decryption of AEGIS128L, exact values of the key size, the nonce size and the tag size

should be known [4]. The process of decryption is similar to the encryption. Firstly, the

initialization and associated data stages should be applied. Then state-update function should be

used to generate the plaintext from the cypher-text. Finally, the finalization stage should be applied.

Therefore, the same hardware that encrypts the plaintext can be used for the decrypting the cypher-

text.

Page 30: DESIGN OF A POWER AWARE ENCRYPTION ACCELERATOR A …

17

CHAPTER 3: ARCHITECTURAL DESIGN OF AEGIS128L

In this chapter, a hardware accelerator for AEGIS128L encryption algorithm is proposed.

The accelerator is designed to support the speed and the power efficiency required by the mobile

network devices today.

3.1 Introduction to Hardware Design of AEGIS128L:

The black box diagram of AEGIS128L is shown in Figure 6. The description of each port

can be found in Table 3. Two 128-bit input ports are re-used for different data at different stages of

the encryption process cycle. This re-use saves more than 255 extra pins without effecting system’s

performance. Input pin ‘Start’ begins the process of encryption and can only be asserted when

system’s output pin ‘ready’ is asserted. More detail on this is provided in later sections.

Figure 6 – Black box of AEGIS128L

Page 31: DESIGN OF A POWER AWARE ENCRYPTION ACCELERATOR A …

18

Table 3 – IO description

Port

Name

Input

or

Output

Description

start Input This signal starts the encryption process. It should only become

active high when ready output is active high. It stays ON until all

of the message blocks are received and then it turns off.

FINon Input This signal indicates that the finalization stage should start now.

It indicates that the message has ended now.

inbus128_1 Input

128-bit

bus

This 128-bit input bus is used to input KEY in the first two cycles

after the start. Then it carries associated data for two cycles.

After that it has message block and in the finalization stage, it

has zero value.

inbus128_2 Input

128-bit

bus

This 128-bit input bus is used to input NONCE in the first two

cycles after the start. Then it carries associated data for two

cycles. After that, it has message block and in the finalization

stage, it has zero value.

Rst Input This is asynchronous reset input of the system.

Clk Input This is the clock input of the system. Note that the outer network

layer of the SOC drives this clk input and so depending upon

regular or overdrive mode drives this clock, different clock rate

clocks can be used.

Full Output This output tells that the FIFO is full and cannot store anymore.

ready Output Ready tells the outer system about whether system can take

another message to encrypt or not.

tagout Output This output tells that the authentication tag is present in the data

output port.

done Output This indicates that the encryption of message is completed.

dataout128_1 Output

128-bit

bus

This output bus gives the encrypted data and the authentication

tag.

dataout128_2 Output

128 bit

bus

This output bus gives the encrypted data.

Page 32: DESIGN OF A POWER AWARE ENCRYPTION ACCELERATOR A …

19

3.2 High Level Design of AEGIS128L:

AEGIS128L hardware design consists of three main blocks. The FIFO, the FIFO controller

and the encryption block. FIFO is the abbreviation of first in first out. Encryption block is further

divided into three sub-blocks, which are discussed in the next chapter.

3.2.1 Block Level Design:

In the AEGIS128L hardware design, the FIFO controller manages the flow of the data into

the system. It generates the control signals for both the FIFO_RAM and the Encryption block.

Some of the clock and power gating control signals are also generated from the FIFO controller.

The detail of the high-level architecture can be found in the next section. Figure 7 shows the block

level diagram for AEGIS128L.

Figure 7 – Block level diagram

Page 33: DESIGN OF A POWER AWARE ENCRYPTION ACCELERATOR A …

20

3.2.2 Encryption Cycle:

Now let us understand how the encryption cycle for this implementation works. For that

purpose, we are using a clock cycle table that shows the behavior of the design by capturing the

highlights of the design for each clock cycle. For the message size of 256 to 524288 bits, the clock

cycle behavior of the design can be found in Table 4.

Table 4 – Clock cycle behavior

Cycle Input Output Behavior of the design

0 Start=0 Ready=1 System is ready for encryption of new message

1-2 Start=1 System starts, Key and Nonce enter the system

3-4 Associated data enter the system

5-x Message blocks enter the system, ‘x’ depends on

message size

x+1 FINon=1 Message ended, begin finalization step

x+2 Start=0

FINon=0

Cycle ends

x+3 to

till

ready=1

Next message cannot arrive until ready becomes

one.

Ready=1 Next message can come now, process repeats

3.3 Parallel Design over Pipeline:

AEGIS128L algorithm is designed in such a way that it supports parallel data processing

more than structural or pipelined data processing. AEGIS128L uses eight parallel AES round

Page 34: DESIGN OF A POWER AWARE ENCRYPTION ACCELERATOR A …

21

function blocks to update the state that makes pipelining very costly. To understand this, Figure 8

shows the insight of the matter. Note that the buffers for the pipeline are not shown in the diagram.

As you can see in the figure that when three pipeline stages are added, it is incapable of

increasing the throughput because a non-linear pipeline is formed. A non-linear pipeline has a

feedback path and in this case, new output can only arrive every third clock cycle. This is because

present output is required to generate the next output. Hence, this pipeline design will only increase

the clock rate but not the execution time or the throughput. We can use Carry-look ahead adder like

concept here to speed up the process but that will take lot of logic. For a mobile device especially,

the real estate is of big concern.

A more efficient solution is a parallel architecture, as it can improve the execution time

without the need of increasing clock rate. Thus, this kind of architecture should be more power

efficient, and able to meet speed requirement, and use moderate real estate. Figure 9 shows the

parallel architecture concept.

Figure 8 – Pipelined design limitations, no buffers shown

Page 35: DESIGN OF A POWER AWARE ENCRYPTION ACCELERATOR A …

22

Figure 9 – Parallel architecture, buffers not shown

3.4 Power Domains:

Power gating is to turn off the power supply to the un-used portions of the system, reducing

leakage and dynamic power. Power gating brings down the power dissipation number largely.

CMOS switches are used to turn-off the power when power gating enable signal arrives. This

project uses three power domains; always on, turning on and off the initialization block, and turning

on and off the whole encryption block. It is important to note that the whole design can be powered

off by the System on Chip (SOC) but that is not the scope of this project. Figure 10 shows the

power domains for the AEGIS128L. PTOP is a wrapper and is added to take care of the glue logic

Page 36: DESIGN OF A POWER AWARE ENCRYPTION ACCELERATOR A …

23

in the form of buffers that can be added in the backend stage of the ASIC design process. Always-

on domain (PAon) covers the instances z1 and z2 (FIFO and FIFO controller). Power domain for

turning on or off the whole encryption block is labeled POF1 while POF2 is the label of power

domain that turns on or off the initialization block. Initialization block is used only in the first clock

cycle of every message encryption cycle.

Figure 10 – AEGIS128L Power Domains Summarized by Power Compiler

3.5 Clock Gating Map:

Clock gating is one of the most effective ways to control the dynamic power wastage. It

uses a control signal to gate the clock to the registers when no new data has to be captured. This

control signal has to be coded in RTL and later in the synthesis script. Clock gating option should

be enabled for the synthesis tool.

In this implementation, two blocks from Figure 7 are loaded with the registers; hence, we

enabled clock gating for them to save dynamic power. The amount of power saving is significant

Page 37: DESIGN OF A POWER AWARE ENCRYPTION ACCELERATOR A …

24

and discussed in Chapter 7. In Figure 11, we have highlighted the two clock gated blocks and there

clock gating enable signal.

Figure 11 – Clock gated domains

3.6 SOC Level Power Saving Awareness:

In this project, an encryption block was designed for the SOC used in the mobile devices.

Although SOC level discussion is outside the scope of this project, it should be noted that the SOC

must be able to save power on this encryption block by following two methods.

Page 38: DESIGN OF A POWER AWARE ENCRYPTION ACCELERATOR A …

25

3.6.1 Power Gating at the SOC Level:

The SOC must be able to power gate the whole of encryption block when it knows that the

encryption is not needed for a relatively longer period. This helps in saving power by going into

the deep sleep mode.

3.6.2 Clock Frequency Division at the SOC Level:

Another way that SOC can save power on encryption block is by using clock frequency

division. In such an approach, the SOC can run the encryption block on a slower frequency when

the speed can be tolerated. The SOC can choose to run encryption on maximum possible speed

whenever required.

Page 39: DESIGN OF A POWER AWARE ENCRYPTION ACCELERATOR A …

26

CHAPTER 4: MODELING OF AEGIS128L IN SYSTEMVERILOG

4.1 SystemVerilog Features Used:

SystemVerilog is a combined hardware description language and hardware verification

language based on extensions to Verilog [8]. The enhanced features in the language help making

RTL modeling of the design very systematic and efficient. The features that were used in the design

phase of this project are briefly described below.

4.1.1 Package:

Packages provide means to have common code shared by different modules in one place

[9]. We have used it to share user defined data types and synthesizable functions. We used

conditional compilation to include the import of package first time only [10].

4.1.2 Interface:

Interface is used to connect the top RTL module with the layered test bench. This gives

us the ability to modify any IO in one place instead of going into two different files, which is

error prone.

4.1.3 User defined types:

User defined data types help in creating meaningful data types that suit the context of the

signal. In this project, we have used user-defined types to define 128-bit and 1024-bit buses in a

Page 40: DESIGN OF A POWER AWARE ENCRYPTION ACCELERATOR A …

27

way that we can access each byte separately as well. Multi-dimensional arrays were also utilized

in the RTL.

4.1.4 Functions:

SystemVerilog enhances Verilog function quite a lot. These enhancements can be found in

[10]. We have used functions inside package to call it from the RTL, where needed. It is important

to mention here that functions should be defined as automatic for synthesizable RTL.

4.1.5 Enhanced blocks:

Verilog had only ‘always’ block which was used for combinatorial and sequential code.

The differentiation between combinatorial, latch-based and flip flop based design was made by

how we use the always block. This obviously was error prone. System Verilog introduced advanced

blocks like ‘always_comb’ for combinatorial, ‘always_ff’ for flip flop based and ‘always_latch’

for latch-based designs. The RTL code in this project utilizes this feature.

4.2 Encryption Module:

4.2.1 Overview:

The encryption module consists of three sub-modules shown in Figure 7. This module takes

256-bits data block (via two 128-bits input ports) in each cycle and depending upon the control

signals, performs initialization for the new message, takes in associate data to further update the

internal states, encryption of the data and production of the authentication tag. The total number of

Page 41: DESIGN OF A POWER AWARE ENCRYPTION ACCELERATOR A …

28

clock cycles that encryption module takes to complete the encryption of the input packet (header

and payload) is calculated below.

Table 5 – Total cycles one message takes for encryption

Total cycles Initialization takes 10

Total cycles Header/associated data processing

takes

2

Total cycles Message encryption takes (size of payload in bits, y)/256 =

x

Total cycles Finalization process takes 7

Delay before new message 1

Total number of cycles for y bit long message 20+x

As the length of the message increases, the overall data rate increases. AEGIS128L is

designed in a way that it supports message lengths from 256 bits to 524288 bits. This factor limits

the data rate. To explain it let us assume that AEGIS128L can run on 20ns clock time and then find

the slowest and fastest data rate possible.

For the worst case, 256 bits will take 21 cycles to be encrypted, which gives us 76MB/sec

of data rate. Calculations are shown below.

Total cycles = 21, Time Period = 20ns, Execution time = Total cycles x Time Period = 420

ns

Page 42: DESIGN OF A POWER AWARE ENCRYPTION ACCELERATOR A …

29

Total bits = 256, Total bytes = 256/8 = 32

Data rate = Total bytes of data/Total execution time = 32/(420 x 10-9 ) = 76 MB/s

For the best case, 524288 bits will take 2068 cycles to be encrypted, which gives us 1.58

GB/s of data rate. Calculations are shown below.

Total cycles = 2068, Time Period = 20ns, Execution time = Total cycles x Time Period

41360 ns

Total bits = 524288, Total bytes = 524288/8 = 65536

Data rate = Total bytes of data/Total execution time = 65536/(41360 x 10-9 ) = 1.58 GB/s

Therefore, the most efficient way that the SOC can utilize the encryption block is by

sending 64KB (65536 bytes) size messages only. This way encryption block will perform quite

fast.

4.2.2 Initialization sub-module:

This module represents a combinatorial logic that is only used in the first cycle of each

message encryption. This means that for the remaining of up to 2067 cycles, power is dissipated in

the form of leakage power and dynamic power. Dynamic power dissipates because at the input

ports, the signals still go through transitions. This is why we identified this logic under power gating

domain.

This block takes the Key and the Nonce to activate the internal 1024-bits state using a

constant described in Chapter 2. This step is very important and is one of the key ones for making

Page 43: DESIGN OF A POWER AWARE ENCRYPTION ACCELERATOR A …

30

AEGIS128L unbreakable. The Nonce can be public but should always be a new value; network

layer (that drives encryption block and is under SOC control) should not repeat Nonce as that could

potentially weaken the security of the system. Figure 12 shows the block diagram of this module.

Figure 12 – Initialization block

From power-aware modeling perspective, this module did not offer much. Initially, a

portion of this module was designed as a sequential logic to save dynamic power by clock gating.

When it was identified as a power-gated domain, it was converted to all combinatorial logic to

reduce the area. Source code for initialization module can be found in Appendix A.

4.2.3 Controller sub-module:

This module generates two multiplexer control signals for the datapath module, power gate

enable signal for the initialization block, counts the length of the message, generates a factor that

is used as a input by datapath block for calculating authentication tag and also generates two bit

system’s state outputs. The ‘Tagout’ output is used to indicate that authentication tag can now be

captured and output ‘done’ indicates that the system is done performing the encryption. This block

Page 44: DESIGN OF A POWER AWARE ENCRYPTION ACCELERATOR A …

31

uses four control input, namely start, ADon, MSGon and FINon. These control signals identify

which of the four states system is in, Table 6 summarizes these four states.

Table 6 – Encryption system states

start ADon MSGon FINon SYSTEM STATE

0 0 0 0 System is not used. Long wait can lead to power

gate mode.

1 0 0 0 System starts, initialization mode

1 1 0 0 System moves to associate data processing mode

1 1 1 0 System is now encrypting message

1 1 1 1 System is now generating authentication tag

In the RTL code, at least two power aware coding features were used in this module. The

first one was the controlled counters instead of the free running counters. The controlled counters

help in reducing switching power and so minimizes the dynamic power. The second feature used

was the reduction of the extra flip-flops. As discussed earlier that AEGIS128L has support of 264

bits messages but we are limiting it to 219 because of practical usage purpose. Therefore, we did

not use the extra flip-flops and instead used a logic zero for them. Moreover, clock gating can be

applied to the control signals to save power but we have avoided it to escape timing issues as the

control signals are enabling multiplexers and power domains.

4.2.4 Datapath sub-module:

This module is the heart of the system. It has the datapath that includes the stateupdate

function. Moreover, it also has multiplexers that make sure that the right inputs and the right outputs

are used. The state-update block is followed by a 1024-bits register bank, which is clock gated and

saves lot of dynamic power. Figure 13 shows the block diagram of this module.

Page 45: DESIGN OF A POWER AWARE ENCRYPTION ACCELERATOR A …

32

Figure 13 – Block diagram of datapath module

Note that the buffer at the output of the state update block is clock gated. The state update

function was discussed in Chapter 2; here we present its block diagram that shows its hardware

implementation. It uses eight AES round function blocks. It introduces randomness in the messages

and just needs one cycle per message block for encryption. This feature makes AEGIS128L

standout from other encryption algorithms.

Page 46: DESIGN OF A POWER AWARE ENCRYPTION ACCELERATOR A …

33

Figure 14 – Block diagram of state update function

Page 47: DESIGN OF A POWER AWARE ENCRYPTION ACCELERATOR A …

34

4.3 FIFO RAM Module:

This module is to model a simple 16 by 257 bit RAM. The replacement policy on the RAM

is FIFO. As mentioned earlier the encryption module needs Key and Nonce inputs for the first ten

cycles, but the system input only delivers it for the first two cycles. Therefore, we store both of

them in the RAM to meet encryption module’s requirements. In addition, when encryption module

is busy with initialization stage, in those ten cycles, system continues to store the associated data

and the plaintext message in the RAM.

The rationale behind the size of the RAM is that for any length of the message, 16 should

be enough. This is because the incoming data can start replacing the previous FIFO entries on every

17th cycle without causing any data conflicts. For any new incoming message, the existing

message’s encryption has to be completed first. Therefore, the flow of the data is as following:

i. First intake of the data block, encryption starts in parallel.

ii. Data is outputted in parallel as well.

iii. New data can only be taken in when the last data has finished processing and has

generated authentication tag.

It should also be noted that when there is no write in process, the FIFO RAM is clock gated

saving lots of dynamic power. We can still read the RAM when it is in clock gate mode.

4.4 FIFO Controller Module:

This module is the main controller unit for the project. Conceptually, it has four main parts.

These parts are power-gating control unit, clock-gating control unit, FIFO control unit and

Page 48: DESIGN OF A POWER AWARE ENCRYPTION ACCELERATOR A …

35

encryption control unit. The power-gating control unit is responsible for turning on or off the power

supply of the encryption block. The clock-gating control unit generates two clock-gating enable

signals which are used by FIFO and encryption blocks. The FIFO control unit controls the flow of

the FIFO RAM. It generates signals like write enable, read enable, read pointer, write pointer, full

and empty which help in the FIFO RAM operation. Encryption control unit is responsible for

managing the four stages of the encryption block. It provides four control signals to the encryption

block that identify what mode the encryption block should be running in. Figure 15 shows the block

level diagram for this module.

There is a finite state machine (FSM) in this module, which was coded in the gray code

style purposely. This style helps in saving dynamic power by offering lesser transitions for state

change. Figure 16 shows the FSM and its related code.

Figure 15 - FIFO controller

Page 49: DESIGN OF A POWER AWARE ENCRYPTION ACCELERATOR A …

36

Figure 16 - Mealy Finite State Machine for FIFO controller

Page 50: DESIGN OF A POWER AWARE ENCRYPTION ACCELERATOR A …

37

CHAPTER 5: AEGIS128L VERIFICATION

5.1 Overview:

SystemVerilog has many features that make verification phase of the project more efficient.

These features have been utilized by methodologies such as Verification Methodology Manual

(VMM), Open Verification Methodology (OVM) and Universal Verification Methodology

(UVM). These methodologies can handle the task of verification for huge designs very efficiently

and offer reusability scope.

For this project, we have used a layered test bench for the functional verification. VMM

follows the layered test bench architecture to take full advantage of the automation [12]. The

constructs used for the project include classes, functional coverage, thread and inter-process

communication.

We also use regular test benches to generate different scenarios for generating Switching

Activity Interchange Format (SAIF) file. This file was used to generate the accurate dynamic power

estimates. The process of generating SAIF file is discussed in detail in the Section 5.4.2

5.2 AEGIS128L Verification Framework:

AEGIS128L verification framework is based on a layered test bench model. This style of

modeling was derived from [13]. Figure 17 summarizes this framework. The thick arrows are used

as the mailbox representation in the figure.

Page 51: DESIGN OF A POWER AWARE ENCRYPTION ACCELERATOR A …

38

Figure 17 – AEGIS128L verification framework, based on [13]

5.2.1 Connection with DUT:

The Layered test bench was connected with the DUT by using a top-level module. This

top-level module used SystemVerilog’s interface concept for connections. The clocking block was

also used to avoid timing issues by providing synchronization between DUT and test bench [13].

The code for interface can be found in Appendix A.

Page 52: DESIGN OF A POWER AWARE ENCRYPTION ACCELERATOR A …

39

Figure 18 – Connection of test-bench with DUT, based on [13]

5.2.2 Inter-process Communication:

SystemVerilog has new constructs like fork_join_any and fork_join_none that can trigger

a parallel process. This feature was used in the environment class of the layered test bench to run

methods from different classes in parallel. We utilized the ‘Mailbox’ feature of SystemVerilog as

well. Mailbox is like a FIFO, which can source and sink data and helps in passing information

between two threads [13]. We also used the dynamic arrays for the communication between the

agent and the scoreboard. Dynamic arrays size is variable and thus it use gave us flexibility to

transfer data of varying length easily.

5.2.3 Program Block:

SystemVerilog introduces the program block to hold the test-bench and to reduce the race

conditions between the design under test (DUT) and the test-bench. This project used a program

block, which included all the classes, functional coverage and helped in providing race condition

free model for testing. Details of each class used in the program block are included in the Section

5.3.

Page 53: DESIGN OF A POWER AWARE ENCRYPTION ACCELERATOR A …

40

5.2.4 Validation:

For the validation, two main tasks were completed. The first one was to generate the

expected results and the second one was to match the DUT outputs with the expected results at the

right timing. Generating expected results can either be accomplished by reading an external file

using the system tasks or by producing them in the scoreboard class. Former is done by using any

programming language like C++ or PERL and dumping the expected results in a file. Advantage

in that is that we will have a pure software design to match with the hardware design. On the other

side, extra steps of exporting the test vectors to some external file where C++/PERL can read them

and reading the expected results back in have to take place. This of course looks inefficient.

For this project, we generated the expected results in the scoreboard class and passed them

to the checker class. The checker class compared the actual results with the expected results and

flagged possible failures in the design.

5.2.5 Coverage:

Coverage is the extent to which something is tested. In the context of ASIC verification,

coverage can be of different types like code and functional. In this project, we use functional

coverage by using SystemVerilog constructs. Table 7 shows the commands used with their

description.

One aspect of the coverage can be to gauge how well the design is tested. Another aspect

is for power measurement. DUT should be exercised for enough time with different scenarios so

Page 54: DESIGN OF A POWER AWARE ENCRYPTION ACCELERATOR A …

41

that we can capture an overall accurate switching activity. This helps in getting an accurate Front-

end power estimation.

Table 7 – Functional Coverage Commands, based on [16]

Commands Description

covergroup It is user defined construct that holds cover points

coverpoint It is use to represent a variable from the DUT

bins It associates a variable name and a count with a set of values or a

sequence of value transitions.

options Built-in feature that helps in defining the weightage of

covergroup

sample() It is a built-in method that helps in calculating coverage on the

fly

$get_coverage This built-in method calculates total coverage percentage

achieved

Figure 19 shows the results of the functional coverage. By running twenty messages of

65536 bytes each, we managed to have a functional coverage number of 100%. To get this result,

we ran urg –dir simv.vdb and firefox urgReport/grp0.html after the compile and simv commands.

Page 55: DESIGN OF A POWER AWARE ENCRYPTION ACCELERATOR A …

42

Figure 19 – Functional Coverage results for AEGIS128L

5.3 Layered Testbench:

Figure 17 shows how the verification infrastructure is placed. It uses many ‘classes’ for

different sections of the framework. Let us explore these classes to understand their functionality.

5.3.1 Class Transactor:

This class is not shown in Figure 17 but was used for the transaction of data from one class

to another using mailbox. This class holds the random and regular variables that can be manipulated

to hold the test vectors.

5.3.2 Class Generator:

This class as the name suggests generated the test vectors using the transactor class. It

defined certain constraints and each variable from the transactor object was assigned appropriate

constraints. This way we achieved high coverage in short amount of time by modeling our

constraints smartly.

Page 56: DESIGN OF A POWER AWARE ENCRYPTION ACCELERATOR A …

43

5.3.3 Class Agent:

The main goal of this class was to get data from the generator object and push it in the

dynamic array so that the scoreboard can read from it. It also passed it down to the driver class,

using mailbox. We used this dynamic array to measure the functional coverage as it held all the

input test vectors.

5.3.4 Class Driver:

Driver class extracted the data from the agent-to-driver mailbox and drived the DUT input

ports. If the driver drives the synchronous signal at the active edge of the clock, the value propagates

immediately to the design [13]. If the test-bench drives the output just after the active edge, the

value is not seen in the design until the next active edge of the clock [13].

5.3.5 Class Scoreboard:

In this class, expected results were formulated by using the test vectors delivered by the

dynamic array. The scoreboard can also input the expected results from an external file.

5.3.6 Class Monitor:

To sample the outputs of the DUT and transfer it to checker class via mailbox, monitor

class was used.

Page 57: DESIGN OF A POWER AWARE ENCRYPTION ACCELERATOR A …

44

5.3.7 Class Checker:

Checker got the expected results from the scoreboard and actual results from the DUT. It

then did comparison between the two and flagged the errors.

5.4 Regular Test Bench for Saif:

To generate SAIF files, we used regular test-benches as well. It was because we had to

run these simulations on gate level netlist and netlist loses some of the hierarchies because of the

boundary optimization during synthesis.

Different scenarios like sleep, normal and overdrive were modeled. This was done so that

the power compiler can generate accurate power report by using the generated SAIF files.

5.4.1 Scenarios:

Table 8 shows the scenarios used and their description. Note that from the SOC level, the

whole encryption block can be powered off saving lot of leakage power but that is outside of the

scope of the project.

5.4.2 SAIF File Generation:

There are two methods to generate a SAIF file. One is to convert vcd file to saif file by

using vcd2saif command. To generate vcd file, we need to place system task $vcdpluson in an

initial block and while compiling, use “-PP” field. VCS generates vpd format file, which we can

use as input to generate saif file as follows.

Page 58: DESIGN OF A POWER AWARE ENCRYPTION ACCELERATOR A …

45

vcd2saif –input vcdpluson.vpd –o power.saif

Table 8 – Modes of operation

Mode Description

Sleep No message is being encrypted but SOC has kept ON the power to the

system.

Normal Message is getting encrypt at a scaled down frequency.

Overdrive Message is getting encrypt at maximum possible frequency.

The second method is by generating the SAIF file directly. In this case we don’t need to

generate vcd file and so we can comment out the $vcdpluson command. Table 9 shows the

commands used in the test-bench to generate the SAIF file. Please note that these commands should

be used in order.

It is important to note here that the time unit should be same in test-bench and SAIF file.

Since we used 10-9 as time-unit in the SAIF file generation command shown in Table 9, our test-

bench should also match this time unit. Otherwise, power estimation will not be accurate for the

dynamic power. This was achieved by using following command.

`timescale 1ns/1ns

Page 59: DESIGN OF A POWER AWARE ENCRYPTION ACCELERATOR A …

46

Table 9 – SAIF generation commands [17]

Commands Description

$set_gate_level_monitoring("ON");

This method/command turns ON the

registering of all internal nets for simulation.

$set_toggle_region("test.top_rtl");

It specifies the toggle region. We have used the

instantiation of DUT in test-bench.

$toggle_start;

This command instructs simulator to start

capturing toggle activity.

$toggle_stop;

This command instructs simulator to stop

capturing toggle activity.

$toggle_report("power.saif",

1.0e-9, "test.dut");

This command dumps the switching activity of

nets and ports into a file with user given name.

Specifying the timescale/time-unit is important

and should match with the test-bench.

The third field is just describing the hierarchy

for switching activity. SAIF file should be

annotated with netlist on same hierarchy for

power estimation step.

5.4.3 Gate Level Simulation:

Gate-level simulation is needed when the RTL and the overall intent of the design is to be

matched. The gate-level netlist that is obtained through synthesis process is simulated and the

results are compared with the RTL simulation. Gate-level simulation can also be performed for

getting accurate switching activity to be used in power estimation. We ran gate-level simulations

for mainly the second reason.

Page 60: DESIGN OF A POWER AWARE ENCRYPTION ACCELERATOR A …

47

The Synthesis tool optimizes the design and applies boundary optimization, which causes

netlist to lose its RTL hierarchies some times. In this case, if we apply the RTL simulation generated

SAIF file to the netlist, many signals will not be able to annotate and thus power estimation will

not be accurate. Instead, if we run gate-level simulation to generate SAIF file, annotation problem

is solved.

For RTL simulation, we usually use following command. However, this command is not

sufficient for the gate-level simulation. This is because netlist contains many gates that are not

present in the RTL. Instead, they are taken from the library file.

vcs –sverilog “testbench.sv”

Fortunately, VCS provides a switch with vcs command that can load the library file in .v

format. This makes the gate-level simulation possible and the command becomes as follows:-

vcs –sverilog “testbench.sv” –v lib.v

However, there is one more challenge, in addition to the functional information; gates in

library file have lot of information about timing. We can either model test-bench to accommodate

for these delays or remove timing specification from the library file. We opted the second method

by using PERL script to clear all the timing information. The PERL script is shared in Appendix

C. The result of the script is shown in Table 10 for one of the gates. The code shown in the table

was taken from the Synopsys 90nm digital standard cell library.

Page 61: DESIGN OF A POWER AWARE ENCRYPTION ACCELERATOR A …

48

Table 10 – Removing timing information from gate-level cell, code taken from [14]

Pre script

`celldefine

`suppress_faults

`enable_portfaults

`ifdef functional

`timescale 1ns / 1ns

`delay_mode_distributed

`delay_mode_unit

`else

`timescale 1ps / 1ps

`delay_mode_path

`endif

module AND2X1_HVT (IN1,IN2,Q);

output Q;

input IN1,IN2;

and #1 (Q,IN2,IN1);

`ifdef functional

`else

specify

specparam

in1_lh_q_lh=52,in1_hl_q_hl=50,in2_lh_q_lh=59,in2_hl_q_hl=56;

( IN1 +=> Q) = (in1_lh_q_lh,in1_hl_q_hl);

( IN2 +=> Q) = (in2_lh_q_lh,in2_hl_q_hl);

endspecify

`endif

endmodule

`endcelldefine

`disable_portfaults

Post script

module AND2X1_HVT (IN1,IN2,Q);

output Q;

input IN1,IN2;

and (Q,IN2,IN1);

endmodule

Page 62: DESIGN OF A POWER AWARE ENCRYPTION ACCELERATOR A …

49

5.5 Multi-Voltage Aware Simulation:

VCS runs simulations of RTL or netlist with an assumption that voltage is always on. In

order to get power gating effect on the SAIF file, we need to run simulations on the VCS with MV

Sim version of VCS tool. This tool version has the awareness of multi voltage and can turn off or

on certain power-domains on the direction of the power gating enable signals and thus helps in

providing accurate switching activity detail. The flow for VCS with MV Sim is called VCS Native

Low Power Flow (VCS NLP). Figure 20 shows the flow diagram. Unfortunately, we did not had

access to this version of the VCS and so were unable to test the power intent implementation.

Similarly, for the power gating we have used manual hacks to get the power consumption estimates,

as the SAIF file did not include power-gating effect.

Figure 20 – VCS NLP Flow [15]

Page 63: DESIGN OF A POWER AWARE ENCRYPTION ACCELERATOR A …

50

CHAPTER 6: AEGIS128L SYNTHESIS

In the synthesis process, the RTL code is translated and optimized into a gate level netlist

using technology library cells. The gate-level output is called netlist. The optimized netlist is the

product that the Front-end delivers to the Back-end in the ASIC design flow. Synopsys Design

Compiler is the synthesis tool used for this project. Figure 21 shows what does it needs to generate

the gate-level netlist for the power aware design.

Figure 21 – Synthesis Process, based on [18]

Page 64: DESIGN OF A POWER AWARE ENCRYPTION ACCELERATOR A …

51

6.1 Synthesis Script:

The synthesis script included the commands to be executed in sequential fashion. In the

beginning, we read the RTL files, then defined the constraints like input and output delays. Then

we provided the synthesis tool with the libraries to be used. For clock gating, clock gating aware

libraries were used and similarly for multi-threshold voltage cells. For clock-gating, we used the

command set_clock_gating_style. This command helps in defining the type of clock gating to be

used, minimum and maximum fan-out etc. To be power efficient, minimum number of flip-flops

in register was chosen to be three. Fanout of each clock gating enable signal can be set to infinity

in Front-end to get lower power and area. We chose fan-out of 64 to get realistic power estimation.

In addition, we used latch-based style for the clock gating as it helped in reducing glitches. Glitch

free design helps in saving dynamic power [6]. Then we loaded UPF script and started the compile

process.

“Compile” command is an older command with respect to the “compile_ultra” command.

We used the latter one. This command has boundary optimization ON by default, unlike “compile”

[18]. In addition, it has a switch “-gate_clock” to implement clock gating. If UPF script is read, it

is automatically incorporated in the design.

6.2 UPF Script:

In this script, we defined the power intent of the system. Firstly, total domains were created

and supply ports were connected via nets. Then we created power switches with power-gate enable

signals. Isolation and retention cells population command usually comes next but we did not use

them. Retention cells were not required because we did not need to retain any state when we

Page 65: DESIGN OF A POWER AWARE ENCRYPTION ACCELERATOR A …

52

powered off the system. As far as isolation cells were concerned, we did not use extra logic of

isolation cells to save area and power. It is save to do so because the system was designed so that

there were multiplexers to follow the power gated domain and timing wise it was always made sure

that when power domain is turned off, the mux selected other input and not the powered off logic’s

output. Both synthesis and UPF scripts can be found in Appendix C.

6.3 Checks:

We can use two commands to check the design. One is the check_design command and

other is the check_mv_design command. Check_design checks the design itself and reports any

potential issues. Check_mv_design checks the multi-voltage power intent and reports any issue in

it.

6.4 Clock Rate:

Our design is capable of running on 20 ns clock time period. SOC can drive the encryption

block with slower frequency to save power or in overdrive mode with maximum frequency for

performance. Ideally, this design can deliver up to 1.6GB/s.

Maximum clock cycles per byte (cpb) = 1 / (data rate)(time period) = 1/32 = 0.03125 cpb

Clock cycle per byte (cpb) = 0.036 cpb for 4096-byte message

From 4096-byte data message’s cpb comparison, we conclude that our implementation is

more than 15 times faster than the AEGIS128L on Intel Sandy Bridge Core i5 processor [4].

Page 66: DESIGN OF A POWER AWARE ENCRYPTION ACCELERATOR A …

53

CHAPTER 7: POWER ESTIMATION AND ANALYSIS

In this chapter, we will cover two main topics. One is the power measurement method used

for the project and other is the analysis of power results. Incremental improvement by applying

each power-saving method is also the part of this chapter.

7.1 Power Estimation:

In this section, we discuss how Synopsys Power Compiler measures power. In addition,

SAIF file structuring is discussed.

7.1.1 Power Types:

Power consumption in digital circuits mainly fall under two categories, Static and Dynamic

power. Dynamic power is further divided into Internal (short circuit) and Switching Power. Let us

understand what does each one mean.

Static Power is the power dissipated by the gates when they are not switching i.e. they are

inactive. It is mainly due to source-to-drain subthreshold leakage, which is caused by reduced

threshold voltages that prevent the gate from completely turning off [17].

Internal Power is any power dissipated within the boundary of a cell [17]. It is the power

dissipated during switching because of the charging and discharging of internal capacitances of the

cell. It also includes the power dissipated during momentary short-circuit between P and N

transistor [17].

Page 67: DESIGN OF A POWER AWARE ENCRYPTION ACCELERATOR A …

54

Switching power is due to the charging and discharging of the load capacitances of the cell

[17]. Therefore, system should be modeled in a way that minimizes the transitions from zero to one

and vice versa, to save power.

7.1.2 Calculating Power:

In this section, we briefly explain how Power Compiler measures the power for the circuit.

Power compiler uses equations for each type of power and gives an estimate. Synopsys uses Non-

Linear Delay Model (NLDM) based model. Let us go through the equations for each power type.

For the leakage power, power compiler analysis calculates total leakage power of the

design by summing the leakage power of each library cell used in the system [17]. This is

summarized in the equation below:

Pleakage = ∑ 𝑷𝑐𝑒𝑙𝑙_𝑙𝑒𝑎𝑘𝑎𝑔𝑒_𝑘𝑐𝑒𝑙𝑙𝑠(𝑘)

For Internal power, the short circuit time, voltage used by the cell, current used by the cell

and the frequency of transitions are required. Power compiler calculates the internal power using

following equation:

Pinternal = E_{output pin} x PathWeight x Toggle_rate(transitions per second) [17]

Page 68: DESIGN OF A POWER AWARE ENCRYPTION ACCELERATOR A …

55

‘E’ represents the internal energy for the output pin of cell as a function of input transitions, output

load and voltage.

“PathWeight” for input pins depends on input toggle rate, transition times and functionality of cell.

Toggle rate of output and input pins is also required [17]. For switching power, following equation

is used:

Psw = Vdd2 ∑i (Cload_i x TRi) [17]

where TRi represents toggle rate of net I, transitions per second. In addition, Vdd represents

supply voltage. Cload is the total capacitive load of net i [17].

Therefore, for dynamic power, we add the internal and the switching power. One thing to

note is that the clock rate, toggle rate and probability of the logic one are the main factors that are

under our control and are big contributors to dynamic power.

7.1.3 Report_power:

Report_power is the command that is used to calculate power on the current design by the

design compiler. If no switching activity is annotated, the Power Compiler uses following defaults

for the primary inputs [17]:

Probability = 0.1 (10% chance of signal being in one state)

Toggle_rate = 0.1 * fclk (signal switches once every 10th clock cycle)

Page 69: DESIGN OF A POWER AWARE ENCRYPTION ACCELERATOR A …

56

It is obvious that the defaults that Power Compiler uses cannot represent the true dynamic

power. This is why, annotating the actual switching activity is very important. SAIF file gives the

actual toggle rate and the probability of one state of the signals.

Report_power_calculation command can be used by specifying a “net” on which power

calculation method is applied. This command generates a report showing how the power is

calculated for that net. Following equation gets the power calculation report for the port ‘clk’.

report_power_calculation clk > report_power_calculation

7.1.4 Manual Setting:

We can manually set the switching activity of the nets by using set_switching_activity

command as well. This of course can become a hectic, verbose and error prone calculation method.

We can also set a global new toggle rate and static probability by using the following two

commands:

Power_default_static_probablity

Power_default_toggle_rate

7.1.5 Using SAIF:

Figure 22 shows a snippet of the SAIF file. It shows a port signal and its statistics after

gate-level simulation. Using this file, we can calculate the true probability and toggle rate as

Page 70: DESIGN OF A POWER AWARE ENCRYPTION ACCELERATOR A …

57

follows:

Probability of logic one = total time for one / (total time for one + total time for zero +

total time for x)

Toggle rate of signal = total toggle count / total simulation time

Figure 22 – SAIF file snippet

7.2 Power Analysis:

For power analysis, three scenarios were considered, namely normal, sleep and overdrive.

Normal scenario covers the continuous encryption of the messages, one after the other. Sleep

scenario covers the case where there is no message to be encrypted, but the system is ready.

Overdrive is same as normal but uses the maximum clock rate. The reason we do not have a mix

scenario is the unavailability of VCS+MV-Sim tool as explained in the Section 5.5.

Page 71: DESIGN OF A POWER AWARE ENCRYPTION ACCELERATOR A …

58

For each step of the power improvement, we calculated power for the three scenarios

mentioned above. There were a total of four power improvement steps based on power aware RTL

code, the addition of clock gating, the addition of multi-threshold Vt cells, and the addition of

power gating. Table 11 shows the summary of power improvements. Detailed reports can be found

in Appendix D.

Table 11 – Power Improvements

Scenarios Power Aware

RTL

Clock

Gating

Multi

threshold Vt

cells

Power

gating

Normal

(clock T =

32ns)

12.98 mW 12.64 mW 12.18 mW 12.16 mW

Sleep 3.39 mW 3.07 mW 2.80 mW 1.1 mW

Overdrive

(clock T =

20ns)

18.72 mW 18.34 mW 17.78 mW 17.75 mW

If the clock rate is slower, the dynamic power decreases. Therefore, SOC has to carefully

designed in how it uses frequency scaling for the encryption block. In the Normal and Overdrive

modes, dynamic power combines with static power to give the total power consumption rate. For

the Sleep mode, only leakage power is accounted.

In the sleep mode, 68% of power is saved. In regular mode up to 6-7% power is saved. For

example in the normal mode, we save 820uW. In the over-drive mode, we save up to 970uW.

AEGIS128L was designed in a way that it has low execution time, completes the encryption

Page 72: DESIGN OF A POWER AWARE ENCRYPTION ACCELERATOR A …

59

relatively quickly, and enters the sleep mode. In the sleep mode, SOC can also turn off the whole

power supply. Either way, power saving is significant.

Mobile devices operate on battery. Battery life can be summarized by following equation.

Battery life = battery capacity / power consumption

From this, we can calculate AEGIS128L impact on mobile phone battery by following

simple analysis. Let us assume a regular battery with capacity of 1500mAh and 3.7 voltage rating.

Following equation will now give the impact of AEGIS128L on battery capacity:

Amps-Hours = (Watts/Volts) x Hours

For 20 hours of battery life, we get the power consumption as follows:

Power (Watts) = (Amps-Hour x Voltage) / Time(hrs)

Power(Watts) = (1500mAh x 3.7 v)/20 hrs

Power = 277.5 mW

AEGIS128L implementation used 12mW in the active mode; therefore, it is using 4.3%

of battery in the active mode. In the sleep mode, it uses 0.4%. [19] shows how the smartphone

battery is usually used. It depends a lot, on what application is being run. In addition, units other

than SOC consume smartphone battery as well.

Page 73: DESIGN OF A POWER AWARE ENCRYPTION ACCELERATOR A …

60

CHAPTER 8: CONCLUSION

In this project, a power-efficient model of AEGIS128L encryption algorithm was

developed in SystemVerilog HDL. Validation was completed in SystemVerilog usin VCS

simulator whereas Synthesis was done on Synopsys Design Compiler tool using a 90nm technology

library. For power estimations, Power Compiler tool was used. Synopsys Inc. provided all the

EDA tool used in this work.

The proposed low power encryption solution was accomplished by using a parallel

architecture and an algorithm that supports it. The designed encryption accelerator can be used

along with the SOC to speed up the encryption process and consume low power.

The model was thoroughly validated in SystemVerilog. A program based layered test

bench was used for this purpose. Functional coverage was used to access the validation. Test

vectors were generated to check the design thoroughly and expected results were created by the

Scoreboard section of the verification.

The power aware gate level netlist can run at 50 MHz frequency. Since 256-bits message

block can be processed in one clock cycle, this gives us maximum of 1.6GB/s data rate. We expect

the design to run faster using latest technology libraries.

Cycles per byte (cpb) is a clock rate independent performance metric. From 4096-byte data

message’s cpb comparison, we conclude that our implementation is more than 15 times faster than

the AEGIS128L on Intel Sandy Bridge Core i5 processor [4].

Page 74: DESIGN OF A POWER AWARE ENCRYPTION ACCELERATOR A …

61

Power aware implementation helped in saving approximately 7% of the dynamic power

and 68% in idle mode. The power aware methods did increase the area by 2% but it reduced the

dynamic power amount by 7%. This saving was achieved when the device was in full

operation. A more significant power saving was obtained (37.5%) when the device ran 50% in the

normal mode. These methods included the power aware RTL coding, clock gating, power gating,

use of multi-threshold Vt cells and frequency scaling.

For future, multi VDD power domains can be created which will result in saving dynamic

power as well. A better and faster technology library can be used to see the effect of it on power

and speed. Clock rate has the scope for improvement. By adding one or two pipeline stages in the

timing critical datapath, we can speed up the encryption process. In addition, protection from side

band attacks can also be incorporated in the hardware implementation.

Page 75: DESIGN OF A POWER AWARE ENCRYPTION ACCELERATOR A …

62

APPENDIX A: AEGIS128L HARDWARE MODEL SOURCE FILES

pckg.sv

`ifndef DEFS_DONE

`define DEFS_DONE

package pckg;

typedef struct packed { logic [127:0] bus0;

logic [127:0] bus1;

logic [127:0] bus2;

logic [127:0] bus3;

logic [127:0] bus4;

logic [127:0] bus5;

logic [127:0] bus6;

logic [127:0] bus7;

}bus1024;

typedef struct packed { logic [127:0] MS128; //most significant 128 bits

logic [127:0] LS128; //least significant 128 bits

}store256;

typedef struct packed { logic [15:0] word0;

logic [15:0] word1;

logic [15:0] word2;

logic [15:0] word3;

logic [15:0] word4;

logic [15:0] word5;

logic [15:0] word6;

logic [15:0] word7;

}bus128;

//following functions taken from [11]

function automatic [7:0] xtime (input [7:0] b);

return {b[6:0],1'b0}^(8'h1b&{8{b[7]}});

endfunction

function automatic [31:0] mix_col (input [7:0] s0,s1,s2,s3);

mix_col={xtime(s0)^xtime(s1)^s1^s2^s3,s0^xtime(s1)^xtime(s2)^s2^s3,

s0^s1^xtime(s2)^xtime(s3)^s3,xtime(s0)^s0^s1^s2^xtime(s3)};

endfunction

function automatic [7:0] sbox(input [7:0] a);

case (a)

8'h00: return 8'h63;

Page 76: DESIGN OF A POWER AWARE ENCRYPTION ACCELERATOR A …

63

8'h01: return 8'h7c;

8'h02: return 8'h77;

8'h03: return 8'h7b;

8'h04: return 8'hf2;

8'h05: return 8'h6b;

8'h06: return 8'h6f;

8'h07: return 8'hc5;

8'h08: return 8'h30;

8'h09: return 8'h01;

8'h0a: return 8'h67;

8'h0b: return 8'h2b;

8'h0c: return 8'hfe;

8'h0d: return 8'hd7;

8'h0e: return 8'hab;

8'h0f: return 8'h76;

8'h10: return 8'hca;

8'h11: return 8'h82;

8'h12: return 8'hc9;

8'h13: return 8'h7d;

8'h14: return 8'hfa;

8'h15: return 8'h59;

8'h16: return 8'h47;

8'h17: return 8'hf0;

8'h18: return 8'had;

8'h19: return 8'hd4;

8'h1a: return 8'ha2;

8'h1b: return 8'haf;

8'h1c: return 8'h9c;

8'h1d: return 8'ha4;

8'h1e: return 8'h72;

8'h1f: return 8'hc0;

8'h20: return 8'hb7;

8'h21: return 8'hfd;

8'h22: return 8'h93;

8'h23: return 8'h26;

8'h24: return 8'h36;

8'h25: return 8'h3f;

8'h26: return 8'hf7;

8'h27: return 8'hcc;

8'h28: return 8'h34;

8'h29: return 8'ha5;

8'h2a: return 8'he5;

8'h2b: return 8'hf1;

8'h2c: return 8'h71;

8'h2d: return 8'hd8;

Page 77: DESIGN OF A POWER AWARE ENCRYPTION ACCELERATOR A …

64

8'h2e: return 8'h31;

8'h2f: return 8'h15;

8'h30: return 8'h04;

8'h31: return 8'hc7;

8'h32: return 8'h23;

8'h33: return 8'hc3;

8'h34: return 8'h18;

8'h35: return 8'h96;

8'h36: return 8'h05;

8'h37: return 8'h9a;

8'h38: return 8'h07;

8'h39: return 8'h12;

8'h3a: return 8'h80;

8'h3b: return 8'he2;

8'h3c: return 8'heb;

8'h3d: return 8'h27;

8'h3e: return 8'hb2;

8'h3f: return 8'h75;

8'h40: return 8'h09;

8'h41: return 8'h83;

8'h42: return 8'h2c;

8'h43: return 8'h1a;

8'h44: return 8'h1b;

8'h45: return 8'h6e;

8'h46: return 8'h5a;

8'h47: return 8'ha0;

8'h48: return 8'h52;

8'h49: return 8'h3b;

8'h4a: return 8'hd6;

8'h4b: return 8'hb3;

8'h4c: return 8'h29;

8'h4d: return 8'he3;

8'h4e: return 8'h2f;

8'h4f: return 8'h84;

8'h50: return 8'h53;

8'h51: return 8'hd1;

8'h52: return 8'h00;

8'h53: return 8'hed;

8'h54: return 8'h20;

8'h55: return 8'hfc;

8'h56: return 8'hb1;

8'h57: return 8'h5b;

8'h58: return 8'h6a;

8'h59: return 8'hcb;

8'h5a: return 8'hbe;

Page 78: DESIGN OF A POWER AWARE ENCRYPTION ACCELERATOR A …

65

8'h5b: return 8'h39;

8'h5c: return 8'h4a;

8'h5d: return 8'h4c;

8'h5e: return 8'h58;

8'h5f: return 8'hcf;

8'h60: return 8'hd0;

8'h61: return 8'hef;

8'h62: return 8'haa;

8'h63: return 8'hfb;

8'h64: return 8'h43;

8'h65: return 8'h4d;

8'h66: return 8'h33;

8'h67: return 8'h85;

8'h68: return 8'h45;

8'h69: return 8'hf9;

8'h6a: return 8'h02;

8'h6b: return 8'h7f;

8'h6c: return 8'h50;

8'h6d: return 8'h3c;

8'h6e: return 8'h9f;

8'h6f: return 8'ha8;

8'h70: return 8'h51;

8'h71: return 8'ha3;

8'h72: return 8'h40;

8'h73: return 8'h8f;

8'h74: return 8'h92;

8'h75: return 8'h9d;

8'h76: return 8'h38;

8'h77: return 8'hf5;

8'h78: return 8'hbc;

8'h79: return 8'hb6;

8'h7a: return 8'hda;

8'h7b: return 8'h21;

8'h7c: return 8'h10;

8'h7d: return 8'hff;

8'h7e: return 8'hf3;

8'h7f: return 8'hd2;

8'h80: return 8'hcd;

8'h81: return 8'h0c;

8'h82: return 8'h13;

8'h83: return 8'hec;

8'h84: return 8'h5f;

8'h85: return 8'h97;

8'h86: return 8'h44;

8'h87: return 8'h17;

Page 79: DESIGN OF A POWER AWARE ENCRYPTION ACCELERATOR A …

66

8'h88: return 8'hc4;

8'h89: return 8'ha7;

8'h8a: return 8'h7e;

8'h8b: return 8'h3d;

8'h8c: return 8'h64;

8'h8d: return 8'h5d;

8'h8e: return 8'h19;

8'h8f: return 8'h73;

8'h90: return 8'h60;

8'h91: return 8'h81;

8'h92: return 8'h4f;

8'h93: return 8'hdc;

8'h94: return 8'h22;

8'h95: return 8'h2a;

8'h96: return 8'h90;

8'h97: return 8'h88;

8'h98: return 8'h46;

8'h99: return 8'hee;

8'h9a: return 8'hb8;

8'h9b: return 8'h14;

8'h9c: return 8'hde;

8'h9d: return 8'h5e;

8'h9e: return 8'h0b;

8'h9f: return 8'hdb;

8'ha0: return 8'he0;

8'ha1: return 8'h32;

8'ha2: return 8'h3a;

8'ha3: return 8'h0a;

8'ha4: return 8'h49;

8'ha5: return 8'h06;

8'ha6: return 8'h24;

8'ha7: return 8'h5c;

8'ha8: return 8'hc2;

8'ha9: return 8'hd3;

8'haa: return 8'hac;

8'hab: return 8'h62;

8'hac: return 8'h91;

8'had: return 8'h95;

8'hae: return 8'he4;

8'haf: return 8'h79;

8'hb0: return 8'he7;

8'hb1: return 8'hc8;

8'hb2: return 8'h37;

8'hb3: return 8'h6d;

8'hb4: return 8'h8d;

Page 80: DESIGN OF A POWER AWARE ENCRYPTION ACCELERATOR A …

67

8'hb5: return 8'hd5;

8'hb6: return 8'h4e;

8'hb7: return 8'ha9;

8'hb8: return 8'h6c;

8'hb9: return 8'h56;

8'hba: return 8'hf4;

8'hbb: return 8'hea;

8'hbc: return 8'h65;

8'hbd: return 8'h7a;

8'hbe: return 8'hae;

8'hbf: return 8'h08;

8'hc0: return 8'hba;

8'hc1: return 8'h78;

8'hc2: return 8'h25;

8'hc3: return 8'h2e;

8'hc4: return 8'h1c;

8'hc5: return 8'ha6;

8'hc6: return 8'hb4;

8'hc7: return 8'hc6;

8'hc8: return 8'he8;

8'hc9: return 8'hdd;

8'hca: return 8'h74;

8'hcb: return 8'h1f;

8'hcc: return 8'h4b;

8'hcd: return 8'hbd;

8'hce: return 8'h8b;

8'hcf: return 8'h8a;

8'hd0: return 8'h70;

8'hd1: return 8'h3e;

8'hd2: return 8'hb5;

8'hd3: return 8'h66;

8'hd4: return 8'h48;

8'hd5: return 8'h03;

8'hd6: return 8'hf6;

8'hd7: return 8'h0e;

8'hd8: return 8'h61;

8'hd9: return 8'h35;

8'hda: return 8'h57;

8'hdb: return 8'hb9;

8'hdc: return 8'h86;

8'hdd: return 8'hc1;

8'hde: return 8'h1d;

8'hdf: return 8'h9e;

8'he0: return 8'he1;

8'he1: return 8'hf8;

8'he2: return 8'h98;

8'he3: return 8'h11;

Page 81: DESIGN OF A POWER AWARE ENCRYPTION ACCELERATOR A …

68

8'he4: return 8'h69;

8'he5: return 8'hd9;

8'he6: return 8'h8e;

8'he7: return 8'h94;

8'he8: return 8'h9b;

8'he9: return 8'h1e;

8'hea: return 8'h87;

8'heb: return 8'he9;

8'hec: return 8'hce;

8'hed: return 8'h55;

8'hee: return 8'h28;

8'hef: return 8'hdf;

8'hf0: return 8'h8c;

8'hf1: return 8'ha1;

8'hf2: return 8'h89;

8'hf3: return 8'h0d;

8'hf4: return 8'hbf;

8'hf5: return 8'he6;

8'hf6: return 8'h42;

8'hf7: return 8'h68;

8'hf8: return 8'h41;

8'hf9: return 8'h99;

8'hfa: return 8'h2d;

8'hfb: return 8'h0f;

8'hfc: return 8'hb0;

8'hfd: return 8'h54;

8'hfe: return 8'hbb;

8'hff: return 8'h16;

endcase

endfunction

endpackage

import pckg::*;

`endif

Port.sv

`include "pckg.sv"

//top_rtl(rst,clk,start,data_in,c1,c2,tagout,done)

interface port(input bit clk,rst);

bus128 c2,c1;

bus256 data_in;

Page 82: DESIGN OF A POWER AWARE ENCRYPTION ACCELERATOR A …

69

logic start,tagout,done; //ready

clocking ck@(posedge clk);

input c2,c1,tagout,done;

output data_in,start;

endclocking

modport top_rtl(input data_in,rst,start,clk,output c2,c1,tagout,done);

modport layrd_test(clocking ck);

endinterface

Top_rtl.sv

`include "fifo.sv"

`include "fifo_cntlr.sv"

`include "encryption.sv"

module top_rtl(rst,clk,start,data_in,c1,c2,tagout,done,ready);

input rst,clk;

input start;

input [256:0] data_in;

wire [256:0] data_out;

wire EMPTY,FULL;

wire [3:0] rd_ptr,wr_ptr;

output [127:0] c1,c2;

output tagout,done,ready;

wire FINon;

wire poff_rtl,poff_init;

wire start_in,ADon,MSGon,FINon_in,rd_en,wr_en,done_clk_gate;

//assign start = data_in[257] ;

assign FINon=data_out[256];

fifo z1(rst,clk,FINon_in,wr_ptr,rd_ptr,wr_en,rd_en,data_in,data_out,done_clk_gate);

fifo_cntlr

z2(rst,clk,start,FINon,EMPTY,FULL,rd_en,wr_en,wr_ptr,rd_ptr,done_clk_gate,start_in,ADon,M

SGon,FINon_in,ready,poff_rtl);

encryption

z3(data_out[255:128],data_out[127:0],rst,clk,start_in,ADon,MSGon,FINon_in,poff_rtl,c2,c1,tago

ut,done,poff_init);

endmodule

Page 83: DESIGN OF A POWER AWARE ENCRYPTION ACCELERATOR A …

70

fifo.sv

module fifo(rst,clk,FINon_in,wr_ptr,rd_ptr,wr_en,rd_en,data_in,data_out,done_clk_gate);

input rst,clk,wr_en,rd_en,FINon_in;

input [3:0] wr_ptr,rd_ptr;

input [256:0] data_in; //define as structure

output [256:0] data_out; //define as structure

input done_clk_gate;

logic [256:0] data_out;

logic [256:0] ram[0:15];

always_ff@(posedge clk or negedge rst)

begin

if(!rst)

begin

ram <= '{default:257'd0};

end

else if(done_clk_gate)

begin

if(wr_en)

begin

ram[wr_ptr] <= data_in;

end

end

end

always_comb

begin

if(rd_en) //when empty,rd_en=0

begin

data_out = ram[rd_ptr]; //might add isolation cells if power gated

end

else

begin

data_out = 257'd0; //might add isolation cells if power gated

end

end

endmodule

Page 84: DESIGN OF A POWER AWARE ENCRYPTION ACCELERATOR A …

71

fifo_cntlr.sv

//`include "fifo.sv"

module

fifo_cntlr(rst,clk,start,FINon,EMPTY,FULL,rd_en,wr_en,wr_ptr,rd_ptr,done_clk_gate,start_in,A

Don,MSGon,FINon_in,ready,poff_rtl);

input rst,clk,start,FINon;

output EMPTY,FULL,rd_en,wr_en,done_clk_gate;

output logic start_in,ADon,MSGon,FINon_in,ready,poff_rtl;

output [3:0] wr_ptr,rd_ptr;

logic [3:0] wr_ptr,rd_ptr;

logic signed [4:0]diff;

logic clear;

//roll_over define

logic roll_over,n_roll_over;

assign diff = wr_ptr - rd_ptr ;

assign done_clk_gate = start;

assign poff_rtl = ready & (!start);

always_ff@(posedge clk or negedge rst)

begin

if(!rst)

roll_over <= 0;

else

roll_over <= n_roll_over;

end

always_comb

begin

//if((wr_ptr - rd_ptr < 0) && (roll_over == 0))

if((diff < 0) && (roll_over == 0))

begin

n_roll_over = 1;

end

//else if((wr_ptr - rd_ptr > 0) && (roll_over == 1))

else if((diff >= 0) && (roll_over == 1))

begin

n_roll_over = 0;

end

else

begin

n_roll_over = roll_over;

end

end

Page 85: DESIGN OF A POWER AWARE ENCRYPTION ACCELERATOR A …

72

//EMPTY and FULL asynchr defines

logic EMPTY,FULL;

always_comb

begin

if(roll_over == 0)

begin

if(wr_ptr == rd_ptr)

EMPTY = 1;

else

EMPTY = 0;

if((wr_ptr - rd_ptr) == 15)

FULL = 1;

else

FULL = 0;

end

//roll_over==1

else

begin

EMPTY = 0;

if(wr_ptr == rd_ptr)

FULL = 1;

else

FULL = 0;

end

end

//write and read ennable defines

logic wr_en,n_wr_en,rd_en,n_rd_en;

always_ff@(posedge clk or negedge rst)

begin

if(!rst)

begin

wr_en <= 0;

rd_en <= 0;

end

else

begin

wr_en <= n_wr_en;

rd_en <= n_rd_en;

end

end

always_comb

if(start == 0)

begin

n_wr_en = 0;

Page 86: DESIGN OF A POWER AWARE ENCRYPTION ACCELERATOR A …

73

if(FULL == 1)

n_rd_en = 1;

else if(EMPTY == 1)

n_rd_en = 0;

else

n_rd_en = 1;

end

else

begin

if(FULL == 1)

begin

n_rd_en = 1;

n_wr_en = 0;

end

else if (EMPTY == 1)

begin

n_rd_en = 0;

n_wr_en = 1;

end

else

begin

n_rd_en = 1;

n_wr_en = 1;

end

end

//write pointer

always_ff@(posedge clk or negedge rst)

begin

if(!rst)

begin

wr_ptr <= 0;

end

else if(wr_en)

begin

wr_ptr <= wr_ptr + 1;

end

else if(clear == 1)

wr_ptr <= 0;

else

wr_ptr <= wr_ptr;

end

//read pointer

logic [1:0] state,n_state;

Page 87: DESIGN OF A POWER AWARE ENCRYPTION ACCELERATOR A …

74

logic [3:0] cnt,n_cnt;

logic [3:0] n_rd_ptr;

logic flag_st,n_flag_st;

always_ff@(posedge clk,negedge rst)

begin

if(!rst)

begin

cnt <= 0;

state <= 0;

rd_ptr <= 0;

flag_st <= 0;

end

else

begin

cnt <= n_cnt;

state <= n_state;

rd_ptr <= n_rd_ptr;

flag_st <= n_flag_st;

end

end

always_comb

begin

n_flag_st = 0;

clear = 0;

ready = 0;

case(state)

2'b00:

begin

if(flag_st == 1)

begin

{start_in,ADon,MSGon,FINon_in} = 4'b0000;

n_cnt = cnt;

n_state = state;

n_rd_ptr = rd_ptr;

//done_clk_gate = 0;

end

else if(cnt <= 8 && rd_en == 1)

begin

{start_in,ADon,MSGon,FINon_in} = 4'b1000;

n_cnt = cnt + 1;

n_state = state;

n_rd_ptr = rd_ptr;

//if(cnt <= 3)

// done_clk_gate = 0;

//else

Page 88: DESIGN OF A POWER AWARE ENCRYPTION ACCELERATOR A …

75

// done_clk_gate = 1;

end

else if(rd_en == 1)

begin

{start_in,ADon,MSGon,FINon_in} = 4'b1000;

n_cnt = 0;

n_state = state + 1;

n_rd_ptr = rd_ptr + 1;

//done_clk_gate = 1;

end

else

begin

{start_in,ADon,MSGon,FINon_in} = 4'b0000;

clear = 1;

n_cnt = 0;

n_state = state;

//n_rd_ptr = rd_ptr;

n_rd_ptr = 0;

//done_clk_gate = 0;

ready = 1;

end

end

2'b01:

begin

{start_in,ADon,MSGon,FINon_in} = 4'b1100;

//done_clk_gate = 1;

if(cnt <= 0 && rd_en == 1)

begin

n_cnt = cnt + 1;

n_state = state;

n_rd_ptr = rd_ptr + 1;

end

else if(rd_en == 1)

begin

n_cnt = 0;

n_state = state + 2;

n_rd_ptr = rd_ptr + 1;

end

else

begin

n_cnt = cnt;

n_state = state;

n_rd_ptr = rd_ptr;

end

end

Page 89: DESIGN OF A POWER AWARE ENCRYPTION ACCELERATOR A …

76

2'b11:

begin

//done_clk_gate = 1;

n_cnt = 0;

if((FINon==0) && (rd_en == 1))

begin

{start_in,ADon,MSGon,FINon_in} = 4'b1110;

n_state = state;

n_rd_ptr = rd_ptr + 1;

end

else if(rd_en == 1)

begin

{start_in,ADon,MSGon,FINon_in} = 4'b1111;

n_state = state - 1;

n_rd_ptr = rd_ptr + 1;

end

else

begin

{start_in,ADon,MSGon,FINon_in} = 4'b1110;

n_state = state;

n_rd_ptr = rd_ptr;

end

end

2'b10:

begin

//done_clk_gate = 1;

{start_in,ADon,MSGon,FINon_in} = 4'b1111;

if(cnt <= 5 && rd_en == 1)

begin

n_cnt = cnt + 1;

n_state = state;

n_rd_ptr = rd_ptr;

end

else if(rd_en == 1)

begin

n_cnt = 0;

n_state = state + 2;

// if(EMPTY == 1)

n_rd_ptr = rd_ptr + 1;

// else

// n_rd_ptr = rd_ptr;

end

else

begin

n_cnt = cnt;

n_state = state;

Page 90: DESIGN OF A POWER AWARE ENCRYPTION ACCELERATOR A …

77

n_rd_ptr = rd_ptr;

end

if(cnt)

n_flag_st = 1;

else

n_flag_st = 0;

end

endcase

end

endmodule

encryption.sv

`include "pckg.sv"

//`include "port.sv"

//`include "top_rtl.sv"

//`include "layrd_test.sv"

`include "controller.sv"

`include "initialization.sv"

`include "aes128.sv"

`include "stateupdate.sv"

`include "datapath.sv"

module encryption(

input [127:0] inbus1, inbus2,

input rst,clk,start,ADon,MSGon,FINon,poff_rtl,

output [127:0] c2,c1 , //c2 for crypted/0 and c1 for crypted/T

output tagout,done,poff_init

);

//module top_rtl(port prt);

wire [1023:0] state_in;

wire FINon_tmp;

wire [127:0] ADMSGlen_tmp;

wire start_tmp,start_tmp2;

controller y0( .rst(rst), .clk(clk), .start(start), .ADon(ADon), .MSGon(MSGon), .FINon(FINon),

.start2(start_tmp), .start3(start_tmp2),.tagout(tagout), .done(done), .FINon2(FINon_tmp),

.ADMSG_len(ADMSGlen_tmp) );

initialization y1(.rst(rst), .clk(clk), .Key(inbus1), .Nonce(inbus2), .state_init(state_in) );

Page 91: DESIGN OF A POWER AWARE ENCRYPTION ACCELERATOR A …

78

datapath y2( .rst(rst), .clk(clk), .FINon(FINon), .FINon2(FINon_tmp), .start2(start_tmp),

.tagout(tagout), .Msg1(inbus1), .Msg2(inbus2), .ADMSG_len(ADMSGlen_tmp),

.state_init(state_in), .cg_start(start), .enc_msg1(c1), .enc_msg2(c2) );

assign poff_init = poff_rtl | start_tmp2 ;

//assign FINon2 = FINon_tmp;

//assign prt.tagout = tagout_tmp;

//assign prt.done = done_tmp;

Endmodule

initialization.sv

module initialization(

input rst,clk,

input [127:0] Key, Nonce,//bus128_in1 for top module

output bus1024 state_init

);

//store256 cnst;

parameter cnst_MS128 =

128'h00_01_01_02_03_05_08_0d_15_22_37_59_90_e9_79_62;

parameter cnst_LS128 = 128'hdb_3d_18_55_6d_c2_2f_f1_20_11_31_42_73_b5_28_dd;

always_comb

begin

state_init.bus0 = Key^Nonce;

state_init.bus1 = cnst_LS128;

state_init.bus2 = cnst_MS128;

state_init.bus3 = cnst_LS128;

state_init.bus4 = Key^Nonce;

state_init.bus5 = Key^cnst_MS128;

state_init.bus6 = Key^cnst_LS128;

state_init.bus7 = Key^cnst_MS128;

end

endmodule

controller.sv

module controller(

input rst,clk,start,ADon,MSGon,FINon,

output logic start2,start3,tagout,done,FINon2,

output logic [127:0] ADMSG_len

);

Page 92: DESIGN OF A POWER AWARE ENCRYPTION ACCELERATOR A …

79

logic n_start2,n_start3,n_FINon2;

logic [63:0] adlen,n_adlen,msglen,n_msglen;

//reusing unused bits of adlen to avoid extra flops, msglen is also 64 bits

logic [2:0] counter,n_counter; // to count 7 cycles to set 'done' on receiving FINon on signal

assign ADMSG_len = {adlen,msglen}; //concatenating adlen and msglen to give 128 bits

assign tagout = (counter == 5 | counter == 6 ); //for output mux select

assign done = (counter == 6); //for completion signal

assign adlen[63:2] = 0;

assign adlen[0] = 0;

assign msglen[63:11] = 0;

//sequential block

always_ff@(posedge clk,negedge rst)

begin

if(!rst)

begin

start2 <= 0;

start3 <= 0;

FINon2 <= 0;

adlen[1] <= 0;

msglen[10:0] <= 0;

counter <= 0;

end

else

begin

start2 <= n_start2;

start3 <= n_start3;

//start2 <= start;

//start3 <= start2;

//FINon2 <= FINon;

FINon2 <= n_FINon2;

adlen[1] <= n_adlen[1];

msglen[10:0] <= n_msglen[10:0];

counter <= n_counter;

end

end

//combinatorial block

always_comb

begin

if(!start)

begin

Page 93: DESIGN OF A POWER AWARE ENCRYPTION ACCELERATOR A …

80

n_start2 = 0;

n_start3 = 0;

n_FINon2 = 0;

n_adlen[1] = 0;

n_msglen[10:0] = 0;

n_counter = 0;

end

else

begin

n_start2 = start;

n_start3 = start2;

n_FINon2 = FINon;

n_adlen[1] = (adlen[1] | ADon) && start;

if(MSGon & !FINon)

n_msglen[10:0] = msglen[10:0] + 1;

else

n_msglen[10:0] = msglen[10:0];

if(FINon)

n_counter = counter+1;

else

n_counter = counter;

end

end

endmodule

datapath.sv

module datapath(

input rst,clk,FINon,FINon2,start2,tagout,

input [127:0] Msg1,Msg2,ADMSG_len,

input bus1024 state_init,

input cg_start,

output logic [127:0] enc_msg1,enc_msg2

);

logic [127:0] tmp,n_tmp,A,B;

bus1024 state,n_state,state_tmp;

//assign tmp = state.bus2 ^ ADMSG_len;

//comb logic/xor

stateupdate ins(state_tmp,A,B,n_state);

Page 94: DESIGN OF A POWER AWARE ENCRYPTION ACCELERATOR A …

81

always_ff@(posedge clk, negedge rst)

begin

if(!rst)

state <= 0;

else //if(cg_start)

state <= n_state;

end

always_ff@(posedge clk, negedge rst)

begin

if(!rst)

tmp <= 0;

else

tmp <= n_tmp;

end

always_comb

begin

if(!FINon2)

n_tmp = state.bus2.ADMSG_len;

else

n_tmp = tmp;

//mux1

if(FINon2)

begin

A = tmp;

B = tmp;

end

else

begin

A = Msg1;

B = Msg2;

end

//mux2

if(start2 & cg_start)

begin

state_tmp = state;

end

else

begin

state_tmp = state_init;

end

//output muxes

if(tagout)

begin

enc_msg2 = 0;

Page 95: DESIGN OF A POWER AWARE ENCRYPTION ACCELERATOR A …

82

enc_msg1 = state.bus0 ^ state.bus1 ^ state.bus2 ^ state.bus3 ^ state.bus4 ^

state.bus5 ^ state.bus6;

end

else

begin

enc_msg2 = Msg2 ^ state.bus1 ^ state.bus6 ^ (state.bus2 & state.bus3);

enc_msg1 = Msg1 ^ state.bus2 ^ state.bus5 ^ (state.bus6 & state.bus7);

end

end

endmodule

stateupdate.sv

module stateupdate

(

input bus1024 S,

input [127:0] A,B,

output bus1024 S_updted

);

logic [127:0] temp1,temp2;

assign temp1 = S.bus0 ^ A;

assign temp2 = S.bus4 ^ B;

aes128 x0(S.bus7,temp1,S_updted.bus0);

aes128 x1(S.bus0,S.bus1,S_updted.bus1);

aes128 x2(S.bus1,S.bus2,S_updted.bus2);

aes128 x3(S.bus2,S.bus3,S_updted.bus3);

aes128 x4(S.bus3,temp2,S_updted.bus4);

aes128 x5(S.bus4,S.bus5,S_updted.bus5);

aes128 x6(S.bus5,S.bus6,S_updted.bus6);

aes128 x7(S.bus6,S.bus7,S_updted.bus7);

endmodule

aes128.sv //[11]

module aes128(

input [127:0] A,B,

output [127:0] C

);

wire [7:0] A_matrix[0:3][0:3];

wire [7:0] A_sub[0:3][0:3];

wire [7:0] A_srow[0:3][0:3];

wire [7:0] A_mcol[0:3][0:3];

Page 96: DESIGN OF A POWER AWARE ENCRYPTION ACCELERATOR A …

83

wire [127:0] A_out;

assign A_matrix[0][0] = A[127:120];

assign A_matrix[1][0] = A[119:112];

assign A_matrix[2][0] = A[111:104];

assign A_matrix[3][0] = A[103:96];

assign A_matrix[0][1] = A[95:88];

assign A_matrix[1][1] = A[87:80];

assign A_matrix[2][1] = A[79:72];

assign A_matrix[3][1] = A[71:64];

assign A_matrix[0][2] = A[63:56];

assign A_matrix[1][2] = A[55:48];

assign A_matrix[2][2] = A[47:40];

assign A_matrix[3][2] = A[39:32];

assign A_matrix[0][3] = A[31:24];

assign A_matrix[1][3] = A[23:16];

assign A_matrix[2][3] = A[15:8];

assign A_matrix[3][3] = A[7:0];

assign A_sub[0][0] = sbox(A_matrix[0][0]);

assign A_sub[0][1] = sbox(A_matrix[0][1]);

assign A_sub[0][2] = sbox(A_matrix[0][2]);

assign A_sub[0][3] = sbox(A_matrix[0][3]);

assign A_sub[1][0] = sbox(A_matrix[1][0]);

assign A_sub[1][1] = sbox(A_matrix[1][1]);

assign A_sub[1][2] = sbox(A_matrix[1][2]);

assign A_sub[1][3] = sbox(A_matrix[1][3]);

assign A_sub[2][0] = sbox(A_matrix[2][0]);

assign A_sub[2][1] = sbox(A_matrix[2][1]);

assign A_sub[2][2] = sbox(A_matrix[2][2]);

assign A_sub[2][3] = sbox(A_matrix[2][3]);

assign A_sub[3][0] = sbox(A_matrix[3][0]);

assign A_sub[3][1] = sbox(A_matrix[3][1]);

assign A_sub[3][2] = sbox(A_matrix[3][2]);

assign A_sub[3][3] = sbox(A_matrix[3][3]);

assign A_srow[0][0] = A_sub[0][0];

assign A_srow[0][1] = A_sub[0][1];

assign A_srow[0][2] = A_sub[0][2];

assign A_srow[0][3] = A_sub[0][3];

assign A_srow[1][0] = A_sub[1][1];

assign A_srow[1][1] = A_sub[1][2];

assign A_srow[1][2] = A_sub[1][3];

assign A_srow[1][3] = A_sub[1][0];

assign A_srow[2][0] = A_sub[2][2];

Page 97: DESIGN OF A POWER AWARE ENCRYPTION ACCELERATOR A …

84

assign A_srow[2][1] = A_sub[2][3];

assign A_srow[2][2] = A_sub[2][0];

assign A_srow[2][3] = A_sub[2][1];

assign A_srow[3][0] = A_sub[3][3];

assign A_srow[3][1] = A_sub[3][0];

assign A_srow[3][2] = A_sub[3][1];

assign A_srow[3][3] = A_sub[3][2];

assign {A_mcol[0][0],A_mcol[1][0],A_mcol[2][0],A_mcol[3][0]} =

mix_col(A_srow[0][0],A_srow[1][0],A_srow[2][0],A_srow[3][0]);

assign {A_mcol[0][1],A_mcol[1][1],A_mcol[2][1],A_mcol[3][1]} =

mix_col(A_srow[0][1],A_srow[1][1],A_srow[2][1],A_srow[3][1]);

assign {A_mcol[0][2],A_mcol[1][2],A_mcol[2][2],A_mcol[3][2]} =

mix_col(A_srow[0][2],A_srow[1][2],A_srow[2][2],A_srow[3][2]);

assign {A_mcol[0][3],A_mcol[1][3],A_mcol[2][3],A_mcol[3][3]} =

mix_col(A_srow[0][3],A_srow[1][3],A_srow[2][3],A_srow[3][3]);

assign A_out = {A_mcol[0][0],A_mcol[1][0],A_mcol[2][0],A_mcol[3][0],

A_mcol[0][1],A_mcol[1][1],A_mcol[2][1],A_mcol[3][1],

A_mcol[0][2],A_mcol[1][2],A_mcol[2][2],A_mcol[3][2],

A_mcol[0][3],A_mcol[1][3],A_mcol[2][3],A_mcol[3][3]} ;

assign C = A_out ^ B;

endmodule

Page 98: DESIGN OF A POWER AWARE ENCRYPTION ACCELERATOR A …

85

APPENDIX B: AEGIS128L VERIFICATION ENVIRONMENT

Layered Testbench:

program automatic layrd_test(port prt);

//initial $monitor("state=%h\n",rtl_inst.y2.state);

class environment;

class transaction;

logic tagout,done,start;

bus128 c1,c2;

rand logic[255:0] gen_data_in;

logic FINon;

constraint rndm

{

gen_data_in[255:128] > 0;

gen_data_in[127:0] > 0;

}

constraint assoc_data

{

gen_data_in[255:128] > 0;

gen_data_in[127:0] inside {0};

}

constraint ending

{

gen_data_in[255:128] inside {0};

gen_data_in[127:0] inside {0};

}

endclass:transaction

class generator;

transaction t;

mailbox #(transaction) gen2agt;

//static bus128 nonce=0;

bus128 nonce=1;

function new(input mailbox #(transaction) gen2agt);

this.gen2agt = gen2agt;

//nonce = nonce+1;

endfunction:new

task run(input int count,msgs);

//static bus128 nonce;

//nonce=0;

Page 99: DESIGN OF A POWER AWARE ENCRYPTION ACCELERATOR A …

86

bus128 x,y;

repeat(msgs)

begin

t=new();

t.constraint_mode(0);

t.rndm.constraint_mode(1);

assert(t.randomize());

t.start = 1;

t.FINon=0;

t.gen_data_in[127:0]=nonce;

nonce=nonce+1;

repeat(2)

//repeat(8)

begin

gen2agt.put(t);

end

t=new();

t.constraint_mode(0);

t.rndm.constraint_mode(1);

assert(t.randomize());

t.start = 1;

t.FINon=0;

gen2agt.put(t);

t=new();

t.constraint_mode(0);

t.assoc_data.constraint_mode(1);

assert(t.randomize());

t.start = 1;

t.FINon=0;

gen2agt.put(t);

repeat(count)

begin

t=new();

t.constraint_mode(0);

t.rndm.constraint_mode(1);

assert(t.randomize());

Page 100: DESIGN OF A POWER AWARE ENCRYPTION ACCELERATOR A …

87

t.start = 1;

t.FINon=0;

gen2agt.put(t);

end

t=new();

t.constraint_mode(0);

t.ending.constraint_mode(1);

assert(t.randomize());

t.start = 1;

t.FINon=1;

gen2agt.put(t);

repeat(21)

begin

t=new();

t.constraint_mode(0);

t.rndm.constraint_mode(1);

assert(t.randomize());

t.start = 0;

t.FINon=0;

t.gen_data_in=0;

gen2agt.put(t);

end

end

endtask:run

endclass:generator

class agent;

bus128 inbus1_scr[*],inbus2_scr[*];

logic start_scr[*],FINon_scr[*];

mailbox #(transaction) gen2agt,agt2drv;

transaction t;

function new(input mailbox # (transaction) gen2agt,agt2drv);

// function new(input mailbox # (transaction) gen2agt);

this.gen2agt = gen2agt;

this.agt2drv = agt2drv;

endfunction:new

Page 101: DESIGN OF A POWER AWARE ENCRYPTION ACCELERATOR A …

88

task run(input int count,msgs,output bus128 inbus1_scr[*],inbus2_scr[*],output

logic start_scr[*],FINon_scr[*]);

int i,fg=0;

i = (count + 26)*msgs;

for (int k=0; k < i; k=k+1)

begin

gen2agt.get(t);

//$display("inbus1=%d,,inbus2=%d,,start=%d,,FINon=%d",t.gen_data_in[255:128],t.gen_data_in

[127:0],t.start,t.FINon);

{FINon_scr[k],inbus1_scr[k],inbus2_scr[k]} =

{t.FINon,t.gen_data_in};

start_scr[k] = t.start;

//FINon[k] = t.FINon;

agt2drv.put(t);

end

endtask:run

endclass:agent

class driver;

mailbox #(transaction) agt2drv;

transaction t;

function new(input mailbox # (transaction) agt2drv);

this.agt2drv = agt2drv;

endfunction:new

task run (input int count,msgs);

int i;

i = (count + 26)*msgs;

//agt2drv.get(t);

repeat(i)

begin

agt2drv.get(t);

//$display("inbus1=%d,,inbus2=%d\n",t.gen_inbus1,t.gen_inbus2);

prt.ck.data_in <= {t.FINon,t.gen_data_in};

prt.ck.start <= t.start;

@prt.ck;

//$display($time,

"state_init=%h\nstate=%h\nin1=%d\nin2=%d",rtl_inst.z3.y2.state_init,rtl_inst.z3.y2.n_state,rtl_in

st.z3.y2.Msg1,rtl_inst.z3.y2.Msg2);

end

endtask:run

Page 102: DESIGN OF A POWER AWARE ENCRYPTION ACCELERATOR A …

89

endclass:driver

class monitor;

transaction t;

mailbox #(transaction) mon2chk;

function new(ref mailbox #(transaction) mon2chk);

this.mon2chk = mon2chk;

endfunction:new

task run (input int count,msgs);

int i;

i = (count + 26)*msgs; //can be 20 or 21

t=new();

//$display("state=%h\nin1=%d\nin2=%d",rtl_inst.y2.state,rtl_inst.y2.Msg1,rtl_inst.y2.Ms

g2);

//@prt.ck i removed

repeat(i)

begin

//

$display("state=%h\nin1=%d\nin2=%d",rtl_inst.y2.state,rtl_inst.y2.Msg1,rtl_inst.y2.Msg

2);

@prt.ck

t.c2 <= prt.ck.c2;

t.c1 <= prt.ck.c1;

t.tagout <= prt.ck.tagout;

t.done <= prt.ck.done;

mon2chk.put(t);

//$display($time," start2=%d

Msg1=%h\nMsg2=%h\nst_n=%h\nst_init=%h\ncg_start=%d\nstate=%h\nstate_tmp=%h\nenc_m

1=%h\nenc_m2=%h\n\n", rtl_inst.z3.y2.start2,rtl_inst.z3.y2.Msg1,rtl_inst.z3.y2.Msg2,

rtl_inst.z3.y2.n_state,rtl_inst.z3.y2.state_init,

rtl_inst.z3.y2.cg_start,rtl_inst.z3.y2.state,rtl_inst.z3.y2.state_tmp,rtl_inst.z3.y2.enc_msg1,

rtl_inst.z3.y2.enc_msg2);

//$display($time,"

start=%d,ADon=%d,MSGon=%d,FINon=%d,tagout=%d,done=%d\ninbus1=%h\ninbus2=%h\nc

1=%h\nc2=%h\n\n",rtl_inst.z3.start,rtl_inst.z3.ADon,rtl_inst.z3.MSGon,rtl_inst.z3.FINon,rtl_inst

.z3.tagout,rtl_inst.z3.done,rtl_inst.z3.inbus1,rtl_inst.z3.inbus2,rtl_inst.z3.c1,rtl_inst.z3.c2);

//$display($time,"

start=%d,FINon=%d,tagout=%d,done=%d\ninbus1=%h\ninbus2=%h\nc1=%h\nc2=%h\n\nn_stat

Page 103: DESIGN OF A POWER AWARE ENCRYPTION ACCELERATOR A …

90

e=%h\n\n",rtl_inst.z3.start,rtl_inst.z3.FINon,rtl_inst.z3.tagout,rtl_inst.z3.done,rtl_inst.z3.inbus1,rt

l_inst.z3.inbus2,rtl_inst.z3.c1,rtl_inst.z3.c2,rtl_inst.z3.y2.n_state);

//$display($time,"

start=%d,FINon=%d,tagout=%d,done=%d\nn_state=%h\n\nstate=%h\ninbus1=%h\ninbus2=%h\

nc1=%h\nc2=%h\n\n",rtl_inst.z2.start,rtl_inst.z1.infin,rtl_inst.z3.tagout,rtl_inst.z3.done,rtl_inst.z

3.y2.n_state,rtl_inst.z3.y2.state,rtl_inst.z3.inbus1,rtl_inst.z3.inbus2,rtl_inst.z3.c1,rtl_inst.z3.c2);

end

endtask:run

endclass:monitor

class scoreboard;

transaction t;

task run(input int count,msgs,input bus128 inbus1_scr[*],inbus2_scr[*],input

logic start[*],FINon_scr[*],output bus128 c1_scr[*],c2_scr[*],output logic

tagout_scr[*],done_scr[*],output bus1024 state1[*]);

int i,k,m=0,n,p,l,o,q,t,u,num,FINon2;

logic Fin2=0;

for(int d=0;d<=msgs-1;d++)

begin

// p=i/msgs;

for(k=1; k<=count+25; k++)

begin

if(k==1) num = (msgs*(count+26)) +26+1;

l = (k==2);

m = ((k>=3) && (k<=11));

n = ((k>=12) && (k<=13));

o = ((k>=14) && (k<14+count));

q = ((k>=14+count) && (k<14+count+7));

FINon2 = ((k>14+count) && (k<14+count+7));

t = ((k>=14+count+5) && (k<=14+count+6));

u = ((k == 14+count+6));

if(m) num=1+(count+26)*d;

else if(n|o) num=(k+(count+26)*d-10);

else if(q) num=4+count+(count+26)*d;

if(l) num=1+(count+26)*d;

i = k + (count + 26)*d;

// $display("i=%d count=%d msgs=%d k=%d m=%d

n=%d",i,count,msgs,k,m,n);

expected(FINon2,l,p,d,k,m,n,o,q,t,u,inbus1_scr[num],inbus2_scr[num],start_scr[i],FINon

_scr[i],Fin2,state1[i],c1_scr[i],c2_scr[i],tagout_scr[i],done_scr[i]);

Fin2 = FINon_scr[k];

Page 104: DESIGN OF A POWER AWARE ENCRYPTION ACCELERATOR A …

91

//$display("c1=%h\nc2=%h\nin1=%h\nin2=%h\nstart=%d done=%d d=%d k=%d\nstate=%h\n---

--\n",c1_scr[i],c2_scr[i],inbus1_scr[num],inbus2_scr[num],start_scr[i],done_scr[i],d,k,state1[i]);

end

end

endtask

task static expected(input int FINon2,l,p,d,k,m,n,o,q,t,u,input bus128

inbus1,inbus2,input logic start,FINon,Fin2,output bus1024 state, output bus128 c1,c2,output logic

tagout,done);

bus1024 state1;

logic [10:0] msglenn = 0;

bus128 tmp;

static logic cnt;

logic FINon2;

store256

cnst=256'h00_01_01_02_03_05_08_0d_15_22_37_59_90_e9_79_62_db_3d_18_55_6d_c2_2f_f

1_20_11_31_42_73_b5_28_dd;

//$display("e_state=%h",state);

if(l==1)

begin

done=0;

state.bus0 = inbus1^inbus2;

state.bus1 = cnst.LS128;

state.bus2 = cnst.MS128;

state.bus3 = cnst.LS128;

state.bus4 = inbus1^inbus2;

state.bus5 = inbus1^cnst.MS128;

state.bus6 = inbus1^cnst.LS128;

state.bus7 = inbus1^cnst.MS128;

c2 = inbus2 ^ state.bus1 ^ state.bus6 ^ (state.bus2 & state.bus3);

c1 = inbus1 ^ state.bus2 ^ state.bus5 ^ (state.bus6 & state.bus7);

stateupdate(state,inbus1,inbus2,state1);

state = state1;

end

else if(m==1 | n==1)

begin

c2 = inbus2 ^ state.bus1 ^ state.bus6 ^ (state.bus2 & state.bus3);

Page 105: DESIGN OF A POWER AWARE ENCRYPTION ACCELERATOR A …

92

c1 = inbus1 ^ state.bus2 ^ state.bus5 ^ (state.bus6 & state.bus7);

stateupdate(state,inbus1,inbus2,state1);

state = state1;

done=0;

end

else if(o==1)

begin

c2 = inbus2 ^ state.bus1 ^ state.bus6 ^ (state.bus2 & state.bus3);

c1 = inbus1 ^ state.bus2 ^ state.bus5 ^ (state.bus6 & state.bus7);

stateupdate(state,inbus1,inbus2,state1);

state = state1;

tagout = 1'b0;

done = 1'b0;

end

else if(q==1)

begin

msglenn = count;

tmp = state.bus2 ^ {62'd0,1'b1,1'b0,53'd0,msglenn};

c2 = inbus2 ^ state.bus1 ^ state.bus6 ^ (state.bus2 & state.bus3);

c1 = inbus1 ^ state.bus2 ^ state.bus5 ^ (state.bus6 & state.bus7);

if(u==1)

begin

done = 1'b1;

c2 = 0;

c1 = state.bus0 ^ state.bus1 ^ state.bus2 ^ state.bus3 ^ state.bus4 ^ state.bus5 ^ state.bus6;

end

else

done = 1'b0;

if(FINon2)

begin

stateupdate(state,tmp,tmp,state1);

state = state1;

end

else

begin

stateupdate(state,inbus1,inbus2,state1);

state = state1;

end

Page 106: DESIGN OF A POWER AWARE ENCRYPTION ACCELERATOR A …

93

if(t==1) begin

tagout = 1'b1;

end

end

else

begin

c2 = inbus2 ^ state.bus1 ^ state.bus6 ^ (state.bus2 & state.bus3);

c1 = inbus1 ^ state.bus2 ^ state.bus5 ^ (state.bus6 & state.bus7);

end

//$display("l=%d m=%d n=%d o=%d q=%d t=%d u=%d done=%d\n\n",l,m,n,o,q,t,u,done);

//$display("c1=%h\nc2=%h\nin1=%h\nin2=%h\nstart=%d done=%d d=%d k=%d\nstate=%h\n---

--\n",c1,c2,inbus1,inbus2,start,done,d,k,state);

//$display("e_state=%h",state);

endtask:expected

endclass:scoreboard

class checker;

int counter,error;

transaction t;

mailbox # (transaction) mon2chk;

function new (input mailbox #(transaction) mon2chk);

this.mon2chk = mon2chk;

endfunction

task run(input int count,msgs,input bus128 inbus1_scr[*],inbus2_scr[*],input

logic start_scr[*],FINon_scr[*],input bus128 c1_scr[*],c2_scr[*],input logic

tagout_scr[*],done_scr[*],input bus1024 state1[*]);

int i,k,p;

i = (count + 26)*msgs;

p=i/msgs;

for(k=0;k<i;k++)

begin

@prt.ck; //???ok

mon2chk.get(t);

if(k%p==2)

$display("key=%h nonce=%h\n",rtl_inst.z3.inbus1,rtl_inst.z3.inbus2);

Page 107: DESIGN OF A POWER AWARE ENCRYPTION ACCELERATOR A …

94

if(k%p == (14+count+6))

begin

if(rtl_inst.z3.c1 === c1_scr[k] && rtl_inst.z3.c2 === c2_scr[k])

begin

counter = counter+1;

$display("authentication_tag=%h\ne_authentication_tag=%h\n exp_done=%d done=%d

\n",rtl_inst.z3.c1,c1_scr[k],done_scr[k],rtl_inst.z3.done);

end

else

begin

counter = counter+1;

error = error+1;

$display("authentication_tag=%h\ne_authentication_tag=%h\n exp_done=%d done=%d

error \n",rtl_inst.z3.c1,c1_scr[k],done_scr[k],rtl_inst.z3.done);

end

end

if((k%p > 13 && k%p < 14+count)) // | k%p == 22)

begin

if(rtl_inst.z3.c1 === c1_scr[k] && rtl_inst.z3.c2 === c2_scr[k])

begin

counter = counter+1;

$display("in2=%h in1=%h\nc2=%h c1=%h\ne_c2=%h

e_c1=%h\n exp_done=%d done=%d

\n",rtl_inst.z3.inbus2,rtl_inst.z3.inbus1,rtl_inst.z3.c2,rtl_inst.z3.c1,c2_scr[k],c1_scr[k],done_scr[k

],rtl_inst.z3.done);

end

else

begin

counter = counter+1;

error = error+1;

$display("in2=%h in1=%h\nc2=%h c1=%h\ne_c2=%h

e_c1=%h\n exp_done=%d done=%d error

\n",rtl_inst.z3.inbus2,rtl_inst.z3.inbus1,rtl_inst.z3.c2,rtl_inst.z3.c1,c2_scr[k],c1_scr[k],done_scr[k

],rtl_inst.z3.done);

end

end

if(k%(count+26) == 0) $display("----------\nStarting next msg to

encrypt\n");

end

$display("Total Errors = %5d \n",error);

Page 108: DESIGN OF A POWER AWARE ENCRYPTION ACCELERATOR A …

95

//@prt.ck; ///?????

endtask:run

endclass:checker

generator gen;

agent agt;

driver drv;

monitor mon;

checker chk;

scoreboard scb;

mailbox #(transaction) gen2agt,agt2drv,mon2chk;

// mailbox #(transaction) gen2agt,agt2drv;

function void build();

gen2agt = new;

agt2drv = new;

mon2chk = new;

gen = new(gen2agt);

agt = new(gen2agt,agt2drv);

//agt = new(gen2agt);

drv = new(agt2drv);

//mon = new(); //mhp

mon = new(mon2chk);

chk = new(mon2chk);

scb = new();

endfunction:build

task run();

fork

gen.run(count,msgs);

agt.run(count,msgs,inbus1_scr,inbus2_scr,start_scr,FINon_scr);

drv.run(count,msgs);

mon.run(count,msgs);

scb.run(count,msgs,inbus1_scr,inbus2_scr,start_scr,FINon_scr,c1_scr,c2_scr,tagout_scr,d

one_scr,state1);

chk.run(count,msgs,inbus1_scr,inbus2_scr,start_scr,FINon_scr,c1_scr,c2_scr,tagout_scr,

done_scr,state1);

join

endtask:run

endclass:environment

Page 109: DESIGN OF A POWER AWARE ENCRYPTION ACCELERATOR A …

96

//coverage

covergroup Cov;

coverpoint prt.data_in[255:128] {

bins lowest = {[0:2**32]};

bins medlow = {[2**32:2**64]};

bins medhigh = {[2**64:2**96]};

bins highest = {[2**96:2**128]};

option.at_least = 2000; }

coverpoint prt.data_in[127:0] {

bins lowest = {[0:2**32]};

bins medlow = {[2**32:2**64]};

bins medhigh = {[2**64:2**96]};

bins highest = {[2**96:2**128]};

option.at_least = 2000; }

endgroup

//

environment env;

real cov_num;

Cov cv;

int count;

int msgs;

bus128 inbus1_scr[*],inbus2_scr[*];

bus128 inbus1_temp,inbus2_temp;

bus128 c1_scr[*],c2_scr[*];

bus1024 state1[*];

logic start_scr[*],FINon_scr[*],tagout_scr[*],done_scr[*];

initial

begin

env = new();

//count=1;

//count= 1 + $unsigned($random)%15;

count= 2048;

//msgs = $unsigned($random)%55;

msgs = 20;

$display("each message has total =%d 256 bits , total msgs = %d",count,msgs);

fork

begin

env.build();

env.run();

end

join_any

Page 110: DESIGN OF A POWER AWARE ENCRYPTION ACCELERATOR A …

97

cv=new();

for(int iter=0;iter<(msgs*(count+26)); iter++)

begin

inbus1_temp = inbus1_scr[iter];

inbus2_temp = inbus2_scr[iter];

cv.sample();

end

repeat(msgs*(count+26)+12) @prt.ck;

cov_num = $get_coverage;

$display("\nfunctional coverage = %f\n",cov_num);

end

endprogram

//endmodule

Top.sv

`include "pckg.sv"

`include "port.sv"

`include "top_rtl.sv"

`include "layrd_test.sv"

`include "fifo.sv"

`include "fifo_cntlr.sv"

`include "controller.sv"

`include "initialization.sv"

`include "aes128.sv"

`include "stateupdate.sv"

`include "datapath.sv"

`include "encryption.sv"

//`include "controller.sv"

//`include "initialization.sv"

//`include "aes128.sv"

//`include "stateupdate.sv"

//`include "datapath.sv"

module top;

bit clk=0,rst=1;

initial

begin

Page 111: DESIGN OF A POWER AWARE ENCRYPTION ACCELERATOR A …

98

#1 rst = 0;

#1 rst = 1;

end

always

#5 clk = ~clk;

port prt(clk,rst);

top_rtl rtl_inst(prt.top_rtl1);

layrd_test tb_inst(prt.layrd_test1);

endmodule

test.sv //regular testbench used to generate the SAIF files

//`include "top_rtl.sv"

`include "top_rtl_synthesized.v"

//`include "saed90final.v"

module test;

logic clk,rst,start;

logic [256:0] data_in;

wire [127:0] c1,c2;

wire tagout,done;

logic [127:0] inb1,inb2;

top_rtl w1(rst,clk,start,data_in,c1,c2,tagout,done);

//initial

//$vcdpluson;

always

#16 clk = ~clk;

initial

begin

//$set_gate_level_monitoring("ON");

$set_toggle_region(test.w1);

$toggle_start();

clk=0;

//$monitor($time,"st=%d AD=%d MSG=%d FI=%d roll=%d start=%d data_out=%d EMPTY=%d

FULL=%d wr_ptr=%d rd_ptr=%d rd_en=%d wr_en=%d cnt=%d state=%d fin=%d

done_clk_gate=%d

Page 112: DESIGN OF A POWER AWARE ENCRYPTION ACCELERATOR A …

99

rdy=%d",start_in,ADon,MSGon,FINon_in,x2.roll_over,start,data_out[9:0],EMPTY,FULL,x2.wr

_ptr,x2.rd_ptr,x2.rd_en,x2.wr_en,x2.cnt,x2.state,x2.FINon,x2.done_clk_gate,ready);

//$monitor($time,"st=%d AD=%d MSG=%d FI=%d start=%d data_out=%d EMPTY=%d

FULL=%d state=%d cnt=%d done_clk_gate=%d

rdy=%d",start_in,ADon,MSGon,FINon_in,start,data_out[9:0],EMPTY,FULL,x2.state,x2.cnt,x2.d

one_clk_gate,ready);

$monitor($time,"in1=%d in2=%d\nst=%d c1=%d c2=%d\n tg=%d done=%din_in1=%d

in_in2=%d\nin_st=%d in_ad=%d in_msg=%d

in_fin=%d\n",data_in[255:128],data_in[127:0],start,c1,c2,tagout,done,w1.z1.data_out[255:128],

w1.z1.data_out[127:0],w1.z2.start_in,w1.z2.ADon,w1.z2.MSGon,w1.z2.FINon_in);

rst=0;

#1 rst=0;

#1 rst=1;// data_in={256'd3333,1'b1,1'b0,1'b0,1'b0};

#1 {start,data_in}={1'b1,1'b0,128'd12630567003293860855109576201165734308,128'd2};

#32 {start,data_in}={1'b1,1'b0,128'd12630567003293860855109576201165734308,128'd2};

#32

{start,data_in}={1'b1,1'b0,128'd130482792299249650780958843169121197881,128'd15119963

0763829394216724218405483397577};

#32 {start,data_in}={1'b1,1'b0,128'd172715115316271502967306500513779031842,128'd0};

#32

{start,data_in}={1'b1,1'b0,128'd23305380004405776739468075089964426475,128'd826743473

97392953226878434787120253260};

#32 {start,data_in}={1'b1,1'b1,128'd0,128'd0};

#32 {start,data_in}={1'b0,1'b0,128'd0,128'd0};

#640

for (int i=1; i<256; i++)

begin

#32 {start,data_in}={1'b1,1'b0,128'd12630567003293860855109576201165734308,128'd2};

#32 {start,data_in}={1'b1,1'b0,128'd12630567003293860855109576201165734308,128'd2};

#32

{start,data_in}={1'b1,1'b0,128'd130482792299249650780958843169121197881,128'd15119963

0763829394216724218405483397577};

#32 {start,data_in}={1'b1,1'b0,128'd172715115316271502967306500513779031842,128'd0};

for(int j=1;j<=i;j++)

begin

inb1 = 1*j;

inb2 = 2*j;

#32 {start,data_in}={1'b1,1'b0,inb1,inb2};

end

Page 113: DESIGN OF A POWER AWARE ENCRYPTION ACCELERATOR A …

100

#32 {start,data_in}={1'b1,1'b1,128'd0,128'd0};

#32 {start,data_in}={1'b0,1'b0,128'd0,128'd0};

#640;

end

for (int i=1; i<525; i++)

begin

#32 {start,data_in}={1'b1,1'b0,inb1,{i,i,i,i}};

#32 {start,data_in}={1'b1,1'b0,inb1,{i,i,i,i}};

#32

{start,data_in}={1'b1,1'b0,128'd130482792299249650780958843169121197881,128'd15119963

0763829394216724218405483397577};

#32 {start,data_in}={1'b1,1'b0,128'd172715115316271502967306500513779031842,128'd0};

for(int j=1;j<=i;j++)

begin

inb1 = 3*j;

inb2 = 4*j;

#32 {start,data_in}={1'b1,1'b0,inb1,inb2};

end

#32 {start,data_in}={1'b1,1'b1,128'd0,128'd0};

#32 {start,data_in}={1'b0,1'b0,128'd0,128'd0};

repeat(i)

#64;

end

$toggle_stop();

$toggle_report("p7.saif",1.0e-9,"test.w1");

#1760; //$vcdplusoff;

#1$finish;

end

endmodule

Page 114: DESIGN OF A POWER AWARE ENCRYPTION ACCELERATOR A …

101

APPENDIX C: SYNTHESIS, UPF AND PERL SCRIPTS

Synthesis script:

set hdlin_auto_save_templates "true"

set hdlin_sv_packages "enable"

set hdlin_infer_function_local_latches "true"

#Read the design in

read_file -format sverilog {"top_rtl.sv"}

#set the current design

set current_design top_rtl

#Link the design

link

#create clockand constrain the design

create_clock "clk" -period 20 -name "clk"

set_input_delay 0.5 -clock clk [all_input]

set_output_delay 0.2 -clock clk [all_output]

set_dont_touch_network "clk"

set_max_area 0

#Set operating conditions

set_operating_conditions -library "saed90nm_typ" "TYPICAL";

set_operating_conditions -library "saed90nm_typ_hvt" "TYPICAL";

set_operating_conditions -library "saed90nm_typ_cg_hvt" "TYPICAL";

set_operating_conditions -library "saed90nm_typ" "TYPICAL"

set_operating_conditions -library "saed90nm_typ_cg" "TYPICAL"

uniquify

set_clock_gating_style -sequential_cell latch \

-positive_edge_logic integrated:CGLPPRX2 \

-negative_edge_logic integrated:CGLNPRX2 \

-control_point before \

-minimum_bitwidth 3 \

-max_fanout 64 \

#insert_clock_gating

#powergating setup

set upf_create_implicit_supply_sets false

Page 115: DESIGN OF A POWER AWARE ENCRYPTION ACCELERATOR A …

102

load_upf /gaia/class/student/pervaizm/Project/4_rtl_clk_hvt_pg/top.upf

set_voltage 1.20 -obj {VDD_POF1_VIRTUAL VDD_POF2_VIRTUAL VDD}

set_voltage 0.00 -obj {VSS}

#Synthesize and generate report

#compile_ultra

#compile_ultra -gate_clock -no_boundary_optimization

compile_ultra -gate_clock

report_clock_gating > report0

report_attribute > report1

report_area > report2

report_constraints -all_violators > report3

report_timing -path full -delay max -max_paths 1 -nworst 1 > report4

report_power > report5

report_power -hier > report6

write_file -format verilog -hierarchy -output top_rtl_synthesized.v

write_file -format ddc -hierarchy -output top_rtl_synthesized.ddc

Perl Script-1:

#!/usr/bin/perl

$path1 = "/gaia/class/student/pervaizm/design1_final/untitledfolder/saed90nm_hvt.v";

open(READ,$path1);

@lines = <READ>;

open(WRITE,">saed90nm_hvt_clean.v");

foreach my $lne(@lines)

{

if($lne =~ /^\s*`/)

{

}

else

{

print WRITE $lne;

}

}

close(READ);

close(WRITE);

Page 116: DESIGN OF A POWER AWARE ENCRYPTION ACCELERATOR A …

103

Perl Script-2:

#!/usr/bin/perl

$path1 = " saed90nm_hvt_clean.v";

open(READ,$path1);

@lines = <READ>;

open(WRITE,">saed90final.v");

$k=0;

foreach my $lne(@lines)

{print "k=$k\n\n";

#if($lne =~ /^\s*`/)

if($lne =~ /^\s*specify/)

{

$k=1; print"1\n";

}

elsif($lne =~ /^\s*endspecify/)

{

$k=0; print"2\n";

}

elsif($k==1)

{

print"3\n";

}

elsif($k==0)

{

print WRITE $lne; print"4\n";

}

}

close(READ);

close(WRITE);

UPF Script:

create_power_domain PTOP

create_power_domain PAon -elements {z1 z2}

create_power_domain POF1 -elements {z3/y0 z3/y2}

create_power_domain POF2 -elements {z3/y1}

create_supply_port VDD

create_supply_net VDD -domain PTOP

create_supply_net VDD -domain PAon -reuse

create_supply_net VDD -domain POF1 -reuse

Page 117: DESIGN OF A POWER AWARE ENCRYPTION ACCELERATOR A …

104

create_supply_net VDD -domain POF2 -reuse

connect_supply_net VDD -ports VDD

create_supply_port VSS

create_supply_net VSS -domain PTOP

create_supply_net VSS -domain PAon -reuse

create_supply_net VSS -domain POF1 -reuse

create_supply_net VSS -domain POF2 -reuse

connect_supply_net VSS -ports VSS

create_supply_net VDD_POF1_VIRTUAL -domain POF1

create_supply_net VDD_POF2_VIRTUAL -domain POF2

set_domain_supply_net PTOP -primary_power_net VDD -primary_ground_net VSS

set_domain_supply_net PAon -primary_power_net VDD -primary_ground_net VSS

set_domain_supply_net POF1 -primary_power_net VDD_POF1_VIRTUAL -primary_ground_net

VSS

set_domain_supply_net POF2 -primary_power_net VDD_POF2_VIRTUAL -primary_ground_net

VSS

create_power_switch POF1_sw -domain POF1 -input_supply_port {in VDD} -output_supply_port

{out1 VDD_POF1_VIRTUAL} -control_port {POF1_sd z2/poff_rtl} -on_state {stateon1 in

{!POF1_sd}}

create_power_switch POF2_sw -domain POF2 -input_supply_port {in VDD} -output_supply_port

{out2 VDD_POF2_VIRTUAL} -control_port {POF2_sd z3/poff_init} -on_state {stateon2 in

{!POF2_sd}}

#set_isolation POF_iso_out -domain POF -isolation_power_net VDD -isolation_ground_net VSS

-clamp_value 0 -applies_to outputs //no need as output already muxed

#set_isolation_control POF_iso_out -domain POF -isolation_signal y0/start3 -isolation_sense low

-location parent

add_port_state VDD -state {voltage 1.20 }

add_port_state POF1_sw/out1 -state {voltage 1.20} -state {POF_OFF1 off}

add_port_state POF2_sw/out2 -state {voltage 1.20} -state {POF_OFF2 off}

create_pst top_pst -supplies {VDD VDD_POF1_VIRTUAL VDD_POF2_VIRTUAL}

#create_pst top_pst1 -supplies {VDD VDD_POF1_VIRTUAL}

#create_pst top_pst2 -supplies {VDD VDD_POF2_VIRTUAL}

add_pst_state ALL_ON -pst top_pst -state {voltage voltage voltage}

add_pst_state RTL_OFF -pst top_pst -state {voltage POF_OFF1 POF_OFF2}

add_pst_state INIT_OFF -pst top_pst -state {voltage voltage POF_OFF2}

Page 118: DESIGN OF A POWER AWARE ENCRYPTION ACCELERATOR A …

105

APPENDIX D: POWER AND SIMULATION RESULTS

Report power for normal mode of third stepping:

Information: Updating design information... (UID-85)

Warning: Design 'top_rtl' contains 1 high-fanout nets. A fanout number of 1000 will be used for

delay calculations involving these nets. (TIM-134)

Information: Propagating switching activity (low effort zero delay simulation). (PWR-6)

Warning: The derived toggle rate value (0.100000) for the clock net 'clk' conflicts with the

annotated value (0.062500). Using the annotated value. (PWR-12)

****************************************

Report : power

-analysis_effort low

Design : top_rtl

Version: I-2013.12-SP5-4

Date : Sat Apr 16 13:23:57 2016

****************************************

Library(s) Used:

saed90nm_typ (File:

/netdisk/tmp/saed/SAED90_EDK/SAED_EDK90nm/Digital_Standard_cell_Library/synopsys/m

odels/saed90nm_typ.db)

saed90nm_typ_hvt (File:

/netdisk/tmp/saed/SAED90_EDK/SAED_EDK90nm/Digital_Standard_cell_Library/synopsys/m

odels/saed90nm_typ_hvt.db)

saed90nm_typ_cg (File:

/netdisk/tmp/saed/SAED90_EDK/SAED_EDK90nm/Digital_Standard_cell_Library/synopsys/m

odels/clock_gating/saed90nm_typ_cg.db)

Operating Conditions: TYPICAL Library: saed90nm_typ_cg

Wire Load Model Mode: enclosed

Global Operating Voltage = 1.2

Power-specific unit information :

Voltage Units = 1V

Capacitance Units = 1.000000ff

Time Units = 1ns

Dynamic Power Units = 1uW (derived from V,C,T units)

Leakage Power Units = 1pW

Page 119: DESIGN OF A POWER AWARE ENCRYPTION ACCELERATOR A …

106

Cell Internal Power = 6.0516 mW (65%)

Net Switching Power = 3.2597 mW (35%)

---------

Total Dynamic Power = 9.3113 mW (100%)

Cell Leakage Power = 2.8695 mW

Internal Switching Leakage Total

Power Group Power Power Power Power ( % ) Attrs

--------------------------------------------------------------------------------------------------

io_pad 0.0000 0.0000 0.0000 0.0000 ( 0.00%)

memory 0.0000 0.0000 0.0000 0.0000 ( 0.00%)

black_box 0.0000 0.0000 0.0000 0.0000 ( 0.00%)

clock_network 72.5890 74.9767 1.4733e+07 162.2987 ( 1.33%)

register 54.9246 87.1252 7.1198e+08 854.0305 ( 7.01%)

sequential 0.0000 0.0000 0.0000 0.0000 ( 0.00%)

combinational 5.9241e+03 3.0976e+03 2.1428e+09 1.1165e+04 ( 91.66%)

--------------------------------------------------------------------------------------------------

Total 6.0516e+03 uW 3.2597e+03 uW 2.8695e+09 pW 1.2181e+04 uW

1

Report power for sleep mode of third stepping:

Information: Propagating switching activity (low effort zero delay simulation). (PWR-6)

Warning: The derived toggle rate value (0.100000) for the clock net 'clk' conflicts with the

annotated value (0.062500). Using the annotated value. (PWR-12)

****************************************

Report : power

-analysis_effort low

Design : top_rtl

Version: I-2013.12-SP5-4

Date : Sat Apr 16 13:25:06 2016

****************************************

Library(s) Used:

saed90nm_typ (File:

/netdisk/tmp/saed/SAED90_EDK/SAED_EDK90nm/Digital_Standard_cell_Library/synopsys/m

odels/saed90nm_typ.db)

saed90nm_typ_hvt (File:

/netdisk/tmp/saed/SAED90_EDK/SAED_EDK90nm/Digital_Standard_cell_Library/synopsys/m

odels/saed90nm_typ_hvt.db)

Page 120: DESIGN OF A POWER AWARE ENCRYPTION ACCELERATOR A …

107

saed90nm_typ_cg (File:

/netdisk/tmp/saed/SAED90_EDK/SAED_EDK90nm/Digital_Standard_cell_Library/synopsys/m

odels/clock_gating/saed90nm_typ_cg.db)

Operating Conditions: TYPICAL Library: saed90nm_typ_cg

Wire Load Model Mode: enclosed

Global Operating Voltage = 1.2

Power-specific unit information :

Voltage Units = 1V

Capacitance Units = 1.000000ff

Time Units = 1ns

Dynamic Power Units = 1uW (derived from V,C,T units)

Leakage Power Units = 1pW

Cell Internal Power = 52.8639 uW (97%)

Net Switching Power = 1.6058 uW (3%)

---------

Total Dynamic Power = 54.4697 uW (100%)

Cell Leakage Power = 2.7468 mW

Internal Switching Leakage Total

Power Group Power Power Power Power ( % ) Attrs

--------------------------------------------------------------------------------------------------

io_pad 0.0000 0.0000 0.0000 0.0000 ( 0.00%)

memory 0.0000 0.0000 0.0000 0.0000 ( 0.00%)

black_box 0.0000 0.0000 0.0000 0.0000 ( 0.00%)

clock_network 52.8635 1.6017 1.4788e+07 69.2534 ( 2.47%)

register 0.0000 0.0000 6.1376e+08 613.7578 ( 21.91%)

sequential 0.0000 0.0000 0.0000 0.0000 ( 0.00%)

combinational 3.9718e-04 4.1204e-03 2.1182e+09 2.1182e+03 ( 75.62%)

--------------------------------------------------------------------------------------------------

Total 52.8639 uW 1.6058 uW 2.7468e+09 pW 2.8012e+03 uW

1

Page 121: DESIGN OF A POWER AWARE ENCRYPTION ACCELERATOR A …

108

Report power for overdrive mode of third stepping:

Information: Propagating switching activity (low effort zero delay simulation). (PWR-6)

****************************************

Report : power

-analysis_effort low

Design : top_rtl

Version: I-2013.12-SP5-4

Date : Sat Apr 16 13:25:47 2016

****************************************

Library(s) Used:

saed90nm_typ (File:

/netdisk/tmp/saed/SAED90_EDK/SAED_EDK90nm/Digital_Standard_cell_Library/synopsys/m

odels/saed90nm_typ.db)

saed90nm_typ_hvt (File:

/netdisk/tmp/saed/SAED90_EDK/SAED_EDK90nm/Digital_Standard_cell_Library/synopsys/m

odels/saed90nm_typ_hvt.db)

saed90nm_typ_cg (File:

/netdisk/tmp/saed/SAED90_EDK/SAED_EDK90nm/Digital_Standard_cell_Library/synopsys/m

odels/clock_gating/saed90nm_typ_cg.db)

Operating Conditions: TYPICAL Library: saed90nm_typ_cg

Wire Load Model Mode: enclosed

Global Operating Voltage = 1.2

Power-specific unit information :

Voltage Units = 1V

Capacitance Units = 1.000000ff

Time Units = 1ns

Dynamic Power Units = 1uW (derived from V,C,T units)

Leakage Power Units = 1pW

Cell Internal Power = 9.6767 mW (65%)

Net Switching Power = 5.2122 mW (35%)

---------

Total Dynamic Power = 14.8889 mW (100%)

Cell Leakage Power = 2.8694 mW

Page 122: DESIGN OF A POWER AWARE ENCRYPTION ACCELERATOR A …

109

Internal Switching Leakage Total

Power Group Power Power Power Power ( % ) Attrs

--------------------------------------------------------------------------------------------------

io_pad 0.0000 0.0000 0.0000 0.0000 ( 0.00%)

memory 0.0000 0.0000 0.0000 0.0000 ( 0.00%)

black_box 0.0000 0.0000 0.0000 0.0000 ( 0.00%)

clock_network 116.1220 119.8884 1.4733e+07 250.7435 ( 1.41%)

register 87.8240 139.3119 7.1192e+08 939.0542 ( 5.29%)

sequential 0.0000 0.0000 0.0000 0.0000 ( 0.00%)

combinational 9.4728e+03 4.9530e+03 2.1428e+09 1.6568e+04 ( 93.30%)

--------------------------------------------------------------------------------------------------

Total 9.6767e+03 uW 5.2122e+03 uW 2.8694e+09 pW 1.7758e+04 uW

1

Simulation Results:

Following is the simulation result of one message of 256 bytes.

Inputs: in1 and in2, each 128-bits plain-text

Outputs: c1 and c2, each 128-bits cipher-text

Expected Outputs: e_c1,e_c2, expected results for the cipher-text.

Starting next msg to encrypt

key=fedebc47eba8c34934f63213f55f665e nonce=00000000000000000000000000000001

in2=dccf5c98c942bc144ab23017e8223c14 in1=194cd297a7cb8eb11824da31d694436c

c2=e5fa5915753ce84b8274e274fb72e459 c1=d2cf53634c36956ae7409ca1acda2a7f

e_c2=e5fa5915753ce84b8274e274fb72e459 e_c1=d2cf53634c36956ae7409ca1acda2a7f

exp_done=0 done=0

in2=96c54d1ee424ff03da07cea515a30005 in1=8f31edb7fbc2dee5d1176ba288f87128

c2=49067378e52e15d1beb0b02d291f74c1 c1=e935f450d747f52883aab5533e369186

e_c2=49067378e52e15d1beb0b02d291f74c1 e_c1=e935f450d747f52883aab5533e369186

exp_done=0 done=0

in2=1fcd2a32bc0d2a571563864345c4a549 in1=acc73be8b800b15c86e9a5bfa1fa273a

c2=a779f396c180a4a58ae874a949fa9656 c1=a56bbad78815caae46a4185c79fa0edb

e_c2=a779f396c180a4a58ae874a949fa9656 e_c1=a56bbad78815caae46a4185c79fa0edb

exp_done=0 done=0

Page 123: DESIGN OF A POWER AWARE ENCRYPTION ACCELERATOR A …

110

in2=15a1082de9e381630c7ca19d1a100b49 in1=642d3fd91131930f7b2a0a6ec2b59a4b

c2=3f2b9dcfde02d3489ad6b514539cc48f c1=4fd0d6627816acaaebdcc865b784de25

e_c2=3f2b9dcfde02d3489ad6b514539cc48f e_c1=4fd0d6627816acaaebdcc865b784de25

exp_done=0 done=0

in2=9023c810a7b2856575554dee1a6d25a1 in1=3e685ea4dda10ccba117810c488f50cc

c2=b10f23cc47ffd6e027c0eb606b489d3e c1=d7324924ae86a6bba89b95282fa70838

e_c2=b10f23cc47ffd6e027c0eb606b489d3e e_c1=d7324924ae86a6bba89b95282fa70838

exp_done=0 done=0

in2=98f6c785a02b9a3b7bbfd154e919fd05 in1=9836b6684ff79b7971c49fe375a8b4b5

c2=2435f6739e5389f5f19d56dd34f7ee0d c1=34cfec3d7b0ff90e3fbd9b6ff3ed968e

e_c2=2435f6739e5389f5f19d56dd34f7ee0d e_c1=34cfec3d7b0ff90e3fbd9b6ff3ed968e

exp_done=0 done=0

in2=ca9693db09bb7dcfc4ac7ddeda8431b1 in1=ab28acb8bd9231e806d78702c845d0e6

c2=67a3cea9eead1e5ffe3dea82da47cc83 c1=3c94d3e6ea375a5782d51a705216434d

e_c2=67a3cea9eead1e5ffe3dea82da47cc83 e_c1=3c94d3e6ea375a5782d51a705216434d

exp_done=0 done=0

in2=8817c0f98865c268ae563d56a5f3cd20 in1=13959954d4e074fd7b169f99484d2c8d

c2=ff6c16bd08405993963eb2a9f83cdebb c1=b781a9b7ba042f9c51d4964521d0a36c

e_c2=ff6c16bd08405993963eb2a9f83cdebb e_c1=b781a9b7ba042f9c51d4964521d0a36c

exp_done=0 done=0

authentication_tag=d478a12b39e91135cca71a6cd830a8a8

e_authentication_tag=d478a12b39e91135cca71a6cd830a8a8

exp_done=1 done=1

----------

Page 124: DESIGN OF A POWER AWARE ENCRYPTION ACCELERATOR A …

111

REFERENCES

[1] “Hardware and Software Encryption”, [Online]. Available:

http://www.infosecurity-magazine.com/magazine-features/tales-crypt-hardware-software.

[Accessed 1 April 2016].

[2] “Transistor Count”, [Online]. Available:

https://en.wikipedia.org/wiki/Transistor_count [Accessed 10 April 2016].

[3] “Moore’s Law Technology”, [Online]. Available:

http://www.intel.com/content/www/us/en/silicon-innovations/moores-law-technology.html.

[Accessed 4 April 2016].

[4] Hongjun Wu and Bart Preneel, “AEGIS: A Fast Authenticated Encryption Algorithm

(v1)," in CAESAR, 2014.

[5] “Mobile devices sold by year”, [Online]. Available:

http://www.statista.com/statistics. [Accessed 4 April 2016].

[6] Tejas Hadke and Behnam Arad, “Low-Power Chip Design Technique”, in CATA, 2015.

[7] “AES Encryption”, [Online]. Available:

https://en.wikipedia.org/wiki/Advanced_Encryption_Standard. [Accessed 6 April 2016].

[8] “SystemVerilog”, [Online]. Available:

https://en.wikipedia.org/wiki/SystemVerilog. [Accessed 6 April 2016].

[9] “Hierarchy in SystemVerilog”, [Online]. Available:

http://www.asic-world.com/systemverilog/hierarchy1.html. [Accessed 4 April 2016].

[10] S. Sutherland, SystenVerilog for Design, Springer, 2006.

[11] Bahram Hakhamaneshi and Behram Arad, “A Hardware implementation of the advance

encryption standard (AES) algorithm using SystemVerilog”, ISCA, 2010.

[12] “VMM Introduction”, [Online]. Available:

http://www.testbench.in/VM_01_INTRODUCTION.html. [Accessed 28 March 2016]

[13] C. Spear, SystemVerilog for Verification, Springer, 2008.

[14] Synopsys 90 nm technology library, 10 February 2014,

http://www.synopsys.com/Community/UniversityProgram/Pages/Library.aspx

[15] “VMM Introduction”, [Online]. Available:

http://www.synopsys.com/Tools/Verification/LowPowerVerification/Pages/MVSIM.aspx.

[Accessed 28 March 2016]

Page 125: DESIGN OF A POWER AWARE ENCRYPTION ACCELERATOR A …

112

[16] “Functional Coverage”, [Online]. Available:

http://www.testbench.in. [Accessed 30 March 2016]

[17] Synopsys Power Compiler User Guide - Version E-2010.12-SP2, March 2011

[18] Synopsys Design Compiler User Guide – Version G-2012.06-SP3, October 2012

[19] Aaron Carroll and Gernot Heiser, “An Anlaysis of Power Consumption in a Smartphone”,

USENIXATC, 2010.