Upload
navneetareddy
View
143
Download
1
Embed Size (px)
Citation preview
CMR INSTITUTE OF TECHNOLOGY
MINI PROJECT REPORT ON
IMPLEMENTATION OF WDDL GATES FOR
SECURE IC APPLICATIONS
A Mini Project Report Submitted in partial fulfillment of the requirement for the award of degree of
BACHELOR OF TECHNOLOGYIN
ELECTRONIC AND COMMUNICATION ENGINEERING
By
R SATISH KUMAR 06R01A0440
M SHESHU BINDU 06R01A0441
M SRAVAN KUMAR 06R01A0444
S KHAJA MOHIDDIN MTech Prof K RAMANAIAHInternal guide HOD ECECMRIT CMRIT
1
Date __________
CERTIFICATE
This is to certify that the mini project entitled ldquoIMPLEMENTATION OF WDDL GATES FOR SECURE IC APPLICATIONSrdquo was successfully carried out by
R SATISH KUMAR 06R01A0440
M SHESHU BINDU 06R01A0441
M SRAVAN KUMAR 06R01A0444
In partial fulfillment of the requirement for the award of Bachelor of Technology in ldquoElectronics and communication Engineeringrdquo
from ldquoJawaharlal Nehru Technological Universityrdquo during academic year 2006 - 2010
Internal Guide Head of the DepartmentS KHAJA MOHIDDIN MTech ProfK Ramanaiah
External Examiner PRINICIPAL Dr M Janga Reddy
2
ACKNOWLEDGMENT
We express our sincere thanks to the management of VEDIC
SCHOOL OF VLSI DESIGN for giving us this opportunity to work in their
organization
We express our immense gratitude to MrMRKArjun FPGA Design
Engineer(Simpli5ng Semiconductor PvtLtd) his inspiring remarks and
simulating guidance valuable suggestion and encouragement helped us greatly in
completion of our project ldquoIMPLEMENTATION OF WDDL GATES FOR
SECURE IC APPLICATIONSrdquo
We wish to thank internal guide of our project Mr S KHAJA
MOHIDDIN Department of Electronics for his constant inspiration and advice
throughout our project work
We express our sincere gratitude to respected Mr JANGA REDDY Pricipal of
CMRIT and Mr K RAMANAIAH HOD of ECE department for their
valuable guidance encouragement and suggestions
3
INDEX
ABSTRACT
CHAPTER 1 INTRODUCTION AND OBJECTIVE
11 INTRODUCTION 12 OBJECTIVE CHAPTER 2 REVIEW OF LITERATURE 21 INTRODUCTION TO DIGITAL DESIGN FLOW 22 SECURE DIGITAL DESIGN FLOW
CHAPTER 3 HARDWARE DESCRIPTION LANGUAGE (VHDL) CHAPTER 4 SMART CARD OVERVIEW
CHAPTER 5 SIDE CHANNEL ATTACKS
51 CLASSIFICATION FO SIDE CHANNEL ATTACKS 52 POWER ANALYSIS ATTACKS 521 SIMPLE POWER ANALYSIS (SPA) 522 DIFFERENTIAL POWER ANALYSIS (DPA) CHAPTER 6 CONSTANTndashPOWER CONSUMING LOGIC STYLES 61 CURRENT MODE LOGIC 62 VOLTAGE MODE LOGIC (CMOS CIRCUIT STYLES) 63 DYNAMIC DIFFERENTIAL LOGIC 631 SENSE AMPLIFIER BASED LOGIC (SABL) 632 WAVE DYNAMIC DIFFERENTIAL LOGIC GATES
(WDDL) CHAPTER 7 DESIGN OF WDDL GATES 71 WDDL GATES 711 WDDL OR GATE 712 WDDL AND GATE 713 WDDL NAND GATE 714 WDDL NOR GATE 715 WDDL XOR GATE CHAPTER 8 FRONT END RESULTS
CHAPTER 9 SUMMARY AND CONCLUSION 91 SUMMARY 92 CONCLUSION
CHAPTER 10 REFERENCES
4
ABSTRACT
Every electronic device needs security from the smallest RFID tags to the larger
hand held devices Security is needed for financial medical consumer automotive
applications and other applications Small-embedded integrated circuits (ICs) such as smart
cards are vulnerable to the so-called side-channel attacks (SCAs) Side channel attacks are a
class of attacks that derive information from the integrated circuits while it is in operation The
attacker can gain information by monitoring the power consumption execution time
electromagnetic radiation and other information leaked by the switching behavior of digital
complementary metalndashoxidendashsemiconductor (CMOS) gates Eg execution times that depend
on values of data andor key show what they are doing Simple timing or power attacks give
visual information on the circuit This project presents a digital very large scale integrated
(VLSI) design flow to create secure power-analysis-attack-resistant ICs The route cause for
this problem is that standard CMOS is power efficient and it will only consume dynamic
power when nodes are switching
The idea is to create digital circuit styles that have a switching behavior independent of
the data or sequence of the data that they are processing A logic style called ldquoWave Dynamic
Differential Logic (WDDL)rdquo is used for the implementation of the basic logic gates which are
used in the cryptographic processors The design flow starts from a normal design in a
hardware description language such as VHDL to the Side Channel Attack (SCA) resistant
layout
5
Figure WDDL Pre-charge wave generation
6
CHAPTER 1 INTRODUCTION
AND OBJECTIVE
11 Introduction
Small-embedded integrated circuits (ICs) such as smart cards are vulnerable to the so-
called side-channel attacks (SCAs) The attacker can gain information by monitoring the power
consumption execution time electromagnetic radiation and other information leaked by the
switching behavior of digital complementary metalndashoxidendashsemiconductor (CMOS) gates This
project presents a digital very large scale integrated (VLSI) design flow to create secure power-
analysis-attack-resistant ICs
The idea is to create digital circuit styles that have a switching behavior independent of
the data or sequence of the data that they are processing A logic style called Wave Dynamic
Differential Logic (WDDL) is used for the implementation of the basic logic gates which are
used in the cryptographic processors The design flow starts from a normal design in a
hardware description language such as VHDL to the Side Channel Attack (SCA) resistant
layout
Depending on the parameter considered the side-channel attacks are classified as
probing attacks fault induction attack timing attack power analysis attack electromagnetic
analysis attack etc One Side Channel Attack in particular namely the Differential Power
Analysis (DPA) is of great concern It is very effective in finding the secret key and can be
mounted quickly with off-the-shelf devices The attack is based on the fact that logic
operations have power characteristics that depend on the input data It relies on statistical
analysis to extract the information from the power consumption that is correlated to the secret
key As the variations actually originate at the logic level implementing the encryption and
decryption modules in a logic style for which a logic gate has at all times constant power
7
consumption independently of signal transitions removes the foundation of DPA and is an
effective means to halt DPA
12 Objective of the Project
The main objectives of this dissertation are
Study of constant-power logic styles
Description of WDDL Gates
Implementation of WDDL Logic Gates
Verification of the functionality of WDDL Logic Gates
Synthesis of the design
Analysis of the reports obtained during simulation and synthesis
8
CHAPTER 2 REVIEW
OF LITERATURE
21 Introduction to Digital Design Flow
A typical digital design flow for any IC is as follows Design Entry (Specification
Architecture RTL Coding and RTL Verification) Synthesis and post-synthesis
verification Backend (Floor Planning Place and route Layout) Tape Out to Foundry to get
the end product All modern digital designs start with a designer writing a hardware description
of the IC (using HDL or Hardware Description Language) in VerilogVHDL A Verilog or
VHDL program essentially describes the hardware (logic gates Flip-Flops counters etc) the
inter connect of the circuit blocks and the functionality Various CAD tools are available to
synthesize a circuit based on the HDL
22 Secure Digital Design Flow
The secure digital design flow is depicted in Fig In addition to the
regular steps in an IC design (logic design logic synthesis place amp route
stream out and verifications) one can recognize two additional steps
namely 1) ldquocell substitutionrdquo and 2) ldquointerconnect decompositionrdquo These
operations have been inserted in the back end of the flow and do not
interfere with the creative part of a design indicated by the ldquologic designrdquo
task
9
Figure 21 Secure Digital Design Flow
During the cell substitution step cells that are designed by any constant power logic style
replace the conventional CMOS gates This ensures the security of the ICs against power
analysis attacks
10
CHAPTER 3 HARDWARE DESCRIPTIVE
LANGUAGE (VHDL)
Why (V) HDL
Interoperability
Technology independence
Design reuse
Several levels of abstraction
Readability
Standard language
Widely supported
What is VHDL
VHDL = VHSIC Hardware Description Language(VHSIC = Very High-Speed IC)
Design specification language
Design entry language
Design simulation language
Design documentation language
An alternative to schematics
Brief History
VHDL Was developed in the early 1980s for managing design problems that involved
large circuits and multiple teams of engineers
Funded by US Department of Defence
11
The first publicly available version was released in 1985
In 1986 IEEE (Institute of Electrical and Electronics Engineers Inc) was presented
with a proposal to standardize the VHDL
In 1987 standardization =gt IEEE 1076-1987
An improved version of the language was released in 1994 =gt IEEE standard1076-
1993
Related Standards
IEEE 1076 doesnrsquot support simulation conditions such as unknown and high-
impedance
Soon after IEEE 1076-1987 was released simulator companies began using their own
non-standard types =gt VHDL was becoming a nonstandard
IEEE 1164 standard was developed by an IEEE1048715IEEE 1164 contains definitions for a
nine-valued data type std_logic
IEEE 10763 (Numeric or Synthesis Standard) defines data types as they relate to actual
hardware
Defines eg two numeric types signed and unsigned
VHDL Environment
12
Design Units
Segments of VHDL code can be compiled separately and stored in a library
Entities
A black box with interface definition
Defines the inputsoutputs of a component (define pins)
A way to represent modularity in VHDL
Similar to symbol in schematic
Entity declaration describes entity
Eg
Entity Comparator is
Port (A B in std_logic_vector (7 downto0)
EQ out std_logic)
end Comparator
13
Ports
Provide channels of communication between the component and its environment
Each port must have a name direction and a type
An entity may have NO port declaration
Port directions
In A value of a port can be read inside the component but cannot be assigned
Multiple reads of port are allowed
Out Assignments can be made to a port but data from a port cannot be read Multiple
assignments are allowed
In out Bi-directional assignments can be made and data can be read Multiple
assignments are allowed
Buffer An out port with read capability May have at most one assignment (are not
recommended)
Architectures
Every entity has at least one architecture
One entity can have several architectures
Architectures can describe design using
BehaviorndashStructurendashDataflow
Architectures can describe design on many levelsndashGate levelndashRTL (Register Transfer
Level)ndashBehavioral level
Configuration declaration links architecture to entity
Eg
Architecture Comparator1 of Comparator is
Begin
EQ lt= rsquo1rsquowhen (A=B) else rsquo0rsquo
End Comparator1
Configurations
Links entity declaration and architecture body together
14
Concept of default configuration is a bit messy in VHDL lsquo87
ndashLast architecture analyzed links to entity
Can be used to change simulation behavior without re-analyzing the VHDL source
Complex configuration declarations are ignored in synthesis
Some entities can have eggate level architecture and behavioral architecture
Are always optional
Packages
Packages contain information common to many design units
1 Package declaration
Constant declarations
ndash Type and subtype declarations
ndash Function and procedure declarations
ndash Global signal declarations
ndash File declarations
ndash Component declarations
2 Package body
ndash Is not necessary needed
ndash Function bodies
ndash Procedure bodies
Packages are meant for encapsuling data which can be shared globally among several design
units These consist of declaration part and optional body part
Package declaration can contain
ndash Type and subtype declarations
ndash Subprograms
ndash Constants
ndash Alias declarations
ndash Global signal declarations
ndash file declarations
ndash Component declarations
Package body consists of
15
ndash Subprogram declarations and bodies
ndash Type and subtype declarations
ndash Deferred constants
ndash File declarations
Libraries
Collection of VHDL design units (database)
1 Packages
package declaration
package body
2 Entities (entity declaration)
3 Architectures (architecture body)
4 Configurations (configuration declarations)
Usually directory in UNIX file system
Can be also any other kind of database
Levels of Abstraction
VHDL supports many possible styles of design description which differ primarily in how
closely they relate to the HW
It is possible to describe a circuit in a number of ways
Structural-------
Dataflow ------- Higher level of abstraction
Behavioral -------
Structural VHDL description
Circuit is described in terms of its components
From a low-level description (eg transistor-level description) to a high level
description (eg block diagram)
For large circuits low-level descriptions quickly become impractical
Dataflow VHDL Description
Circuit is described in terms of how data moves through the system
16
In the dataflow style you describe how information flows between registers in the
system
The combinational logic is described at a relatively high level the placement and
operation of registers is specified quite precisely
The behavior of the system over the time is defined by registers
There are no build-in registers in VHDL-language
ndashEither lower level description
ndashor behavioral description of sequential elements is needed
The lower level register descriptions must be created or obtained
If there is no 3rd party models for registers =gt you must write the behavioral
description of registers
The behavioral description can be provided in the form of subprograms(functions or
procedures)
Behavioral VHDL Description
Circuit is described in terms of its operation over time
Representation might include eg state diagrams timing diagrams and algorithmic
descriptions
The concept of time may be expressed precisely using delays (eg A lt= B after 10 ns)
If no actual delays are used order of sequential operations is defined
17
In the lower levels of abstraction (eg RTL) synthesis tools ignore detailed timing
specifications
The actual timing results depend on implementation technology and efficiency of
synthesis tool
There are a few tools for behavioral synthesis
Concurrent Vs Sequential
Processes
Basic simulation concept in VHDL
VHDL description can always be broken up to interconnected processes
Quite similar to UNIX process
18
Process keyword in VHDL
Process statement is concurrent statement
Statements inside process statements are sequential statements
Process must contain either sensitivity list or wait statement(s) but NOT both
Sensitivity list or wait statement(s) contains signals which wakes process up
General Format
Process [(sensitivity list)]
process_declarative_part
begin
process_statements
[wait_statement]
End process
19
CHAPTER 4 SMART
CARD OVERVIEW
This section will very briefly introduce the concept of a smart card Basically a smart
card is a computer embedded in a safe It consists of a (typically 8-bit or 32-bit) processor
together with ROM EEPROM and a small amount of RAM which is therefore capable of
performing computations The main goal of a smart card is to allow the execution of
cryptographic operations involving some secret parameter (the key) while not revealing this
parameter to the outside world As opposed the goal of the attacker is to recover this secret
parameter This processor is embedded in a chip and connected to the outside world through
eight wires the role use position of which is normalized In addition to the inputoutput wires
the parts we will be the most interested in are the following
1 Power supply Smart cards do not have an internal battery
2 The current they need is provided by the smart card reader This will make the smart
cards power consumption pretty easy to measure for the attacker
3 Clock Similarly smart cards do not dispose of an internal clock either The clock ticks
must also be provided from the outside world As a consequence this will allow the
attacker to measure the cards running time with very good precision
Smart cards are usually equipped with protection mechanisms composed of a shield (the
passivation layer) whose goal is to hide the internal behavior of the chip and possibly sensors
that react when the shield is removed by destroying all sensitive data and preventing the card
to function properly
20
CHAPTER 5 SIDE
CHANNEL ATTACKS
ldquoSide channel attacksrdquo are attacks that are based on ldquoSide Channel Informationrdquo Side
channel information is information that can be retrieved from the encryption device that is
neither the plaintext to be encrypted nor the cipher text resulting from the encryption process
In the past an encryption device was perceived as a unit that receives plaintext input
and produces cipher text output and vice-versa Attacks were therefore based on either
knowing the cipher text (such as cipher text-only attacks) or knowing both (such as known
plaintext attacks) or on the ability to define what plaintext is to be encrypted and then seeing
the results of the encryption (known as chosen plaintext attacks) Today it is known that
encryption devices have additional output and often additional inputs which are not the
plaintext or the cipher text
Encryption devices produce timing information (information about the time that
operations take) that is easily measurable radiation of various sorts power consumption
statistics (that can be easily measured as well) and more Often the encryption device also has
additional ldquounintentionalrdquo inputs such as voltage that can be modified to cause predictable
outcomes Side channel attacks make use of some or all of this information along with other
(known) cryptanalytic techniques to recover the key the device is using
Side channel analysis techniques are of concern because the attacks can be mounted
quickly and can sometimes be implemented using readily available hardware costing from only
a few hundred dollars to thousands of dollars
51 Classification of side channel attacks
The literature usually classifies side channel attacks along two orthogonal axes
1 Invasive vs Non-invasive
21
Invasive attacks require de-packaging the chip to get direct access to its components
A typical example of this is the connection of a wire on a data bus to see the data transfers
A non-invasive attack only exploits externally available information (the emission of
which is however often unintentional) such as running time power consumption
A new distinction called semi-invasive attacks These attacks have the specificity that
they require de-packaging of the chip to get access to the chip surface but do not tamper with
the passivation layer ( they do not require electrical contact to the metal surface)
2 Active vs passive
Active attacks try to tamper with the cards proper functioning For example fault
induction attacks will try to induce errors in the computation
As opposed passive attacks will simply observe the cards behavior during its
processing without disturbing it
Note that these two axes are well orthogonal
An Invasive attack may completely avoid disturbing the cards behavior and a passive
attack may require a preliminary de-packaging for the required information to be observable
These attacks are of course not mutually exclusive an invasive attack may for example serve
as a preliminary step for a non-invasive one by giving a detailed description of the chips
architecture that helps to find out where to put external probes
As smart cards are usually equipped with protection mechanisms that are supposed to
react to invasive attacks (although several invasive attacks are nonetheless capable to defeat
these mechanisms as will be illustrated below) On the other hand it is worth pointing out that
a non-invasive attack is completely undetectable there is for example no way for a smart card
to figure out that its running time is currently being measured Other countermeasures will
therefore be necessary From an economical point of view invasive attacks are usually more
expensive to deploy on a large scale since they require individual processing of each attacked
device In this sense non-invasive attacks constitute therefore a bigger menace for the smart
card industry
Invasive attacks involved a relatively high capital investment for lab equipment plus a
moderate investment of effort for each individual chip attacked Non-invasive attacks require
only a moderate capital investment plus a moderate investment of effort in designing an attack
on a particular type of device Thereafter the cost per device attacked is low Semi-invasive
attacks can be carried out using very cheap and simple equipment
The attacker can gain information by
22
1 Probing attacks
2 Fault induction attacks
3 Timing attacks
4 Power analysis attacks and
5 Electromagnetic timing attacks
These attacks are performed during the switching behavior of digital
complementary metalndashoxidendashsemiconductor (CMOS) gates Of all these Power analysis attack
is of major concern
52 Power analysis attacks
The power consumption of a cryptographic device may provide much information
about the operations that take place and the involved parameters This is the idea of simple and
differential power analysis first introduced by Kocher et al As the clock ticks the cards
energy is also provided by the terminal and can therefore easily be measured Basically to
measure a circuits power consumption a small (eg 50 ohm) resistor is inserted in series with
the power or ground input The voltage difference across the resistor divided by the resistance
yields the current Well-equipped electronics labs have equipment that can digitally sample
voltage differences at extraordinarily high rates (over 1GHz) with excellent accuracy (less than
1 error) Devices capable of sampling at 20MHz or faster and transferring the data to a PC
can be bought for less than US$ 400
Power analysis attacks are of two types
1 Simple power analysis attack and
2 Differential Power Analysis attack
SPA attacks on smartcards typically take a few seconds per card while DPA attacks
can take several hours In a general with a somewhat academic perspective we may consider
the entire internal state of the block cipher to be all the intermediate results and values that are
never included in the output in normal operations For example DES has 16 rounds we can
consider the intermediate states state [115] after each round except the last as a secret internal
state Side channels typically give information about these internal states or about the
operations used in the transition of this internal state from one round to another The type of
side-channel will of course determine what information is available to the attacker about these
states The attacks typically work by finding some information about the internal state of the
cipher which can be learned both by guessing part of the key and checking the value directly
23
and additionally by some statistical property of the cipher that makes that checkable value
slightly nonrandom
521 Simple Power Analysis attack (SPA)
Simple Power Analysis is generally based on looking at the visual representation of the
power consumption of a unit while an encryption operation is being performed Simple Power
Analysis is a technique that involves direct interpretation of power consumption measurements
collected during cryptographic operations SPA can yield information about a devices
operation as well as key material
A trace refers to a set of power consumption measurements taken across a
cryptographic operation For example a 1 millisecond operation sampled at 5 MHz yields a
trace containing 5000 points Figure for example shows an SPA trace from a smart card
performing a DES operation
Figure SPA monitoring from a single DES operation performed by a typical smart card The
upper trace shows the entire encryption operation including the initial permutation the 16
DES rounds and the final permutation The lower trace is a detailed view of the second and
third rounds
Because SPA can reveal the sequence of instructions executed it can be used to break
cryptographic implementations in which the execution path depends on the data being
processed For example
DES key schedule the DES key schedule computation involves rotating 28-bit key registers
A conditional branch is commonly used to check the bit shifted off the end so that ldquo1 bits can
24
be wrapped around The resulting power consumption traces for a ldquo1 bit and a ldquo0 bit will
contain different SPA features if the execution paths take different branches for each
DES permutations DES implementations perform a variety of bit permutations Conditional
branching in software or microcode can cause significant power consumption differences for
ldquo0 and ldquo1 bits
Comparisons String or memory comparison operations typically perform a conditional
branch when a mismatch is found This conditional branching causes large SPA (and
sometimes timing) characteristics
Multipliers Modular multiplication circuits tend to leak a great deal of information about the
data they process The leakage functions depend on the multiplier design but are often strongly
correlated to operand values and Hamming weights
Exponentiators A simple modular exponentiation function scans across the exponent
performing a squaring operation in every iteration with an additional multiplication operation
for each exponent bit that is equal to ldquo1 The exponent can be compromised if squaring and
multiplication operations have different power consumption characteristics take different
amounts of time or are separated by different code Modular exponentiation functions that
operate on two or more exponent bits at a time may have more complex leakage functions
522Differential Power Analysis attack (DPA)
In addition to large-scale power variations due to the instruction sequence there are
effects correlated to data values being manipulated These variations tend to be smaller and are
sometimes overshadowed by measurement errors and other noise In such cases it is still often
possible to break the system using statistical functions tailored to the target algorithm
To implement the DPA attack an attacker first observes m encryption operations and captures
power traces T1 m [1 k] containing k samples each In addition the attacker records the
cipher text C1 m No knowledge of the plain text is required DPA analysis uses power
consumption measurements to determine whether a key block guess Ks is correct The attacker
computes a k-sample differential trace centD [1 k] by finding the difference between the
average of the traces for which a certain intermediate value V is one and the average of the
traces for which V is zero Thus cent D[j) is the average over C1m of the effect due to the value
represented by the selection function D on the power consumption at point j In particular25
If Ks is incorrect the bit computed using D will differ from the actual target bit for about half
of the ciphertext Ci The selection function is thus effectively uncorrelated to what was actually
computed by the target device If a random function is used to divide a set into two subsets the
difference in the averages of the subsets should approach zero as the subset sizes approach
infinity
Thus because trace components uncorrelated to D will diminish with 1 pm causing the
differential trace to become at (the actual trace may not be completely at as D with Ks
incorrect may have a weak correlation to D with the correct Ks) If Ks is correct however the
computed value for D (Ci bKs) will equal the actual value of target bit b with probability 1
The selection function is thus correlated to the value of the bit considered Other data values
measurement errors etc that are not correlated to D approach zero Because power
consumption is correlated to data bit values the plot of centD will be degat with spikes in regions
where D is correlated to the values being processed The correct value of Ks can thus be
identified from the spikes in its differential trace Four values of b correspond to each S box
providing confirmation of key block guesses Finding all eight Ks yields the entire 48-bit round
sub key The remaining 8 key bits can be found easily using exhaustive search or by analyzing
one additional round Triple DES keys can be found by analyzing an outer DES operation first
using the resulting key to decrypt the cipher text and attacking the next DES key DPA can use
known plaintext or known cipher text and can find encryption or decryption keys
26
CHAPTER 6 CONSTANT POWER CONSUMING
LOGIC STYLES
The power consumption of traditional standard cells and logic is
dependent on the signal activity When the output of the logic gate makes
a 0 to 1 transition a current comes from the power supply and charges the
output capacitance On the other hand when the output sees a 1 to 0 a 0
to 0 or a 1 to 1 transition no or only a limited amount of energy (due to
short circuit or leakage) is consumed from the power supply This is the
fundamental reason why information is leaked through the power supply
and why power attacks are possible The basis of a secure digital design
flow is a logic style with constant power consumption
61 Current Mode Logic
Current mode logic (CML) eg current steering logic seems the
ideal solution This type of logic continuously draws a current from the
supply and measures its state through the path that the current takes A
gate has constant power consumption if it draws a perfectly constant
current from the power supply independently of the input and output
signals To build a current source capable of generating a constant current
special circuit techniques that minimize channel length modulation have to
be used
The decisive drawback of CML however is its static power
consumption When the logic gate is not processing any data it burns the
27
current which makes this logic style unacceptable for embedded battery-
operated devices
62 Voltage Mode Logic (CMOS circuit styles)
Voltage mode logic (VML) eg static CMOS logic only draws a current from the
supply to change state and measures its state by the amount of charge it stores on a
capacitance A regular standard CMOS circuit will only consume power when a capacitance
gets charged and later discharged ie when a gate switches state It is the main reason that
CMOS is the style of choice for every battery operated or low power device This is illustrated
in the figure below for simple inverter Thus static CMOS is the preferred logic style because
of its low power consumption and high noise margins
Standard CMOS inverter
Yet two conditions must be satisfied for VML to have constant power consumption
namely
1) A logic gate must have exactly one switching event per signal transition
2) The logic gate must charge a constant capacitance in that switching event
28
Here above all the four transitions of CMOS inverter can be distinguished when
monitoring the power supply
63 Dynamic Differential Logic
Dynamic differential logic sometimes also referred to as dual rail with pre-charge
logic fulfills the first condition A differential logic family uses the true and the false
representation of the input and output signals and a dynamic logic family alternates pre-charge
and evaluation phases As a result since both outputs (true and false) are pre-charged to 1
exactly one of the two output nodes evaluates to 0 to have a differential output signal in the
evaluation phase The discharged output node is charged to 1 in the following pre-charge phase
to pre-charge both outputs to 1 In other words every signal transition including the events in
which the input signals remain constant is represented with an actual switching event in
which the logic gate charges a capacitance All the logic families that have been introduced to
thwart the differential power analysis (DPA) by using dynamic differential logic in the
following techniques
1 Sense Amplifier Based Logic (SABL) and
2 Wave Dynamic Differential Logic (WDDL) gates
631 Sense Amplifier Based logic (SABL)
SABL has its main advantage that it has balanced input and output nodes and that all
internal nodes connect to an output The output capacitances can be balanced Systematic
methods have been developed to make sure that both branches of the differential pull down
network are balanced and that no memory effects are present in the network Sense Amplifier
Based logic is illustrated as
29
Sense Amplifier Based Logic
ANDNAND gate
This circuit style does require however a full custom characterization and layout It also
suffers from a high clock load common to all dynamic logic gates
632 Wave Dynamic Differential Logic Gates (WDDL)
WDDL logic can be implemented with static CMOS logic Static CMOS
standard cells are combined to form secure compound standard cells
which have a reduced power signature WDDL has many advantages It can
be readily implemented from an existing standard cell library The design
flow is fully supported with accurate EDA library files that come directly
from the vendor WDDL also results in a dynamic differential logic with only
a small load capacitance on the pre-charge control signal and with the low
power consumption and the high noise margins of static CMOS
Advantages of WDDL logic style are as follows
30
A major advantage of the proposed logic style is that it can be incorporated by the common
Electronic Design Automation (EDA) tool flow
No special design rules are involved in the interconnection of WDDL gates
The switching factor of WDDL is 100 A WDDL gate consists of a parallel
combination of two positive complementary gates one calculating the
true output using the true inputs the other the false output using the
false inputs A positive gate produces a zero output for an all zero input
The AND gate and the OR gate are examples of positive gates A
complementary gate sometimes also referred to as a dual gate
expresses the false output of the original logic gate using the false
inputs of the original gate The AND gate fed with true input signals and
the OR gate fed with false input signals are two dual gates Fig shows
the WDDL AND gate and the WDDL OR gate In the evaluation phase
each input signal is differential and the WDDL gate calculates its
differential output In the pre-charge phase the inputs to the WDDL gate
are set at 0 This puts the output of the gate at 0 A module in WDDL
pre-charges without distributing the pre-charge signal to each individual
gate During the pre-charge phase the input vector of the combinatorial
logic is set at all 0s Each individual gate will eventually have all its
inputs at 0 evaluate its output to 0 and pass this 0 value to the next
gate One could say that the pre-charge signal travels over the
combinatorial logic as a 0-wave hence WDDL There are several ways
to launch to pre-charge wave In Fig a pre-charge operator is inserted
at the start of every combinatorial logic tree ie the inputs of the
encryption module and the outputs of the registers They produce an all-
zero output in the pre-charge phase (clk-signal high) but let the
31
differential signal through during the evaluation phase (clk-signal low)
Fig
ure WDDL Pre-charge wave generationCHAPTER 7
WDDL GATESThe methodology used in the project is bottom-up approach Lower
modules are designed and later integrated to form larger modules whose further integration
leads to the final top module As it is a fact that logic gates form lower level modules
initially logic gates required for the design are implemented in WDDL style WDDL
demands a parallel combination of two positive complementary gates one calculating the
true value and the other negative value The logic gates like OR AND XOR have been
implemented Besides there is even implementation of Full Adder 32-bit XOR
etc71WDDL OR gateA WDDL OR gate is constructed by considering conventional
OR gate in parallel to its complementary gate ie AND gate as shown in the following
32
figure Figure
41 WDDL OR GateThe NAND and NOT gates used act as Precharge operator injecting
signal lsquo0rsquo into the gates when lsquoclkrsquo signal is high ie during the Precharge phase72
WDDL AND gateA WDDL AND gate is constructed by considering conventional
AND gate in parallel to its complementary gate ie OR gate as shown in the following
33
figure Figure
42 WDDL AND Gate73 WDDL NAND gateA WDDL NAND gate is constructed by
considering conventional AND gate in parallel to its complementary gate ie OR gate as
shown in the following figure
34
Figure
43 WDDL NAND Gate74 WDDL NOR gateA WDDL NOR gate is constructed by
considering conventional OR gate in parallel to its complementary gate ie AND gate as
shown in the following figure
35
Figure 44 WDDL
NOR Gate 75 WDDL XOR gate XOR function can be implemented by the
Boolean Equation A B = ABrsquo + ArsquoB Therefore XOR Gate is implemented
in terms of AND gate and OR gate as shown in the figure 44 It can also be implemented
by instantiating a WDDL AND gate and WDDL OR gate But the number of gates
involved in the latter one is greater than the former one Therefore the first method of
implementation is followed rather than the second one
36
Figure 45
WDDL XOR gateWith the help of the above basic gates Full adder circuit has been
designed by instantiating the above designed WDDL gates During the implementation of
the Blowfish algorithm a 32-bit XOR gate and 32-bit Adder circuit are required They can
be easily implemented by instantiating the corresponding lower module 32 number of
timesCHAPTER 8 FRONT END
RESULTSWDDL OR GATESynthesis
Report==========================================================
= Final Report
===========================================================Final
ResultsRTL Top Level Output File Name wddlorngrTop Level Output File Name
wddlorOutput Format NGCOptimization Goal SpeedKeep
Hierarchy NODesign Statistics IOs 5Cell Usage
BELS 2 LUT3 2 IO Buffers 5
37
IBUF 3 OBUF
2============================================================Devi
ce utilization summary---------------------------Selected Device 3s250eft256-4 Number
of Slices 1 out of 2448 0 Number of 4 input LUTs 2
out of 4896 0 Number of IOs 5 Number of bonded IOBs
5 out of 172 2 Timing Summary---------------Speed Grade -4Maximum
combinational path delay 6236nsSimulation Result
S
ynthesis Result
38
WDD
L AND GATESynthesis
Report==========================================================
== Final Report
============================================================Fina
l ResultsRTL Top Level Output File Name wddlgatesngrTop Level Output File
Name wddlgatesOutput Format NGCOptimization Goal
SpeedKeep Hierarchy NODesign Statistics IOs
5Cell Usage BELS 2 LUT3 2 IO Buffers
5 IBUF 3 OBUF
2===========================================================Devic
e utilization summary---------------------------Selected Device 3s250etq144-4 Number
of Slices 1 out of 2448 0 Number of 4 input LUTs 2
out of 4896 0 Number of IOs 5 Number of bonded IOBs
5 out of 108 4 Timing Summary---------------Speed Grade -4Maximum
combinational path delay 6236nsSimulation Result
39
Sy
nthesis Result
WDDL NAND GATESynthesis
Report==========================================================
== Final Report
============================================================Fina
l ResultsRTL Top Level Output File Name wddlnand1ngrTop Level Output File
Name wddlnand1Output Format NGCOptimization Goal
SpeedKeep Hierarchy NODesign Statistics IOs
5Cell Usage BELS 2 LUT3 2 IO Buffers
40
5 IBUF 3 OBUF
2============================================================Devi
ce utilization summarySelected Device 3s500efg320-4 Number of Slices
1 out of 4656 0 Number of 4 input LUTs 2 out of 9312 0
Number of IOs 5 Number of bonded IOBs 5 out of 232
2 Timing SummarySpeed Grade -4Maximum combinational path delay
6236nsSimulation Result
Synthesis Result
WD
41
DL XOR GATESimulation Result
Synthesis Result
WDDL XOR GATESynthesis
Report==========================================================
== Final Report
===========================================================Final
42
ResultsRTL Top Level Output File Name wddlxorgatengrTop Level Output File
Name wddlxorgateOutput Format NGCOptimization Goal
SpeedKeep Hierarchy NODesign Statistics IOs
5Cell Usage BELS 2 LUT3 2 IO Buffers
5 IBUF 3 OBUF
2============================================================Devi
ce utilization summary---------------------------Selected Device 3s250eft256-4 Number
of Slices 1 out of 2448 0 Number of 4 input LUTs 2
out of 4896 0 Number of IOs 5 Number of bonded IOBs
5 out of 172 2 Timing Summary---------------Speed Grade -4Maximum
combinational path delay 6236nsSimulation Result
Synthesis Result
43
CHAPTER 9 SUMMARY AND CONCLUSION 91
SummaryIn order to provide security to ICs against side-channel attacks especially
Differential Power Analysis (DPA) it is necessary to implement the design in a logic that
can render constant power dissipation irrespective of the input combination WDDL is
proved to be advantageous to others and therefore is of great significance In this
dissertation work architecture for Blowfish Algorithm is designed and implemented in
WDDL style In this implementation bottom-up approach is used The low level entities
are designed and later they are all combined to form the entire module The key
scheduling is online The sub-keys generated for a particular key can be used for the
encryption of the entire data to be encrypted with that key The sub keys are given in
reverse direction for the decryption data path Initially logic gates are implemented in
WDDL and then higher modules have been designed by instantiating the WDDL gates to
form the entire module thus resulting in constant power dissipation irrespective of any
input data combination The entire design works in two phases namely Precharge phase and
Evaluation phase In the Precharge phase all the signals of the design are zeroed and
during the Evaluation phase the functionality of the design is achieved This sort of design
has been found simple and very effective in thwarting the side-channel attack namely
Differential Power analysis (DPA)92 ConclusionThe crypto processor has been
44
designed for the key size of 448 bits and plain text of 64 bits The code for the
implementation has been written in VHDL The functional verification has been done using
the Cadence (NC Launch) Simulation package and Synthesis using RTL Compiler The
Backend of the design is done using the SOC EncounterAccording to the specifications
desired functionality has been achieved In the output during the Evaluation phase there
has been same number of transitions thus resulting in constant power dissipation During
Synthesis it has been observed that a simple WDDL gate comprised many conventional
gates Therefore the area of the design has grown nearly three-fold when compared to the
design implemented in conventional CMOS logic at the cost of security incorporated into
the IC against Differential Power Analysis (DPA) Due to the constant power dissipation at
the output hacker cannot apply Differential Power Analysis (DPA) scheme to find the
secret key that is being used in the crypto-processor Thus security against DPA is
incorporated into the IC at hardware level by implementing the design in WDDL style
which is quite simple and effectiveCHAPTER 10
REFERENCES 101 Referred Technical papers[1] Kris Tiri Member
IEEE and Ingrid Verbauwhede Senior Member IEEE ldquoA Digital Design Flow for
Secure Integrated Circuitsrdquo IEEE Transaction on Computer-Aided Design of Integrated
Circuits and Systems Vol 25 No 7 July 2006[2] Prof Jean-Jacques Quisquater Math
RiZK laquo Side Channel Attacks raquo October 2002[3] Ross Anderson Mike Bond Jolyon
Clulow and Sergei Skorobogatov ldquoCrypto processors ndash A Surveyrdquo IEEE Proceedings[4]
Noohul Basheer Zain Ali James M Noras ldquoOptimal Data path Design for a Cryptographic
Processor The Blowfish Algorithmrdquo Malaysian Journal of Computer Science Vol 14 No
1 June 2001 pp 16-27[5] Dan Rinehimer Derek Wilson ldquoSummary of B Schneierrsquos
Description of New Variable Length Key 64-Bit Block Cipher (Blowfish)rdquo[6] Kris Tiri and
Ingrid Verbauwhede ldquoA Dynamic and Differential CMOS Logic Style to Resist Power and
Timing Attacks on Security ICrsquosrdquo[7] Discretix Technologies Limited ldquoIntroduction to Side
45
Channel Attacksrdquo[8] Kris Tiri Moonmoon Akmal and Ingrid Verbauwhede ldquoA Dynamic
and Differential Logic with Signal Independent Power Consumption to withstand
Differential Power Analysis on Smart CardsrdquoReference books[1] William Stallings
ldquoCryptography and Network Security Principles and Practicesrdquo Pearson Education
2003[2] Samir Palnitkar ldquoVerilog HDL A Guide to Digital Design and Synthesisrdquo
Prentice Hall Referred Web Pages[1] httpwwwschneiercomblowfishhtml[2]
httpwwwnistgov[3] httpwwwdiscretixcomPDFIntroduction20to20Side20Channel
20Attackspdf[4] httpwwwwipointpctdbenwojsp
IA=WO2005081085ampDISPLAY=CLAIMS
46
Date __________
CERTIFICATE
This is to certify that the mini project entitled ldquoIMPLEMENTATION OF WDDL GATES FOR SECURE IC APPLICATIONSrdquo was successfully carried out by
R SATISH KUMAR 06R01A0440
M SHESHU BINDU 06R01A0441
M SRAVAN KUMAR 06R01A0444
In partial fulfillment of the requirement for the award of Bachelor of Technology in ldquoElectronics and communication Engineeringrdquo
from ldquoJawaharlal Nehru Technological Universityrdquo during academic year 2006 - 2010
Internal Guide Head of the DepartmentS KHAJA MOHIDDIN MTech ProfK Ramanaiah
External Examiner PRINICIPAL Dr M Janga Reddy
2
ACKNOWLEDGMENT
We express our sincere thanks to the management of VEDIC
SCHOOL OF VLSI DESIGN for giving us this opportunity to work in their
organization
We express our immense gratitude to MrMRKArjun FPGA Design
Engineer(Simpli5ng Semiconductor PvtLtd) his inspiring remarks and
simulating guidance valuable suggestion and encouragement helped us greatly in
completion of our project ldquoIMPLEMENTATION OF WDDL GATES FOR
SECURE IC APPLICATIONSrdquo
We wish to thank internal guide of our project Mr S KHAJA
MOHIDDIN Department of Electronics for his constant inspiration and advice
throughout our project work
We express our sincere gratitude to respected Mr JANGA REDDY Pricipal of
CMRIT and Mr K RAMANAIAH HOD of ECE department for their
valuable guidance encouragement and suggestions
3
INDEX
ABSTRACT
CHAPTER 1 INTRODUCTION AND OBJECTIVE
11 INTRODUCTION 12 OBJECTIVE CHAPTER 2 REVIEW OF LITERATURE 21 INTRODUCTION TO DIGITAL DESIGN FLOW 22 SECURE DIGITAL DESIGN FLOW
CHAPTER 3 HARDWARE DESCRIPTION LANGUAGE (VHDL) CHAPTER 4 SMART CARD OVERVIEW
CHAPTER 5 SIDE CHANNEL ATTACKS
51 CLASSIFICATION FO SIDE CHANNEL ATTACKS 52 POWER ANALYSIS ATTACKS 521 SIMPLE POWER ANALYSIS (SPA) 522 DIFFERENTIAL POWER ANALYSIS (DPA) CHAPTER 6 CONSTANTndashPOWER CONSUMING LOGIC STYLES 61 CURRENT MODE LOGIC 62 VOLTAGE MODE LOGIC (CMOS CIRCUIT STYLES) 63 DYNAMIC DIFFERENTIAL LOGIC 631 SENSE AMPLIFIER BASED LOGIC (SABL) 632 WAVE DYNAMIC DIFFERENTIAL LOGIC GATES
(WDDL) CHAPTER 7 DESIGN OF WDDL GATES 71 WDDL GATES 711 WDDL OR GATE 712 WDDL AND GATE 713 WDDL NAND GATE 714 WDDL NOR GATE 715 WDDL XOR GATE CHAPTER 8 FRONT END RESULTS
CHAPTER 9 SUMMARY AND CONCLUSION 91 SUMMARY 92 CONCLUSION
CHAPTER 10 REFERENCES
4
ABSTRACT
Every electronic device needs security from the smallest RFID tags to the larger
hand held devices Security is needed for financial medical consumer automotive
applications and other applications Small-embedded integrated circuits (ICs) such as smart
cards are vulnerable to the so-called side-channel attacks (SCAs) Side channel attacks are a
class of attacks that derive information from the integrated circuits while it is in operation The
attacker can gain information by monitoring the power consumption execution time
electromagnetic radiation and other information leaked by the switching behavior of digital
complementary metalndashoxidendashsemiconductor (CMOS) gates Eg execution times that depend
on values of data andor key show what they are doing Simple timing or power attacks give
visual information on the circuit This project presents a digital very large scale integrated
(VLSI) design flow to create secure power-analysis-attack-resistant ICs The route cause for
this problem is that standard CMOS is power efficient and it will only consume dynamic
power when nodes are switching
The idea is to create digital circuit styles that have a switching behavior independent of
the data or sequence of the data that they are processing A logic style called ldquoWave Dynamic
Differential Logic (WDDL)rdquo is used for the implementation of the basic logic gates which are
used in the cryptographic processors The design flow starts from a normal design in a
hardware description language such as VHDL to the Side Channel Attack (SCA) resistant
layout
5
Figure WDDL Pre-charge wave generation
6
CHAPTER 1 INTRODUCTION
AND OBJECTIVE
11 Introduction
Small-embedded integrated circuits (ICs) such as smart cards are vulnerable to the so-
called side-channel attacks (SCAs) The attacker can gain information by monitoring the power
consumption execution time electromagnetic radiation and other information leaked by the
switching behavior of digital complementary metalndashoxidendashsemiconductor (CMOS) gates This
project presents a digital very large scale integrated (VLSI) design flow to create secure power-
analysis-attack-resistant ICs
The idea is to create digital circuit styles that have a switching behavior independent of
the data or sequence of the data that they are processing A logic style called Wave Dynamic
Differential Logic (WDDL) is used for the implementation of the basic logic gates which are
used in the cryptographic processors The design flow starts from a normal design in a
hardware description language such as VHDL to the Side Channel Attack (SCA) resistant
layout
Depending on the parameter considered the side-channel attacks are classified as
probing attacks fault induction attack timing attack power analysis attack electromagnetic
analysis attack etc One Side Channel Attack in particular namely the Differential Power
Analysis (DPA) is of great concern It is very effective in finding the secret key and can be
mounted quickly with off-the-shelf devices The attack is based on the fact that logic
operations have power characteristics that depend on the input data It relies on statistical
analysis to extract the information from the power consumption that is correlated to the secret
key As the variations actually originate at the logic level implementing the encryption and
decryption modules in a logic style for which a logic gate has at all times constant power
7
consumption independently of signal transitions removes the foundation of DPA and is an
effective means to halt DPA
12 Objective of the Project
The main objectives of this dissertation are
Study of constant-power logic styles
Description of WDDL Gates
Implementation of WDDL Logic Gates
Verification of the functionality of WDDL Logic Gates
Synthesis of the design
Analysis of the reports obtained during simulation and synthesis
8
CHAPTER 2 REVIEW
OF LITERATURE
21 Introduction to Digital Design Flow
A typical digital design flow for any IC is as follows Design Entry (Specification
Architecture RTL Coding and RTL Verification) Synthesis and post-synthesis
verification Backend (Floor Planning Place and route Layout) Tape Out to Foundry to get
the end product All modern digital designs start with a designer writing a hardware description
of the IC (using HDL or Hardware Description Language) in VerilogVHDL A Verilog or
VHDL program essentially describes the hardware (logic gates Flip-Flops counters etc) the
inter connect of the circuit blocks and the functionality Various CAD tools are available to
synthesize a circuit based on the HDL
22 Secure Digital Design Flow
The secure digital design flow is depicted in Fig In addition to the
regular steps in an IC design (logic design logic synthesis place amp route
stream out and verifications) one can recognize two additional steps
namely 1) ldquocell substitutionrdquo and 2) ldquointerconnect decompositionrdquo These
operations have been inserted in the back end of the flow and do not
interfere with the creative part of a design indicated by the ldquologic designrdquo
task
9
Figure 21 Secure Digital Design Flow
During the cell substitution step cells that are designed by any constant power logic style
replace the conventional CMOS gates This ensures the security of the ICs against power
analysis attacks
10
CHAPTER 3 HARDWARE DESCRIPTIVE
LANGUAGE (VHDL)
Why (V) HDL
Interoperability
Technology independence
Design reuse
Several levels of abstraction
Readability
Standard language
Widely supported
What is VHDL
VHDL = VHSIC Hardware Description Language(VHSIC = Very High-Speed IC)
Design specification language
Design entry language
Design simulation language
Design documentation language
An alternative to schematics
Brief History
VHDL Was developed in the early 1980s for managing design problems that involved
large circuits and multiple teams of engineers
Funded by US Department of Defence
11
The first publicly available version was released in 1985
In 1986 IEEE (Institute of Electrical and Electronics Engineers Inc) was presented
with a proposal to standardize the VHDL
In 1987 standardization =gt IEEE 1076-1987
An improved version of the language was released in 1994 =gt IEEE standard1076-
1993
Related Standards
IEEE 1076 doesnrsquot support simulation conditions such as unknown and high-
impedance
Soon after IEEE 1076-1987 was released simulator companies began using their own
non-standard types =gt VHDL was becoming a nonstandard
IEEE 1164 standard was developed by an IEEE1048715IEEE 1164 contains definitions for a
nine-valued data type std_logic
IEEE 10763 (Numeric or Synthesis Standard) defines data types as they relate to actual
hardware
Defines eg two numeric types signed and unsigned
VHDL Environment
12
Design Units
Segments of VHDL code can be compiled separately and stored in a library
Entities
A black box with interface definition
Defines the inputsoutputs of a component (define pins)
A way to represent modularity in VHDL
Similar to symbol in schematic
Entity declaration describes entity
Eg
Entity Comparator is
Port (A B in std_logic_vector (7 downto0)
EQ out std_logic)
end Comparator
13
Ports
Provide channels of communication between the component and its environment
Each port must have a name direction and a type
An entity may have NO port declaration
Port directions
In A value of a port can be read inside the component but cannot be assigned
Multiple reads of port are allowed
Out Assignments can be made to a port but data from a port cannot be read Multiple
assignments are allowed
In out Bi-directional assignments can be made and data can be read Multiple
assignments are allowed
Buffer An out port with read capability May have at most one assignment (are not
recommended)
Architectures
Every entity has at least one architecture
One entity can have several architectures
Architectures can describe design using
BehaviorndashStructurendashDataflow
Architectures can describe design on many levelsndashGate levelndashRTL (Register Transfer
Level)ndashBehavioral level
Configuration declaration links architecture to entity
Eg
Architecture Comparator1 of Comparator is
Begin
EQ lt= rsquo1rsquowhen (A=B) else rsquo0rsquo
End Comparator1
Configurations
Links entity declaration and architecture body together
14
Concept of default configuration is a bit messy in VHDL lsquo87
ndashLast architecture analyzed links to entity
Can be used to change simulation behavior without re-analyzing the VHDL source
Complex configuration declarations are ignored in synthesis
Some entities can have eggate level architecture and behavioral architecture
Are always optional
Packages
Packages contain information common to many design units
1 Package declaration
Constant declarations
ndash Type and subtype declarations
ndash Function and procedure declarations
ndash Global signal declarations
ndash File declarations
ndash Component declarations
2 Package body
ndash Is not necessary needed
ndash Function bodies
ndash Procedure bodies
Packages are meant for encapsuling data which can be shared globally among several design
units These consist of declaration part and optional body part
Package declaration can contain
ndash Type and subtype declarations
ndash Subprograms
ndash Constants
ndash Alias declarations
ndash Global signal declarations
ndash file declarations
ndash Component declarations
Package body consists of
15
ndash Subprogram declarations and bodies
ndash Type and subtype declarations
ndash Deferred constants
ndash File declarations
Libraries
Collection of VHDL design units (database)
1 Packages
package declaration
package body
2 Entities (entity declaration)
3 Architectures (architecture body)
4 Configurations (configuration declarations)
Usually directory in UNIX file system
Can be also any other kind of database
Levels of Abstraction
VHDL supports many possible styles of design description which differ primarily in how
closely they relate to the HW
It is possible to describe a circuit in a number of ways
Structural-------
Dataflow ------- Higher level of abstraction
Behavioral -------
Structural VHDL description
Circuit is described in terms of its components
From a low-level description (eg transistor-level description) to a high level
description (eg block diagram)
For large circuits low-level descriptions quickly become impractical
Dataflow VHDL Description
Circuit is described in terms of how data moves through the system
16
In the dataflow style you describe how information flows between registers in the
system
The combinational logic is described at a relatively high level the placement and
operation of registers is specified quite precisely
The behavior of the system over the time is defined by registers
There are no build-in registers in VHDL-language
ndashEither lower level description
ndashor behavioral description of sequential elements is needed
The lower level register descriptions must be created or obtained
If there is no 3rd party models for registers =gt you must write the behavioral
description of registers
The behavioral description can be provided in the form of subprograms(functions or
procedures)
Behavioral VHDL Description
Circuit is described in terms of its operation over time
Representation might include eg state diagrams timing diagrams and algorithmic
descriptions
The concept of time may be expressed precisely using delays (eg A lt= B after 10 ns)
If no actual delays are used order of sequential operations is defined
17
In the lower levels of abstraction (eg RTL) synthesis tools ignore detailed timing
specifications
The actual timing results depend on implementation technology and efficiency of
synthesis tool
There are a few tools for behavioral synthesis
Concurrent Vs Sequential
Processes
Basic simulation concept in VHDL
VHDL description can always be broken up to interconnected processes
Quite similar to UNIX process
18
Process keyword in VHDL
Process statement is concurrent statement
Statements inside process statements are sequential statements
Process must contain either sensitivity list or wait statement(s) but NOT both
Sensitivity list or wait statement(s) contains signals which wakes process up
General Format
Process [(sensitivity list)]
process_declarative_part
begin
process_statements
[wait_statement]
End process
19
CHAPTER 4 SMART
CARD OVERVIEW
This section will very briefly introduce the concept of a smart card Basically a smart
card is a computer embedded in a safe It consists of a (typically 8-bit or 32-bit) processor
together with ROM EEPROM and a small amount of RAM which is therefore capable of
performing computations The main goal of a smart card is to allow the execution of
cryptographic operations involving some secret parameter (the key) while not revealing this
parameter to the outside world As opposed the goal of the attacker is to recover this secret
parameter This processor is embedded in a chip and connected to the outside world through
eight wires the role use position of which is normalized In addition to the inputoutput wires
the parts we will be the most interested in are the following
1 Power supply Smart cards do not have an internal battery
2 The current they need is provided by the smart card reader This will make the smart
cards power consumption pretty easy to measure for the attacker
3 Clock Similarly smart cards do not dispose of an internal clock either The clock ticks
must also be provided from the outside world As a consequence this will allow the
attacker to measure the cards running time with very good precision
Smart cards are usually equipped with protection mechanisms composed of a shield (the
passivation layer) whose goal is to hide the internal behavior of the chip and possibly sensors
that react when the shield is removed by destroying all sensitive data and preventing the card
to function properly
20
CHAPTER 5 SIDE
CHANNEL ATTACKS
ldquoSide channel attacksrdquo are attacks that are based on ldquoSide Channel Informationrdquo Side
channel information is information that can be retrieved from the encryption device that is
neither the plaintext to be encrypted nor the cipher text resulting from the encryption process
In the past an encryption device was perceived as a unit that receives plaintext input
and produces cipher text output and vice-versa Attacks were therefore based on either
knowing the cipher text (such as cipher text-only attacks) or knowing both (such as known
plaintext attacks) or on the ability to define what plaintext is to be encrypted and then seeing
the results of the encryption (known as chosen plaintext attacks) Today it is known that
encryption devices have additional output and often additional inputs which are not the
plaintext or the cipher text
Encryption devices produce timing information (information about the time that
operations take) that is easily measurable radiation of various sorts power consumption
statistics (that can be easily measured as well) and more Often the encryption device also has
additional ldquounintentionalrdquo inputs such as voltage that can be modified to cause predictable
outcomes Side channel attacks make use of some or all of this information along with other
(known) cryptanalytic techniques to recover the key the device is using
Side channel analysis techniques are of concern because the attacks can be mounted
quickly and can sometimes be implemented using readily available hardware costing from only
a few hundred dollars to thousands of dollars
51 Classification of side channel attacks
The literature usually classifies side channel attacks along two orthogonal axes
1 Invasive vs Non-invasive
21
Invasive attacks require de-packaging the chip to get direct access to its components
A typical example of this is the connection of a wire on a data bus to see the data transfers
A non-invasive attack only exploits externally available information (the emission of
which is however often unintentional) such as running time power consumption
A new distinction called semi-invasive attacks These attacks have the specificity that
they require de-packaging of the chip to get access to the chip surface but do not tamper with
the passivation layer ( they do not require electrical contact to the metal surface)
2 Active vs passive
Active attacks try to tamper with the cards proper functioning For example fault
induction attacks will try to induce errors in the computation
As opposed passive attacks will simply observe the cards behavior during its
processing without disturbing it
Note that these two axes are well orthogonal
An Invasive attack may completely avoid disturbing the cards behavior and a passive
attack may require a preliminary de-packaging for the required information to be observable
These attacks are of course not mutually exclusive an invasive attack may for example serve
as a preliminary step for a non-invasive one by giving a detailed description of the chips
architecture that helps to find out where to put external probes
As smart cards are usually equipped with protection mechanisms that are supposed to
react to invasive attacks (although several invasive attacks are nonetheless capable to defeat
these mechanisms as will be illustrated below) On the other hand it is worth pointing out that
a non-invasive attack is completely undetectable there is for example no way for a smart card
to figure out that its running time is currently being measured Other countermeasures will
therefore be necessary From an economical point of view invasive attacks are usually more
expensive to deploy on a large scale since they require individual processing of each attacked
device In this sense non-invasive attacks constitute therefore a bigger menace for the smart
card industry
Invasive attacks involved a relatively high capital investment for lab equipment plus a
moderate investment of effort for each individual chip attacked Non-invasive attacks require
only a moderate capital investment plus a moderate investment of effort in designing an attack
on a particular type of device Thereafter the cost per device attacked is low Semi-invasive
attacks can be carried out using very cheap and simple equipment
The attacker can gain information by
22
1 Probing attacks
2 Fault induction attacks
3 Timing attacks
4 Power analysis attacks and
5 Electromagnetic timing attacks
These attacks are performed during the switching behavior of digital
complementary metalndashoxidendashsemiconductor (CMOS) gates Of all these Power analysis attack
is of major concern
52 Power analysis attacks
The power consumption of a cryptographic device may provide much information
about the operations that take place and the involved parameters This is the idea of simple and
differential power analysis first introduced by Kocher et al As the clock ticks the cards
energy is also provided by the terminal and can therefore easily be measured Basically to
measure a circuits power consumption a small (eg 50 ohm) resistor is inserted in series with
the power or ground input The voltage difference across the resistor divided by the resistance
yields the current Well-equipped electronics labs have equipment that can digitally sample
voltage differences at extraordinarily high rates (over 1GHz) with excellent accuracy (less than
1 error) Devices capable of sampling at 20MHz or faster and transferring the data to a PC
can be bought for less than US$ 400
Power analysis attacks are of two types
1 Simple power analysis attack and
2 Differential Power Analysis attack
SPA attacks on smartcards typically take a few seconds per card while DPA attacks
can take several hours In a general with a somewhat academic perspective we may consider
the entire internal state of the block cipher to be all the intermediate results and values that are
never included in the output in normal operations For example DES has 16 rounds we can
consider the intermediate states state [115] after each round except the last as a secret internal
state Side channels typically give information about these internal states or about the
operations used in the transition of this internal state from one round to another The type of
side-channel will of course determine what information is available to the attacker about these
states The attacks typically work by finding some information about the internal state of the
cipher which can be learned both by guessing part of the key and checking the value directly
23
and additionally by some statistical property of the cipher that makes that checkable value
slightly nonrandom
521 Simple Power Analysis attack (SPA)
Simple Power Analysis is generally based on looking at the visual representation of the
power consumption of a unit while an encryption operation is being performed Simple Power
Analysis is a technique that involves direct interpretation of power consumption measurements
collected during cryptographic operations SPA can yield information about a devices
operation as well as key material
A trace refers to a set of power consumption measurements taken across a
cryptographic operation For example a 1 millisecond operation sampled at 5 MHz yields a
trace containing 5000 points Figure for example shows an SPA trace from a smart card
performing a DES operation
Figure SPA monitoring from a single DES operation performed by a typical smart card The
upper trace shows the entire encryption operation including the initial permutation the 16
DES rounds and the final permutation The lower trace is a detailed view of the second and
third rounds
Because SPA can reveal the sequence of instructions executed it can be used to break
cryptographic implementations in which the execution path depends on the data being
processed For example
DES key schedule the DES key schedule computation involves rotating 28-bit key registers
A conditional branch is commonly used to check the bit shifted off the end so that ldquo1 bits can
24
be wrapped around The resulting power consumption traces for a ldquo1 bit and a ldquo0 bit will
contain different SPA features if the execution paths take different branches for each
DES permutations DES implementations perform a variety of bit permutations Conditional
branching in software or microcode can cause significant power consumption differences for
ldquo0 and ldquo1 bits
Comparisons String or memory comparison operations typically perform a conditional
branch when a mismatch is found This conditional branching causes large SPA (and
sometimes timing) characteristics
Multipliers Modular multiplication circuits tend to leak a great deal of information about the
data they process The leakage functions depend on the multiplier design but are often strongly
correlated to operand values and Hamming weights
Exponentiators A simple modular exponentiation function scans across the exponent
performing a squaring operation in every iteration with an additional multiplication operation
for each exponent bit that is equal to ldquo1 The exponent can be compromised if squaring and
multiplication operations have different power consumption characteristics take different
amounts of time or are separated by different code Modular exponentiation functions that
operate on two or more exponent bits at a time may have more complex leakage functions
522Differential Power Analysis attack (DPA)
In addition to large-scale power variations due to the instruction sequence there are
effects correlated to data values being manipulated These variations tend to be smaller and are
sometimes overshadowed by measurement errors and other noise In such cases it is still often
possible to break the system using statistical functions tailored to the target algorithm
To implement the DPA attack an attacker first observes m encryption operations and captures
power traces T1 m [1 k] containing k samples each In addition the attacker records the
cipher text C1 m No knowledge of the plain text is required DPA analysis uses power
consumption measurements to determine whether a key block guess Ks is correct The attacker
computes a k-sample differential trace centD [1 k] by finding the difference between the
average of the traces for which a certain intermediate value V is one and the average of the
traces for which V is zero Thus cent D[j) is the average over C1m of the effect due to the value
represented by the selection function D on the power consumption at point j In particular25
If Ks is incorrect the bit computed using D will differ from the actual target bit for about half
of the ciphertext Ci The selection function is thus effectively uncorrelated to what was actually
computed by the target device If a random function is used to divide a set into two subsets the
difference in the averages of the subsets should approach zero as the subset sizes approach
infinity
Thus because trace components uncorrelated to D will diminish with 1 pm causing the
differential trace to become at (the actual trace may not be completely at as D with Ks
incorrect may have a weak correlation to D with the correct Ks) If Ks is correct however the
computed value for D (Ci bKs) will equal the actual value of target bit b with probability 1
The selection function is thus correlated to the value of the bit considered Other data values
measurement errors etc that are not correlated to D approach zero Because power
consumption is correlated to data bit values the plot of centD will be degat with spikes in regions
where D is correlated to the values being processed The correct value of Ks can thus be
identified from the spikes in its differential trace Four values of b correspond to each S box
providing confirmation of key block guesses Finding all eight Ks yields the entire 48-bit round
sub key The remaining 8 key bits can be found easily using exhaustive search or by analyzing
one additional round Triple DES keys can be found by analyzing an outer DES operation first
using the resulting key to decrypt the cipher text and attacking the next DES key DPA can use
known plaintext or known cipher text and can find encryption or decryption keys
26
CHAPTER 6 CONSTANT POWER CONSUMING
LOGIC STYLES
The power consumption of traditional standard cells and logic is
dependent on the signal activity When the output of the logic gate makes
a 0 to 1 transition a current comes from the power supply and charges the
output capacitance On the other hand when the output sees a 1 to 0 a 0
to 0 or a 1 to 1 transition no or only a limited amount of energy (due to
short circuit or leakage) is consumed from the power supply This is the
fundamental reason why information is leaked through the power supply
and why power attacks are possible The basis of a secure digital design
flow is a logic style with constant power consumption
61 Current Mode Logic
Current mode logic (CML) eg current steering logic seems the
ideal solution This type of logic continuously draws a current from the
supply and measures its state through the path that the current takes A
gate has constant power consumption if it draws a perfectly constant
current from the power supply independently of the input and output
signals To build a current source capable of generating a constant current
special circuit techniques that minimize channel length modulation have to
be used
The decisive drawback of CML however is its static power
consumption When the logic gate is not processing any data it burns the
27
current which makes this logic style unacceptable for embedded battery-
operated devices
62 Voltage Mode Logic (CMOS circuit styles)
Voltage mode logic (VML) eg static CMOS logic only draws a current from the
supply to change state and measures its state by the amount of charge it stores on a
capacitance A regular standard CMOS circuit will only consume power when a capacitance
gets charged and later discharged ie when a gate switches state It is the main reason that
CMOS is the style of choice for every battery operated or low power device This is illustrated
in the figure below for simple inverter Thus static CMOS is the preferred logic style because
of its low power consumption and high noise margins
Standard CMOS inverter
Yet two conditions must be satisfied for VML to have constant power consumption
namely
1) A logic gate must have exactly one switching event per signal transition
2) The logic gate must charge a constant capacitance in that switching event
28
Here above all the four transitions of CMOS inverter can be distinguished when
monitoring the power supply
63 Dynamic Differential Logic
Dynamic differential logic sometimes also referred to as dual rail with pre-charge
logic fulfills the first condition A differential logic family uses the true and the false
representation of the input and output signals and a dynamic logic family alternates pre-charge
and evaluation phases As a result since both outputs (true and false) are pre-charged to 1
exactly one of the two output nodes evaluates to 0 to have a differential output signal in the
evaluation phase The discharged output node is charged to 1 in the following pre-charge phase
to pre-charge both outputs to 1 In other words every signal transition including the events in
which the input signals remain constant is represented with an actual switching event in
which the logic gate charges a capacitance All the logic families that have been introduced to
thwart the differential power analysis (DPA) by using dynamic differential logic in the
following techniques
1 Sense Amplifier Based Logic (SABL) and
2 Wave Dynamic Differential Logic (WDDL) gates
631 Sense Amplifier Based logic (SABL)
SABL has its main advantage that it has balanced input and output nodes and that all
internal nodes connect to an output The output capacitances can be balanced Systematic
methods have been developed to make sure that both branches of the differential pull down
network are balanced and that no memory effects are present in the network Sense Amplifier
Based logic is illustrated as
29
Sense Amplifier Based Logic
ANDNAND gate
This circuit style does require however a full custom characterization and layout It also
suffers from a high clock load common to all dynamic logic gates
632 Wave Dynamic Differential Logic Gates (WDDL)
WDDL logic can be implemented with static CMOS logic Static CMOS
standard cells are combined to form secure compound standard cells
which have a reduced power signature WDDL has many advantages It can
be readily implemented from an existing standard cell library The design
flow is fully supported with accurate EDA library files that come directly
from the vendor WDDL also results in a dynamic differential logic with only
a small load capacitance on the pre-charge control signal and with the low
power consumption and the high noise margins of static CMOS
Advantages of WDDL logic style are as follows
30
A major advantage of the proposed logic style is that it can be incorporated by the common
Electronic Design Automation (EDA) tool flow
No special design rules are involved in the interconnection of WDDL gates
The switching factor of WDDL is 100 A WDDL gate consists of a parallel
combination of two positive complementary gates one calculating the
true output using the true inputs the other the false output using the
false inputs A positive gate produces a zero output for an all zero input
The AND gate and the OR gate are examples of positive gates A
complementary gate sometimes also referred to as a dual gate
expresses the false output of the original logic gate using the false
inputs of the original gate The AND gate fed with true input signals and
the OR gate fed with false input signals are two dual gates Fig shows
the WDDL AND gate and the WDDL OR gate In the evaluation phase
each input signal is differential and the WDDL gate calculates its
differential output In the pre-charge phase the inputs to the WDDL gate
are set at 0 This puts the output of the gate at 0 A module in WDDL
pre-charges without distributing the pre-charge signal to each individual
gate During the pre-charge phase the input vector of the combinatorial
logic is set at all 0s Each individual gate will eventually have all its
inputs at 0 evaluate its output to 0 and pass this 0 value to the next
gate One could say that the pre-charge signal travels over the
combinatorial logic as a 0-wave hence WDDL There are several ways
to launch to pre-charge wave In Fig a pre-charge operator is inserted
at the start of every combinatorial logic tree ie the inputs of the
encryption module and the outputs of the registers They produce an all-
zero output in the pre-charge phase (clk-signal high) but let the
31
differential signal through during the evaluation phase (clk-signal low)
Fig
ure WDDL Pre-charge wave generationCHAPTER 7
WDDL GATESThe methodology used in the project is bottom-up approach Lower
modules are designed and later integrated to form larger modules whose further integration
leads to the final top module As it is a fact that logic gates form lower level modules
initially logic gates required for the design are implemented in WDDL style WDDL
demands a parallel combination of two positive complementary gates one calculating the
true value and the other negative value The logic gates like OR AND XOR have been
implemented Besides there is even implementation of Full Adder 32-bit XOR
etc71WDDL OR gateA WDDL OR gate is constructed by considering conventional
OR gate in parallel to its complementary gate ie AND gate as shown in the following
32
figure Figure
41 WDDL OR GateThe NAND and NOT gates used act as Precharge operator injecting
signal lsquo0rsquo into the gates when lsquoclkrsquo signal is high ie during the Precharge phase72
WDDL AND gateA WDDL AND gate is constructed by considering conventional
AND gate in parallel to its complementary gate ie OR gate as shown in the following
33
figure Figure
42 WDDL AND Gate73 WDDL NAND gateA WDDL NAND gate is constructed by
considering conventional AND gate in parallel to its complementary gate ie OR gate as
shown in the following figure
34
Figure
43 WDDL NAND Gate74 WDDL NOR gateA WDDL NOR gate is constructed by
considering conventional OR gate in parallel to its complementary gate ie AND gate as
shown in the following figure
35
Figure 44 WDDL
NOR Gate 75 WDDL XOR gate XOR function can be implemented by the
Boolean Equation A B = ABrsquo + ArsquoB Therefore XOR Gate is implemented
in terms of AND gate and OR gate as shown in the figure 44 It can also be implemented
by instantiating a WDDL AND gate and WDDL OR gate But the number of gates
involved in the latter one is greater than the former one Therefore the first method of
implementation is followed rather than the second one
36
Figure 45
WDDL XOR gateWith the help of the above basic gates Full adder circuit has been
designed by instantiating the above designed WDDL gates During the implementation of
the Blowfish algorithm a 32-bit XOR gate and 32-bit Adder circuit are required They can
be easily implemented by instantiating the corresponding lower module 32 number of
timesCHAPTER 8 FRONT END
RESULTSWDDL OR GATESynthesis
Report==========================================================
= Final Report
===========================================================Final
ResultsRTL Top Level Output File Name wddlorngrTop Level Output File Name
wddlorOutput Format NGCOptimization Goal SpeedKeep
Hierarchy NODesign Statistics IOs 5Cell Usage
BELS 2 LUT3 2 IO Buffers 5
37
IBUF 3 OBUF
2============================================================Devi
ce utilization summary---------------------------Selected Device 3s250eft256-4 Number
of Slices 1 out of 2448 0 Number of 4 input LUTs 2
out of 4896 0 Number of IOs 5 Number of bonded IOBs
5 out of 172 2 Timing Summary---------------Speed Grade -4Maximum
combinational path delay 6236nsSimulation Result
S
ynthesis Result
38
WDD
L AND GATESynthesis
Report==========================================================
== Final Report
============================================================Fina
l ResultsRTL Top Level Output File Name wddlgatesngrTop Level Output File
Name wddlgatesOutput Format NGCOptimization Goal
SpeedKeep Hierarchy NODesign Statistics IOs
5Cell Usage BELS 2 LUT3 2 IO Buffers
5 IBUF 3 OBUF
2===========================================================Devic
e utilization summary---------------------------Selected Device 3s250etq144-4 Number
of Slices 1 out of 2448 0 Number of 4 input LUTs 2
out of 4896 0 Number of IOs 5 Number of bonded IOBs
5 out of 108 4 Timing Summary---------------Speed Grade -4Maximum
combinational path delay 6236nsSimulation Result
39
Sy
nthesis Result
WDDL NAND GATESynthesis
Report==========================================================
== Final Report
============================================================Fina
l ResultsRTL Top Level Output File Name wddlnand1ngrTop Level Output File
Name wddlnand1Output Format NGCOptimization Goal
SpeedKeep Hierarchy NODesign Statistics IOs
5Cell Usage BELS 2 LUT3 2 IO Buffers
40
5 IBUF 3 OBUF
2============================================================Devi
ce utilization summarySelected Device 3s500efg320-4 Number of Slices
1 out of 4656 0 Number of 4 input LUTs 2 out of 9312 0
Number of IOs 5 Number of bonded IOBs 5 out of 232
2 Timing SummarySpeed Grade -4Maximum combinational path delay
6236nsSimulation Result
Synthesis Result
WD
41
DL XOR GATESimulation Result
Synthesis Result
WDDL XOR GATESynthesis
Report==========================================================
== Final Report
===========================================================Final
42
ResultsRTL Top Level Output File Name wddlxorgatengrTop Level Output File
Name wddlxorgateOutput Format NGCOptimization Goal
SpeedKeep Hierarchy NODesign Statistics IOs
5Cell Usage BELS 2 LUT3 2 IO Buffers
5 IBUF 3 OBUF
2============================================================Devi
ce utilization summary---------------------------Selected Device 3s250eft256-4 Number
of Slices 1 out of 2448 0 Number of 4 input LUTs 2
out of 4896 0 Number of IOs 5 Number of bonded IOBs
5 out of 172 2 Timing Summary---------------Speed Grade -4Maximum
combinational path delay 6236nsSimulation Result
Synthesis Result
43
CHAPTER 9 SUMMARY AND CONCLUSION 91
SummaryIn order to provide security to ICs against side-channel attacks especially
Differential Power Analysis (DPA) it is necessary to implement the design in a logic that
can render constant power dissipation irrespective of the input combination WDDL is
proved to be advantageous to others and therefore is of great significance In this
dissertation work architecture for Blowfish Algorithm is designed and implemented in
WDDL style In this implementation bottom-up approach is used The low level entities
are designed and later they are all combined to form the entire module The key
scheduling is online The sub-keys generated for a particular key can be used for the
encryption of the entire data to be encrypted with that key The sub keys are given in
reverse direction for the decryption data path Initially logic gates are implemented in
WDDL and then higher modules have been designed by instantiating the WDDL gates to
form the entire module thus resulting in constant power dissipation irrespective of any
input data combination The entire design works in two phases namely Precharge phase and
Evaluation phase In the Precharge phase all the signals of the design are zeroed and
during the Evaluation phase the functionality of the design is achieved This sort of design
has been found simple and very effective in thwarting the side-channel attack namely
Differential Power analysis (DPA)92 ConclusionThe crypto processor has been
44
designed for the key size of 448 bits and plain text of 64 bits The code for the
implementation has been written in VHDL The functional verification has been done using
the Cadence (NC Launch) Simulation package and Synthesis using RTL Compiler The
Backend of the design is done using the SOC EncounterAccording to the specifications
desired functionality has been achieved In the output during the Evaluation phase there
has been same number of transitions thus resulting in constant power dissipation During
Synthesis it has been observed that a simple WDDL gate comprised many conventional
gates Therefore the area of the design has grown nearly three-fold when compared to the
design implemented in conventional CMOS logic at the cost of security incorporated into
the IC against Differential Power Analysis (DPA) Due to the constant power dissipation at
the output hacker cannot apply Differential Power Analysis (DPA) scheme to find the
secret key that is being used in the crypto-processor Thus security against DPA is
incorporated into the IC at hardware level by implementing the design in WDDL style
which is quite simple and effectiveCHAPTER 10
REFERENCES 101 Referred Technical papers[1] Kris Tiri Member
IEEE and Ingrid Verbauwhede Senior Member IEEE ldquoA Digital Design Flow for
Secure Integrated Circuitsrdquo IEEE Transaction on Computer-Aided Design of Integrated
Circuits and Systems Vol 25 No 7 July 2006[2] Prof Jean-Jacques Quisquater Math
RiZK laquo Side Channel Attacks raquo October 2002[3] Ross Anderson Mike Bond Jolyon
Clulow and Sergei Skorobogatov ldquoCrypto processors ndash A Surveyrdquo IEEE Proceedings[4]
Noohul Basheer Zain Ali James M Noras ldquoOptimal Data path Design for a Cryptographic
Processor The Blowfish Algorithmrdquo Malaysian Journal of Computer Science Vol 14 No
1 June 2001 pp 16-27[5] Dan Rinehimer Derek Wilson ldquoSummary of B Schneierrsquos
Description of New Variable Length Key 64-Bit Block Cipher (Blowfish)rdquo[6] Kris Tiri and
Ingrid Verbauwhede ldquoA Dynamic and Differential CMOS Logic Style to Resist Power and
Timing Attacks on Security ICrsquosrdquo[7] Discretix Technologies Limited ldquoIntroduction to Side
45
Channel Attacksrdquo[8] Kris Tiri Moonmoon Akmal and Ingrid Verbauwhede ldquoA Dynamic
and Differential Logic with Signal Independent Power Consumption to withstand
Differential Power Analysis on Smart CardsrdquoReference books[1] William Stallings
ldquoCryptography and Network Security Principles and Practicesrdquo Pearson Education
2003[2] Samir Palnitkar ldquoVerilog HDL A Guide to Digital Design and Synthesisrdquo
Prentice Hall Referred Web Pages[1] httpwwwschneiercomblowfishhtml[2]
httpwwwnistgov[3] httpwwwdiscretixcomPDFIntroduction20to20Side20Channel
20Attackspdf[4] httpwwwwipointpctdbenwojsp
IA=WO2005081085ampDISPLAY=CLAIMS
46
ACKNOWLEDGMENT
We express our sincere thanks to the management of VEDIC
SCHOOL OF VLSI DESIGN for giving us this opportunity to work in their
organization
We express our immense gratitude to MrMRKArjun FPGA Design
Engineer(Simpli5ng Semiconductor PvtLtd) his inspiring remarks and
simulating guidance valuable suggestion and encouragement helped us greatly in
completion of our project ldquoIMPLEMENTATION OF WDDL GATES FOR
SECURE IC APPLICATIONSrdquo
We wish to thank internal guide of our project Mr S KHAJA
MOHIDDIN Department of Electronics for his constant inspiration and advice
throughout our project work
We express our sincere gratitude to respected Mr JANGA REDDY Pricipal of
CMRIT and Mr K RAMANAIAH HOD of ECE department for their
valuable guidance encouragement and suggestions
3
INDEX
ABSTRACT
CHAPTER 1 INTRODUCTION AND OBJECTIVE
11 INTRODUCTION 12 OBJECTIVE CHAPTER 2 REVIEW OF LITERATURE 21 INTRODUCTION TO DIGITAL DESIGN FLOW 22 SECURE DIGITAL DESIGN FLOW
CHAPTER 3 HARDWARE DESCRIPTION LANGUAGE (VHDL) CHAPTER 4 SMART CARD OVERVIEW
CHAPTER 5 SIDE CHANNEL ATTACKS
51 CLASSIFICATION FO SIDE CHANNEL ATTACKS 52 POWER ANALYSIS ATTACKS 521 SIMPLE POWER ANALYSIS (SPA) 522 DIFFERENTIAL POWER ANALYSIS (DPA) CHAPTER 6 CONSTANTndashPOWER CONSUMING LOGIC STYLES 61 CURRENT MODE LOGIC 62 VOLTAGE MODE LOGIC (CMOS CIRCUIT STYLES) 63 DYNAMIC DIFFERENTIAL LOGIC 631 SENSE AMPLIFIER BASED LOGIC (SABL) 632 WAVE DYNAMIC DIFFERENTIAL LOGIC GATES
(WDDL) CHAPTER 7 DESIGN OF WDDL GATES 71 WDDL GATES 711 WDDL OR GATE 712 WDDL AND GATE 713 WDDL NAND GATE 714 WDDL NOR GATE 715 WDDL XOR GATE CHAPTER 8 FRONT END RESULTS
CHAPTER 9 SUMMARY AND CONCLUSION 91 SUMMARY 92 CONCLUSION
CHAPTER 10 REFERENCES
4
ABSTRACT
Every electronic device needs security from the smallest RFID tags to the larger
hand held devices Security is needed for financial medical consumer automotive
applications and other applications Small-embedded integrated circuits (ICs) such as smart
cards are vulnerable to the so-called side-channel attacks (SCAs) Side channel attacks are a
class of attacks that derive information from the integrated circuits while it is in operation The
attacker can gain information by monitoring the power consumption execution time
electromagnetic radiation and other information leaked by the switching behavior of digital
complementary metalndashoxidendashsemiconductor (CMOS) gates Eg execution times that depend
on values of data andor key show what they are doing Simple timing or power attacks give
visual information on the circuit This project presents a digital very large scale integrated
(VLSI) design flow to create secure power-analysis-attack-resistant ICs The route cause for
this problem is that standard CMOS is power efficient and it will only consume dynamic
power when nodes are switching
The idea is to create digital circuit styles that have a switching behavior independent of
the data or sequence of the data that they are processing A logic style called ldquoWave Dynamic
Differential Logic (WDDL)rdquo is used for the implementation of the basic logic gates which are
used in the cryptographic processors The design flow starts from a normal design in a
hardware description language such as VHDL to the Side Channel Attack (SCA) resistant
layout
5
Figure WDDL Pre-charge wave generation
6
CHAPTER 1 INTRODUCTION
AND OBJECTIVE
11 Introduction
Small-embedded integrated circuits (ICs) such as smart cards are vulnerable to the so-
called side-channel attacks (SCAs) The attacker can gain information by monitoring the power
consumption execution time electromagnetic radiation and other information leaked by the
switching behavior of digital complementary metalndashoxidendashsemiconductor (CMOS) gates This
project presents a digital very large scale integrated (VLSI) design flow to create secure power-
analysis-attack-resistant ICs
The idea is to create digital circuit styles that have a switching behavior independent of
the data or sequence of the data that they are processing A logic style called Wave Dynamic
Differential Logic (WDDL) is used for the implementation of the basic logic gates which are
used in the cryptographic processors The design flow starts from a normal design in a
hardware description language such as VHDL to the Side Channel Attack (SCA) resistant
layout
Depending on the parameter considered the side-channel attacks are classified as
probing attacks fault induction attack timing attack power analysis attack electromagnetic
analysis attack etc One Side Channel Attack in particular namely the Differential Power
Analysis (DPA) is of great concern It is very effective in finding the secret key and can be
mounted quickly with off-the-shelf devices The attack is based on the fact that logic
operations have power characteristics that depend on the input data It relies on statistical
analysis to extract the information from the power consumption that is correlated to the secret
key As the variations actually originate at the logic level implementing the encryption and
decryption modules in a logic style for which a logic gate has at all times constant power
7
consumption independently of signal transitions removes the foundation of DPA and is an
effective means to halt DPA
12 Objective of the Project
The main objectives of this dissertation are
Study of constant-power logic styles
Description of WDDL Gates
Implementation of WDDL Logic Gates
Verification of the functionality of WDDL Logic Gates
Synthesis of the design
Analysis of the reports obtained during simulation and synthesis
8
CHAPTER 2 REVIEW
OF LITERATURE
21 Introduction to Digital Design Flow
A typical digital design flow for any IC is as follows Design Entry (Specification
Architecture RTL Coding and RTL Verification) Synthesis and post-synthesis
verification Backend (Floor Planning Place and route Layout) Tape Out to Foundry to get
the end product All modern digital designs start with a designer writing a hardware description
of the IC (using HDL or Hardware Description Language) in VerilogVHDL A Verilog or
VHDL program essentially describes the hardware (logic gates Flip-Flops counters etc) the
inter connect of the circuit blocks and the functionality Various CAD tools are available to
synthesize a circuit based on the HDL
22 Secure Digital Design Flow
The secure digital design flow is depicted in Fig In addition to the
regular steps in an IC design (logic design logic synthesis place amp route
stream out and verifications) one can recognize two additional steps
namely 1) ldquocell substitutionrdquo and 2) ldquointerconnect decompositionrdquo These
operations have been inserted in the back end of the flow and do not
interfere with the creative part of a design indicated by the ldquologic designrdquo
task
9
Figure 21 Secure Digital Design Flow
During the cell substitution step cells that are designed by any constant power logic style
replace the conventional CMOS gates This ensures the security of the ICs against power
analysis attacks
10
CHAPTER 3 HARDWARE DESCRIPTIVE
LANGUAGE (VHDL)
Why (V) HDL
Interoperability
Technology independence
Design reuse
Several levels of abstraction
Readability
Standard language
Widely supported
What is VHDL
VHDL = VHSIC Hardware Description Language(VHSIC = Very High-Speed IC)
Design specification language
Design entry language
Design simulation language
Design documentation language
An alternative to schematics
Brief History
VHDL Was developed in the early 1980s for managing design problems that involved
large circuits and multiple teams of engineers
Funded by US Department of Defence
11
The first publicly available version was released in 1985
In 1986 IEEE (Institute of Electrical and Electronics Engineers Inc) was presented
with a proposal to standardize the VHDL
In 1987 standardization =gt IEEE 1076-1987
An improved version of the language was released in 1994 =gt IEEE standard1076-
1993
Related Standards
IEEE 1076 doesnrsquot support simulation conditions such as unknown and high-
impedance
Soon after IEEE 1076-1987 was released simulator companies began using their own
non-standard types =gt VHDL was becoming a nonstandard
IEEE 1164 standard was developed by an IEEE1048715IEEE 1164 contains definitions for a
nine-valued data type std_logic
IEEE 10763 (Numeric or Synthesis Standard) defines data types as they relate to actual
hardware
Defines eg two numeric types signed and unsigned
VHDL Environment
12
Design Units
Segments of VHDL code can be compiled separately and stored in a library
Entities
A black box with interface definition
Defines the inputsoutputs of a component (define pins)
A way to represent modularity in VHDL
Similar to symbol in schematic
Entity declaration describes entity
Eg
Entity Comparator is
Port (A B in std_logic_vector (7 downto0)
EQ out std_logic)
end Comparator
13
Ports
Provide channels of communication between the component and its environment
Each port must have a name direction and a type
An entity may have NO port declaration
Port directions
In A value of a port can be read inside the component but cannot be assigned
Multiple reads of port are allowed
Out Assignments can be made to a port but data from a port cannot be read Multiple
assignments are allowed
In out Bi-directional assignments can be made and data can be read Multiple
assignments are allowed
Buffer An out port with read capability May have at most one assignment (are not
recommended)
Architectures
Every entity has at least one architecture
One entity can have several architectures
Architectures can describe design using
BehaviorndashStructurendashDataflow
Architectures can describe design on many levelsndashGate levelndashRTL (Register Transfer
Level)ndashBehavioral level
Configuration declaration links architecture to entity
Eg
Architecture Comparator1 of Comparator is
Begin
EQ lt= rsquo1rsquowhen (A=B) else rsquo0rsquo
End Comparator1
Configurations
Links entity declaration and architecture body together
14
Concept of default configuration is a bit messy in VHDL lsquo87
ndashLast architecture analyzed links to entity
Can be used to change simulation behavior without re-analyzing the VHDL source
Complex configuration declarations are ignored in synthesis
Some entities can have eggate level architecture and behavioral architecture
Are always optional
Packages
Packages contain information common to many design units
1 Package declaration
Constant declarations
ndash Type and subtype declarations
ndash Function and procedure declarations
ndash Global signal declarations
ndash File declarations
ndash Component declarations
2 Package body
ndash Is not necessary needed
ndash Function bodies
ndash Procedure bodies
Packages are meant for encapsuling data which can be shared globally among several design
units These consist of declaration part and optional body part
Package declaration can contain
ndash Type and subtype declarations
ndash Subprograms
ndash Constants
ndash Alias declarations
ndash Global signal declarations
ndash file declarations
ndash Component declarations
Package body consists of
15
ndash Subprogram declarations and bodies
ndash Type and subtype declarations
ndash Deferred constants
ndash File declarations
Libraries
Collection of VHDL design units (database)
1 Packages
package declaration
package body
2 Entities (entity declaration)
3 Architectures (architecture body)
4 Configurations (configuration declarations)
Usually directory in UNIX file system
Can be also any other kind of database
Levels of Abstraction
VHDL supports many possible styles of design description which differ primarily in how
closely they relate to the HW
It is possible to describe a circuit in a number of ways
Structural-------
Dataflow ------- Higher level of abstraction
Behavioral -------
Structural VHDL description
Circuit is described in terms of its components
From a low-level description (eg transistor-level description) to a high level
description (eg block diagram)
For large circuits low-level descriptions quickly become impractical
Dataflow VHDL Description
Circuit is described in terms of how data moves through the system
16
In the dataflow style you describe how information flows between registers in the
system
The combinational logic is described at a relatively high level the placement and
operation of registers is specified quite precisely
The behavior of the system over the time is defined by registers
There are no build-in registers in VHDL-language
ndashEither lower level description
ndashor behavioral description of sequential elements is needed
The lower level register descriptions must be created or obtained
If there is no 3rd party models for registers =gt you must write the behavioral
description of registers
The behavioral description can be provided in the form of subprograms(functions or
procedures)
Behavioral VHDL Description
Circuit is described in terms of its operation over time
Representation might include eg state diagrams timing diagrams and algorithmic
descriptions
The concept of time may be expressed precisely using delays (eg A lt= B after 10 ns)
If no actual delays are used order of sequential operations is defined
17
In the lower levels of abstraction (eg RTL) synthesis tools ignore detailed timing
specifications
The actual timing results depend on implementation technology and efficiency of
synthesis tool
There are a few tools for behavioral synthesis
Concurrent Vs Sequential
Processes
Basic simulation concept in VHDL
VHDL description can always be broken up to interconnected processes
Quite similar to UNIX process
18
Process keyword in VHDL
Process statement is concurrent statement
Statements inside process statements are sequential statements
Process must contain either sensitivity list or wait statement(s) but NOT both
Sensitivity list or wait statement(s) contains signals which wakes process up
General Format
Process [(sensitivity list)]
process_declarative_part
begin
process_statements
[wait_statement]
End process
19
CHAPTER 4 SMART
CARD OVERVIEW
This section will very briefly introduce the concept of a smart card Basically a smart
card is a computer embedded in a safe It consists of a (typically 8-bit or 32-bit) processor
together with ROM EEPROM and a small amount of RAM which is therefore capable of
performing computations The main goal of a smart card is to allow the execution of
cryptographic operations involving some secret parameter (the key) while not revealing this
parameter to the outside world As opposed the goal of the attacker is to recover this secret
parameter This processor is embedded in a chip and connected to the outside world through
eight wires the role use position of which is normalized In addition to the inputoutput wires
the parts we will be the most interested in are the following
1 Power supply Smart cards do not have an internal battery
2 The current they need is provided by the smart card reader This will make the smart
cards power consumption pretty easy to measure for the attacker
3 Clock Similarly smart cards do not dispose of an internal clock either The clock ticks
must also be provided from the outside world As a consequence this will allow the
attacker to measure the cards running time with very good precision
Smart cards are usually equipped with protection mechanisms composed of a shield (the
passivation layer) whose goal is to hide the internal behavior of the chip and possibly sensors
that react when the shield is removed by destroying all sensitive data and preventing the card
to function properly
20
CHAPTER 5 SIDE
CHANNEL ATTACKS
ldquoSide channel attacksrdquo are attacks that are based on ldquoSide Channel Informationrdquo Side
channel information is information that can be retrieved from the encryption device that is
neither the plaintext to be encrypted nor the cipher text resulting from the encryption process
In the past an encryption device was perceived as a unit that receives plaintext input
and produces cipher text output and vice-versa Attacks were therefore based on either
knowing the cipher text (such as cipher text-only attacks) or knowing both (such as known
plaintext attacks) or on the ability to define what plaintext is to be encrypted and then seeing
the results of the encryption (known as chosen plaintext attacks) Today it is known that
encryption devices have additional output and often additional inputs which are not the
plaintext or the cipher text
Encryption devices produce timing information (information about the time that
operations take) that is easily measurable radiation of various sorts power consumption
statistics (that can be easily measured as well) and more Often the encryption device also has
additional ldquounintentionalrdquo inputs such as voltage that can be modified to cause predictable
outcomes Side channel attacks make use of some or all of this information along with other
(known) cryptanalytic techniques to recover the key the device is using
Side channel analysis techniques are of concern because the attacks can be mounted
quickly and can sometimes be implemented using readily available hardware costing from only
a few hundred dollars to thousands of dollars
51 Classification of side channel attacks
The literature usually classifies side channel attacks along two orthogonal axes
1 Invasive vs Non-invasive
21
Invasive attacks require de-packaging the chip to get direct access to its components
A typical example of this is the connection of a wire on a data bus to see the data transfers
A non-invasive attack only exploits externally available information (the emission of
which is however often unintentional) such as running time power consumption
A new distinction called semi-invasive attacks These attacks have the specificity that
they require de-packaging of the chip to get access to the chip surface but do not tamper with
the passivation layer ( they do not require electrical contact to the metal surface)
2 Active vs passive
Active attacks try to tamper with the cards proper functioning For example fault
induction attacks will try to induce errors in the computation
As opposed passive attacks will simply observe the cards behavior during its
processing without disturbing it
Note that these two axes are well orthogonal
An Invasive attack may completely avoid disturbing the cards behavior and a passive
attack may require a preliminary de-packaging for the required information to be observable
These attacks are of course not mutually exclusive an invasive attack may for example serve
as a preliminary step for a non-invasive one by giving a detailed description of the chips
architecture that helps to find out where to put external probes
As smart cards are usually equipped with protection mechanisms that are supposed to
react to invasive attacks (although several invasive attacks are nonetheless capable to defeat
these mechanisms as will be illustrated below) On the other hand it is worth pointing out that
a non-invasive attack is completely undetectable there is for example no way for a smart card
to figure out that its running time is currently being measured Other countermeasures will
therefore be necessary From an economical point of view invasive attacks are usually more
expensive to deploy on a large scale since they require individual processing of each attacked
device In this sense non-invasive attacks constitute therefore a bigger menace for the smart
card industry
Invasive attacks involved a relatively high capital investment for lab equipment plus a
moderate investment of effort for each individual chip attacked Non-invasive attacks require
only a moderate capital investment plus a moderate investment of effort in designing an attack
on a particular type of device Thereafter the cost per device attacked is low Semi-invasive
attacks can be carried out using very cheap and simple equipment
The attacker can gain information by
22
1 Probing attacks
2 Fault induction attacks
3 Timing attacks
4 Power analysis attacks and
5 Electromagnetic timing attacks
These attacks are performed during the switching behavior of digital
complementary metalndashoxidendashsemiconductor (CMOS) gates Of all these Power analysis attack
is of major concern
52 Power analysis attacks
The power consumption of a cryptographic device may provide much information
about the operations that take place and the involved parameters This is the idea of simple and
differential power analysis first introduced by Kocher et al As the clock ticks the cards
energy is also provided by the terminal and can therefore easily be measured Basically to
measure a circuits power consumption a small (eg 50 ohm) resistor is inserted in series with
the power or ground input The voltage difference across the resistor divided by the resistance
yields the current Well-equipped electronics labs have equipment that can digitally sample
voltage differences at extraordinarily high rates (over 1GHz) with excellent accuracy (less than
1 error) Devices capable of sampling at 20MHz or faster and transferring the data to a PC
can be bought for less than US$ 400
Power analysis attacks are of two types
1 Simple power analysis attack and
2 Differential Power Analysis attack
SPA attacks on smartcards typically take a few seconds per card while DPA attacks
can take several hours In a general with a somewhat academic perspective we may consider
the entire internal state of the block cipher to be all the intermediate results and values that are
never included in the output in normal operations For example DES has 16 rounds we can
consider the intermediate states state [115] after each round except the last as a secret internal
state Side channels typically give information about these internal states or about the
operations used in the transition of this internal state from one round to another The type of
side-channel will of course determine what information is available to the attacker about these
states The attacks typically work by finding some information about the internal state of the
cipher which can be learned both by guessing part of the key and checking the value directly
23
and additionally by some statistical property of the cipher that makes that checkable value
slightly nonrandom
521 Simple Power Analysis attack (SPA)
Simple Power Analysis is generally based on looking at the visual representation of the
power consumption of a unit while an encryption operation is being performed Simple Power
Analysis is a technique that involves direct interpretation of power consumption measurements
collected during cryptographic operations SPA can yield information about a devices
operation as well as key material
A trace refers to a set of power consumption measurements taken across a
cryptographic operation For example a 1 millisecond operation sampled at 5 MHz yields a
trace containing 5000 points Figure for example shows an SPA trace from a smart card
performing a DES operation
Figure SPA monitoring from a single DES operation performed by a typical smart card The
upper trace shows the entire encryption operation including the initial permutation the 16
DES rounds and the final permutation The lower trace is a detailed view of the second and
third rounds
Because SPA can reveal the sequence of instructions executed it can be used to break
cryptographic implementations in which the execution path depends on the data being
processed For example
DES key schedule the DES key schedule computation involves rotating 28-bit key registers
A conditional branch is commonly used to check the bit shifted off the end so that ldquo1 bits can
24
be wrapped around The resulting power consumption traces for a ldquo1 bit and a ldquo0 bit will
contain different SPA features if the execution paths take different branches for each
DES permutations DES implementations perform a variety of bit permutations Conditional
branching in software or microcode can cause significant power consumption differences for
ldquo0 and ldquo1 bits
Comparisons String or memory comparison operations typically perform a conditional
branch when a mismatch is found This conditional branching causes large SPA (and
sometimes timing) characteristics
Multipliers Modular multiplication circuits tend to leak a great deal of information about the
data they process The leakage functions depend on the multiplier design but are often strongly
correlated to operand values and Hamming weights
Exponentiators A simple modular exponentiation function scans across the exponent
performing a squaring operation in every iteration with an additional multiplication operation
for each exponent bit that is equal to ldquo1 The exponent can be compromised if squaring and
multiplication operations have different power consumption characteristics take different
amounts of time or are separated by different code Modular exponentiation functions that
operate on two or more exponent bits at a time may have more complex leakage functions
522Differential Power Analysis attack (DPA)
In addition to large-scale power variations due to the instruction sequence there are
effects correlated to data values being manipulated These variations tend to be smaller and are
sometimes overshadowed by measurement errors and other noise In such cases it is still often
possible to break the system using statistical functions tailored to the target algorithm
To implement the DPA attack an attacker first observes m encryption operations and captures
power traces T1 m [1 k] containing k samples each In addition the attacker records the
cipher text C1 m No knowledge of the plain text is required DPA analysis uses power
consumption measurements to determine whether a key block guess Ks is correct The attacker
computes a k-sample differential trace centD [1 k] by finding the difference between the
average of the traces for which a certain intermediate value V is one and the average of the
traces for which V is zero Thus cent D[j) is the average over C1m of the effect due to the value
represented by the selection function D on the power consumption at point j In particular25
If Ks is incorrect the bit computed using D will differ from the actual target bit for about half
of the ciphertext Ci The selection function is thus effectively uncorrelated to what was actually
computed by the target device If a random function is used to divide a set into two subsets the
difference in the averages of the subsets should approach zero as the subset sizes approach
infinity
Thus because trace components uncorrelated to D will diminish with 1 pm causing the
differential trace to become at (the actual trace may not be completely at as D with Ks
incorrect may have a weak correlation to D with the correct Ks) If Ks is correct however the
computed value for D (Ci bKs) will equal the actual value of target bit b with probability 1
The selection function is thus correlated to the value of the bit considered Other data values
measurement errors etc that are not correlated to D approach zero Because power
consumption is correlated to data bit values the plot of centD will be degat with spikes in regions
where D is correlated to the values being processed The correct value of Ks can thus be
identified from the spikes in its differential trace Four values of b correspond to each S box
providing confirmation of key block guesses Finding all eight Ks yields the entire 48-bit round
sub key The remaining 8 key bits can be found easily using exhaustive search or by analyzing
one additional round Triple DES keys can be found by analyzing an outer DES operation first
using the resulting key to decrypt the cipher text and attacking the next DES key DPA can use
known plaintext or known cipher text and can find encryption or decryption keys
26
CHAPTER 6 CONSTANT POWER CONSUMING
LOGIC STYLES
The power consumption of traditional standard cells and logic is
dependent on the signal activity When the output of the logic gate makes
a 0 to 1 transition a current comes from the power supply and charges the
output capacitance On the other hand when the output sees a 1 to 0 a 0
to 0 or a 1 to 1 transition no or only a limited amount of energy (due to
short circuit or leakage) is consumed from the power supply This is the
fundamental reason why information is leaked through the power supply
and why power attacks are possible The basis of a secure digital design
flow is a logic style with constant power consumption
61 Current Mode Logic
Current mode logic (CML) eg current steering logic seems the
ideal solution This type of logic continuously draws a current from the
supply and measures its state through the path that the current takes A
gate has constant power consumption if it draws a perfectly constant
current from the power supply independently of the input and output
signals To build a current source capable of generating a constant current
special circuit techniques that minimize channel length modulation have to
be used
The decisive drawback of CML however is its static power
consumption When the logic gate is not processing any data it burns the
27
current which makes this logic style unacceptable for embedded battery-
operated devices
62 Voltage Mode Logic (CMOS circuit styles)
Voltage mode logic (VML) eg static CMOS logic only draws a current from the
supply to change state and measures its state by the amount of charge it stores on a
capacitance A regular standard CMOS circuit will only consume power when a capacitance
gets charged and later discharged ie when a gate switches state It is the main reason that
CMOS is the style of choice for every battery operated or low power device This is illustrated
in the figure below for simple inverter Thus static CMOS is the preferred logic style because
of its low power consumption and high noise margins
Standard CMOS inverter
Yet two conditions must be satisfied for VML to have constant power consumption
namely
1) A logic gate must have exactly one switching event per signal transition
2) The logic gate must charge a constant capacitance in that switching event
28
Here above all the four transitions of CMOS inverter can be distinguished when
monitoring the power supply
63 Dynamic Differential Logic
Dynamic differential logic sometimes also referred to as dual rail with pre-charge
logic fulfills the first condition A differential logic family uses the true and the false
representation of the input and output signals and a dynamic logic family alternates pre-charge
and evaluation phases As a result since both outputs (true and false) are pre-charged to 1
exactly one of the two output nodes evaluates to 0 to have a differential output signal in the
evaluation phase The discharged output node is charged to 1 in the following pre-charge phase
to pre-charge both outputs to 1 In other words every signal transition including the events in
which the input signals remain constant is represented with an actual switching event in
which the logic gate charges a capacitance All the logic families that have been introduced to
thwart the differential power analysis (DPA) by using dynamic differential logic in the
following techniques
1 Sense Amplifier Based Logic (SABL) and
2 Wave Dynamic Differential Logic (WDDL) gates
631 Sense Amplifier Based logic (SABL)
SABL has its main advantage that it has balanced input and output nodes and that all
internal nodes connect to an output The output capacitances can be balanced Systematic
methods have been developed to make sure that both branches of the differential pull down
network are balanced and that no memory effects are present in the network Sense Amplifier
Based logic is illustrated as
29
Sense Amplifier Based Logic
ANDNAND gate
This circuit style does require however a full custom characterization and layout It also
suffers from a high clock load common to all dynamic logic gates
632 Wave Dynamic Differential Logic Gates (WDDL)
WDDL logic can be implemented with static CMOS logic Static CMOS
standard cells are combined to form secure compound standard cells
which have a reduced power signature WDDL has many advantages It can
be readily implemented from an existing standard cell library The design
flow is fully supported with accurate EDA library files that come directly
from the vendor WDDL also results in a dynamic differential logic with only
a small load capacitance on the pre-charge control signal and with the low
power consumption and the high noise margins of static CMOS
Advantages of WDDL logic style are as follows
30
A major advantage of the proposed logic style is that it can be incorporated by the common
Electronic Design Automation (EDA) tool flow
No special design rules are involved in the interconnection of WDDL gates
The switching factor of WDDL is 100 A WDDL gate consists of a parallel
combination of two positive complementary gates one calculating the
true output using the true inputs the other the false output using the
false inputs A positive gate produces a zero output for an all zero input
The AND gate and the OR gate are examples of positive gates A
complementary gate sometimes also referred to as a dual gate
expresses the false output of the original logic gate using the false
inputs of the original gate The AND gate fed with true input signals and
the OR gate fed with false input signals are two dual gates Fig shows
the WDDL AND gate and the WDDL OR gate In the evaluation phase
each input signal is differential and the WDDL gate calculates its
differential output In the pre-charge phase the inputs to the WDDL gate
are set at 0 This puts the output of the gate at 0 A module in WDDL
pre-charges without distributing the pre-charge signal to each individual
gate During the pre-charge phase the input vector of the combinatorial
logic is set at all 0s Each individual gate will eventually have all its
inputs at 0 evaluate its output to 0 and pass this 0 value to the next
gate One could say that the pre-charge signal travels over the
combinatorial logic as a 0-wave hence WDDL There are several ways
to launch to pre-charge wave In Fig a pre-charge operator is inserted
at the start of every combinatorial logic tree ie the inputs of the
encryption module and the outputs of the registers They produce an all-
zero output in the pre-charge phase (clk-signal high) but let the
31
differential signal through during the evaluation phase (clk-signal low)
Fig
ure WDDL Pre-charge wave generationCHAPTER 7
WDDL GATESThe methodology used in the project is bottom-up approach Lower
modules are designed and later integrated to form larger modules whose further integration
leads to the final top module As it is a fact that logic gates form lower level modules
initially logic gates required for the design are implemented in WDDL style WDDL
demands a parallel combination of two positive complementary gates one calculating the
true value and the other negative value The logic gates like OR AND XOR have been
implemented Besides there is even implementation of Full Adder 32-bit XOR
etc71WDDL OR gateA WDDL OR gate is constructed by considering conventional
OR gate in parallel to its complementary gate ie AND gate as shown in the following
32
figure Figure
41 WDDL OR GateThe NAND and NOT gates used act as Precharge operator injecting
signal lsquo0rsquo into the gates when lsquoclkrsquo signal is high ie during the Precharge phase72
WDDL AND gateA WDDL AND gate is constructed by considering conventional
AND gate in parallel to its complementary gate ie OR gate as shown in the following
33
figure Figure
42 WDDL AND Gate73 WDDL NAND gateA WDDL NAND gate is constructed by
considering conventional AND gate in parallel to its complementary gate ie OR gate as
shown in the following figure
34
Figure
43 WDDL NAND Gate74 WDDL NOR gateA WDDL NOR gate is constructed by
considering conventional OR gate in parallel to its complementary gate ie AND gate as
shown in the following figure
35
Figure 44 WDDL
NOR Gate 75 WDDL XOR gate XOR function can be implemented by the
Boolean Equation A B = ABrsquo + ArsquoB Therefore XOR Gate is implemented
in terms of AND gate and OR gate as shown in the figure 44 It can also be implemented
by instantiating a WDDL AND gate and WDDL OR gate But the number of gates
involved in the latter one is greater than the former one Therefore the first method of
implementation is followed rather than the second one
36
Figure 45
WDDL XOR gateWith the help of the above basic gates Full adder circuit has been
designed by instantiating the above designed WDDL gates During the implementation of
the Blowfish algorithm a 32-bit XOR gate and 32-bit Adder circuit are required They can
be easily implemented by instantiating the corresponding lower module 32 number of
timesCHAPTER 8 FRONT END
RESULTSWDDL OR GATESynthesis
Report==========================================================
= Final Report
===========================================================Final
ResultsRTL Top Level Output File Name wddlorngrTop Level Output File Name
wddlorOutput Format NGCOptimization Goal SpeedKeep
Hierarchy NODesign Statistics IOs 5Cell Usage
BELS 2 LUT3 2 IO Buffers 5
37
IBUF 3 OBUF
2============================================================Devi
ce utilization summary---------------------------Selected Device 3s250eft256-4 Number
of Slices 1 out of 2448 0 Number of 4 input LUTs 2
out of 4896 0 Number of IOs 5 Number of bonded IOBs
5 out of 172 2 Timing Summary---------------Speed Grade -4Maximum
combinational path delay 6236nsSimulation Result
S
ynthesis Result
38
WDD
L AND GATESynthesis
Report==========================================================
== Final Report
============================================================Fina
l ResultsRTL Top Level Output File Name wddlgatesngrTop Level Output File
Name wddlgatesOutput Format NGCOptimization Goal
SpeedKeep Hierarchy NODesign Statistics IOs
5Cell Usage BELS 2 LUT3 2 IO Buffers
5 IBUF 3 OBUF
2===========================================================Devic
e utilization summary---------------------------Selected Device 3s250etq144-4 Number
of Slices 1 out of 2448 0 Number of 4 input LUTs 2
out of 4896 0 Number of IOs 5 Number of bonded IOBs
5 out of 108 4 Timing Summary---------------Speed Grade -4Maximum
combinational path delay 6236nsSimulation Result
39
Sy
nthesis Result
WDDL NAND GATESynthesis
Report==========================================================
== Final Report
============================================================Fina
l ResultsRTL Top Level Output File Name wddlnand1ngrTop Level Output File
Name wddlnand1Output Format NGCOptimization Goal
SpeedKeep Hierarchy NODesign Statistics IOs
5Cell Usage BELS 2 LUT3 2 IO Buffers
40
5 IBUF 3 OBUF
2============================================================Devi
ce utilization summarySelected Device 3s500efg320-4 Number of Slices
1 out of 4656 0 Number of 4 input LUTs 2 out of 9312 0
Number of IOs 5 Number of bonded IOBs 5 out of 232
2 Timing SummarySpeed Grade -4Maximum combinational path delay
6236nsSimulation Result
Synthesis Result
WD
41
DL XOR GATESimulation Result
Synthesis Result
WDDL XOR GATESynthesis
Report==========================================================
== Final Report
===========================================================Final
42
ResultsRTL Top Level Output File Name wddlxorgatengrTop Level Output File
Name wddlxorgateOutput Format NGCOptimization Goal
SpeedKeep Hierarchy NODesign Statistics IOs
5Cell Usage BELS 2 LUT3 2 IO Buffers
5 IBUF 3 OBUF
2============================================================Devi
ce utilization summary---------------------------Selected Device 3s250eft256-4 Number
of Slices 1 out of 2448 0 Number of 4 input LUTs 2
out of 4896 0 Number of IOs 5 Number of bonded IOBs
5 out of 172 2 Timing Summary---------------Speed Grade -4Maximum
combinational path delay 6236nsSimulation Result
Synthesis Result
43
CHAPTER 9 SUMMARY AND CONCLUSION 91
SummaryIn order to provide security to ICs against side-channel attacks especially
Differential Power Analysis (DPA) it is necessary to implement the design in a logic that
can render constant power dissipation irrespective of the input combination WDDL is
proved to be advantageous to others and therefore is of great significance In this
dissertation work architecture for Blowfish Algorithm is designed and implemented in
WDDL style In this implementation bottom-up approach is used The low level entities
are designed and later they are all combined to form the entire module The key
scheduling is online The sub-keys generated for a particular key can be used for the
encryption of the entire data to be encrypted with that key The sub keys are given in
reverse direction for the decryption data path Initially logic gates are implemented in
WDDL and then higher modules have been designed by instantiating the WDDL gates to
form the entire module thus resulting in constant power dissipation irrespective of any
input data combination The entire design works in two phases namely Precharge phase and
Evaluation phase In the Precharge phase all the signals of the design are zeroed and
during the Evaluation phase the functionality of the design is achieved This sort of design
has been found simple and very effective in thwarting the side-channel attack namely
Differential Power analysis (DPA)92 ConclusionThe crypto processor has been
44
designed for the key size of 448 bits and plain text of 64 bits The code for the
implementation has been written in VHDL The functional verification has been done using
the Cadence (NC Launch) Simulation package and Synthesis using RTL Compiler The
Backend of the design is done using the SOC EncounterAccording to the specifications
desired functionality has been achieved In the output during the Evaluation phase there
has been same number of transitions thus resulting in constant power dissipation During
Synthesis it has been observed that a simple WDDL gate comprised many conventional
gates Therefore the area of the design has grown nearly three-fold when compared to the
design implemented in conventional CMOS logic at the cost of security incorporated into
the IC against Differential Power Analysis (DPA) Due to the constant power dissipation at
the output hacker cannot apply Differential Power Analysis (DPA) scheme to find the
secret key that is being used in the crypto-processor Thus security against DPA is
incorporated into the IC at hardware level by implementing the design in WDDL style
which is quite simple and effectiveCHAPTER 10
REFERENCES 101 Referred Technical papers[1] Kris Tiri Member
IEEE and Ingrid Verbauwhede Senior Member IEEE ldquoA Digital Design Flow for
Secure Integrated Circuitsrdquo IEEE Transaction on Computer-Aided Design of Integrated
Circuits and Systems Vol 25 No 7 July 2006[2] Prof Jean-Jacques Quisquater Math
RiZK laquo Side Channel Attacks raquo October 2002[3] Ross Anderson Mike Bond Jolyon
Clulow and Sergei Skorobogatov ldquoCrypto processors ndash A Surveyrdquo IEEE Proceedings[4]
Noohul Basheer Zain Ali James M Noras ldquoOptimal Data path Design for a Cryptographic
Processor The Blowfish Algorithmrdquo Malaysian Journal of Computer Science Vol 14 No
1 June 2001 pp 16-27[5] Dan Rinehimer Derek Wilson ldquoSummary of B Schneierrsquos
Description of New Variable Length Key 64-Bit Block Cipher (Blowfish)rdquo[6] Kris Tiri and
Ingrid Verbauwhede ldquoA Dynamic and Differential CMOS Logic Style to Resist Power and
Timing Attacks on Security ICrsquosrdquo[7] Discretix Technologies Limited ldquoIntroduction to Side
45
Channel Attacksrdquo[8] Kris Tiri Moonmoon Akmal and Ingrid Verbauwhede ldquoA Dynamic
and Differential Logic with Signal Independent Power Consumption to withstand
Differential Power Analysis on Smart CardsrdquoReference books[1] William Stallings
ldquoCryptography and Network Security Principles and Practicesrdquo Pearson Education
2003[2] Samir Palnitkar ldquoVerilog HDL A Guide to Digital Design and Synthesisrdquo
Prentice Hall Referred Web Pages[1] httpwwwschneiercomblowfishhtml[2]
httpwwwnistgov[3] httpwwwdiscretixcomPDFIntroduction20to20Side20Channel
20Attackspdf[4] httpwwwwipointpctdbenwojsp
IA=WO2005081085ampDISPLAY=CLAIMS
46
INDEX
ABSTRACT
CHAPTER 1 INTRODUCTION AND OBJECTIVE
11 INTRODUCTION 12 OBJECTIVE CHAPTER 2 REVIEW OF LITERATURE 21 INTRODUCTION TO DIGITAL DESIGN FLOW 22 SECURE DIGITAL DESIGN FLOW
CHAPTER 3 HARDWARE DESCRIPTION LANGUAGE (VHDL) CHAPTER 4 SMART CARD OVERVIEW
CHAPTER 5 SIDE CHANNEL ATTACKS
51 CLASSIFICATION FO SIDE CHANNEL ATTACKS 52 POWER ANALYSIS ATTACKS 521 SIMPLE POWER ANALYSIS (SPA) 522 DIFFERENTIAL POWER ANALYSIS (DPA) CHAPTER 6 CONSTANTndashPOWER CONSUMING LOGIC STYLES 61 CURRENT MODE LOGIC 62 VOLTAGE MODE LOGIC (CMOS CIRCUIT STYLES) 63 DYNAMIC DIFFERENTIAL LOGIC 631 SENSE AMPLIFIER BASED LOGIC (SABL) 632 WAVE DYNAMIC DIFFERENTIAL LOGIC GATES
(WDDL) CHAPTER 7 DESIGN OF WDDL GATES 71 WDDL GATES 711 WDDL OR GATE 712 WDDL AND GATE 713 WDDL NAND GATE 714 WDDL NOR GATE 715 WDDL XOR GATE CHAPTER 8 FRONT END RESULTS
CHAPTER 9 SUMMARY AND CONCLUSION 91 SUMMARY 92 CONCLUSION
CHAPTER 10 REFERENCES
4
ABSTRACT
Every electronic device needs security from the smallest RFID tags to the larger
hand held devices Security is needed for financial medical consumer automotive
applications and other applications Small-embedded integrated circuits (ICs) such as smart
cards are vulnerable to the so-called side-channel attacks (SCAs) Side channel attacks are a
class of attacks that derive information from the integrated circuits while it is in operation The
attacker can gain information by monitoring the power consumption execution time
electromagnetic radiation and other information leaked by the switching behavior of digital
complementary metalndashoxidendashsemiconductor (CMOS) gates Eg execution times that depend
on values of data andor key show what they are doing Simple timing or power attacks give
visual information on the circuit This project presents a digital very large scale integrated
(VLSI) design flow to create secure power-analysis-attack-resistant ICs The route cause for
this problem is that standard CMOS is power efficient and it will only consume dynamic
power when nodes are switching
The idea is to create digital circuit styles that have a switching behavior independent of
the data or sequence of the data that they are processing A logic style called ldquoWave Dynamic
Differential Logic (WDDL)rdquo is used for the implementation of the basic logic gates which are
used in the cryptographic processors The design flow starts from a normal design in a
hardware description language such as VHDL to the Side Channel Attack (SCA) resistant
layout
5
Figure WDDL Pre-charge wave generation
6
CHAPTER 1 INTRODUCTION
AND OBJECTIVE
11 Introduction
Small-embedded integrated circuits (ICs) such as smart cards are vulnerable to the so-
called side-channel attacks (SCAs) The attacker can gain information by monitoring the power
consumption execution time electromagnetic radiation and other information leaked by the
switching behavior of digital complementary metalndashoxidendashsemiconductor (CMOS) gates This
project presents a digital very large scale integrated (VLSI) design flow to create secure power-
analysis-attack-resistant ICs
The idea is to create digital circuit styles that have a switching behavior independent of
the data or sequence of the data that they are processing A logic style called Wave Dynamic
Differential Logic (WDDL) is used for the implementation of the basic logic gates which are
used in the cryptographic processors The design flow starts from a normal design in a
hardware description language such as VHDL to the Side Channel Attack (SCA) resistant
layout
Depending on the parameter considered the side-channel attacks are classified as
probing attacks fault induction attack timing attack power analysis attack electromagnetic
analysis attack etc One Side Channel Attack in particular namely the Differential Power
Analysis (DPA) is of great concern It is very effective in finding the secret key and can be
mounted quickly with off-the-shelf devices The attack is based on the fact that logic
operations have power characteristics that depend on the input data It relies on statistical
analysis to extract the information from the power consumption that is correlated to the secret
key As the variations actually originate at the logic level implementing the encryption and
decryption modules in a logic style for which a logic gate has at all times constant power
7
consumption independently of signal transitions removes the foundation of DPA and is an
effective means to halt DPA
12 Objective of the Project
The main objectives of this dissertation are
Study of constant-power logic styles
Description of WDDL Gates
Implementation of WDDL Logic Gates
Verification of the functionality of WDDL Logic Gates
Synthesis of the design
Analysis of the reports obtained during simulation and synthesis
8
CHAPTER 2 REVIEW
OF LITERATURE
21 Introduction to Digital Design Flow
A typical digital design flow for any IC is as follows Design Entry (Specification
Architecture RTL Coding and RTL Verification) Synthesis and post-synthesis
verification Backend (Floor Planning Place and route Layout) Tape Out to Foundry to get
the end product All modern digital designs start with a designer writing a hardware description
of the IC (using HDL or Hardware Description Language) in VerilogVHDL A Verilog or
VHDL program essentially describes the hardware (logic gates Flip-Flops counters etc) the
inter connect of the circuit blocks and the functionality Various CAD tools are available to
synthesize a circuit based on the HDL
22 Secure Digital Design Flow
The secure digital design flow is depicted in Fig In addition to the
regular steps in an IC design (logic design logic synthesis place amp route
stream out and verifications) one can recognize two additional steps
namely 1) ldquocell substitutionrdquo and 2) ldquointerconnect decompositionrdquo These
operations have been inserted in the back end of the flow and do not
interfere with the creative part of a design indicated by the ldquologic designrdquo
task
9
Figure 21 Secure Digital Design Flow
During the cell substitution step cells that are designed by any constant power logic style
replace the conventional CMOS gates This ensures the security of the ICs against power
analysis attacks
10
CHAPTER 3 HARDWARE DESCRIPTIVE
LANGUAGE (VHDL)
Why (V) HDL
Interoperability
Technology independence
Design reuse
Several levels of abstraction
Readability
Standard language
Widely supported
What is VHDL
VHDL = VHSIC Hardware Description Language(VHSIC = Very High-Speed IC)
Design specification language
Design entry language
Design simulation language
Design documentation language
An alternative to schematics
Brief History
VHDL Was developed in the early 1980s for managing design problems that involved
large circuits and multiple teams of engineers
Funded by US Department of Defence
11
The first publicly available version was released in 1985
In 1986 IEEE (Institute of Electrical and Electronics Engineers Inc) was presented
with a proposal to standardize the VHDL
In 1987 standardization =gt IEEE 1076-1987
An improved version of the language was released in 1994 =gt IEEE standard1076-
1993
Related Standards
IEEE 1076 doesnrsquot support simulation conditions such as unknown and high-
impedance
Soon after IEEE 1076-1987 was released simulator companies began using their own
non-standard types =gt VHDL was becoming a nonstandard
IEEE 1164 standard was developed by an IEEE1048715IEEE 1164 contains definitions for a
nine-valued data type std_logic
IEEE 10763 (Numeric or Synthesis Standard) defines data types as they relate to actual
hardware
Defines eg two numeric types signed and unsigned
VHDL Environment
12
Design Units
Segments of VHDL code can be compiled separately and stored in a library
Entities
A black box with interface definition
Defines the inputsoutputs of a component (define pins)
A way to represent modularity in VHDL
Similar to symbol in schematic
Entity declaration describes entity
Eg
Entity Comparator is
Port (A B in std_logic_vector (7 downto0)
EQ out std_logic)
end Comparator
13
Ports
Provide channels of communication between the component and its environment
Each port must have a name direction and a type
An entity may have NO port declaration
Port directions
In A value of a port can be read inside the component but cannot be assigned
Multiple reads of port are allowed
Out Assignments can be made to a port but data from a port cannot be read Multiple
assignments are allowed
In out Bi-directional assignments can be made and data can be read Multiple
assignments are allowed
Buffer An out port with read capability May have at most one assignment (are not
recommended)
Architectures
Every entity has at least one architecture
One entity can have several architectures
Architectures can describe design using
BehaviorndashStructurendashDataflow
Architectures can describe design on many levelsndashGate levelndashRTL (Register Transfer
Level)ndashBehavioral level
Configuration declaration links architecture to entity
Eg
Architecture Comparator1 of Comparator is
Begin
EQ lt= rsquo1rsquowhen (A=B) else rsquo0rsquo
End Comparator1
Configurations
Links entity declaration and architecture body together
14
Concept of default configuration is a bit messy in VHDL lsquo87
ndashLast architecture analyzed links to entity
Can be used to change simulation behavior without re-analyzing the VHDL source
Complex configuration declarations are ignored in synthesis
Some entities can have eggate level architecture and behavioral architecture
Are always optional
Packages
Packages contain information common to many design units
1 Package declaration
Constant declarations
ndash Type and subtype declarations
ndash Function and procedure declarations
ndash Global signal declarations
ndash File declarations
ndash Component declarations
2 Package body
ndash Is not necessary needed
ndash Function bodies
ndash Procedure bodies
Packages are meant for encapsuling data which can be shared globally among several design
units These consist of declaration part and optional body part
Package declaration can contain
ndash Type and subtype declarations
ndash Subprograms
ndash Constants
ndash Alias declarations
ndash Global signal declarations
ndash file declarations
ndash Component declarations
Package body consists of
15
ndash Subprogram declarations and bodies
ndash Type and subtype declarations
ndash Deferred constants
ndash File declarations
Libraries
Collection of VHDL design units (database)
1 Packages
package declaration
package body
2 Entities (entity declaration)
3 Architectures (architecture body)
4 Configurations (configuration declarations)
Usually directory in UNIX file system
Can be also any other kind of database
Levels of Abstraction
VHDL supports many possible styles of design description which differ primarily in how
closely they relate to the HW
It is possible to describe a circuit in a number of ways
Structural-------
Dataflow ------- Higher level of abstraction
Behavioral -------
Structural VHDL description
Circuit is described in terms of its components
From a low-level description (eg transistor-level description) to a high level
description (eg block diagram)
For large circuits low-level descriptions quickly become impractical
Dataflow VHDL Description
Circuit is described in terms of how data moves through the system
16
In the dataflow style you describe how information flows between registers in the
system
The combinational logic is described at a relatively high level the placement and
operation of registers is specified quite precisely
The behavior of the system over the time is defined by registers
There are no build-in registers in VHDL-language
ndashEither lower level description
ndashor behavioral description of sequential elements is needed
The lower level register descriptions must be created or obtained
If there is no 3rd party models for registers =gt you must write the behavioral
description of registers
The behavioral description can be provided in the form of subprograms(functions or
procedures)
Behavioral VHDL Description
Circuit is described in terms of its operation over time
Representation might include eg state diagrams timing diagrams and algorithmic
descriptions
The concept of time may be expressed precisely using delays (eg A lt= B after 10 ns)
If no actual delays are used order of sequential operations is defined
17
In the lower levels of abstraction (eg RTL) synthesis tools ignore detailed timing
specifications
The actual timing results depend on implementation technology and efficiency of
synthesis tool
There are a few tools for behavioral synthesis
Concurrent Vs Sequential
Processes
Basic simulation concept in VHDL
VHDL description can always be broken up to interconnected processes
Quite similar to UNIX process
18
Process keyword in VHDL
Process statement is concurrent statement
Statements inside process statements are sequential statements
Process must contain either sensitivity list or wait statement(s) but NOT both
Sensitivity list or wait statement(s) contains signals which wakes process up
General Format
Process [(sensitivity list)]
process_declarative_part
begin
process_statements
[wait_statement]
End process
19
CHAPTER 4 SMART
CARD OVERVIEW
This section will very briefly introduce the concept of a smart card Basically a smart
card is a computer embedded in a safe It consists of a (typically 8-bit or 32-bit) processor
together with ROM EEPROM and a small amount of RAM which is therefore capable of
performing computations The main goal of a smart card is to allow the execution of
cryptographic operations involving some secret parameter (the key) while not revealing this
parameter to the outside world As opposed the goal of the attacker is to recover this secret
parameter This processor is embedded in a chip and connected to the outside world through
eight wires the role use position of which is normalized In addition to the inputoutput wires
the parts we will be the most interested in are the following
1 Power supply Smart cards do not have an internal battery
2 The current they need is provided by the smart card reader This will make the smart
cards power consumption pretty easy to measure for the attacker
3 Clock Similarly smart cards do not dispose of an internal clock either The clock ticks
must also be provided from the outside world As a consequence this will allow the
attacker to measure the cards running time with very good precision
Smart cards are usually equipped with protection mechanisms composed of a shield (the
passivation layer) whose goal is to hide the internal behavior of the chip and possibly sensors
that react when the shield is removed by destroying all sensitive data and preventing the card
to function properly
20
CHAPTER 5 SIDE
CHANNEL ATTACKS
ldquoSide channel attacksrdquo are attacks that are based on ldquoSide Channel Informationrdquo Side
channel information is information that can be retrieved from the encryption device that is
neither the plaintext to be encrypted nor the cipher text resulting from the encryption process
In the past an encryption device was perceived as a unit that receives plaintext input
and produces cipher text output and vice-versa Attacks were therefore based on either
knowing the cipher text (such as cipher text-only attacks) or knowing both (such as known
plaintext attacks) or on the ability to define what plaintext is to be encrypted and then seeing
the results of the encryption (known as chosen plaintext attacks) Today it is known that
encryption devices have additional output and often additional inputs which are not the
plaintext or the cipher text
Encryption devices produce timing information (information about the time that
operations take) that is easily measurable radiation of various sorts power consumption
statistics (that can be easily measured as well) and more Often the encryption device also has
additional ldquounintentionalrdquo inputs such as voltage that can be modified to cause predictable
outcomes Side channel attacks make use of some or all of this information along with other
(known) cryptanalytic techniques to recover the key the device is using
Side channel analysis techniques are of concern because the attacks can be mounted
quickly and can sometimes be implemented using readily available hardware costing from only
a few hundred dollars to thousands of dollars
51 Classification of side channel attacks
The literature usually classifies side channel attacks along two orthogonal axes
1 Invasive vs Non-invasive
21
Invasive attacks require de-packaging the chip to get direct access to its components
A typical example of this is the connection of a wire on a data bus to see the data transfers
A non-invasive attack only exploits externally available information (the emission of
which is however often unintentional) such as running time power consumption
A new distinction called semi-invasive attacks These attacks have the specificity that
they require de-packaging of the chip to get access to the chip surface but do not tamper with
the passivation layer ( they do not require electrical contact to the metal surface)
2 Active vs passive
Active attacks try to tamper with the cards proper functioning For example fault
induction attacks will try to induce errors in the computation
As opposed passive attacks will simply observe the cards behavior during its
processing without disturbing it
Note that these two axes are well orthogonal
An Invasive attack may completely avoid disturbing the cards behavior and a passive
attack may require a preliminary de-packaging for the required information to be observable
These attacks are of course not mutually exclusive an invasive attack may for example serve
as a preliminary step for a non-invasive one by giving a detailed description of the chips
architecture that helps to find out where to put external probes
As smart cards are usually equipped with protection mechanisms that are supposed to
react to invasive attacks (although several invasive attacks are nonetheless capable to defeat
these mechanisms as will be illustrated below) On the other hand it is worth pointing out that
a non-invasive attack is completely undetectable there is for example no way for a smart card
to figure out that its running time is currently being measured Other countermeasures will
therefore be necessary From an economical point of view invasive attacks are usually more
expensive to deploy on a large scale since they require individual processing of each attacked
device In this sense non-invasive attacks constitute therefore a bigger menace for the smart
card industry
Invasive attacks involved a relatively high capital investment for lab equipment plus a
moderate investment of effort for each individual chip attacked Non-invasive attacks require
only a moderate capital investment plus a moderate investment of effort in designing an attack
on a particular type of device Thereafter the cost per device attacked is low Semi-invasive
attacks can be carried out using very cheap and simple equipment
The attacker can gain information by
22
1 Probing attacks
2 Fault induction attacks
3 Timing attacks
4 Power analysis attacks and
5 Electromagnetic timing attacks
These attacks are performed during the switching behavior of digital
complementary metalndashoxidendashsemiconductor (CMOS) gates Of all these Power analysis attack
is of major concern
52 Power analysis attacks
The power consumption of a cryptographic device may provide much information
about the operations that take place and the involved parameters This is the idea of simple and
differential power analysis first introduced by Kocher et al As the clock ticks the cards
energy is also provided by the terminal and can therefore easily be measured Basically to
measure a circuits power consumption a small (eg 50 ohm) resistor is inserted in series with
the power or ground input The voltage difference across the resistor divided by the resistance
yields the current Well-equipped electronics labs have equipment that can digitally sample
voltage differences at extraordinarily high rates (over 1GHz) with excellent accuracy (less than
1 error) Devices capable of sampling at 20MHz or faster and transferring the data to a PC
can be bought for less than US$ 400
Power analysis attacks are of two types
1 Simple power analysis attack and
2 Differential Power Analysis attack
SPA attacks on smartcards typically take a few seconds per card while DPA attacks
can take several hours In a general with a somewhat academic perspective we may consider
the entire internal state of the block cipher to be all the intermediate results and values that are
never included in the output in normal operations For example DES has 16 rounds we can
consider the intermediate states state [115] after each round except the last as a secret internal
state Side channels typically give information about these internal states or about the
operations used in the transition of this internal state from one round to another The type of
side-channel will of course determine what information is available to the attacker about these
states The attacks typically work by finding some information about the internal state of the
cipher which can be learned both by guessing part of the key and checking the value directly
23
and additionally by some statistical property of the cipher that makes that checkable value
slightly nonrandom
521 Simple Power Analysis attack (SPA)
Simple Power Analysis is generally based on looking at the visual representation of the
power consumption of a unit while an encryption operation is being performed Simple Power
Analysis is a technique that involves direct interpretation of power consumption measurements
collected during cryptographic operations SPA can yield information about a devices
operation as well as key material
A trace refers to a set of power consumption measurements taken across a
cryptographic operation For example a 1 millisecond operation sampled at 5 MHz yields a
trace containing 5000 points Figure for example shows an SPA trace from a smart card
performing a DES operation
Figure SPA monitoring from a single DES operation performed by a typical smart card The
upper trace shows the entire encryption operation including the initial permutation the 16
DES rounds and the final permutation The lower trace is a detailed view of the second and
third rounds
Because SPA can reveal the sequence of instructions executed it can be used to break
cryptographic implementations in which the execution path depends on the data being
processed For example
DES key schedule the DES key schedule computation involves rotating 28-bit key registers
A conditional branch is commonly used to check the bit shifted off the end so that ldquo1 bits can
24
be wrapped around The resulting power consumption traces for a ldquo1 bit and a ldquo0 bit will
contain different SPA features if the execution paths take different branches for each
DES permutations DES implementations perform a variety of bit permutations Conditional
branching in software or microcode can cause significant power consumption differences for
ldquo0 and ldquo1 bits
Comparisons String or memory comparison operations typically perform a conditional
branch when a mismatch is found This conditional branching causes large SPA (and
sometimes timing) characteristics
Multipliers Modular multiplication circuits tend to leak a great deal of information about the
data they process The leakage functions depend on the multiplier design but are often strongly
correlated to operand values and Hamming weights
Exponentiators A simple modular exponentiation function scans across the exponent
performing a squaring operation in every iteration with an additional multiplication operation
for each exponent bit that is equal to ldquo1 The exponent can be compromised if squaring and
multiplication operations have different power consumption characteristics take different
amounts of time or are separated by different code Modular exponentiation functions that
operate on two or more exponent bits at a time may have more complex leakage functions
522Differential Power Analysis attack (DPA)
In addition to large-scale power variations due to the instruction sequence there are
effects correlated to data values being manipulated These variations tend to be smaller and are
sometimes overshadowed by measurement errors and other noise In such cases it is still often
possible to break the system using statistical functions tailored to the target algorithm
To implement the DPA attack an attacker first observes m encryption operations and captures
power traces T1 m [1 k] containing k samples each In addition the attacker records the
cipher text C1 m No knowledge of the plain text is required DPA analysis uses power
consumption measurements to determine whether a key block guess Ks is correct The attacker
computes a k-sample differential trace centD [1 k] by finding the difference between the
average of the traces for which a certain intermediate value V is one and the average of the
traces for which V is zero Thus cent D[j) is the average over C1m of the effect due to the value
represented by the selection function D on the power consumption at point j In particular25
If Ks is incorrect the bit computed using D will differ from the actual target bit for about half
of the ciphertext Ci The selection function is thus effectively uncorrelated to what was actually
computed by the target device If a random function is used to divide a set into two subsets the
difference in the averages of the subsets should approach zero as the subset sizes approach
infinity
Thus because trace components uncorrelated to D will diminish with 1 pm causing the
differential trace to become at (the actual trace may not be completely at as D with Ks
incorrect may have a weak correlation to D with the correct Ks) If Ks is correct however the
computed value for D (Ci bKs) will equal the actual value of target bit b with probability 1
The selection function is thus correlated to the value of the bit considered Other data values
measurement errors etc that are not correlated to D approach zero Because power
consumption is correlated to data bit values the plot of centD will be degat with spikes in regions
where D is correlated to the values being processed The correct value of Ks can thus be
identified from the spikes in its differential trace Four values of b correspond to each S box
providing confirmation of key block guesses Finding all eight Ks yields the entire 48-bit round
sub key The remaining 8 key bits can be found easily using exhaustive search or by analyzing
one additional round Triple DES keys can be found by analyzing an outer DES operation first
using the resulting key to decrypt the cipher text and attacking the next DES key DPA can use
known plaintext or known cipher text and can find encryption or decryption keys
26
CHAPTER 6 CONSTANT POWER CONSUMING
LOGIC STYLES
The power consumption of traditional standard cells and logic is
dependent on the signal activity When the output of the logic gate makes
a 0 to 1 transition a current comes from the power supply and charges the
output capacitance On the other hand when the output sees a 1 to 0 a 0
to 0 or a 1 to 1 transition no or only a limited amount of energy (due to
short circuit or leakage) is consumed from the power supply This is the
fundamental reason why information is leaked through the power supply
and why power attacks are possible The basis of a secure digital design
flow is a logic style with constant power consumption
61 Current Mode Logic
Current mode logic (CML) eg current steering logic seems the
ideal solution This type of logic continuously draws a current from the
supply and measures its state through the path that the current takes A
gate has constant power consumption if it draws a perfectly constant
current from the power supply independently of the input and output
signals To build a current source capable of generating a constant current
special circuit techniques that minimize channel length modulation have to
be used
The decisive drawback of CML however is its static power
consumption When the logic gate is not processing any data it burns the
27
current which makes this logic style unacceptable for embedded battery-
operated devices
62 Voltage Mode Logic (CMOS circuit styles)
Voltage mode logic (VML) eg static CMOS logic only draws a current from the
supply to change state and measures its state by the amount of charge it stores on a
capacitance A regular standard CMOS circuit will only consume power when a capacitance
gets charged and later discharged ie when a gate switches state It is the main reason that
CMOS is the style of choice for every battery operated or low power device This is illustrated
in the figure below for simple inverter Thus static CMOS is the preferred logic style because
of its low power consumption and high noise margins
Standard CMOS inverter
Yet two conditions must be satisfied for VML to have constant power consumption
namely
1) A logic gate must have exactly one switching event per signal transition
2) The logic gate must charge a constant capacitance in that switching event
28
Here above all the four transitions of CMOS inverter can be distinguished when
monitoring the power supply
63 Dynamic Differential Logic
Dynamic differential logic sometimes also referred to as dual rail with pre-charge
logic fulfills the first condition A differential logic family uses the true and the false
representation of the input and output signals and a dynamic logic family alternates pre-charge
and evaluation phases As a result since both outputs (true and false) are pre-charged to 1
exactly one of the two output nodes evaluates to 0 to have a differential output signal in the
evaluation phase The discharged output node is charged to 1 in the following pre-charge phase
to pre-charge both outputs to 1 In other words every signal transition including the events in
which the input signals remain constant is represented with an actual switching event in
which the logic gate charges a capacitance All the logic families that have been introduced to
thwart the differential power analysis (DPA) by using dynamic differential logic in the
following techniques
1 Sense Amplifier Based Logic (SABL) and
2 Wave Dynamic Differential Logic (WDDL) gates
631 Sense Amplifier Based logic (SABL)
SABL has its main advantage that it has balanced input and output nodes and that all
internal nodes connect to an output The output capacitances can be balanced Systematic
methods have been developed to make sure that both branches of the differential pull down
network are balanced and that no memory effects are present in the network Sense Amplifier
Based logic is illustrated as
29
Sense Amplifier Based Logic
ANDNAND gate
This circuit style does require however a full custom characterization and layout It also
suffers from a high clock load common to all dynamic logic gates
632 Wave Dynamic Differential Logic Gates (WDDL)
WDDL logic can be implemented with static CMOS logic Static CMOS
standard cells are combined to form secure compound standard cells
which have a reduced power signature WDDL has many advantages It can
be readily implemented from an existing standard cell library The design
flow is fully supported with accurate EDA library files that come directly
from the vendor WDDL also results in a dynamic differential logic with only
a small load capacitance on the pre-charge control signal and with the low
power consumption and the high noise margins of static CMOS
Advantages of WDDL logic style are as follows
30
A major advantage of the proposed logic style is that it can be incorporated by the common
Electronic Design Automation (EDA) tool flow
No special design rules are involved in the interconnection of WDDL gates
The switching factor of WDDL is 100 A WDDL gate consists of a parallel
combination of two positive complementary gates one calculating the
true output using the true inputs the other the false output using the
false inputs A positive gate produces a zero output for an all zero input
The AND gate and the OR gate are examples of positive gates A
complementary gate sometimes also referred to as a dual gate
expresses the false output of the original logic gate using the false
inputs of the original gate The AND gate fed with true input signals and
the OR gate fed with false input signals are two dual gates Fig shows
the WDDL AND gate and the WDDL OR gate In the evaluation phase
each input signal is differential and the WDDL gate calculates its
differential output In the pre-charge phase the inputs to the WDDL gate
are set at 0 This puts the output of the gate at 0 A module in WDDL
pre-charges without distributing the pre-charge signal to each individual
gate During the pre-charge phase the input vector of the combinatorial
logic is set at all 0s Each individual gate will eventually have all its
inputs at 0 evaluate its output to 0 and pass this 0 value to the next
gate One could say that the pre-charge signal travels over the
combinatorial logic as a 0-wave hence WDDL There are several ways
to launch to pre-charge wave In Fig a pre-charge operator is inserted
at the start of every combinatorial logic tree ie the inputs of the
encryption module and the outputs of the registers They produce an all-
zero output in the pre-charge phase (clk-signal high) but let the
31
differential signal through during the evaluation phase (clk-signal low)
Fig
ure WDDL Pre-charge wave generationCHAPTER 7
WDDL GATESThe methodology used in the project is bottom-up approach Lower
modules are designed and later integrated to form larger modules whose further integration
leads to the final top module As it is a fact that logic gates form lower level modules
initially logic gates required for the design are implemented in WDDL style WDDL
demands a parallel combination of two positive complementary gates one calculating the
true value and the other negative value The logic gates like OR AND XOR have been
implemented Besides there is even implementation of Full Adder 32-bit XOR
etc71WDDL OR gateA WDDL OR gate is constructed by considering conventional
OR gate in parallel to its complementary gate ie AND gate as shown in the following
32
figure Figure
41 WDDL OR GateThe NAND and NOT gates used act as Precharge operator injecting
signal lsquo0rsquo into the gates when lsquoclkrsquo signal is high ie during the Precharge phase72
WDDL AND gateA WDDL AND gate is constructed by considering conventional
AND gate in parallel to its complementary gate ie OR gate as shown in the following
33
figure Figure
42 WDDL AND Gate73 WDDL NAND gateA WDDL NAND gate is constructed by
considering conventional AND gate in parallel to its complementary gate ie OR gate as
shown in the following figure
34
Figure
43 WDDL NAND Gate74 WDDL NOR gateA WDDL NOR gate is constructed by
considering conventional OR gate in parallel to its complementary gate ie AND gate as
shown in the following figure
35
Figure 44 WDDL
NOR Gate 75 WDDL XOR gate XOR function can be implemented by the
Boolean Equation A B = ABrsquo + ArsquoB Therefore XOR Gate is implemented
in terms of AND gate and OR gate as shown in the figure 44 It can also be implemented
by instantiating a WDDL AND gate and WDDL OR gate But the number of gates
involved in the latter one is greater than the former one Therefore the first method of
implementation is followed rather than the second one
36
Figure 45
WDDL XOR gateWith the help of the above basic gates Full adder circuit has been
designed by instantiating the above designed WDDL gates During the implementation of
the Blowfish algorithm a 32-bit XOR gate and 32-bit Adder circuit are required They can
be easily implemented by instantiating the corresponding lower module 32 number of
timesCHAPTER 8 FRONT END
RESULTSWDDL OR GATESynthesis
Report==========================================================
= Final Report
===========================================================Final
ResultsRTL Top Level Output File Name wddlorngrTop Level Output File Name
wddlorOutput Format NGCOptimization Goal SpeedKeep
Hierarchy NODesign Statistics IOs 5Cell Usage
BELS 2 LUT3 2 IO Buffers 5
37
IBUF 3 OBUF
2============================================================Devi
ce utilization summary---------------------------Selected Device 3s250eft256-4 Number
of Slices 1 out of 2448 0 Number of 4 input LUTs 2
out of 4896 0 Number of IOs 5 Number of bonded IOBs
5 out of 172 2 Timing Summary---------------Speed Grade -4Maximum
combinational path delay 6236nsSimulation Result
S
ynthesis Result
38
WDD
L AND GATESynthesis
Report==========================================================
== Final Report
============================================================Fina
l ResultsRTL Top Level Output File Name wddlgatesngrTop Level Output File
Name wddlgatesOutput Format NGCOptimization Goal
SpeedKeep Hierarchy NODesign Statistics IOs
5Cell Usage BELS 2 LUT3 2 IO Buffers
5 IBUF 3 OBUF
2===========================================================Devic
e utilization summary---------------------------Selected Device 3s250etq144-4 Number
of Slices 1 out of 2448 0 Number of 4 input LUTs 2
out of 4896 0 Number of IOs 5 Number of bonded IOBs
5 out of 108 4 Timing Summary---------------Speed Grade -4Maximum
combinational path delay 6236nsSimulation Result
39
Sy
nthesis Result
WDDL NAND GATESynthesis
Report==========================================================
== Final Report
============================================================Fina
l ResultsRTL Top Level Output File Name wddlnand1ngrTop Level Output File
Name wddlnand1Output Format NGCOptimization Goal
SpeedKeep Hierarchy NODesign Statistics IOs
5Cell Usage BELS 2 LUT3 2 IO Buffers
40
5 IBUF 3 OBUF
2============================================================Devi
ce utilization summarySelected Device 3s500efg320-4 Number of Slices
1 out of 4656 0 Number of 4 input LUTs 2 out of 9312 0
Number of IOs 5 Number of bonded IOBs 5 out of 232
2 Timing SummarySpeed Grade -4Maximum combinational path delay
6236nsSimulation Result
Synthesis Result
WD
41
DL XOR GATESimulation Result
Synthesis Result
WDDL XOR GATESynthesis
Report==========================================================
== Final Report
===========================================================Final
42
ResultsRTL Top Level Output File Name wddlxorgatengrTop Level Output File
Name wddlxorgateOutput Format NGCOptimization Goal
SpeedKeep Hierarchy NODesign Statistics IOs
5Cell Usage BELS 2 LUT3 2 IO Buffers
5 IBUF 3 OBUF
2============================================================Devi
ce utilization summary---------------------------Selected Device 3s250eft256-4 Number
of Slices 1 out of 2448 0 Number of 4 input LUTs 2
out of 4896 0 Number of IOs 5 Number of bonded IOBs
5 out of 172 2 Timing Summary---------------Speed Grade -4Maximum
combinational path delay 6236nsSimulation Result
Synthesis Result
43
CHAPTER 9 SUMMARY AND CONCLUSION 91
SummaryIn order to provide security to ICs against side-channel attacks especially
Differential Power Analysis (DPA) it is necessary to implement the design in a logic that
can render constant power dissipation irrespective of the input combination WDDL is
proved to be advantageous to others and therefore is of great significance In this
dissertation work architecture for Blowfish Algorithm is designed and implemented in
WDDL style In this implementation bottom-up approach is used The low level entities
are designed and later they are all combined to form the entire module The key
scheduling is online The sub-keys generated for a particular key can be used for the
encryption of the entire data to be encrypted with that key The sub keys are given in
reverse direction for the decryption data path Initially logic gates are implemented in
WDDL and then higher modules have been designed by instantiating the WDDL gates to
form the entire module thus resulting in constant power dissipation irrespective of any
input data combination The entire design works in two phases namely Precharge phase and
Evaluation phase In the Precharge phase all the signals of the design are zeroed and
during the Evaluation phase the functionality of the design is achieved This sort of design
has been found simple and very effective in thwarting the side-channel attack namely
Differential Power analysis (DPA)92 ConclusionThe crypto processor has been
44
designed for the key size of 448 bits and plain text of 64 bits The code for the
implementation has been written in VHDL The functional verification has been done using
the Cadence (NC Launch) Simulation package and Synthesis using RTL Compiler The
Backend of the design is done using the SOC EncounterAccording to the specifications
desired functionality has been achieved In the output during the Evaluation phase there
has been same number of transitions thus resulting in constant power dissipation During
Synthesis it has been observed that a simple WDDL gate comprised many conventional
gates Therefore the area of the design has grown nearly three-fold when compared to the
design implemented in conventional CMOS logic at the cost of security incorporated into
the IC against Differential Power Analysis (DPA) Due to the constant power dissipation at
the output hacker cannot apply Differential Power Analysis (DPA) scheme to find the
secret key that is being used in the crypto-processor Thus security against DPA is
incorporated into the IC at hardware level by implementing the design in WDDL style
which is quite simple and effectiveCHAPTER 10
REFERENCES 101 Referred Technical papers[1] Kris Tiri Member
IEEE and Ingrid Verbauwhede Senior Member IEEE ldquoA Digital Design Flow for
Secure Integrated Circuitsrdquo IEEE Transaction on Computer-Aided Design of Integrated
Circuits and Systems Vol 25 No 7 July 2006[2] Prof Jean-Jacques Quisquater Math
RiZK laquo Side Channel Attacks raquo October 2002[3] Ross Anderson Mike Bond Jolyon
Clulow and Sergei Skorobogatov ldquoCrypto processors ndash A Surveyrdquo IEEE Proceedings[4]
Noohul Basheer Zain Ali James M Noras ldquoOptimal Data path Design for a Cryptographic
Processor The Blowfish Algorithmrdquo Malaysian Journal of Computer Science Vol 14 No
1 June 2001 pp 16-27[5] Dan Rinehimer Derek Wilson ldquoSummary of B Schneierrsquos
Description of New Variable Length Key 64-Bit Block Cipher (Blowfish)rdquo[6] Kris Tiri and
Ingrid Verbauwhede ldquoA Dynamic and Differential CMOS Logic Style to Resist Power and
Timing Attacks on Security ICrsquosrdquo[7] Discretix Technologies Limited ldquoIntroduction to Side
45
Channel Attacksrdquo[8] Kris Tiri Moonmoon Akmal and Ingrid Verbauwhede ldquoA Dynamic
and Differential Logic with Signal Independent Power Consumption to withstand
Differential Power Analysis on Smart CardsrdquoReference books[1] William Stallings
ldquoCryptography and Network Security Principles and Practicesrdquo Pearson Education
2003[2] Samir Palnitkar ldquoVerilog HDL A Guide to Digital Design and Synthesisrdquo
Prentice Hall Referred Web Pages[1] httpwwwschneiercomblowfishhtml[2]
httpwwwnistgov[3] httpwwwdiscretixcomPDFIntroduction20to20Side20Channel
20Attackspdf[4] httpwwwwipointpctdbenwojsp
IA=WO2005081085ampDISPLAY=CLAIMS
46
ABSTRACT
Every electronic device needs security from the smallest RFID tags to the larger
hand held devices Security is needed for financial medical consumer automotive
applications and other applications Small-embedded integrated circuits (ICs) such as smart
cards are vulnerable to the so-called side-channel attacks (SCAs) Side channel attacks are a
class of attacks that derive information from the integrated circuits while it is in operation The
attacker can gain information by monitoring the power consumption execution time
electromagnetic radiation and other information leaked by the switching behavior of digital
complementary metalndashoxidendashsemiconductor (CMOS) gates Eg execution times that depend
on values of data andor key show what they are doing Simple timing or power attacks give
visual information on the circuit This project presents a digital very large scale integrated
(VLSI) design flow to create secure power-analysis-attack-resistant ICs The route cause for
this problem is that standard CMOS is power efficient and it will only consume dynamic
power when nodes are switching
The idea is to create digital circuit styles that have a switching behavior independent of
the data or sequence of the data that they are processing A logic style called ldquoWave Dynamic
Differential Logic (WDDL)rdquo is used for the implementation of the basic logic gates which are
used in the cryptographic processors The design flow starts from a normal design in a
hardware description language such as VHDL to the Side Channel Attack (SCA) resistant
layout
5
Figure WDDL Pre-charge wave generation
6
CHAPTER 1 INTRODUCTION
AND OBJECTIVE
11 Introduction
Small-embedded integrated circuits (ICs) such as smart cards are vulnerable to the so-
called side-channel attacks (SCAs) The attacker can gain information by monitoring the power
consumption execution time electromagnetic radiation and other information leaked by the
switching behavior of digital complementary metalndashoxidendashsemiconductor (CMOS) gates This
project presents a digital very large scale integrated (VLSI) design flow to create secure power-
analysis-attack-resistant ICs
The idea is to create digital circuit styles that have a switching behavior independent of
the data or sequence of the data that they are processing A logic style called Wave Dynamic
Differential Logic (WDDL) is used for the implementation of the basic logic gates which are
used in the cryptographic processors The design flow starts from a normal design in a
hardware description language such as VHDL to the Side Channel Attack (SCA) resistant
layout
Depending on the parameter considered the side-channel attacks are classified as
probing attacks fault induction attack timing attack power analysis attack electromagnetic
analysis attack etc One Side Channel Attack in particular namely the Differential Power
Analysis (DPA) is of great concern It is very effective in finding the secret key and can be
mounted quickly with off-the-shelf devices The attack is based on the fact that logic
operations have power characteristics that depend on the input data It relies on statistical
analysis to extract the information from the power consumption that is correlated to the secret
key As the variations actually originate at the logic level implementing the encryption and
decryption modules in a logic style for which a logic gate has at all times constant power
7
consumption independently of signal transitions removes the foundation of DPA and is an
effective means to halt DPA
12 Objective of the Project
The main objectives of this dissertation are
Study of constant-power logic styles
Description of WDDL Gates
Implementation of WDDL Logic Gates
Verification of the functionality of WDDL Logic Gates
Synthesis of the design
Analysis of the reports obtained during simulation and synthesis
8
CHAPTER 2 REVIEW
OF LITERATURE
21 Introduction to Digital Design Flow
A typical digital design flow for any IC is as follows Design Entry (Specification
Architecture RTL Coding and RTL Verification) Synthesis and post-synthesis
verification Backend (Floor Planning Place and route Layout) Tape Out to Foundry to get
the end product All modern digital designs start with a designer writing a hardware description
of the IC (using HDL or Hardware Description Language) in VerilogVHDL A Verilog or
VHDL program essentially describes the hardware (logic gates Flip-Flops counters etc) the
inter connect of the circuit blocks and the functionality Various CAD tools are available to
synthesize a circuit based on the HDL
22 Secure Digital Design Flow
The secure digital design flow is depicted in Fig In addition to the
regular steps in an IC design (logic design logic synthesis place amp route
stream out and verifications) one can recognize two additional steps
namely 1) ldquocell substitutionrdquo and 2) ldquointerconnect decompositionrdquo These
operations have been inserted in the back end of the flow and do not
interfere with the creative part of a design indicated by the ldquologic designrdquo
task
9
Figure 21 Secure Digital Design Flow
During the cell substitution step cells that are designed by any constant power logic style
replace the conventional CMOS gates This ensures the security of the ICs against power
analysis attacks
10
CHAPTER 3 HARDWARE DESCRIPTIVE
LANGUAGE (VHDL)
Why (V) HDL
Interoperability
Technology independence
Design reuse
Several levels of abstraction
Readability
Standard language
Widely supported
What is VHDL
VHDL = VHSIC Hardware Description Language(VHSIC = Very High-Speed IC)
Design specification language
Design entry language
Design simulation language
Design documentation language
An alternative to schematics
Brief History
VHDL Was developed in the early 1980s for managing design problems that involved
large circuits and multiple teams of engineers
Funded by US Department of Defence
11
The first publicly available version was released in 1985
In 1986 IEEE (Institute of Electrical and Electronics Engineers Inc) was presented
with a proposal to standardize the VHDL
In 1987 standardization =gt IEEE 1076-1987
An improved version of the language was released in 1994 =gt IEEE standard1076-
1993
Related Standards
IEEE 1076 doesnrsquot support simulation conditions such as unknown and high-
impedance
Soon after IEEE 1076-1987 was released simulator companies began using their own
non-standard types =gt VHDL was becoming a nonstandard
IEEE 1164 standard was developed by an IEEE1048715IEEE 1164 contains definitions for a
nine-valued data type std_logic
IEEE 10763 (Numeric or Synthesis Standard) defines data types as they relate to actual
hardware
Defines eg two numeric types signed and unsigned
VHDL Environment
12
Design Units
Segments of VHDL code can be compiled separately and stored in a library
Entities
A black box with interface definition
Defines the inputsoutputs of a component (define pins)
A way to represent modularity in VHDL
Similar to symbol in schematic
Entity declaration describes entity
Eg
Entity Comparator is
Port (A B in std_logic_vector (7 downto0)
EQ out std_logic)
end Comparator
13
Ports
Provide channels of communication between the component and its environment
Each port must have a name direction and a type
An entity may have NO port declaration
Port directions
In A value of a port can be read inside the component but cannot be assigned
Multiple reads of port are allowed
Out Assignments can be made to a port but data from a port cannot be read Multiple
assignments are allowed
In out Bi-directional assignments can be made and data can be read Multiple
assignments are allowed
Buffer An out port with read capability May have at most one assignment (are not
recommended)
Architectures
Every entity has at least one architecture
One entity can have several architectures
Architectures can describe design using
BehaviorndashStructurendashDataflow
Architectures can describe design on many levelsndashGate levelndashRTL (Register Transfer
Level)ndashBehavioral level
Configuration declaration links architecture to entity
Eg
Architecture Comparator1 of Comparator is
Begin
EQ lt= rsquo1rsquowhen (A=B) else rsquo0rsquo
End Comparator1
Configurations
Links entity declaration and architecture body together
14
Concept of default configuration is a bit messy in VHDL lsquo87
ndashLast architecture analyzed links to entity
Can be used to change simulation behavior without re-analyzing the VHDL source
Complex configuration declarations are ignored in synthesis
Some entities can have eggate level architecture and behavioral architecture
Are always optional
Packages
Packages contain information common to many design units
1 Package declaration
Constant declarations
ndash Type and subtype declarations
ndash Function and procedure declarations
ndash Global signal declarations
ndash File declarations
ndash Component declarations
2 Package body
ndash Is not necessary needed
ndash Function bodies
ndash Procedure bodies
Packages are meant for encapsuling data which can be shared globally among several design
units These consist of declaration part and optional body part
Package declaration can contain
ndash Type and subtype declarations
ndash Subprograms
ndash Constants
ndash Alias declarations
ndash Global signal declarations
ndash file declarations
ndash Component declarations
Package body consists of
15
ndash Subprogram declarations and bodies
ndash Type and subtype declarations
ndash Deferred constants
ndash File declarations
Libraries
Collection of VHDL design units (database)
1 Packages
package declaration
package body
2 Entities (entity declaration)
3 Architectures (architecture body)
4 Configurations (configuration declarations)
Usually directory in UNIX file system
Can be also any other kind of database
Levels of Abstraction
VHDL supports many possible styles of design description which differ primarily in how
closely they relate to the HW
It is possible to describe a circuit in a number of ways
Structural-------
Dataflow ------- Higher level of abstraction
Behavioral -------
Structural VHDL description
Circuit is described in terms of its components
From a low-level description (eg transistor-level description) to a high level
description (eg block diagram)
For large circuits low-level descriptions quickly become impractical
Dataflow VHDL Description
Circuit is described in terms of how data moves through the system
16
In the dataflow style you describe how information flows between registers in the
system
The combinational logic is described at a relatively high level the placement and
operation of registers is specified quite precisely
The behavior of the system over the time is defined by registers
There are no build-in registers in VHDL-language
ndashEither lower level description
ndashor behavioral description of sequential elements is needed
The lower level register descriptions must be created or obtained
If there is no 3rd party models for registers =gt you must write the behavioral
description of registers
The behavioral description can be provided in the form of subprograms(functions or
procedures)
Behavioral VHDL Description
Circuit is described in terms of its operation over time
Representation might include eg state diagrams timing diagrams and algorithmic
descriptions
The concept of time may be expressed precisely using delays (eg A lt= B after 10 ns)
If no actual delays are used order of sequential operations is defined
17
In the lower levels of abstraction (eg RTL) synthesis tools ignore detailed timing
specifications
The actual timing results depend on implementation technology and efficiency of
synthesis tool
There are a few tools for behavioral synthesis
Concurrent Vs Sequential
Processes
Basic simulation concept in VHDL
VHDL description can always be broken up to interconnected processes
Quite similar to UNIX process
18
Process keyword in VHDL
Process statement is concurrent statement
Statements inside process statements are sequential statements
Process must contain either sensitivity list or wait statement(s) but NOT both
Sensitivity list or wait statement(s) contains signals which wakes process up
General Format
Process [(sensitivity list)]
process_declarative_part
begin
process_statements
[wait_statement]
End process
19
CHAPTER 4 SMART
CARD OVERVIEW
This section will very briefly introduce the concept of a smart card Basically a smart
card is a computer embedded in a safe It consists of a (typically 8-bit or 32-bit) processor
together with ROM EEPROM and a small amount of RAM which is therefore capable of
performing computations The main goal of a smart card is to allow the execution of
cryptographic operations involving some secret parameter (the key) while not revealing this
parameter to the outside world As opposed the goal of the attacker is to recover this secret
parameter This processor is embedded in a chip and connected to the outside world through
eight wires the role use position of which is normalized In addition to the inputoutput wires
the parts we will be the most interested in are the following
1 Power supply Smart cards do not have an internal battery
2 The current they need is provided by the smart card reader This will make the smart
cards power consumption pretty easy to measure for the attacker
3 Clock Similarly smart cards do not dispose of an internal clock either The clock ticks
must also be provided from the outside world As a consequence this will allow the
attacker to measure the cards running time with very good precision
Smart cards are usually equipped with protection mechanisms composed of a shield (the
passivation layer) whose goal is to hide the internal behavior of the chip and possibly sensors
that react when the shield is removed by destroying all sensitive data and preventing the card
to function properly
20
CHAPTER 5 SIDE
CHANNEL ATTACKS
ldquoSide channel attacksrdquo are attacks that are based on ldquoSide Channel Informationrdquo Side
channel information is information that can be retrieved from the encryption device that is
neither the plaintext to be encrypted nor the cipher text resulting from the encryption process
In the past an encryption device was perceived as a unit that receives plaintext input
and produces cipher text output and vice-versa Attacks were therefore based on either
knowing the cipher text (such as cipher text-only attacks) or knowing both (such as known
plaintext attacks) or on the ability to define what plaintext is to be encrypted and then seeing
the results of the encryption (known as chosen plaintext attacks) Today it is known that
encryption devices have additional output and often additional inputs which are not the
plaintext or the cipher text
Encryption devices produce timing information (information about the time that
operations take) that is easily measurable radiation of various sorts power consumption
statistics (that can be easily measured as well) and more Often the encryption device also has
additional ldquounintentionalrdquo inputs such as voltage that can be modified to cause predictable
outcomes Side channel attacks make use of some or all of this information along with other
(known) cryptanalytic techniques to recover the key the device is using
Side channel analysis techniques are of concern because the attacks can be mounted
quickly and can sometimes be implemented using readily available hardware costing from only
a few hundred dollars to thousands of dollars
51 Classification of side channel attacks
The literature usually classifies side channel attacks along two orthogonal axes
1 Invasive vs Non-invasive
21
Invasive attacks require de-packaging the chip to get direct access to its components
A typical example of this is the connection of a wire on a data bus to see the data transfers
A non-invasive attack only exploits externally available information (the emission of
which is however often unintentional) such as running time power consumption
A new distinction called semi-invasive attacks These attacks have the specificity that
they require de-packaging of the chip to get access to the chip surface but do not tamper with
the passivation layer ( they do not require electrical contact to the metal surface)
2 Active vs passive
Active attacks try to tamper with the cards proper functioning For example fault
induction attacks will try to induce errors in the computation
As opposed passive attacks will simply observe the cards behavior during its
processing without disturbing it
Note that these two axes are well orthogonal
An Invasive attack may completely avoid disturbing the cards behavior and a passive
attack may require a preliminary de-packaging for the required information to be observable
These attacks are of course not mutually exclusive an invasive attack may for example serve
as a preliminary step for a non-invasive one by giving a detailed description of the chips
architecture that helps to find out where to put external probes
As smart cards are usually equipped with protection mechanisms that are supposed to
react to invasive attacks (although several invasive attacks are nonetheless capable to defeat
these mechanisms as will be illustrated below) On the other hand it is worth pointing out that
a non-invasive attack is completely undetectable there is for example no way for a smart card
to figure out that its running time is currently being measured Other countermeasures will
therefore be necessary From an economical point of view invasive attacks are usually more
expensive to deploy on a large scale since they require individual processing of each attacked
device In this sense non-invasive attacks constitute therefore a bigger menace for the smart
card industry
Invasive attacks involved a relatively high capital investment for lab equipment plus a
moderate investment of effort for each individual chip attacked Non-invasive attacks require
only a moderate capital investment plus a moderate investment of effort in designing an attack
on a particular type of device Thereafter the cost per device attacked is low Semi-invasive
attacks can be carried out using very cheap and simple equipment
The attacker can gain information by
22
1 Probing attacks
2 Fault induction attacks
3 Timing attacks
4 Power analysis attacks and
5 Electromagnetic timing attacks
These attacks are performed during the switching behavior of digital
complementary metalndashoxidendashsemiconductor (CMOS) gates Of all these Power analysis attack
is of major concern
52 Power analysis attacks
The power consumption of a cryptographic device may provide much information
about the operations that take place and the involved parameters This is the idea of simple and
differential power analysis first introduced by Kocher et al As the clock ticks the cards
energy is also provided by the terminal and can therefore easily be measured Basically to
measure a circuits power consumption a small (eg 50 ohm) resistor is inserted in series with
the power or ground input The voltage difference across the resistor divided by the resistance
yields the current Well-equipped electronics labs have equipment that can digitally sample
voltage differences at extraordinarily high rates (over 1GHz) with excellent accuracy (less than
1 error) Devices capable of sampling at 20MHz or faster and transferring the data to a PC
can be bought for less than US$ 400
Power analysis attacks are of two types
1 Simple power analysis attack and
2 Differential Power Analysis attack
SPA attacks on smartcards typically take a few seconds per card while DPA attacks
can take several hours In a general with a somewhat academic perspective we may consider
the entire internal state of the block cipher to be all the intermediate results and values that are
never included in the output in normal operations For example DES has 16 rounds we can
consider the intermediate states state [115] after each round except the last as a secret internal
state Side channels typically give information about these internal states or about the
operations used in the transition of this internal state from one round to another The type of
side-channel will of course determine what information is available to the attacker about these
states The attacks typically work by finding some information about the internal state of the
cipher which can be learned both by guessing part of the key and checking the value directly
23
and additionally by some statistical property of the cipher that makes that checkable value
slightly nonrandom
521 Simple Power Analysis attack (SPA)
Simple Power Analysis is generally based on looking at the visual representation of the
power consumption of a unit while an encryption operation is being performed Simple Power
Analysis is a technique that involves direct interpretation of power consumption measurements
collected during cryptographic operations SPA can yield information about a devices
operation as well as key material
A trace refers to a set of power consumption measurements taken across a
cryptographic operation For example a 1 millisecond operation sampled at 5 MHz yields a
trace containing 5000 points Figure for example shows an SPA trace from a smart card
performing a DES operation
Figure SPA monitoring from a single DES operation performed by a typical smart card The
upper trace shows the entire encryption operation including the initial permutation the 16
DES rounds and the final permutation The lower trace is a detailed view of the second and
third rounds
Because SPA can reveal the sequence of instructions executed it can be used to break
cryptographic implementations in which the execution path depends on the data being
processed For example
DES key schedule the DES key schedule computation involves rotating 28-bit key registers
A conditional branch is commonly used to check the bit shifted off the end so that ldquo1 bits can
24
be wrapped around The resulting power consumption traces for a ldquo1 bit and a ldquo0 bit will
contain different SPA features if the execution paths take different branches for each
DES permutations DES implementations perform a variety of bit permutations Conditional
branching in software or microcode can cause significant power consumption differences for
ldquo0 and ldquo1 bits
Comparisons String or memory comparison operations typically perform a conditional
branch when a mismatch is found This conditional branching causes large SPA (and
sometimes timing) characteristics
Multipliers Modular multiplication circuits tend to leak a great deal of information about the
data they process The leakage functions depend on the multiplier design but are often strongly
correlated to operand values and Hamming weights
Exponentiators A simple modular exponentiation function scans across the exponent
performing a squaring operation in every iteration with an additional multiplication operation
for each exponent bit that is equal to ldquo1 The exponent can be compromised if squaring and
multiplication operations have different power consumption characteristics take different
amounts of time or are separated by different code Modular exponentiation functions that
operate on two or more exponent bits at a time may have more complex leakage functions
522Differential Power Analysis attack (DPA)
In addition to large-scale power variations due to the instruction sequence there are
effects correlated to data values being manipulated These variations tend to be smaller and are
sometimes overshadowed by measurement errors and other noise In such cases it is still often
possible to break the system using statistical functions tailored to the target algorithm
To implement the DPA attack an attacker first observes m encryption operations and captures
power traces T1 m [1 k] containing k samples each In addition the attacker records the
cipher text C1 m No knowledge of the plain text is required DPA analysis uses power
consumption measurements to determine whether a key block guess Ks is correct The attacker
computes a k-sample differential trace centD [1 k] by finding the difference between the
average of the traces for which a certain intermediate value V is one and the average of the
traces for which V is zero Thus cent D[j) is the average over C1m of the effect due to the value
represented by the selection function D on the power consumption at point j In particular25
If Ks is incorrect the bit computed using D will differ from the actual target bit for about half
of the ciphertext Ci The selection function is thus effectively uncorrelated to what was actually
computed by the target device If a random function is used to divide a set into two subsets the
difference in the averages of the subsets should approach zero as the subset sizes approach
infinity
Thus because trace components uncorrelated to D will diminish with 1 pm causing the
differential trace to become at (the actual trace may not be completely at as D with Ks
incorrect may have a weak correlation to D with the correct Ks) If Ks is correct however the
computed value for D (Ci bKs) will equal the actual value of target bit b with probability 1
The selection function is thus correlated to the value of the bit considered Other data values
measurement errors etc that are not correlated to D approach zero Because power
consumption is correlated to data bit values the plot of centD will be degat with spikes in regions
where D is correlated to the values being processed The correct value of Ks can thus be
identified from the spikes in its differential trace Four values of b correspond to each S box
providing confirmation of key block guesses Finding all eight Ks yields the entire 48-bit round
sub key The remaining 8 key bits can be found easily using exhaustive search or by analyzing
one additional round Triple DES keys can be found by analyzing an outer DES operation first
using the resulting key to decrypt the cipher text and attacking the next DES key DPA can use
known plaintext or known cipher text and can find encryption or decryption keys
26
CHAPTER 6 CONSTANT POWER CONSUMING
LOGIC STYLES
The power consumption of traditional standard cells and logic is
dependent on the signal activity When the output of the logic gate makes
a 0 to 1 transition a current comes from the power supply and charges the
output capacitance On the other hand when the output sees a 1 to 0 a 0
to 0 or a 1 to 1 transition no or only a limited amount of energy (due to
short circuit or leakage) is consumed from the power supply This is the
fundamental reason why information is leaked through the power supply
and why power attacks are possible The basis of a secure digital design
flow is a logic style with constant power consumption
61 Current Mode Logic
Current mode logic (CML) eg current steering logic seems the
ideal solution This type of logic continuously draws a current from the
supply and measures its state through the path that the current takes A
gate has constant power consumption if it draws a perfectly constant
current from the power supply independently of the input and output
signals To build a current source capable of generating a constant current
special circuit techniques that minimize channel length modulation have to
be used
The decisive drawback of CML however is its static power
consumption When the logic gate is not processing any data it burns the
27
current which makes this logic style unacceptable for embedded battery-
operated devices
62 Voltage Mode Logic (CMOS circuit styles)
Voltage mode logic (VML) eg static CMOS logic only draws a current from the
supply to change state and measures its state by the amount of charge it stores on a
capacitance A regular standard CMOS circuit will only consume power when a capacitance
gets charged and later discharged ie when a gate switches state It is the main reason that
CMOS is the style of choice for every battery operated or low power device This is illustrated
in the figure below for simple inverter Thus static CMOS is the preferred logic style because
of its low power consumption and high noise margins
Standard CMOS inverter
Yet two conditions must be satisfied for VML to have constant power consumption
namely
1) A logic gate must have exactly one switching event per signal transition
2) The logic gate must charge a constant capacitance in that switching event
28
Here above all the four transitions of CMOS inverter can be distinguished when
monitoring the power supply
63 Dynamic Differential Logic
Dynamic differential logic sometimes also referred to as dual rail with pre-charge
logic fulfills the first condition A differential logic family uses the true and the false
representation of the input and output signals and a dynamic logic family alternates pre-charge
and evaluation phases As a result since both outputs (true and false) are pre-charged to 1
exactly one of the two output nodes evaluates to 0 to have a differential output signal in the
evaluation phase The discharged output node is charged to 1 in the following pre-charge phase
to pre-charge both outputs to 1 In other words every signal transition including the events in
which the input signals remain constant is represented with an actual switching event in
which the logic gate charges a capacitance All the logic families that have been introduced to
thwart the differential power analysis (DPA) by using dynamic differential logic in the
following techniques
1 Sense Amplifier Based Logic (SABL) and
2 Wave Dynamic Differential Logic (WDDL) gates
631 Sense Amplifier Based logic (SABL)
SABL has its main advantage that it has balanced input and output nodes and that all
internal nodes connect to an output The output capacitances can be balanced Systematic
methods have been developed to make sure that both branches of the differential pull down
network are balanced and that no memory effects are present in the network Sense Amplifier
Based logic is illustrated as
29
Sense Amplifier Based Logic
ANDNAND gate
This circuit style does require however a full custom characterization and layout It also
suffers from a high clock load common to all dynamic logic gates
632 Wave Dynamic Differential Logic Gates (WDDL)
WDDL logic can be implemented with static CMOS logic Static CMOS
standard cells are combined to form secure compound standard cells
which have a reduced power signature WDDL has many advantages It can
be readily implemented from an existing standard cell library The design
flow is fully supported with accurate EDA library files that come directly
from the vendor WDDL also results in a dynamic differential logic with only
a small load capacitance on the pre-charge control signal and with the low
power consumption and the high noise margins of static CMOS
Advantages of WDDL logic style are as follows
30
A major advantage of the proposed logic style is that it can be incorporated by the common
Electronic Design Automation (EDA) tool flow
No special design rules are involved in the interconnection of WDDL gates
The switching factor of WDDL is 100 A WDDL gate consists of a parallel
combination of two positive complementary gates one calculating the
true output using the true inputs the other the false output using the
false inputs A positive gate produces a zero output for an all zero input
The AND gate and the OR gate are examples of positive gates A
complementary gate sometimes also referred to as a dual gate
expresses the false output of the original logic gate using the false
inputs of the original gate The AND gate fed with true input signals and
the OR gate fed with false input signals are two dual gates Fig shows
the WDDL AND gate and the WDDL OR gate In the evaluation phase
each input signal is differential and the WDDL gate calculates its
differential output In the pre-charge phase the inputs to the WDDL gate
are set at 0 This puts the output of the gate at 0 A module in WDDL
pre-charges without distributing the pre-charge signal to each individual
gate During the pre-charge phase the input vector of the combinatorial
logic is set at all 0s Each individual gate will eventually have all its
inputs at 0 evaluate its output to 0 and pass this 0 value to the next
gate One could say that the pre-charge signal travels over the
combinatorial logic as a 0-wave hence WDDL There are several ways
to launch to pre-charge wave In Fig a pre-charge operator is inserted
at the start of every combinatorial logic tree ie the inputs of the
encryption module and the outputs of the registers They produce an all-
zero output in the pre-charge phase (clk-signal high) but let the
31
differential signal through during the evaluation phase (clk-signal low)
Fig
ure WDDL Pre-charge wave generationCHAPTER 7
WDDL GATESThe methodology used in the project is bottom-up approach Lower
modules are designed and later integrated to form larger modules whose further integration
leads to the final top module As it is a fact that logic gates form lower level modules
initially logic gates required for the design are implemented in WDDL style WDDL
demands a parallel combination of two positive complementary gates one calculating the
true value and the other negative value The logic gates like OR AND XOR have been
implemented Besides there is even implementation of Full Adder 32-bit XOR
etc71WDDL OR gateA WDDL OR gate is constructed by considering conventional
OR gate in parallel to its complementary gate ie AND gate as shown in the following
32
figure Figure
41 WDDL OR GateThe NAND and NOT gates used act as Precharge operator injecting
signal lsquo0rsquo into the gates when lsquoclkrsquo signal is high ie during the Precharge phase72
WDDL AND gateA WDDL AND gate is constructed by considering conventional
AND gate in parallel to its complementary gate ie OR gate as shown in the following
33
figure Figure
42 WDDL AND Gate73 WDDL NAND gateA WDDL NAND gate is constructed by
considering conventional AND gate in parallel to its complementary gate ie OR gate as
shown in the following figure
34
Figure
43 WDDL NAND Gate74 WDDL NOR gateA WDDL NOR gate is constructed by
considering conventional OR gate in parallel to its complementary gate ie AND gate as
shown in the following figure
35
Figure 44 WDDL
NOR Gate 75 WDDL XOR gate XOR function can be implemented by the
Boolean Equation A B = ABrsquo + ArsquoB Therefore XOR Gate is implemented
in terms of AND gate and OR gate as shown in the figure 44 It can also be implemented
by instantiating a WDDL AND gate and WDDL OR gate But the number of gates
involved in the latter one is greater than the former one Therefore the first method of
implementation is followed rather than the second one
36
Figure 45
WDDL XOR gateWith the help of the above basic gates Full adder circuit has been
designed by instantiating the above designed WDDL gates During the implementation of
the Blowfish algorithm a 32-bit XOR gate and 32-bit Adder circuit are required They can
be easily implemented by instantiating the corresponding lower module 32 number of
timesCHAPTER 8 FRONT END
RESULTSWDDL OR GATESynthesis
Report==========================================================
= Final Report
===========================================================Final
ResultsRTL Top Level Output File Name wddlorngrTop Level Output File Name
wddlorOutput Format NGCOptimization Goal SpeedKeep
Hierarchy NODesign Statistics IOs 5Cell Usage
BELS 2 LUT3 2 IO Buffers 5
37
IBUF 3 OBUF
2============================================================Devi
ce utilization summary---------------------------Selected Device 3s250eft256-4 Number
of Slices 1 out of 2448 0 Number of 4 input LUTs 2
out of 4896 0 Number of IOs 5 Number of bonded IOBs
5 out of 172 2 Timing Summary---------------Speed Grade -4Maximum
combinational path delay 6236nsSimulation Result
S
ynthesis Result
38
WDD
L AND GATESynthesis
Report==========================================================
== Final Report
============================================================Fina
l ResultsRTL Top Level Output File Name wddlgatesngrTop Level Output File
Name wddlgatesOutput Format NGCOptimization Goal
SpeedKeep Hierarchy NODesign Statistics IOs
5Cell Usage BELS 2 LUT3 2 IO Buffers
5 IBUF 3 OBUF
2===========================================================Devic
e utilization summary---------------------------Selected Device 3s250etq144-4 Number
of Slices 1 out of 2448 0 Number of 4 input LUTs 2
out of 4896 0 Number of IOs 5 Number of bonded IOBs
5 out of 108 4 Timing Summary---------------Speed Grade -4Maximum
combinational path delay 6236nsSimulation Result
39
Sy
nthesis Result
WDDL NAND GATESynthesis
Report==========================================================
== Final Report
============================================================Fina
l ResultsRTL Top Level Output File Name wddlnand1ngrTop Level Output File
Name wddlnand1Output Format NGCOptimization Goal
SpeedKeep Hierarchy NODesign Statistics IOs
5Cell Usage BELS 2 LUT3 2 IO Buffers
40
5 IBUF 3 OBUF
2============================================================Devi
ce utilization summarySelected Device 3s500efg320-4 Number of Slices
1 out of 4656 0 Number of 4 input LUTs 2 out of 9312 0
Number of IOs 5 Number of bonded IOBs 5 out of 232
2 Timing SummarySpeed Grade -4Maximum combinational path delay
6236nsSimulation Result
Synthesis Result
WD
41
DL XOR GATESimulation Result
Synthesis Result
WDDL XOR GATESynthesis
Report==========================================================
== Final Report
===========================================================Final
42
ResultsRTL Top Level Output File Name wddlxorgatengrTop Level Output File
Name wddlxorgateOutput Format NGCOptimization Goal
SpeedKeep Hierarchy NODesign Statistics IOs
5Cell Usage BELS 2 LUT3 2 IO Buffers
5 IBUF 3 OBUF
2============================================================Devi
ce utilization summary---------------------------Selected Device 3s250eft256-4 Number
of Slices 1 out of 2448 0 Number of 4 input LUTs 2
out of 4896 0 Number of IOs 5 Number of bonded IOBs
5 out of 172 2 Timing Summary---------------Speed Grade -4Maximum
combinational path delay 6236nsSimulation Result
Synthesis Result
43
CHAPTER 9 SUMMARY AND CONCLUSION 91
SummaryIn order to provide security to ICs against side-channel attacks especially
Differential Power Analysis (DPA) it is necessary to implement the design in a logic that
can render constant power dissipation irrespective of the input combination WDDL is
proved to be advantageous to others and therefore is of great significance In this
dissertation work architecture for Blowfish Algorithm is designed and implemented in
WDDL style In this implementation bottom-up approach is used The low level entities
are designed and later they are all combined to form the entire module The key
scheduling is online The sub-keys generated for a particular key can be used for the
encryption of the entire data to be encrypted with that key The sub keys are given in
reverse direction for the decryption data path Initially logic gates are implemented in
WDDL and then higher modules have been designed by instantiating the WDDL gates to
form the entire module thus resulting in constant power dissipation irrespective of any
input data combination The entire design works in two phases namely Precharge phase and
Evaluation phase In the Precharge phase all the signals of the design are zeroed and
during the Evaluation phase the functionality of the design is achieved This sort of design
has been found simple and very effective in thwarting the side-channel attack namely
Differential Power analysis (DPA)92 ConclusionThe crypto processor has been
44
designed for the key size of 448 bits and plain text of 64 bits The code for the
implementation has been written in VHDL The functional verification has been done using
the Cadence (NC Launch) Simulation package and Synthesis using RTL Compiler The
Backend of the design is done using the SOC EncounterAccording to the specifications
desired functionality has been achieved In the output during the Evaluation phase there
has been same number of transitions thus resulting in constant power dissipation During
Synthesis it has been observed that a simple WDDL gate comprised many conventional
gates Therefore the area of the design has grown nearly three-fold when compared to the
design implemented in conventional CMOS logic at the cost of security incorporated into
the IC against Differential Power Analysis (DPA) Due to the constant power dissipation at
the output hacker cannot apply Differential Power Analysis (DPA) scheme to find the
secret key that is being used in the crypto-processor Thus security against DPA is
incorporated into the IC at hardware level by implementing the design in WDDL style
which is quite simple and effectiveCHAPTER 10
REFERENCES 101 Referred Technical papers[1] Kris Tiri Member
IEEE and Ingrid Verbauwhede Senior Member IEEE ldquoA Digital Design Flow for
Secure Integrated Circuitsrdquo IEEE Transaction on Computer-Aided Design of Integrated
Circuits and Systems Vol 25 No 7 July 2006[2] Prof Jean-Jacques Quisquater Math
RiZK laquo Side Channel Attacks raquo October 2002[3] Ross Anderson Mike Bond Jolyon
Clulow and Sergei Skorobogatov ldquoCrypto processors ndash A Surveyrdquo IEEE Proceedings[4]
Noohul Basheer Zain Ali James M Noras ldquoOptimal Data path Design for a Cryptographic
Processor The Blowfish Algorithmrdquo Malaysian Journal of Computer Science Vol 14 No
1 June 2001 pp 16-27[5] Dan Rinehimer Derek Wilson ldquoSummary of B Schneierrsquos
Description of New Variable Length Key 64-Bit Block Cipher (Blowfish)rdquo[6] Kris Tiri and
Ingrid Verbauwhede ldquoA Dynamic and Differential CMOS Logic Style to Resist Power and
Timing Attacks on Security ICrsquosrdquo[7] Discretix Technologies Limited ldquoIntroduction to Side
45
Channel Attacksrdquo[8] Kris Tiri Moonmoon Akmal and Ingrid Verbauwhede ldquoA Dynamic
and Differential Logic with Signal Independent Power Consumption to withstand
Differential Power Analysis on Smart CardsrdquoReference books[1] William Stallings
ldquoCryptography and Network Security Principles and Practicesrdquo Pearson Education
2003[2] Samir Palnitkar ldquoVerilog HDL A Guide to Digital Design and Synthesisrdquo
Prentice Hall Referred Web Pages[1] httpwwwschneiercomblowfishhtml[2]
httpwwwnistgov[3] httpwwwdiscretixcomPDFIntroduction20to20Side20Channel
20Attackspdf[4] httpwwwwipointpctdbenwojsp
IA=WO2005081085ampDISPLAY=CLAIMS
46
Figure WDDL Pre-charge wave generation
6
CHAPTER 1 INTRODUCTION
AND OBJECTIVE
11 Introduction
Small-embedded integrated circuits (ICs) such as smart cards are vulnerable to the so-
called side-channel attacks (SCAs) The attacker can gain information by monitoring the power
consumption execution time electromagnetic radiation and other information leaked by the
switching behavior of digital complementary metalndashoxidendashsemiconductor (CMOS) gates This
project presents a digital very large scale integrated (VLSI) design flow to create secure power-
analysis-attack-resistant ICs
The idea is to create digital circuit styles that have a switching behavior independent of
the data or sequence of the data that they are processing A logic style called Wave Dynamic
Differential Logic (WDDL) is used for the implementation of the basic logic gates which are
used in the cryptographic processors The design flow starts from a normal design in a
hardware description language such as VHDL to the Side Channel Attack (SCA) resistant
layout
Depending on the parameter considered the side-channel attacks are classified as
probing attacks fault induction attack timing attack power analysis attack electromagnetic
analysis attack etc One Side Channel Attack in particular namely the Differential Power
Analysis (DPA) is of great concern It is very effective in finding the secret key and can be
mounted quickly with off-the-shelf devices The attack is based on the fact that logic
operations have power characteristics that depend on the input data It relies on statistical
analysis to extract the information from the power consumption that is correlated to the secret
key As the variations actually originate at the logic level implementing the encryption and
decryption modules in a logic style for which a logic gate has at all times constant power
7
consumption independently of signal transitions removes the foundation of DPA and is an
effective means to halt DPA
12 Objective of the Project
The main objectives of this dissertation are
Study of constant-power logic styles
Description of WDDL Gates
Implementation of WDDL Logic Gates
Verification of the functionality of WDDL Logic Gates
Synthesis of the design
Analysis of the reports obtained during simulation and synthesis
8
CHAPTER 2 REVIEW
OF LITERATURE
21 Introduction to Digital Design Flow
A typical digital design flow for any IC is as follows Design Entry (Specification
Architecture RTL Coding and RTL Verification) Synthesis and post-synthesis
verification Backend (Floor Planning Place and route Layout) Tape Out to Foundry to get
the end product All modern digital designs start with a designer writing a hardware description
of the IC (using HDL or Hardware Description Language) in VerilogVHDL A Verilog or
VHDL program essentially describes the hardware (logic gates Flip-Flops counters etc) the
inter connect of the circuit blocks and the functionality Various CAD tools are available to
synthesize a circuit based on the HDL
22 Secure Digital Design Flow
The secure digital design flow is depicted in Fig In addition to the
regular steps in an IC design (logic design logic synthesis place amp route
stream out and verifications) one can recognize two additional steps
namely 1) ldquocell substitutionrdquo and 2) ldquointerconnect decompositionrdquo These
operations have been inserted in the back end of the flow and do not
interfere with the creative part of a design indicated by the ldquologic designrdquo
task
9
Figure 21 Secure Digital Design Flow
During the cell substitution step cells that are designed by any constant power logic style
replace the conventional CMOS gates This ensures the security of the ICs against power
analysis attacks
10
CHAPTER 3 HARDWARE DESCRIPTIVE
LANGUAGE (VHDL)
Why (V) HDL
Interoperability
Technology independence
Design reuse
Several levels of abstraction
Readability
Standard language
Widely supported
What is VHDL
VHDL = VHSIC Hardware Description Language(VHSIC = Very High-Speed IC)
Design specification language
Design entry language
Design simulation language
Design documentation language
An alternative to schematics
Brief History
VHDL Was developed in the early 1980s for managing design problems that involved
large circuits and multiple teams of engineers
Funded by US Department of Defence
11
The first publicly available version was released in 1985
In 1986 IEEE (Institute of Electrical and Electronics Engineers Inc) was presented
with a proposal to standardize the VHDL
In 1987 standardization =gt IEEE 1076-1987
An improved version of the language was released in 1994 =gt IEEE standard1076-
1993
Related Standards
IEEE 1076 doesnrsquot support simulation conditions such as unknown and high-
impedance
Soon after IEEE 1076-1987 was released simulator companies began using their own
non-standard types =gt VHDL was becoming a nonstandard
IEEE 1164 standard was developed by an IEEE1048715IEEE 1164 contains definitions for a
nine-valued data type std_logic
IEEE 10763 (Numeric or Synthesis Standard) defines data types as they relate to actual
hardware
Defines eg two numeric types signed and unsigned
VHDL Environment
12
Design Units
Segments of VHDL code can be compiled separately and stored in a library
Entities
A black box with interface definition
Defines the inputsoutputs of a component (define pins)
A way to represent modularity in VHDL
Similar to symbol in schematic
Entity declaration describes entity
Eg
Entity Comparator is
Port (A B in std_logic_vector (7 downto0)
EQ out std_logic)
end Comparator
13
Ports
Provide channels of communication between the component and its environment
Each port must have a name direction and a type
An entity may have NO port declaration
Port directions
In A value of a port can be read inside the component but cannot be assigned
Multiple reads of port are allowed
Out Assignments can be made to a port but data from a port cannot be read Multiple
assignments are allowed
In out Bi-directional assignments can be made and data can be read Multiple
assignments are allowed
Buffer An out port with read capability May have at most one assignment (are not
recommended)
Architectures
Every entity has at least one architecture
One entity can have several architectures
Architectures can describe design using
BehaviorndashStructurendashDataflow
Architectures can describe design on many levelsndashGate levelndashRTL (Register Transfer
Level)ndashBehavioral level
Configuration declaration links architecture to entity
Eg
Architecture Comparator1 of Comparator is
Begin
EQ lt= rsquo1rsquowhen (A=B) else rsquo0rsquo
End Comparator1
Configurations
Links entity declaration and architecture body together
14
Concept of default configuration is a bit messy in VHDL lsquo87
ndashLast architecture analyzed links to entity
Can be used to change simulation behavior without re-analyzing the VHDL source
Complex configuration declarations are ignored in synthesis
Some entities can have eggate level architecture and behavioral architecture
Are always optional
Packages
Packages contain information common to many design units
1 Package declaration
Constant declarations
ndash Type and subtype declarations
ndash Function and procedure declarations
ndash Global signal declarations
ndash File declarations
ndash Component declarations
2 Package body
ndash Is not necessary needed
ndash Function bodies
ndash Procedure bodies
Packages are meant for encapsuling data which can be shared globally among several design
units These consist of declaration part and optional body part
Package declaration can contain
ndash Type and subtype declarations
ndash Subprograms
ndash Constants
ndash Alias declarations
ndash Global signal declarations
ndash file declarations
ndash Component declarations
Package body consists of
15
ndash Subprogram declarations and bodies
ndash Type and subtype declarations
ndash Deferred constants
ndash File declarations
Libraries
Collection of VHDL design units (database)
1 Packages
package declaration
package body
2 Entities (entity declaration)
3 Architectures (architecture body)
4 Configurations (configuration declarations)
Usually directory in UNIX file system
Can be also any other kind of database
Levels of Abstraction
VHDL supports many possible styles of design description which differ primarily in how
closely they relate to the HW
It is possible to describe a circuit in a number of ways
Structural-------
Dataflow ------- Higher level of abstraction
Behavioral -------
Structural VHDL description
Circuit is described in terms of its components
From a low-level description (eg transistor-level description) to a high level
description (eg block diagram)
For large circuits low-level descriptions quickly become impractical
Dataflow VHDL Description
Circuit is described in terms of how data moves through the system
16
In the dataflow style you describe how information flows between registers in the
system
The combinational logic is described at a relatively high level the placement and
operation of registers is specified quite precisely
The behavior of the system over the time is defined by registers
There are no build-in registers in VHDL-language
ndashEither lower level description
ndashor behavioral description of sequential elements is needed
The lower level register descriptions must be created or obtained
If there is no 3rd party models for registers =gt you must write the behavioral
description of registers
The behavioral description can be provided in the form of subprograms(functions or
procedures)
Behavioral VHDL Description
Circuit is described in terms of its operation over time
Representation might include eg state diagrams timing diagrams and algorithmic
descriptions
The concept of time may be expressed precisely using delays (eg A lt= B after 10 ns)
If no actual delays are used order of sequential operations is defined
17
In the lower levels of abstraction (eg RTL) synthesis tools ignore detailed timing
specifications
The actual timing results depend on implementation technology and efficiency of
synthesis tool
There are a few tools for behavioral synthesis
Concurrent Vs Sequential
Processes
Basic simulation concept in VHDL
VHDL description can always be broken up to interconnected processes
Quite similar to UNIX process
18
Process keyword in VHDL
Process statement is concurrent statement
Statements inside process statements are sequential statements
Process must contain either sensitivity list or wait statement(s) but NOT both
Sensitivity list or wait statement(s) contains signals which wakes process up
General Format
Process [(sensitivity list)]
process_declarative_part
begin
process_statements
[wait_statement]
End process
19
CHAPTER 4 SMART
CARD OVERVIEW
This section will very briefly introduce the concept of a smart card Basically a smart
card is a computer embedded in a safe It consists of a (typically 8-bit or 32-bit) processor
together with ROM EEPROM and a small amount of RAM which is therefore capable of
performing computations The main goal of a smart card is to allow the execution of
cryptographic operations involving some secret parameter (the key) while not revealing this
parameter to the outside world As opposed the goal of the attacker is to recover this secret
parameter This processor is embedded in a chip and connected to the outside world through
eight wires the role use position of which is normalized In addition to the inputoutput wires
the parts we will be the most interested in are the following
1 Power supply Smart cards do not have an internal battery
2 The current they need is provided by the smart card reader This will make the smart
cards power consumption pretty easy to measure for the attacker
3 Clock Similarly smart cards do not dispose of an internal clock either The clock ticks
must also be provided from the outside world As a consequence this will allow the
attacker to measure the cards running time with very good precision
Smart cards are usually equipped with protection mechanisms composed of a shield (the
passivation layer) whose goal is to hide the internal behavior of the chip and possibly sensors
that react when the shield is removed by destroying all sensitive data and preventing the card
to function properly
20
CHAPTER 5 SIDE
CHANNEL ATTACKS
ldquoSide channel attacksrdquo are attacks that are based on ldquoSide Channel Informationrdquo Side
channel information is information that can be retrieved from the encryption device that is
neither the plaintext to be encrypted nor the cipher text resulting from the encryption process
In the past an encryption device was perceived as a unit that receives plaintext input
and produces cipher text output and vice-versa Attacks were therefore based on either
knowing the cipher text (such as cipher text-only attacks) or knowing both (such as known
plaintext attacks) or on the ability to define what plaintext is to be encrypted and then seeing
the results of the encryption (known as chosen plaintext attacks) Today it is known that
encryption devices have additional output and often additional inputs which are not the
plaintext or the cipher text
Encryption devices produce timing information (information about the time that
operations take) that is easily measurable radiation of various sorts power consumption
statistics (that can be easily measured as well) and more Often the encryption device also has
additional ldquounintentionalrdquo inputs such as voltage that can be modified to cause predictable
outcomes Side channel attacks make use of some or all of this information along with other
(known) cryptanalytic techniques to recover the key the device is using
Side channel analysis techniques are of concern because the attacks can be mounted
quickly and can sometimes be implemented using readily available hardware costing from only
a few hundred dollars to thousands of dollars
51 Classification of side channel attacks
The literature usually classifies side channel attacks along two orthogonal axes
1 Invasive vs Non-invasive
21
Invasive attacks require de-packaging the chip to get direct access to its components
A typical example of this is the connection of a wire on a data bus to see the data transfers
A non-invasive attack only exploits externally available information (the emission of
which is however often unintentional) such as running time power consumption
A new distinction called semi-invasive attacks These attacks have the specificity that
they require de-packaging of the chip to get access to the chip surface but do not tamper with
the passivation layer ( they do not require electrical contact to the metal surface)
2 Active vs passive
Active attacks try to tamper with the cards proper functioning For example fault
induction attacks will try to induce errors in the computation
As opposed passive attacks will simply observe the cards behavior during its
processing without disturbing it
Note that these two axes are well orthogonal
An Invasive attack may completely avoid disturbing the cards behavior and a passive
attack may require a preliminary de-packaging for the required information to be observable
These attacks are of course not mutually exclusive an invasive attack may for example serve
as a preliminary step for a non-invasive one by giving a detailed description of the chips
architecture that helps to find out where to put external probes
As smart cards are usually equipped with protection mechanisms that are supposed to
react to invasive attacks (although several invasive attacks are nonetheless capable to defeat
these mechanisms as will be illustrated below) On the other hand it is worth pointing out that
a non-invasive attack is completely undetectable there is for example no way for a smart card
to figure out that its running time is currently being measured Other countermeasures will
therefore be necessary From an economical point of view invasive attacks are usually more
expensive to deploy on a large scale since they require individual processing of each attacked
device In this sense non-invasive attacks constitute therefore a bigger menace for the smart
card industry
Invasive attacks involved a relatively high capital investment for lab equipment plus a
moderate investment of effort for each individual chip attacked Non-invasive attacks require
only a moderate capital investment plus a moderate investment of effort in designing an attack
on a particular type of device Thereafter the cost per device attacked is low Semi-invasive
attacks can be carried out using very cheap and simple equipment
The attacker can gain information by
22
1 Probing attacks
2 Fault induction attacks
3 Timing attacks
4 Power analysis attacks and
5 Electromagnetic timing attacks
These attacks are performed during the switching behavior of digital
complementary metalndashoxidendashsemiconductor (CMOS) gates Of all these Power analysis attack
is of major concern
52 Power analysis attacks
The power consumption of a cryptographic device may provide much information
about the operations that take place and the involved parameters This is the idea of simple and
differential power analysis first introduced by Kocher et al As the clock ticks the cards
energy is also provided by the terminal and can therefore easily be measured Basically to
measure a circuits power consumption a small (eg 50 ohm) resistor is inserted in series with
the power or ground input The voltage difference across the resistor divided by the resistance
yields the current Well-equipped electronics labs have equipment that can digitally sample
voltage differences at extraordinarily high rates (over 1GHz) with excellent accuracy (less than
1 error) Devices capable of sampling at 20MHz or faster and transferring the data to a PC
can be bought for less than US$ 400
Power analysis attacks are of two types
1 Simple power analysis attack and
2 Differential Power Analysis attack
SPA attacks on smartcards typically take a few seconds per card while DPA attacks
can take several hours In a general with a somewhat academic perspective we may consider
the entire internal state of the block cipher to be all the intermediate results and values that are
never included in the output in normal operations For example DES has 16 rounds we can
consider the intermediate states state [115] after each round except the last as a secret internal
state Side channels typically give information about these internal states or about the
operations used in the transition of this internal state from one round to another The type of
side-channel will of course determine what information is available to the attacker about these
states The attacks typically work by finding some information about the internal state of the
cipher which can be learned both by guessing part of the key and checking the value directly
23
and additionally by some statistical property of the cipher that makes that checkable value
slightly nonrandom
521 Simple Power Analysis attack (SPA)
Simple Power Analysis is generally based on looking at the visual representation of the
power consumption of a unit while an encryption operation is being performed Simple Power
Analysis is a technique that involves direct interpretation of power consumption measurements
collected during cryptographic operations SPA can yield information about a devices
operation as well as key material
A trace refers to a set of power consumption measurements taken across a
cryptographic operation For example a 1 millisecond operation sampled at 5 MHz yields a
trace containing 5000 points Figure for example shows an SPA trace from a smart card
performing a DES operation
Figure SPA monitoring from a single DES operation performed by a typical smart card The
upper trace shows the entire encryption operation including the initial permutation the 16
DES rounds and the final permutation The lower trace is a detailed view of the second and
third rounds
Because SPA can reveal the sequence of instructions executed it can be used to break
cryptographic implementations in which the execution path depends on the data being
processed For example
DES key schedule the DES key schedule computation involves rotating 28-bit key registers
A conditional branch is commonly used to check the bit shifted off the end so that ldquo1 bits can
24
be wrapped around The resulting power consumption traces for a ldquo1 bit and a ldquo0 bit will
contain different SPA features if the execution paths take different branches for each
DES permutations DES implementations perform a variety of bit permutations Conditional
branching in software or microcode can cause significant power consumption differences for
ldquo0 and ldquo1 bits
Comparisons String or memory comparison operations typically perform a conditional
branch when a mismatch is found This conditional branching causes large SPA (and
sometimes timing) characteristics
Multipliers Modular multiplication circuits tend to leak a great deal of information about the
data they process The leakage functions depend on the multiplier design but are often strongly
correlated to operand values and Hamming weights
Exponentiators A simple modular exponentiation function scans across the exponent
performing a squaring operation in every iteration with an additional multiplication operation
for each exponent bit that is equal to ldquo1 The exponent can be compromised if squaring and
multiplication operations have different power consumption characteristics take different
amounts of time or are separated by different code Modular exponentiation functions that
operate on two or more exponent bits at a time may have more complex leakage functions
522Differential Power Analysis attack (DPA)
In addition to large-scale power variations due to the instruction sequence there are
effects correlated to data values being manipulated These variations tend to be smaller and are
sometimes overshadowed by measurement errors and other noise In such cases it is still often
possible to break the system using statistical functions tailored to the target algorithm
To implement the DPA attack an attacker first observes m encryption operations and captures
power traces T1 m [1 k] containing k samples each In addition the attacker records the
cipher text C1 m No knowledge of the plain text is required DPA analysis uses power
consumption measurements to determine whether a key block guess Ks is correct The attacker
computes a k-sample differential trace centD [1 k] by finding the difference between the
average of the traces for which a certain intermediate value V is one and the average of the
traces for which V is zero Thus cent D[j) is the average over C1m of the effect due to the value
represented by the selection function D on the power consumption at point j In particular25
If Ks is incorrect the bit computed using D will differ from the actual target bit for about half
of the ciphertext Ci The selection function is thus effectively uncorrelated to what was actually
computed by the target device If a random function is used to divide a set into two subsets the
difference in the averages of the subsets should approach zero as the subset sizes approach
infinity
Thus because trace components uncorrelated to D will diminish with 1 pm causing the
differential trace to become at (the actual trace may not be completely at as D with Ks
incorrect may have a weak correlation to D with the correct Ks) If Ks is correct however the
computed value for D (Ci bKs) will equal the actual value of target bit b with probability 1
The selection function is thus correlated to the value of the bit considered Other data values
measurement errors etc that are not correlated to D approach zero Because power
consumption is correlated to data bit values the plot of centD will be degat with spikes in regions
where D is correlated to the values being processed The correct value of Ks can thus be
identified from the spikes in its differential trace Four values of b correspond to each S box
providing confirmation of key block guesses Finding all eight Ks yields the entire 48-bit round
sub key The remaining 8 key bits can be found easily using exhaustive search or by analyzing
one additional round Triple DES keys can be found by analyzing an outer DES operation first
using the resulting key to decrypt the cipher text and attacking the next DES key DPA can use
known plaintext or known cipher text and can find encryption or decryption keys
26
CHAPTER 6 CONSTANT POWER CONSUMING
LOGIC STYLES
The power consumption of traditional standard cells and logic is
dependent on the signal activity When the output of the logic gate makes
a 0 to 1 transition a current comes from the power supply and charges the
output capacitance On the other hand when the output sees a 1 to 0 a 0
to 0 or a 1 to 1 transition no or only a limited amount of energy (due to
short circuit or leakage) is consumed from the power supply This is the
fundamental reason why information is leaked through the power supply
and why power attacks are possible The basis of a secure digital design
flow is a logic style with constant power consumption
61 Current Mode Logic
Current mode logic (CML) eg current steering logic seems the
ideal solution This type of logic continuously draws a current from the
supply and measures its state through the path that the current takes A
gate has constant power consumption if it draws a perfectly constant
current from the power supply independently of the input and output
signals To build a current source capable of generating a constant current
special circuit techniques that minimize channel length modulation have to
be used
The decisive drawback of CML however is its static power
consumption When the logic gate is not processing any data it burns the
27
current which makes this logic style unacceptable for embedded battery-
operated devices
62 Voltage Mode Logic (CMOS circuit styles)
Voltage mode logic (VML) eg static CMOS logic only draws a current from the
supply to change state and measures its state by the amount of charge it stores on a
capacitance A regular standard CMOS circuit will only consume power when a capacitance
gets charged and later discharged ie when a gate switches state It is the main reason that
CMOS is the style of choice for every battery operated or low power device This is illustrated
in the figure below for simple inverter Thus static CMOS is the preferred logic style because
of its low power consumption and high noise margins
Standard CMOS inverter
Yet two conditions must be satisfied for VML to have constant power consumption
namely
1) A logic gate must have exactly one switching event per signal transition
2) The logic gate must charge a constant capacitance in that switching event
28
Here above all the four transitions of CMOS inverter can be distinguished when
monitoring the power supply
63 Dynamic Differential Logic
Dynamic differential logic sometimes also referred to as dual rail with pre-charge
logic fulfills the first condition A differential logic family uses the true and the false
representation of the input and output signals and a dynamic logic family alternates pre-charge
and evaluation phases As a result since both outputs (true and false) are pre-charged to 1
exactly one of the two output nodes evaluates to 0 to have a differential output signal in the
evaluation phase The discharged output node is charged to 1 in the following pre-charge phase
to pre-charge both outputs to 1 In other words every signal transition including the events in
which the input signals remain constant is represented with an actual switching event in
which the logic gate charges a capacitance All the logic families that have been introduced to
thwart the differential power analysis (DPA) by using dynamic differential logic in the
following techniques
1 Sense Amplifier Based Logic (SABL) and
2 Wave Dynamic Differential Logic (WDDL) gates
631 Sense Amplifier Based logic (SABL)
SABL has its main advantage that it has balanced input and output nodes and that all
internal nodes connect to an output The output capacitances can be balanced Systematic
methods have been developed to make sure that both branches of the differential pull down
network are balanced and that no memory effects are present in the network Sense Amplifier
Based logic is illustrated as
29
Sense Amplifier Based Logic
ANDNAND gate
This circuit style does require however a full custom characterization and layout It also
suffers from a high clock load common to all dynamic logic gates
632 Wave Dynamic Differential Logic Gates (WDDL)
WDDL logic can be implemented with static CMOS logic Static CMOS
standard cells are combined to form secure compound standard cells
which have a reduced power signature WDDL has many advantages It can
be readily implemented from an existing standard cell library The design
flow is fully supported with accurate EDA library files that come directly
from the vendor WDDL also results in a dynamic differential logic with only
a small load capacitance on the pre-charge control signal and with the low
power consumption and the high noise margins of static CMOS
Advantages of WDDL logic style are as follows
30
A major advantage of the proposed logic style is that it can be incorporated by the common
Electronic Design Automation (EDA) tool flow
No special design rules are involved in the interconnection of WDDL gates
The switching factor of WDDL is 100 A WDDL gate consists of a parallel
combination of two positive complementary gates one calculating the
true output using the true inputs the other the false output using the
false inputs A positive gate produces a zero output for an all zero input
The AND gate and the OR gate are examples of positive gates A
complementary gate sometimes also referred to as a dual gate
expresses the false output of the original logic gate using the false
inputs of the original gate The AND gate fed with true input signals and
the OR gate fed with false input signals are two dual gates Fig shows
the WDDL AND gate and the WDDL OR gate In the evaluation phase
each input signal is differential and the WDDL gate calculates its
differential output In the pre-charge phase the inputs to the WDDL gate
are set at 0 This puts the output of the gate at 0 A module in WDDL
pre-charges without distributing the pre-charge signal to each individual
gate During the pre-charge phase the input vector of the combinatorial
logic is set at all 0s Each individual gate will eventually have all its
inputs at 0 evaluate its output to 0 and pass this 0 value to the next
gate One could say that the pre-charge signal travels over the
combinatorial logic as a 0-wave hence WDDL There are several ways
to launch to pre-charge wave In Fig a pre-charge operator is inserted
at the start of every combinatorial logic tree ie the inputs of the
encryption module and the outputs of the registers They produce an all-
zero output in the pre-charge phase (clk-signal high) but let the
31
differential signal through during the evaluation phase (clk-signal low)
Fig
ure WDDL Pre-charge wave generationCHAPTER 7
WDDL GATESThe methodology used in the project is bottom-up approach Lower
modules are designed and later integrated to form larger modules whose further integration
leads to the final top module As it is a fact that logic gates form lower level modules
initially logic gates required for the design are implemented in WDDL style WDDL
demands a parallel combination of two positive complementary gates one calculating the
true value and the other negative value The logic gates like OR AND XOR have been
implemented Besides there is even implementation of Full Adder 32-bit XOR
etc71WDDL OR gateA WDDL OR gate is constructed by considering conventional
OR gate in parallel to its complementary gate ie AND gate as shown in the following
32
figure Figure
41 WDDL OR GateThe NAND and NOT gates used act as Precharge operator injecting
signal lsquo0rsquo into the gates when lsquoclkrsquo signal is high ie during the Precharge phase72
WDDL AND gateA WDDL AND gate is constructed by considering conventional
AND gate in parallel to its complementary gate ie OR gate as shown in the following
33
figure Figure
42 WDDL AND Gate73 WDDL NAND gateA WDDL NAND gate is constructed by
considering conventional AND gate in parallel to its complementary gate ie OR gate as
shown in the following figure
34
Figure
43 WDDL NAND Gate74 WDDL NOR gateA WDDL NOR gate is constructed by
considering conventional OR gate in parallel to its complementary gate ie AND gate as
shown in the following figure
35
Figure 44 WDDL
NOR Gate 75 WDDL XOR gate XOR function can be implemented by the
Boolean Equation A B = ABrsquo + ArsquoB Therefore XOR Gate is implemented
in terms of AND gate and OR gate as shown in the figure 44 It can also be implemented
by instantiating a WDDL AND gate and WDDL OR gate But the number of gates
involved in the latter one is greater than the former one Therefore the first method of
implementation is followed rather than the second one
36
Figure 45
WDDL XOR gateWith the help of the above basic gates Full adder circuit has been
designed by instantiating the above designed WDDL gates During the implementation of
the Blowfish algorithm a 32-bit XOR gate and 32-bit Adder circuit are required They can
be easily implemented by instantiating the corresponding lower module 32 number of
timesCHAPTER 8 FRONT END
RESULTSWDDL OR GATESynthesis
Report==========================================================
= Final Report
===========================================================Final
ResultsRTL Top Level Output File Name wddlorngrTop Level Output File Name
wddlorOutput Format NGCOptimization Goal SpeedKeep
Hierarchy NODesign Statistics IOs 5Cell Usage
BELS 2 LUT3 2 IO Buffers 5
37
IBUF 3 OBUF
2============================================================Devi
ce utilization summary---------------------------Selected Device 3s250eft256-4 Number
of Slices 1 out of 2448 0 Number of 4 input LUTs 2
out of 4896 0 Number of IOs 5 Number of bonded IOBs
5 out of 172 2 Timing Summary---------------Speed Grade -4Maximum
combinational path delay 6236nsSimulation Result
S
ynthesis Result
38
WDD
L AND GATESynthesis
Report==========================================================
== Final Report
============================================================Fina
l ResultsRTL Top Level Output File Name wddlgatesngrTop Level Output File
Name wddlgatesOutput Format NGCOptimization Goal
SpeedKeep Hierarchy NODesign Statistics IOs
5Cell Usage BELS 2 LUT3 2 IO Buffers
5 IBUF 3 OBUF
2===========================================================Devic
e utilization summary---------------------------Selected Device 3s250etq144-4 Number
of Slices 1 out of 2448 0 Number of 4 input LUTs 2
out of 4896 0 Number of IOs 5 Number of bonded IOBs
5 out of 108 4 Timing Summary---------------Speed Grade -4Maximum
combinational path delay 6236nsSimulation Result
39
Sy
nthesis Result
WDDL NAND GATESynthesis
Report==========================================================
== Final Report
============================================================Fina
l ResultsRTL Top Level Output File Name wddlnand1ngrTop Level Output File
Name wddlnand1Output Format NGCOptimization Goal
SpeedKeep Hierarchy NODesign Statistics IOs
5Cell Usage BELS 2 LUT3 2 IO Buffers
40
5 IBUF 3 OBUF
2============================================================Devi
ce utilization summarySelected Device 3s500efg320-4 Number of Slices
1 out of 4656 0 Number of 4 input LUTs 2 out of 9312 0
Number of IOs 5 Number of bonded IOBs 5 out of 232
2 Timing SummarySpeed Grade -4Maximum combinational path delay
6236nsSimulation Result
Synthesis Result
WD
41
DL XOR GATESimulation Result
Synthesis Result
WDDL XOR GATESynthesis
Report==========================================================
== Final Report
===========================================================Final
42
ResultsRTL Top Level Output File Name wddlxorgatengrTop Level Output File
Name wddlxorgateOutput Format NGCOptimization Goal
SpeedKeep Hierarchy NODesign Statistics IOs
5Cell Usage BELS 2 LUT3 2 IO Buffers
5 IBUF 3 OBUF
2============================================================Devi
ce utilization summary---------------------------Selected Device 3s250eft256-4 Number
of Slices 1 out of 2448 0 Number of 4 input LUTs 2
out of 4896 0 Number of IOs 5 Number of bonded IOBs
5 out of 172 2 Timing Summary---------------Speed Grade -4Maximum
combinational path delay 6236nsSimulation Result
Synthesis Result
43
CHAPTER 9 SUMMARY AND CONCLUSION 91
SummaryIn order to provide security to ICs against side-channel attacks especially
Differential Power Analysis (DPA) it is necessary to implement the design in a logic that
can render constant power dissipation irrespective of the input combination WDDL is
proved to be advantageous to others and therefore is of great significance In this
dissertation work architecture for Blowfish Algorithm is designed and implemented in
WDDL style In this implementation bottom-up approach is used The low level entities
are designed and later they are all combined to form the entire module The key
scheduling is online The sub-keys generated for a particular key can be used for the
encryption of the entire data to be encrypted with that key The sub keys are given in
reverse direction for the decryption data path Initially logic gates are implemented in
WDDL and then higher modules have been designed by instantiating the WDDL gates to
form the entire module thus resulting in constant power dissipation irrespective of any
input data combination The entire design works in two phases namely Precharge phase and
Evaluation phase In the Precharge phase all the signals of the design are zeroed and
during the Evaluation phase the functionality of the design is achieved This sort of design
has been found simple and very effective in thwarting the side-channel attack namely
Differential Power analysis (DPA)92 ConclusionThe crypto processor has been
44
designed for the key size of 448 bits and plain text of 64 bits The code for the
implementation has been written in VHDL The functional verification has been done using
the Cadence (NC Launch) Simulation package and Synthesis using RTL Compiler The
Backend of the design is done using the SOC EncounterAccording to the specifications
desired functionality has been achieved In the output during the Evaluation phase there
has been same number of transitions thus resulting in constant power dissipation During
Synthesis it has been observed that a simple WDDL gate comprised many conventional
gates Therefore the area of the design has grown nearly three-fold when compared to the
design implemented in conventional CMOS logic at the cost of security incorporated into
the IC against Differential Power Analysis (DPA) Due to the constant power dissipation at
the output hacker cannot apply Differential Power Analysis (DPA) scheme to find the
secret key that is being used in the crypto-processor Thus security against DPA is
incorporated into the IC at hardware level by implementing the design in WDDL style
which is quite simple and effectiveCHAPTER 10
REFERENCES 101 Referred Technical papers[1] Kris Tiri Member
IEEE and Ingrid Verbauwhede Senior Member IEEE ldquoA Digital Design Flow for
Secure Integrated Circuitsrdquo IEEE Transaction on Computer-Aided Design of Integrated
Circuits and Systems Vol 25 No 7 July 2006[2] Prof Jean-Jacques Quisquater Math
RiZK laquo Side Channel Attacks raquo October 2002[3] Ross Anderson Mike Bond Jolyon
Clulow and Sergei Skorobogatov ldquoCrypto processors ndash A Surveyrdquo IEEE Proceedings[4]
Noohul Basheer Zain Ali James M Noras ldquoOptimal Data path Design for a Cryptographic
Processor The Blowfish Algorithmrdquo Malaysian Journal of Computer Science Vol 14 No
1 June 2001 pp 16-27[5] Dan Rinehimer Derek Wilson ldquoSummary of B Schneierrsquos
Description of New Variable Length Key 64-Bit Block Cipher (Blowfish)rdquo[6] Kris Tiri and
Ingrid Verbauwhede ldquoA Dynamic and Differential CMOS Logic Style to Resist Power and
Timing Attacks on Security ICrsquosrdquo[7] Discretix Technologies Limited ldquoIntroduction to Side
45
Channel Attacksrdquo[8] Kris Tiri Moonmoon Akmal and Ingrid Verbauwhede ldquoA Dynamic
and Differential Logic with Signal Independent Power Consumption to withstand
Differential Power Analysis on Smart CardsrdquoReference books[1] William Stallings
ldquoCryptography and Network Security Principles and Practicesrdquo Pearson Education
2003[2] Samir Palnitkar ldquoVerilog HDL A Guide to Digital Design and Synthesisrdquo
Prentice Hall Referred Web Pages[1] httpwwwschneiercomblowfishhtml[2]
httpwwwnistgov[3] httpwwwdiscretixcomPDFIntroduction20to20Side20Channel
20Attackspdf[4] httpwwwwipointpctdbenwojsp
IA=WO2005081085ampDISPLAY=CLAIMS
46
CHAPTER 1 INTRODUCTION
AND OBJECTIVE
11 Introduction
Small-embedded integrated circuits (ICs) such as smart cards are vulnerable to the so-
called side-channel attacks (SCAs) The attacker can gain information by monitoring the power
consumption execution time electromagnetic radiation and other information leaked by the
switching behavior of digital complementary metalndashoxidendashsemiconductor (CMOS) gates This
project presents a digital very large scale integrated (VLSI) design flow to create secure power-
analysis-attack-resistant ICs
The idea is to create digital circuit styles that have a switching behavior independent of
the data or sequence of the data that they are processing A logic style called Wave Dynamic
Differential Logic (WDDL) is used for the implementation of the basic logic gates which are
used in the cryptographic processors The design flow starts from a normal design in a
hardware description language such as VHDL to the Side Channel Attack (SCA) resistant
layout
Depending on the parameter considered the side-channel attacks are classified as
probing attacks fault induction attack timing attack power analysis attack electromagnetic
analysis attack etc One Side Channel Attack in particular namely the Differential Power
Analysis (DPA) is of great concern It is very effective in finding the secret key and can be
mounted quickly with off-the-shelf devices The attack is based on the fact that logic
operations have power characteristics that depend on the input data It relies on statistical
analysis to extract the information from the power consumption that is correlated to the secret
key As the variations actually originate at the logic level implementing the encryption and
decryption modules in a logic style for which a logic gate has at all times constant power
7
consumption independently of signal transitions removes the foundation of DPA and is an
effective means to halt DPA
12 Objective of the Project
The main objectives of this dissertation are
Study of constant-power logic styles
Description of WDDL Gates
Implementation of WDDL Logic Gates
Verification of the functionality of WDDL Logic Gates
Synthesis of the design
Analysis of the reports obtained during simulation and synthesis
8
CHAPTER 2 REVIEW
OF LITERATURE
21 Introduction to Digital Design Flow
A typical digital design flow for any IC is as follows Design Entry (Specification
Architecture RTL Coding and RTL Verification) Synthesis and post-synthesis
verification Backend (Floor Planning Place and route Layout) Tape Out to Foundry to get
the end product All modern digital designs start with a designer writing a hardware description
of the IC (using HDL or Hardware Description Language) in VerilogVHDL A Verilog or
VHDL program essentially describes the hardware (logic gates Flip-Flops counters etc) the
inter connect of the circuit blocks and the functionality Various CAD tools are available to
synthesize a circuit based on the HDL
22 Secure Digital Design Flow
The secure digital design flow is depicted in Fig In addition to the
regular steps in an IC design (logic design logic synthesis place amp route
stream out and verifications) one can recognize two additional steps
namely 1) ldquocell substitutionrdquo and 2) ldquointerconnect decompositionrdquo These
operations have been inserted in the back end of the flow and do not
interfere with the creative part of a design indicated by the ldquologic designrdquo
task
9
Figure 21 Secure Digital Design Flow
During the cell substitution step cells that are designed by any constant power logic style
replace the conventional CMOS gates This ensures the security of the ICs against power
analysis attacks
10
CHAPTER 3 HARDWARE DESCRIPTIVE
LANGUAGE (VHDL)
Why (V) HDL
Interoperability
Technology independence
Design reuse
Several levels of abstraction
Readability
Standard language
Widely supported
What is VHDL
VHDL = VHSIC Hardware Description Language(VHSIC = Very High-Speed IC)
Design specification language
Design entry language
Design simulation language
Design documentation language
An alternative to schematics
Brief History
VHDL Was developed in the early 1980s for managing design problems that involved
large circuits and multiple teams of engineers
Funded by US Department of Defence
11
The first publicly available version was released in 1985
In 1986 IEEE (Institute of Electrical and Electronics Engineers Inc) was presented
with a proposal to standardize the VHDL
In 1987 standardization =gt IEEE 1076-1987
An improved version of the language was released in 1994 =gt IEEE standard1076-
1993
Related Standards
IEEE 1076 doesnrsquot support simulation conditions such as unknown and high-
impedance
Soon after IEEE 1076-1987 was released simulator companies began using their own
non-standard types =gt VHDL was becoming a nonstandard
IEEE 1164 standard was developed by an IEEE1048715IEEE 1164 contains definitions for a
nine-valued data type std_logic
IEEE 10763 (Numeric or Synthesis Standard) defines data types as they relate to actual
hardware
Defines eg two numeric types signed and unsigned
VHDL Environment
12
Design Units
Segments of VHDL code can be compiled separately and stored in a library
Entities
A black box with interface definition
Defines the inputsoutputs of a component (define pins)
A way to represent modularity in VHDL
Similar to symbol in schematic
Entity declaration describes entity
Eg
Entity Comparator is
Port (A B in std_logic_vector (7 downto0)
EQ out std_logic)
end Comparator
13
Ports
Provide channels of communication between the component and its environment
Each port must have a name direction and a type
An entity may have NO port declaration
Port directions
In A value of a port can be read inside the component but cannot be assigned
Multiple reads of port are allowed
Out Assignments can be made to a port but data from a port cannot be read Multiple
assignments are allowed
In out Bi-directional assignments can be made and data can be read Multiple
assignments are allowed
Buffer An out port with read capability May have at most one assignment (are not
recommended)
Architectures
Every entity has at least one architecture
One entity can have several architectures
Architectures can describe design using
BehaviorndashStructurendashDataflow
Architectures can describe design on many levelsndashGate levelndashRTL (Register Transfer
Level)ndashBehavioral level
Configuration declaration links architecture to entity
Eg
Architecture Comparator1 of Comparator is
Begin
EQ lt= rsquo1rsquowhen (A=B) else rsquo0rsquo
End Comparator1
Configurations
Links entity declaration and architecture body together
14
Concept of default configuration is a bit messy in VHDL lsquo87
ndashLast architecture analyzed links to entity
Can be used to change simulation behavior without re-analyzing the VHDL source
Complex configuration declarations are ignored in synthesis
Some entities can have eggate level architecture and behavioral architecture
Are always optional
Packages
Packages contain information common to many design units
1 Package declaration
Constant declarations
ndash Type and subtype declarations
ndash Function and procedure declarations
ndash Global signal declarations
ndash File declarations
ndash Component declarations
2 Package body
ndash Is not necessary needed
ndash Function bodies
ndash Procedure bodies
Packages are meant for encapsuling data which can be shared globally among several design
units These consist of declaration part and optional body part
Package declaration can contain
ndash Type and subtype declarations
ndash Subprograms
ndash Constants
ndash Alias declarations
ndash Global signal declarations
ndash file declarations
ndash Component declarations
Package body consists of
15
ndash Subprogram declarations and bodies
ndash Type and subtype declarations
ndash Deferred constants
ndash File declarations
Libraries
Collection of VHDL design units (database)
1 Packages
package declaration
package body
2 Entities (entity declaration)
3 Architectures (architecture body)
4 Configurations (configuration declarations)
Usually directory in UNIX file system
Can be also any other kind of database
Levels of Abstraction
VHDL supports many possible styles of design description which differ primarily in how
closely they relate to the HW
It is possible to describe a circuit in a number of ways
Structural-------
Dataflow ------- Higher level of abstraction
Behavioral -------
Structural VHDL description
Circuit is described in terms of its components
From a low-level description (eg transistor-level description) to a high level
description (eg block diagram)
For large circuits low-level descriptions quickly become impractical
Dataflow VHDL Description
Circuit is described in terms of how data moves through the system
16
In the dataflow style you describe how information flows between registers in the
system
The combinational logic is described at a relatively high level the placement and
operation of registers is specified quite precisely
The behavior of the system over the time is defined by registers
There are no build-in registers in VHDL-language
ndashEither lower level description
ndashor behavioral description of sequential elements is needed
The lower level register descriptions must be created or obtained
If there is no 3rd party models for registers =gt you must write the behavioral
description of registers
The behavioral description can be provided in the form of subprograms(functions or
procedures)
Behavioral VHDL Description
Circuit is described in terms of its operation over time
Representation might include eg state diagrams timing diagrams and algorithmic
descriptions
The concept of time may be expressed precisely using delays (eg A lt= B after 10 ns)
If no actual delays are used order of sequential operations is defined
17
In the lower levels of abstraction (eg RTL) synthesis tools ignore detailed timing
specifications
The actual timing results depend on implementation technology and efficiency of
synthesis tool
There are a few tools for behavioral synthesis
Concurrent Vs Sequential
Processes
Basic simulation concept in VHDL
VHDL description can always be broken up to interconnected processes
Quite similar to UNIX process
18
Process keyword in VHDL
Process statement is concurrent statement
Statements inside process statements are sequential statements
Process must contain either sensitivity list or wait statement(s) but NOT both
Sensitivity list or wait statement(s) contains signals which wakes process up
General Format
Process [(sensitivity list)]
process_declarative_part
begin
process_statements
[wait_statement]
End process
19
CHAPTER 4 SMART
CARD OVERVIEW
This section will very briefly introduce the concept of a smart card Basically a smart
card is a computer embedded in a safe It consists of a (typically 8-bit or 32-bit) processor
together with ROM EEPROM and a small amount of RAM which is therefore capable of
performing computations The main goal of a smart card is to allow the execution of
cryptographic operations involving some secret parameter (the key) while not revealing this
parameter to the outside world As opposed the goal of the attacker is to recover this secret
parameter This processor is embedded in a chip and connected to the outside world through
eight wires the role use position of which is normalized In addition to the inputoutput wires
the parts we will be the most interested in are the following
1 Power supply Smart cards do not have an internal battery
2 The current they need is provided by the smart card reader This will make the smart
cards power consumption pretty easy to measure for the attacker
3 Clock Similarly smart cards do not dispose of an internal clock either The clock ticks
must also be provided from the outside world As a consequence this will allow the
attacker to measure the cards running time with very good precision
Smart cards are usually equipped with protection mechanisms composed of a shield (the
passivation layer) whose goal is to hide the internal behavior of the chip and possibly sensors
that react when the shield is removed by destroying all sensitive data and preventing the card
to function properly
20
CHAPTER 5 SIDE
CHANNEL ATTACKS
ldquoSide channel attacksrdquo are attacks that are based on ldquoSide Channel Informationrdquo Side
channel information is information that can be retrieved from the encryption device that is
neither the plaintext to be encrypted nor the cipher text resulting from the encryption process
In the past an encryption device was perceived as a unit that receives plaintext input
and produces cipher text output and vice-versa Attacks were therefore based on either
knowing the cipher text (such as cipher text-only attacks) or knowing both (such as known
plaintext attacks) or on the ability to define what plaintext is to be encrypted and then seeing
the results of the encryption (known as chosen plaintext attacks) Today it is known that
encryption devices have additional output and often additional inputs which are not the
plaintext or the cipher text
Encryption devices produce timing information (information about the time that
operations take) that is easily measurable radiation of various sorts power consumption
statistics (that can be easily measured as well) and more Often the encryption device also has
additional ldquounintentionalrdquo inputs such as voltage that can be modified to cause predictable
outcomes Side channel attacks make use of some or all of this information along with other
(known) cryptanalytic techniques to recover the key the device is using
Side channel analysis techniques are of concern because the attacks can be mounted
quickly and can sometimes be implemented using readily available hardware costing from only
a few hundred dollars to thousands of dollars
51 Classification of side channel attacks
The literature usually classifies side channel attacks along two orthogonal axes
1 Invasive vs Non-invasive
21
Invasive attacks require de-packaging the chip to get direct access to its components
A typical example of this is the connection of a wire on a data bus to see the data transfers
A non-invasive attack only exploits externally available information (the emission of
which is however often unintentional) such as running time power consumption
A new distinction called semi-invasive attacks These attacks have the specificity that
they require de-packaging of the chip to get access to the chip surface but do not tamper with
the passivation layer ( they do not require electrical contact to the metal surface)
2 Active vs passive
Active attacks try to tamper with the cards proper functioning For example fault
induction attacks will try to induce errors in the computation
As opposed passive attacks will simply observe the cards behavior during its
processing without disturbing it
Note that these two axes are well orthogonal
An Invasive attack may completely avoid disturbing the cards behavior and a passive
attack may require a preliminary de-packaging for the required information to be observable
These attacks are of course not mutually exclusive an invasive attack may for example serve
as a preliminary step for a non-invasive one by giving a detailed description of the chips
architecture that helps to find out where to put external probes
As smart cards are usually equipped with protection mechanisms that are supposed to
react to invasive attacks (although several invasive attacks are nonetheless capable to defeat
these mechanisms as will be illustrated below) On the other hand it is worth pointing out that
a non-invasive attack is completely undetectable there is for example no way for a smart card
to figure out that its running time is currently being measured Other countermeasures will
therefore be necessary From an economical point of view invasive attacks are usually more
expensive to deploy on a large scale since they require individual processing of each attacked
device In this sense non-invasive attacks constitute therefore a bigger menace for the smart
card industry
Invasive attacks involved a relatively high capital investment for lab equipment plus a
moderate investment of effort for each individual chip attacked Non-invasive attacks require
only a moderate capital investment plus a moderate investment of effort in designing an attack
on a particular type of device Thereafter the cost per device attacked is low Semi-invasive
attacks can be carried out using very cheap and simple equipment
The attacker can gain information by
22
1 Probing attacks
2 Fault induction attacks
3 Timing attacks
4 Power analysis attacks and
5 Electromagnetic timing attacks
These attacks are performed during the switching behavior of digital
complementary metalndashoxidendashsemiconductor (CMOS) gates Of all these Power analysis attack
is of major concern
52 Power analysis attacks
The power consumption of a cryptographic device may provide much information
about the operations that take place and the involved parameters This is the idea of simple and
differential power analysis first introduced by Kocher et al As the clock ticks the cards
energy is also provided by the terminal and can therefore easily be measured Basically to
measure a circuits power consumption a small (eg 50 ohm) resistor is inserted in series with
the power or ground input The voltage difference across the resistor divided by the resistance
yields the current Well-equipped electronics labs have equipment that can digitally sample
voltage differences at extraordinarily high rates (over 1GHz) with excellent accuracy (less than
1 error) Devices capable of sampling at 20MHz or faster and transferring the data to a PC
can be bought for less than US$ 400
Power analysis attacks are of two types
1 Simple power analysis attack and
2 Differential Power Analysis attack
SPA attacks on smartcards typically take a few seconds per card while DPA attacks
can take several hours In a general with a somewhat academic perspective we may consider
the entire internal state of the block cipher to be all the intermediate results and values that are
never included in the output in normal operations For example DES has 16 rounds we can
consider the intermediate states state [115] after each round except the last as a secret internal
state Side channels typically give information about these internal states or about the
operations used in the transition of this internal state from one round to another The type of
side-channel will of course determine what information is available to the attacker about these
states The attacks typically work by finding some information about the internal state of the
cipher which can be learned both by guessing part of the key and checking the value directly
23
and additionally by some statistical property of the cipher that makes that checkable value
slightly nonrandom
521 Simple Power Analysis attack (SPA)
Simple Power Analysis is generally based on looking at the visual representation of the
power consumption of a unit while an encryption operation is being performed Simple Power
Analysis is a technique that involves direct interpretation of power consumption measurements
collected during cryptographic operations SPA can yield information about a devices
operation as well as key material
A trace refers to a set of power consumption measurements taken across a
cryptographic operation For example a 1 millisecond operation sampled at 5 MHz yields a
trace containing 5000 points Figure for example shows an SPA trace from a smart card
performing a DES operation
Figure SPA monitoring from a single DES operation performed by a typical smart card The
upper trace shows the entire encryption operation including the initial permutation the 16
DES rounds and the final permutation The lower trace is a detailed view of the second and
third rounds
Because SPA can reveal the sequence of instructions executed it can be used to break
cryptographic implementations in which the execution path depends on the data being
processed For example
DES key schedule the DES key schedule computation involves rotating 28-bit key registers
A conditional branch is commonly used to check the bit shifted off the end so that ldquo1 bits can
24
be wrapped around The resulting power consumption traces for a ldquo1 bit and a ldquo0 bit will
contain different SPA features if the execution paths take different branches for each
DES permutations DES implementations perform a variety of bit permutations Conditional
branching in software or microcode can cause significant power consumption differences for
ldquo0 and ldquo1 bits
Comparisons String or memory comparison operations typically perform a conditional
branch when a mismatch is found This conditional branching causes large SPA (and
sometimes timing) characteristics
Multipliers Modular multiplication circuits tend to leak a great deal of information about the
data they process The leakage functions depend on the multiplier design but are often strongly
correlated to operand values and Hamming weights
Exponentiators A simple modular exponentiation function scans across the exponent
performing a squaring operation in every iteration with an additional multiplication operation
for each exponent bit that is equal to ldquo1 The exponent can be compromised if squaring and
multiplication operations have different power consumption characteristics take different
amounts of time or are separated by different code Modular exponentiation functions that
operate on two or more exponent bits at a time may have more complex leakage functions
522Differential Power Analysis attack (DPA)
In addition to large-scale power variations due to the instruction sequence there are
effects correlated to data values being manipulated These variations tend to be smaller and are
sometimes overshadowed by measurement errors and other noise In such cases it is still often
possible to break the system using statistical functions tailored to the target algorithm
To implement the DPA attack an attacker first observes m encryption operations and captures
power traces T1 m [1 k] containing k samples each In addition the attacker records the
cipher text C1 m No knowledge of the plain text is required DPA analysis uses power
consumption measurements to determine whether a key block guess Ks is correct The attacker
computes a k-sample differential trace centD [1 k] by finding the difference between the
average of the traces for which a certain intermediate value V is one and the average of the
traces for which V is zero Thus cent D[j) is the average over C1m of the effect due to the value
represented by the selection function D on the power consumption at point j In particular25
If Ks is incorrect the bit computed using D will differ from the actual target bit for about half
of the ciphertext Ci The selection function is thus effectively uncorrelated to what was actually
computed by the target device If a random function is used to divide a set into two subsets the
difference in the averages of the subsets should approach zero as the subset sizes approach
infinity
Thus because trace components uncorrelated to D will diminish with 1 pm causing the
differential trace to become at (the actual trace may not be completely at as D with Ks
incorrect may have a weak correlation to D with the correct Ks) If Ks is correct however the
computed value for D (Ci bKs) will equal the actual value of target bit b with probability 1
The selection function is thus correlated to the value of the bit considered Other data values
measurement errors etc that are not correlated to D approach zero Because power
consumption is correlated to data bit values the plot of centD will be degat with spikes in regions
where D is correlated to the values being processed The correct value of Ks can thus be
identified from the spikes in its differential trace Four values of b correspond to each S box
providing confirmation of key block guesses Finding all eight Ks yields the entire 48-bit round
sub key The remaining 8 key bits can be found easily using exhaustive search or by analyzing
one additional round Triple DES keys can be found by analyzing an outer DES operation first
using the resulting key to decrypt the cipher text and attacking the next DES key DPA can use
known plaintext or known cipher text and can find encryption or decryption keys
26
CHAPTER 6 CONSTANT POWER CONSUMING
LOGIC STYLES
The power consumption of traditional standard cells and logic is
dependent on the signal activity When the output of the logic gate makes
a 0 to 1 transition a current comes from the power supply and charges the
output capacitance On the other hand when the output sees a 1 to 0 a 0
to 0 or a 1 to 1 transition no or only a limited amount of energy (due to
short circuit or leakage) is consumed from the power supply This is the
fundamental reason why information is leaked through the power supply
and why power attacks are possible The basis of a secure digital design
flow is a logic style with constant power consumption
61 Current Mode Logic
Current mode logic (CML) eg current steering logic seems the
ideal solution This type of logic continuously draws a current from the
supply and measures its state through the path that the current takes A
gate has constant power consumption if it draws a perfectly constant
current from the power supply independently of the input and output
signals To build a current source capable of generating a constant current
special circuit techniques that minimize channel length modulation have to
be used
The decisive drawback of CML however is its static power
consumption When the logic gate is not processing any data it burns the
27
current which makes this logic style unacceptable for embedded battery-
operated devices
62 Voltage Mode Logic (CMOS circuit styles)
Voltage mode logic (VML) eg static CMOS logic only draws a current from the
supply to change state and measures its state by the amount of charge it stores on a
capacitance A regular standard CMOS circuit will only consume power when a capacitance
gets charged and later discharged ie when a gate switches state It is the main reason that
CMOS is the style of choice for every battery operated or low power device This is illustrated
in the figure below for simple inverter Thus static CMOS is the preferred logic style because
of its low power consumption and high noise margins
Standard CMOS inverter
Yet two conditions must be satisfied for VML to have constant power consumption
namely
1) A logic gate must have exactly one switching event per signal transition
2) The logic gate must charge a constant capacitance in that switching event
28
Here above all the four transitions of CMOS inverter can be distinguished when
monitoring the power supply
63 Dynamic Differential Logic
Dynamic differential logic sometimes also referred to as dual rail with pre-charge
logic fulfills the first condition A differential logic family uses the true and the false
representation of the input and output signals and a dynamic logic family alternates pre-charge
and evaluation phases As a result since both outputs (true and false) are pre-charged to 1
exactly one of the two output nodes evaluates to 0 to have a differential output signal in the
evaluation phase The discharged output node is charged to 1 in the following pre-charge phase
to pre-charge both outputs to 1 In other words every signal transition including the events in
which the input signals remain constant is represented with an actual switching event in
which the logic gate charges a capacitance All the logic families that have been introduced to
thwart the differential power analysis (DPA) by using dynamic differential logic in the
following techniques
1 Sense Amplifier Based Logic (SABL) and
2 Wave Dynamic Differential Logic (WDDL) gates
631 Sense Amplifier Based logic (SABL)
SABL has its main advantage that it has balanced input and output nodes and that all
internal nodes connect to an output The output capacitances can be balanced Systematic
methods have been developed to make sure that both branches of the differential pull down
network are balanced and that no memory effects are present in the network Sense Amplifier
Based logic is illustrated as
29
Sense Amplifier Based Logic
ANDNAND gate
This circuit style does require however a full custom characterization and layout It also
suffers from a high clock load common to all dynamic logic gates
632 Wave Dynamic Differential Logic Gates (WDDL)
WDDL logic can be implemented with static CMOS logic Static CMOS
standard cells are combined to form secure compound standard cells
which have a reduced power signature WDDL has many advantages It can
be readily implemented from an existing standard cell library The design
flow is fully supported with accurate EDA library files that come directly
from the vendor WDDL also results in a dynamic differential logic with only
a small load capacitance on the pre-charge control signal and with the low
power consumption and the high noise margins of static CMOS
Advantages of WDDL logic style are as follows
30
A major advantage of the proposed logic style is that it can be incorporated by the common
Electronic Design Automation (EDA) tool flow
No special design rules are involved in the interconnection of WDDL gates
The switching factor of WDDL is 100 A WDDL gate consists of a parallel
combination of two positive complementary gates one calculating the
true output using the true inputs the other the false output using the
false inputs A positive gate produces a zero output for an all zero input
The AND gate and the OR gate are examples of positive gates A
complementary gate sometimes also referred to as a dual gate
expresses the false output of the original logic gate using the false
inputs of the original gate The AND gate fed with true input signals and
the OR gate fed with false input signals are two dual gates Fig shows
the WDDL AND gate and the WDDL OR gate In the evaluation phase
each input signal is differential and the WDDL gate calculates its
differential output In the pre-charge phase the inputs to the WDDL gate
are set at 0 This puts the output of the gate at 0 A module in WDDL
pre-charges without distributing the pre-charge signal to each individual
gate During the pre-charge phase the input vector of the combinatorial
logic is set at all 0s Each individual gate will eventually have all its
inputs at 0 evaluate its output to 0 and pass this 0 value to the next
gate One could say that the pre-charge signal travels over the
combinatorial logic as a 0-wave hence WDDL There are several ways
to launch to pre-charge wave In Fig a pre-charge operator is inserted
at the start of every combinatorial logic tree ie the inputs of the
encryption module and the outputs of the registers They produce an all-
zero output in the pre-charge phase (clk-signal high) but let the
31
differential signal through during the evaluation phase (clk-signal low)
Fig
ure WDDL Pre-charge wave generationCHAPTER 7
WDDL GATESThe methodology used in the project is bottom-up approach Lower
modules are designed and later integrated to form larger modules whose further integration
leads to the final top module As it is a fact that logic gates form lower level modules
initially logic gates required for the design are implemented in WDDL style WDDL
demands a parallel combination of two positive complementary gates one calculating the
true value and the other negative value The logic gates like OR AND XOR have been
implemented Besides there is even implementation of Full Adder 32-bit XOR
etc71WDDL OR gateA WDDL OR gate is constructed by considering conventional
OR gate in parallel to its complementary gate ie AND gate as shown in the following
32
figure Figure
41 WDDL OR GateThe NAND and NOT gates used act as Precharge operator injecting
signal lsquo0rsquo into the gates when lsquoclkrsquo signal is high ie during the Precharge phase72
WDDL AND gateA WDDL AND gate is constructed by considering conventional
AND gate in parallel to its complementary gate ie OR gate as shown in the following
33
figure Figure
42 WDDL AND Gate73 WDDL NAND gateA WDDL NAND gate is constructed by
considering conventional AND gate in parallel to its complementary gate ie OR gate as
shown in the following figure
34
Figure
43 WDDL NAND Gate74 WDDL NOR gateA WDDL NOR gate is constructed by
considering conventional OR gate in parallel to its complementary gate ie AND gate as
shown in the following figure
35
Figure 44 WDDL
NOR Gate 75 WDDL XOR gate XOR function can be implemented by the
Boolean Equation A B = ABrsquo + ArsquoB Therefore XOR Gate is implemented
in terms of AND gate and OR gate as shown in the figure 44 It can also be implemented
by instantiating a WDDL AND gate and WDDL OR gate But the number of gates
involved in the latter one is greater than the former one Therefore the first method of
implementation is followed rather than the second one
36
Figure 45
WDDL XOR gateWith the help of the above basic gates Full adder circuit has been
designed by instantiating the above designed WDDL gates During the implementation of
the Blowfish algorithm a 32-bit XOR gate and 32-bit Adder circuit are required They can
be easily implemented by instantiating the corresponding lower module 32 number of
timesCHAPTER 8 FRONT END
RESULTSWDDL OR GATESynthesis
Report==========================================================
= Final Report
===========================================================Final
ResultsRTL Top Level Output File Name wddlorngrTop Level Output File Name
wddlorOutput Format NGCOptimization Goal SpeedKeep
Hierarchy NODesign Statistics IOs 5Cell Usage
BELS 2 LUT3 2 IO Buffers 5
37
IBUF 3 OBUF
2============================================================Devi
ce utilization summary---------------------------Selected Device 3s250eft256-4 Number
of Slices 1 out of 2448 0 Number of 4 input LUTs 2
out of 4896 0 Number of IOs 5 Number of bonded IOBs
5 out of 172 2 Timing Summary---------------Speed Grade -4Maximum
combinational path delay 6236nsSimulation Result
S
ynthesis Result
38
WDD
L AND GATESynthesis
Report==========================================================
== Final Report
============================================================Fina
l ResultsRTL Top Level Output File Name wddlgatesngrTop Level Output File
Name wddlgatesOutput Format NGCOptimization Goal
SpeedKeep Hierarchy NODesign Statistics IOs
5Cell Usage BELS 2 LUT3 2 IO Buffers
5 IBUF 3 OBUF
2===========================================================Devic
e utilization summary---------------------------Selected Device 3s250etq144-4 Number
of Slices 1 out of 2448 0 Number of 4 input LUTs 2
out of 4896 0 Number of IOs 5 Number of bonded IOBs
5 out of 108 4 Timing Summary---------------Speed Grade -4Maximum
combinational path delay 6236nsSimulation Result
39
Sy
nthesis Result
WDDL NAND GATESynthesis
Report==========================================================
== Final Report
============================================================Fina
l ResultsRTL Top Level Output File Name wddlnand1ngrTop Level Output File
Name wddlnand1Output Format NGCOptimization Goal
SpeedKeep Hierarchy NODesign Statistics IOs
5Cell Usage BELS 2 LUT3 2 IO Buffers
40
5 IBUF 3 OBUF
2============================================================Devi
ce utilization summarySelected Device 3s500efg320-4 Number of Slices
1 out of 4656 0 Number of 4 input LUTs 2 out of 9312 0
Number of IOs 5 Number of bonded IOBs 5 out of 232
2 Timing SummarySpeed Grade -4Maximum combinational path delay
6236nsSimulation Result
Synthesis Result
WD
41
DL XOR GATESimulation Result
Synthesis Result
WDDL XOR GATESynthesis
Report==========================================================
== Final Report
===========================================================Final
42
ResultsRTL Top Level Output File Name wddlxorgatengrTop Level Output File
Name wddlxorgateOutput Format NGCOptimization Goal
SpeedKeep Hierarchy NODesign Statistics IOs
5Cell Usage BELS 2 LUT3 2 IO Buffers
5 IBUF 3 OBUF
2============================================================Devi
ce utilization summary---------------------------Selected Device 3s250eft256-4 Number
of Slices 1 out of 2448 0 Number of 4 input LUTs 2
out of 4896 0 Number of IOs 5 Number of bonded IOBs
5 out of 172 2 Timing Summary---------------Speed Grade -4Maximum
combinational path delay 6236nsSimulation Result
Synthesis Result
43
CHAPTER 9 SUMMARY AND CONCLUSION 91
SummaryIn order to provide security to ICs against side-channel attacks especially
Differential Power Analysis (DPA) it is necessary to implement the design in a logic that
can render constant power dissipation irrespective of the input combination WDDL is
proved to be advantageous to others and therefore is of great significance In this
dissertation work architecture for Blowfish Algorithm is designed and implemented in
WDDL style In this implementation bottom-up approach is used The low level entities
are designed and later they are all combined to form the entire module The key
scheduling is online The sub-keys generated for a particular key can be used for the
encryption of the entire data to be encrypted with that key The sub keys are given in
reverse direction for the decryption data path Initially logic gates are implemented in
WDDL and then higher modules have been designed by instantiating the WDDL gates to
form the entire module thus resulting in constant power dissipation irrespective of any
input data combination The entire design works in two phases namely Precharge phase and
Evaluation phase In the Precharge phase all the signals of the design are zeroed and
during the Evaluation phase the functionality of the design is achieved This sort of design
has been found simple and very effective in thwarting the side-channel attack namely
Differential Power analysis (DPA)92 ConclusionThe crypto processor has been
44
designed for the key size of 448 bits and plain text of 64 bits The code for the
implementation has been written in VHDL The functional verification has been done using
the Cadence (NC Launch) Simulation package and Synthesis using RTL Compiler The
Backend of the design is done using the SOC EncounterAccording to the specifications
desired functionality has been achieved In the output during the Evaluation phase there
has been same number of transitions thus resulting in constant power dissipation During
Synthesis it has been observed that a simple WDDL gate comprised many conventional
gates Therefore the area of the design has grown nearly three-fold when compared to the
design implemented in conventional CMOS logic at the cost of security incorporated into
the IC against Differential Power Analysis (DPA) Due to the constant power dissipation at
the output hacker cannot apply Differential Power Analysis (DPA) scheme to find the
secret key that is being used in the crypto-processor Thus security against DPA is
incorporated into the IC at hardware level by implementing the design in WDDL style
which is quite simple and effectiveCHAPTER 10
REFERENCES 101 Referred Technical papers[1] Kris Tiri Member
IEEE and Ingrid Verbauwhede Senior Member IEEE ldquoA Digital Design Flow for
Secure Integrated Circuitsrdquo IEEE Transaction on Computer-Aided Design of Integrated
Circuits and Systems Vol 25 No 7 July 2006[2] Prof Jean-Jacques Quisquater Math
RiZK laquo Side Channel Attacks raquo October 2002[3] Ross Anderson Mike Bond Jolyon
Clulow and Sergei Skorobogatov ldquoCrypto processors ndash A Surveyrdquo IEEE Proceedings[4]
Noohul Basheer Zain Ali James M Noras ldquoOptimal Data path Design for a Cryptographic
Processor The Blowfish Algorithmrdquo Malaysian Journal of Computer Science Vol 14 No
1 June 2001 pp 16-27[5] Dan Rinehimer Derek Wilson ldquoSummary of B Schneierrsquos
Description of New Variable Length Key 64-Bit Block Cipher (Blowfish)rdquo[6] Kris Tiri and
Ingrid Verbauwhede ldquoA Dynamic and Differential CMOS Logic Style to Resist Power and
Timing Attacks on Security ICrsquosrdquo[7] Discretix Technologies Limited ldquoIntroduction to Side
45
Channel Attacksrdquo[8] Kris Tiri Moonmoon Akmal and Ingrid Verbauwhede ldquoA Dynamic
and Differential Logic with Signal Independent Power Consumption to withstand
Differential Power Analysis on Smart CardsrdquoReference books[1] William Stallings
ldquoCryptography and Network Security Principles and Practicesrdquo Pearson Education
2003[2] Samir Palnitkar ldquoVerilog HDL A Guide to Digital Design and Synthesisrdquo
Prentice Hall Referred Web Pages[1] httpwwwschneiercomblowfishhtml[2]
httpwwwnistgov[3] httpwwwdiscretixcomPDFIntroduction20to20Side20Channel
20Attackspdf[4] httpwwwwipointpctdbenwojsp
IA=WO2005081085ampDISPLAY=CLAIMS
46
consumption independently of signal transitions removes the foundation of DPA and is an
effective means to halt DPA
12 Objective of the Project
The main objectives of this dissertation are
Study of constant-power logic styles
Description of WDDL Gates
Implementation of WDDL Logic Gates
Verification of the functionality of WDDL Logic Gates
Synthesis of the design
Analysis of the reports obtained during simulation and synthesis
8
CHAPTER 2 REVIEW
OF LITERATURE
21 Introduction to Digital Design Flow
A typical digital design flow for any IC is as follows Design Entry (Specification
Architecture RTL Coding and RTL Verification) Synthesis and post-synthesis
verification Backend (Floor Planning Place and route Layout) Tape Out to Foundry to get
the end product All modern digital designs start with a designer writing a hardware description
of the IC (using HDL or Hardware Description Language) in VerilogVHDL A Verilog or
VHDL program essentially describes the hardware (logic gates Flip-Flops counters etc) the
inter connect of the circuit blocks and the functionality Various CAD tools are available to
synthesize a circuit based on the HDL
22 Secure Digital Design Flow
The secure digital design flow is depicted in Fig In addition to the
regular steps in an IC design (logic design logic synthesis place amp route
stream out and verifications) one can recognize two additional steps
namely 1) ldquocell substitutionrdquo and 2) ldquointerconnect decompositionrdquo These
operations have been inserted in the back end of the flow and do not
interfere with the creative part of a design indicated by the ldquologic designrdquo
task
9
Figure 21 Secure Digital Design Flow
During the cell substitution step cells that are designed by any constant power logic style
replace the conventional CMOS gates This ensures the security of the ICs against power
analysis attacks
10
CHAPTER 3 HARDWARE DESCRIPTIVE
LANGUAGE (VHDL)
Why (V) HDL
Interoperability
Technology independence
Design reuse
Several levels of abstraction
Readability
Standard language
Widely supported
What is VHDL
VHDL = VHSIC Hardware Description Language(VHSIC = Very High-Speed IC)
Design specification language
Design entry language
Design simulation language
Design documentation language
An alternative to schematics
Brief History
VHDL Was developed in the early 1980s for managing design problems that involved
large circuits and multiple teams of engineers
Funded by US Department of Defence
11
The first publicly available version was released in 1985
In 1986 IEEE (Institute of Electrical and Electronics Engineers Inc) was presented
with a proposal to standardize the VHDL
In 1987 standardization =gt IEEE 1076-1987
An improved version of the language was released in 1994 =gt IEEE standard1076-
1993
Related Standards
IEEE 1076 doesnrsquot support simulation conditions such as unknown and high-
impedance
Soon after IEEE 1076-1987 was released simulator companies began using their own
non-standard types =gt VHDL was becoming a nonstandard
IEEE 1164 standard was developed by an IEEE1048715IEEE 1164 contains definitions for a
nine-valued data type std_logic
IEEE 10763 (Numeric or Synthesis Standard) defines data types as they relate to actual
hardware
Defines eg two numeric types signed and unsigned
VHDL Environment
12
Design Units
Segments of VHDL code can be compiled separately and stored in a library
Entities
A black box with interface definition
Defines the inputsoutputs of a component (define pins)
A way to represent modularity in VHDL
Similar to symbol in schematic
Entity declaration describes entity
Eg
Entity Comparator is
Port (A B in std_logic_vector (7 downto0)
EQ out std_logic)
end Comparator
13
Ports
Provide channels of communication between the component and its environment
Each port must have a name direction and a type
An entity may have NO port declaration
Port directions
In A value of a port can be read inside the component but cannot be assigned
Multiple reads of port are allowed
Out Assignments can be made to a port but data from a port cannot be read Multiple
assignments are allowed
In out Bi-directional assignments can be made and data can be read Multiple
assignments are allowed
Buffer An out port with read capability May have at most one assignment (are not
recommended)
Architectures
Every entity has at least one architecture
One entity can have several architectures
Architectures can describe design using
BehaviorndashStructurendashDataflow
Architectures can describe design on many levelsndashGate levelndashRTL (Register Transfer
Level)ndashBehavioral level
Configuration declaration links architecture to entity
Eg
Architecture Comparator1 of Comparator is
Begin
EQ lt= rsquo1rsquowhen (A=B) else rsquo0rsquo
End Comparator1
Configurations
Links entity declaration and architecture body together
14
Concept of default configuration is a bit messy in VHDL lsquo87
ndashLast architecture analyzed links to entity
Can be used to change simulation behavior without re-analyzing the VHDL source
Complex configuration declarations are ignored in synthesis
Some entities can have eggate level architecture and behavioral architecture
Are always optional
Packages
Packages contain information common to many design units
1 Package declaration
Constant declarations
ndash Type and subtype declarations
ndash Function and procedure declarations
ndash Global signal declarations
ndash File declarations
ndash Component declarations
2 Package body
ndash Is not necessary needed
ndash Function bodies
ndash Procedure bodies
Packages are meant for encapsuling data which can be shared globally among several design
units These consist of declaration part and optional body part
Package declaration can contain
ndash Type and subtype declarations
ndash Subprograms
ndash Constants
ndash Alias declarations
ndash Global signal declarations
ndash file declarations
ndash Component declarations
Package body consists of
15
ndash Subprogram declarations and bodies
ndash Type and subtype declarations
ndash Deferred constants
ndash File declarations
Libraries
Collection of VHDL design units (database)
1 Packages
package declaration
package body
2 Entities (entity declaration)
3 Architectures (architecture body)
4 Configurations (configuration declarations)
Usually directory in UNIX file system
Can be also any other kind of database
Levels of Abstraction
VHDL supports many possible styles of design description which differ primarily in how
closely they relate to the HW
It is possible to describe a circuit in a number of ways
Structural-------
Dataflow ------- Higher level of abstraction
Behavioral -------
Structural VHDL description
Circuit is described in terms of its components
From a low-level description (eg transistor-level description) to a high level
description (eg block diagram)
For large circuits low-level descriptions quickly become impractical
Dataflow VHDL Description
Circuit is described in terms of how data moves through the system
16
In the dataflow style you describe how information flows between registers in the
system
The combinational logic is described at a relatively high level the placement and
operation of registers is specified quite precisely
The behavior of the system over the time is defined by registers
There are no build-in registers in VHDL-language
ndashEither lower level description
ndashor behavioral description of sequential elements is needed
The lower level register descriptions must be created or obtained
If there is no 3rd party models for registers =gt you must write the behavioral
description of registers
The behavioral description can be provided in the form of subprograms(functions or
procedures)
Behavioral VHDL Description
Circuit is described in terms of its operation over time
Representation might include eg state diagrams timing diagrams and algorithmic
descriptions
The concept of time may be expressed precisely using delays (eg A lt= B after 10 ns)
If no actual delays are used order of sequential operations is defined
17
In the lower levels of abstraction (eg RTL) synthesis tools ignore detailed timing
specifications
The actual timing results depend on implementation technology and efficiency of
synthesis tool
There are a few tools for behavioral synthesis
Concurrent Vs Sequential
Processes
Basic simulation concept in VHDL
VHDL description can always be broken up to interconnected processes
Quite similar to UNIX process
18
Process keyword in VHDL
Process statement is concurrent statement
Statements inside process statements are sequential statements
Process must contain either sensitivity list or wait statement(s) but NOT both
Sensitivity list or wait statement(s) contains signals which wakes process up
General Format
Process [(sensitivity list)]
process_declarative_part
begin
process_statements
[wait_statement]
End process
19
CHAPTER 4 SMART
CARD OVERVIEW
This section will very briefly introduce the concept of a smart card Basically a smart
card is a computer embedded in a safe It consists of a (typically 8-bit or 32-bit) processor
together with ROM EEPROM and a small amount of RAM which is therefore capable of
performing computations The main goal of a smart card is to allow the execution of
cryptographic operations involving some secret parameter (the key) while not revealing this
parameter to the outside world As opposed the goal of the attacker is to recover this secret
parameter This processor is embedded in a chip and connected to the outside world through
eight wires the role use position of which is normalized In addition to the inputoutput wires
the parts we will be the most interested in are the following
1 Power supply Smart cards do not have an internal battery
2 The current they need is provided by the smart card reader This will make the smart
cards power consumption pretty easy to measure for the attacker
3 Clock Similarly smart cards do not dispose of an internal clock either The clock ticks
must also be provided from the outside world As a consequence this will allow the
attacker to measure the cards running time with very good precision
Smart cards are usually equipped with protection mechanisms composed of a shield (the
passivation layer) whose goal is to hide the internal behavior of the chip and possibly sensors
that react when the shield is removed by destroying all sensitive data and preventing the card
to function properly
20
CHAPTER 5 SIDE
CHANNEL ATTACKS
ldquoSide channel attacksrdquo are attacks that are based on ldquoSide Channel Informationrdquo Side
channel information is information that can be retrieved from the encryption device that is
neither the plaintext to be encrypted nor the cipher text resulting from the encryption process
In the past an encryption device was perceived as a unit that receives plaintext input
and produces cipher text output and vice-versa Attacks were therefore based on either
knowing the cipher text (such as cipher text-only attacks) or knowing both (such as known
plaintext attacks) or on the ability to define what plaintext is to be encrypted and then seeing
the results of the encryption (known as chosen plaintext attacks) Today it is known that
encryption devices have additional output and often additional inputs which are not the
plaintext or the cipher text
Encryption devices produce timing information (information about the time that
operations take) that is easily measurable radiation of various sorts power consumption
statistics (that can be easily measured as well) and more Often the encryption device also has
additional ldquounintentionalrdquo inputs such as voltage that can be modified to cause predictable
outcomes Side channel attacks make use of some or all of this information along with other
(known) cryptanalytic techniques to recover the key the device is using
Side channel analysis techniques are of concern because the attacks can be mounted
quickly and can sometimes be implemented using readily available hardware costing from only
a few hundred dollars to thousands of dollars
51 Classification of side channel attacks
The literature usually classifies side channel attacks along two orthogonal axes
1 Invasive vs Non-invasive
21
Invasive attacks require de-packaging the chip to get direct access to its components
A typical example of this is the connection of a wire on a data bus to see the data transfers
A non-invasive attack only exploits externally available information (the emission of
which is however often unintentional) such as running time power consumption
A new distinction called semi-invasive attacks These attacks have the specificity that
they require de-packaging of the chip to get access to the chip surface but do not tamper with
the passivation layer ( they do not require electrical contact to the metal surface)
2 Active vs passive
Active attacks try to tamper with the cards proper functioning For example fault
induction attacks will try to induce errors in the computation
As opposed passive attacks will simply observe the cards behavior during its
processing without disturbing it
Note that these two axes are well orthogonal
An Invasive attack may completely avoid disturbing the cards behavior and a passive
attack may require a preliminary de-packaging for the required information to be observable
These attacks are of course not mutually exclusive an invasive attack may for example serve
as a preliminary step for a non-invasive one by giving a detailed description of the chips
architecture that helps to find out where to put external probes
As smart cards are usually equipped with protection mechanisms that are supposed to
react to invasive attacks (although several invasive attacks are nonetheless capable to defeat
these mechanisms as will be illustrated below) On the other hand it is worth pointing out that
a non-invasive attack is completely undetectable there is for example no way for a smart card
to figure out that its running time is currently being measured Other countermeasures will
therefore be necessary From an economical point of view invasive attacks are usually more
expensive to deploy on a large scale since they require individual processing of each attacked
device In this sense non-invasive attacks constitute therefore a bigger menace for the smart
card industry
Invasive attacks involved a relatively high capital investment for lab equipment plus a
moderate investment of effort for each individual chip attacked Non-invasive attacks require
only a moderate capital investment plus a moderate investment of effort in designing an attack
on a particular type of device Thereafter the cost per device attacked is low Semi-invasive
attacks can be carried out using very cheap and simple equipment
The attacker can gain information by
22
1 Probing attacks
2 Fault induction attacks
3 Timing attacks
4 Power analysis attacks and
5 Electromagnetic timing attacks
These attacks are performed during the switching behavior of digital
complementary metalndashoxidendashsemiconductor (CMOS) gates Of all these Power analysis attack
is of major concern
52 Power analysis attacks
The power consumption of a cryptographic device may provide much information
about the operations that take place and the involved parameters This is the idea of simple and
differential power analysis first introduced by Kocher et al As the clock ticks the cards
energy is also provided by the terminal and can therefore easily be measured Basically to
measure a circuits power consumption a small (eg 50 ohm) resistor is inserted in series with
the power or ground input The voltage difference across the resistor divided by the resistance
yields the current Well-equipped electronics labs have equipment that can digitally sample
voltage differences at extraordinarily high rates (over 1GHz) with excellent accuracy (less than
1 error) Devices capable of sampling at 20MHz or faster and transferring the data to a PC
can be bought for less than US$ 400
Power analysis attacks are of two types
1 Simple power analysis attack and
2 Differential Power Analysis attack
SPA attacks on smartcards typically take a few seconds per card while DPA attacks
can take several hours In a general with a somewhat academic perspective we may consider
the entire internal state of the block cipher to be all the intermediate results and values that are
never included in the output in normal operations For example DES has 16 rounds we can
consider the intermediate states state [115] after each round except the last as a secret internal
state Side channels typically give information about these internal states or about the
operations used in the transition of this internal state from one round to another The type of
side-channel will of course determine what information is available to the attacker about these
states The attacks typically work by finding some information about the internal state of the
cipher which can be learned both by guessing part of the key and checking the value directly
23
and additionally by some statistical property of the cipher that makes that checkable value
slightly nonrandom
521 Simple Power Analysis attack (SPA)
Simple Power Analysis is generally based on looking at the visual representation of the
power consumption of a unit while an encryption operation is being performed Simple Power
Analysis is a technique that involves direct interpretation of power consumption measurements
collected during cryptographic operations SPA can yield information about a devices
operation as well as key material
A trace refers to a set of power consumption measurements taken across a
cryptographic operation For example a 1 millisecond operation sampled at 5 MHz yields a
trace containing 5000 points Figure for example shows an SPA trace from a smart card
performing a DES operation
Figure SPA monitoring from a single DES operation performed by a typical smart card The
upper trace shows the entire encryption operation including the initial permutation the 16
DES rounds and the final permutation The lower trace is a detailed view of the second and
third rounds
Because SPA can reveal the sequence of instructions executed it can be used to break
cryptographic implementations in which the execution path depends on the data being
processed For example
DES key schedule the DES key schedule computation involves rotating 28-bit key registers
A conditional branch is commonly used to check the bit shifted off the end so that ldquo1 bits can
24
be wrapped around The resulting power consumption traces for a ldquo1 bit and a ldquo0 bit will
contain different SPA features if the execution paths take different branches for each
DES permutations DES implementations perform a variety of bit permutations Conditional
branching in software or microcode can cause significant power consumption differences for
ldquo0 and ldquo1 bits
Comparisons String or memory comparison operations typically perform a conditional
branch when a mismatch is found This conditional branching causes large SPA (and
sometimes timing) characteristics
Multipliers Modular multiplication circuits tend to leak a great deal of information about the
data they process The leakage functions depend on the multiplier design but are often strongly
correlated to operand values and Hamming weights
Exponentiators A simple modular exponentiation function scans across the exponent
performing a squaring operation in every iteration with an additional multiplication operation
for each exponent bit that is equal to ldquo1 The exponent can be compromised if squaring and
multiplication operations have different power consumption characteristics take different
amounts of time or are separated by different code Modular exponentiation functions that
operate on two or more exponent bits at a time may have more complex leakage functions
522Differential Power Analysis attack (DPA)
In addition to large-scale power variations due to the instruction sequence there are
effects correlated to data values being manipulated These variations tend to be smaller and are
sometimes overshadowed by measurement errors and other noise In such cases it is still often
possible to break the system using statistical functions tailored to the target algorithm
To implement the DPA attack an attacker first observes m encryption operations and captures
power traces T1 m [1 k] containing k samples each In addition the attacker records the
cipher text C1 m No knowledge of the plain text is required DPA analysis uses power
consumption measurements to determine whether a key block guess Ks is correct The attacker
computes a k-sample differential trace centD [1 k] by finding the difference between the
average of the traces for which a certain intermediate value V is one and the average of the
traces for which V is zero Thus cent D[j) is the average over C1m of the effect due to the value
represented by the selection function D on the power consumption at point j In particular25
If Ks is incorrect the bit computed using D will differ from the actual target bit for about half
of the ciphertext Ci The selection function is thus effectively uncorrelated to what was actually
computed by the target device If a random function is used to divide a set into two subsets the
difference in the averages of the subsets should approach zero as the subset sizes approach
infinity
Thus because trace components uncorrelated to D will diminish with 1 pm causing the
differential trace to become at (the actual trace may not be completely at as D with Ks
incorrect may have a weak correlation to D with the correct Ks) If Ks is correct however the
computed value for D (Ci bKs) will equal the actual value of target bit b with probability 1
The selection function is thus correlated to the value of the bit considered Other data values
measurement errors etc that are not correlated to D approach zero Because power
consumption is correlated to data bit values the plot of centD will be degat with spikes in regions
where D is correlated to the values being processed The correct value of Ks can thus be
identified from the spikes in its differential trace Four values of b correspond to each S box
providing confirmation of key block guesses Finding all eight Ks yields the entire 48-bit round
sub key The remaining 8 key bits can be found easily using exhaustive search or by analyzing
one additional round Triple DES keys can be found by analyzing an outer DES operation first
using the resulting key to decrypt the cipher text and attacking the next DES key DPA can use
known plaintext or known cipher text and can find encryption or decryption keys
26
CHAPTER 6 CONSTANT POWER CONSUMING
LOGIC STYLES
The power consumption of traditional standard cells and logic is
dependent on the signal activity When the output of the logic gate makes
a 0 to 1 transition a current comes from the power supply and charges the
output capacitance On the other hand when the output sees a 1 to 0 a 0
to 0 or a 1 to 1 transition no or only a limited amount of energy (due to
short circuit or leakage) is consumed from the power supply This is the
fundamental reason why information is leaked through the power supply
and why power attacks are possible The basis of a secure digital design
flow is a logic style with constant power consumption
61 Current Mode Logic
Current mode logic (CML) eg current steering logic seems the
ideal solution This type of logic continuously draws a current from the
supply and measures its state through the path that the current takes A
gate has constant power consumption if it draws a perfectly constant
current from the power supply independently of the input and output
signals To build a current source capable of generating a constant current
special circuit techniques that minimize channel length modulation have to
be used
The decisive drawback of CML however is its static power
consumption When the logic gate is not processing any data it burns the
27
current which makes this logic style unacceptable for embedded battery-
operated devices
62 Voltage Mode Logic (CMOS circuit styles)
Voltage mode logic (VML) eg static CMOS logic only draws a current from the
supply to change state and measures its state by the amount of charge it stores on a
capacitance A regular standard CMOS circuit will only consume power when a capacitance
gets charged and later discharged ie when a gate switches state It is the main reason that
CMOS is the style of choice for every battery operated or low power device This is illustrated
in the figure below for simple inverter Thus static CMOS is the preferred logic style because
of its low power consumption and high noise margins
Standard CMOS inverter
Yet two conditions must be satisfied for VML to have constant power consumption
namely
1) A logic gate must have exactly one switching event per signal transition
2) The logic gate must charge a constant capacitance in that switching event
28
Here above all the four transitions of CMOS inverter can be distinguished when
monitoring the power supply
63 Dynamic Differential Logic
Dynamic differential logic sometimes also referred to as dual rail with pre-charge
logic fulfills the first condition A differential logic family uses the true and the false
representation of the input and output signals and a dynamic logic family alternates pre-charge
and evaluation phases As a result since both outputs (true and false) are pre-charged to 1
exactly one of the two output nodes evaluates to 0 to have a differential output signal in the
evaluation phase The discharged output node is charged to 1 in the following pre-charge phase
to pre-charge both outputs to 1 In other words every signal transition including the events in
which the input signals remain constant is represented with an actual switching event in
which the logic gate charges a capacitance All the logic families that have been introduced to
thwart the differential power analysis (DPA) by using dynamic differential logic in the
following techniques
1 Sense Amplifier Based Logic (SABL) and
2 Wave Dynamic Differential Logic (WDDL) gates
631 Sense Amplifier Based logic (SABL)
SABL has its main advantage that it has balanced input and output nodes and that all
internal nodes connect to an output The output capacitances can be balanced Systematic
methods have been developed to make sure that both branches of the differential pull down
network are balanced and that no memory effects are present in the network Sense Amplifier
Based logic is illustrated as
29
Sense Amplifier Based Logic
ANDNAND gate
This circuit style does require however a full custom characterization and layout It also
suffers from a high clock load common to all dynamic logic gates
632 Wave Dynamic Differential Logic Gates (WDDL)
WDDL logic can be implemented with static CMOS logic Static CMOS
standard cells are combined to form secure compound standard cells
which have a reduced power signature WDDL has many advantages It can
be readily implemented from an existing standard cell library The design
flow is fully supported with accurate EDA library files that come directly
from the vendor WDDL also results in a dynamic differential logic with only
a small load capacitance on the pre-charge control signal and with the low
power consumption and the high noise margins of static CMOS
Advantages of WDDL logic style are as follows
30
A major advantage of the proposed logic style is that it can be incorporated by the common
Electronic Design Automation (EDA) tool flow
No special design rules are involved in the interconnection of WDDL gates
The switching factor of WDDL is 100 A WDDL gate consists of a parallel
combination of two positive complementary gates one calculating the
true output using the true inputs the other the false output using the
false inputs A positive gate produces a zero output for an all zero input
The AND gate and the OR gate are examples of positive gates A
complementary gate sometimes also referred to as a dual gate
expresses the false output of the original logic gate using the false
inputs of the original gate The AND gate fed with true input signals and
the OR gate fed with false input signals are two dual gates Fig shows
the WDDL AND gate and the WDDL OR gate In the evaluation phase
each input signal is differential and the WDDL gate calculates its
differential output In the pre-charge phase the inputs to the WDDL gate
are set at 0 This puts the output of the gate at 0 A module in WDDL
pre-charges without distributing the pre-charge signal to each individual
gate During the pre-charge phase the input vector of the combinatorial
logic is set at all 0s Each individual gate will eventually have all its
inputs at 0 evaluate its output to 0 and pass this 0 value to the next
gate One could say that the pre-charge signal travels over the
combinatorial logic as a 0-wave hence WDDL There are several ways
to launch to pre-charge wave In Fig a pre-charge operator is inserted
at the start of every combinatorial logic tree ie the inputs of the
encryption module and the outputs of the registers They produce an all-
zero output in the pre-charge phase (clk-signal high) but let the
31
differential signal through during the evaluation phase (clk-signal low)
Fig
ure WDDL Pre-charge wave generationCHAPTER 7
WDDL GATESThe methodology used in the project is bottom-up approach Lower
modules are designed and later integrated to form larger modules whose further integration
leads to the final top module As it is a fact that logic gates form lower level modules
initially logic gates required for the design are implemented in WDDL style WDDL
demands a parallel combination of two positive complementary gates one calculating the
true value and the other negative value The logic gates like OR AND XOR have been
implemented Besides there is even implementation of Full Adder 32-bit XOR
etc71WDDL OR gateA WDDL OR gate is constructed by considering conventional
OR gate in parallel to its complementary gate ie AND gate as shown in the following
32
figure Figure
41 WDDL OR GateThe NAND and NOT gates used act as Precharge operator injecting
signal lsquo0rsquo into the gates when lsquoclkrsquo signal is high ie during the Precharge phase72
WDDL AND gateA WDDL AND gate is constructed by considering conventional
AND gate in parallel to its complementary gate ie OR gate as shown in the following
33
figure Figure
42 WDDL AND Gate73 WDDL NAND gateA WDDL NAND gate is constructed by
considering conventional AND gate in parallel to its complementary gate ie OR gate as
shown in the following figure
34
Figure
43 WDDL NAND Gate74 WDDL NOR gateA WDDL NOR gate is constructed by
considering conventional OR gate in parallel to its complementary gate ie AND gate as
shown in the following figure
35
Figure 44 WDDL
NOR Gate 75 WDDL XOR gate XOR function can be implemented by the
Boolean Equation A B = ABrsquo + ArsquoB Therefore XOR Gate is implemented
in terms of AND gate and OR gate as shown in the figure 44 It can also be implemented
by instantiating a WDDL AND gate and WDDL OR gate But the number of gates
involved in the latter one is greater than the former one Therefore the first method of
implementation is followed rather than the second one
36
Figure 45
WDDL XOR gateWith the help of the above basic gates Full adder circuit has been
designed by instantiating the above designed WDDL gates During the implementation of
the Blowfish algorithm a 32-bit XOR gate and 32-bit Adder circuit are required They can
be easily implemented by instantiating the corresponding lower module 32 number of
timesCHAPTER 8 FRONT END
RESULTSWDDL OR GATESynthesis
Report==========================================================
= Final Report
===========================================================Final
ResultsRTL Top Level Output File Name wddlorngrTop Level Output File Name
wddlorOutput Format NGCOptimization Goal SpeedKeep
Hierarchy NODesign Statistics IOs 5Cell Usage
BELS 2 LUT3 2 IO Buffers 5
37
IBUF 3 OBUF
2============================================================Devi
ce utilization summary---------------------------Selected Device 3s250eft256-4 Number
of Slices 1 out of 2448 0 Number of 4 input LUTs 2
out of 4896 0 Number of IOs 5 Number of bonded IOBs
5 out of 172 2 Timing Summary---------------Speed Grade -4Maximum
combinational path delay 6236nsSimulation Result
S
ynthesis Result
38
WDD
L AND GATESynthesis
Report==========================================================
== Final Report
============================================================Fina
l ResultsRTL Top Level Output File Name wddlgatesngrTop Level Output File
Name wddlgatesOutput Format NGCOptimization Goal
SpeedKeep Hierarchy NODesign Statistics IOs
5Cell Usage BELS 2 LUT3 2 IO Buffers
5 IBUF 3 OBUF
2===========================================================Devic
e utilization summary---------------------------Selected Device 3s250etq144-4 Number
of Slices 1 out of 2448 0 Number of 4 input LUTs 2
out of 4896 0 Number of IOs 5 Number of bonded IOBs
5 out of 108 4 Timing Summary---------------Speed Grade -4Maximum
combinational path delay 6236nsSimulation Result
39
Sy
nthesis Result
WDDL NAND GATESynthesis
Report==========================================================
== Final Report
============================================================Fina
l ResultsRTL Top Level Output File Name wddlnand1ngrTop Level Output File
Name wddlnand1Output Format NGCOptimization Goal
SpeedKeep Hierarchy NODesign Statistics IOs
5Cell Usage BELS 2 LUT3 2 IO Buffers
40
5 IBUF 3 OBUF
2============================================================Devi
ce utilization summarySelected Device 3s500efg320-4 Number of Slices
1 out of 4656 0 Number of 4 input LUTs 2 out of 9312 0
Number of IOs 5 Number of bonded IOBs 5 out of 232
2 Timing SummarySpeed Grade -4Maximum combinational path delay
6236nsSimulation Result
Synthesis Result
WD
41
DL XOR GATESimulation Result
Synthesis Result
WDDL XOR GATESynthesis
Report==========================================================
== Final Report
===========================================================Final
42
ResultsRTL Top Level Output File Name wddlxorgatengrTop Level Output File
Name wddlxorgateOutput Format NGCOptimization Goal
SpeedKeep Hierarchy NODesign Statistics IOs
5Cell Usage BELS 2 LUT3 2 IO Buffers
5 IBUF 3 OBUF
2============================================================Devi
ce utilization summary---------------------------Selected Device 3s250eft256-4 Number
of Slices 1 out of 2448 0 Number of 4 input LUTs 2
out of 4896 0 Number of IOs 5 Number of bonded IOBs
5 out of 172 2 Timing Summary---------------Speed Grade -4Maximum
combinational path delay 6236nsSimulation Result
Synthesis Result
43
CHAPTER 9 SUMMARY AND CONCLUSION 91
SummaryIn order to provide security to ICs against side-channel attacks especially
Differential Power Analysis (DPA) it is necessary to implement the design in a logic that
can render constant power dissipation irrespective of the input combination WDDL is
proved to be advantageous to others and therefore is of great significance In this
dissertation work architecture for Blowfish Algorithm is designed and implemented in
WDDL style In this implementation bottom-up approach is used The low level entities
are designed and later they are all combined to form the entire module The key
scheduling is online The sub-keys generated for a particular key can be used for the
encryption of the entire data to be encrypted with that key The sub keys are given in
reverse direction for the decryption data path Initially logic gates are implemented in
WDDL and then higher modules have been designed by instantiating the WDDL gates to
form the entire module thus resulting in constant power dissipation irrespective of any
input data combination The entire design works in two phases namely Precharge phase and
Evaluation phase In the Precharge phase all the signals of the design are zeroed and
during the Evaluation phase the functionality of the design is achieved This sort of design
has been found simple and very effective in thwarting the side-channel attack namely
Differential Power analysis (DPA)92 ConclusionThe crypto processor has been
44
designed for the key size of 448 bits and plain text of 64 bits The code for the
implementation has been written in VHDL The functional verification has been done using
the Cadence (NC Launch) Simulation package and Synthesis using RTL Compiler The
Backend of the design is done using the SOC EncounterAccording to the specifications
desired functionality has been achieved In the output during the Evaluation phase there
has been same number of transitions thus resulting in constant power dissipation During
Synthesis it has been observed that a simple WDDL gate comprised many conventional
gates Therefore the area of the design has grown nearly three-fold when compared to the
design implemented in conventional CMOS logic at the cost of security incorporated into
the IC against Differential Power Analysis (DPA) Due to the constant power dissipation at
the output hacker cannot apply Differential Power Analysis (DPA) scheme to find the
secret key that is being used in the crypto-processor Thus security against DPA is
incorporated into the IC at hardware level by implementing the design in WDDL style
which is quite simple and effectiveCHAPTER 10
REFERENCES 101 Referred Technical papers[1] Kris Tiri Member
IEEE and Ingrid Verbauwhede Senior Member IEEE ldquoA Digital Design Flow for
Secure Integrated Circuitsrdquo IEEE Transaction on Computer-Aided Design of Integrated
Circuits and Systems Vol 25 No 7 July 2006[2] Prof Jean-Jacques Quisquater Math
RiZK laquo Side Channel Attacks raquo October 2002[3] Ross Anderson Mike Bond Jolyon
Clulow and Sergei Skorobogatov ldquoCrypto processors ndash A Surveyrdquo IEEE Proceedings[4]
Noohul Basheer Zain Ali James M Noras ldquoOptimal Data path Design for a Cryptographic
Processor The Blowfish Algorithmrdquo Malaysian Journal of Computer Science Vol 14 No
1 June 2001 pp 16-27[5] Dan Rinehimer Derek Wilson ldquoSummary of B Schneierrsquos
Description of New Variable Length Key 64-Bit Block Cipher (Blowfish)rdquo[6] Kris Tiri and
Ingrid Verbauwhede ldquoA Dynamic and Differential CMOS Logic Style to Resist Power and
Timing Attacks on Security ICrsquosrdquo[7] Discretix Technologies Limited ldquoIntroduction to Side
45
Channel Attacksrdquo[8] Kris Tiri Moonmoon Akmal and Ingrid Verbauwhede ldquoA Dynamic
and Differential Logic with Signal Independent Power Consumption to withstand
Differential Power Analysis on Smart CardsrdquoReference books[1] William Stallings
ldquoCryptography and Network Security Principles and Practicesrdquo Pearson Education
2003[2] Samir Palnitkar ldquoVerilog HDL A Guide to Digital Design and Synthesisrdquo
Prentice Hall Referred Web Pages[1] httpwwwschneiercomblowfishhtml[2]
httpwwwnistgov[3] httpwwwdiscretixcomPDFIntroduction20to20Side20Channel
20Attackspdf[4] httpwwwwipointpctdbenwojsp
IA=WO2005081085ampDISPLAY=CLAIMS
46
CHAPTER 2 REVIEW
OF LITERATURE
21 Introduction to Digital Design Flow
A typical digital design flow for any IC is as follows Design Entry (Specification
Architecture RTL Coding and RTL Verification) Synthesis and post-synthesis
verification Backend (Floor Planning Place and route Layout) Tape Out to Foundry to get
the end product All modern digital designs start with a designer writing a hardware description
of the IC (using HDL or Hardware Description Language) in VerilogVHDL A Verilog or
VHDL program essentially describes the hardware (logic gates Flip-Flops counters etc) the
inter connect of the circuit blocks and the functionality Various CAD tools are available to
synthesize a circuit based on the HDL
22 Secure Digital Design Flow
The secure digital design flow is depicted in Fig In addition to the
regular steps in an IC design (logic design logic synthesis place amp route
stream out and verifications) one can recognize two additional steps
namely 1) ldquocell substitutionrdquo and 2) ldquointerconnect decompositionrdquo These
operations have been inserted in the back end of the flow and do not
interfere with the creative part of a design indicated by the ldquologic designrdquo
task
9
Figure 21 Secure Digital Design Flow
During the cell substitution step cells that are designed by any constant power logic style
replace the conventional CMOS gates This ensures the security of the ICs against power
analysis attacks
10
CHAPTER 3 HARDWARE DESCRIPTIVE
LANGUAGE (VHDL)
Why (V) HDL
Interoperability
Technology independence
Design reuse
Several levels of abstraction
Readability
Standard language
Widely supported
What is VHDL
VHDL = VHSIC Hardware Description Language(VHSIC = Very High-Speed IC)
Design specification language
Design entry language
Design simulation language
Design documentation language
An alternative to schematics
Brief History
VHDL Was developed in the early 1980s for managing design problems that involved
large circuits and multiple teams of engineers
Funded by US Department of Defence
11
The first publicly available version was released in 1985
In 1986 IEEE (Institute of Electrical and Electronics Engineers Inc) was presented
with a proposal to standardize the VHDL
In 1987 standardization =gt IEEE 1076-1987
An improved version of the language was released in 1994 =gt IEEE standard1076-
1993
Related Standards
IEEE 1076 doesnrsquot support simulation conditions such as unknown and high-
impedance
Soon after IEEE 1076-1987 was released simulator companies began using their own
non-standard types =gt VHDL was becoming a nonstandard
IEEE 1164 standard was developed by an IEEE1048715IEEE 1164 contains definitions for a
nine-valued data type std_logic
IEEE 10763 (Numeric or Synthesis Standard) defines data types as they relate to actual
hardware
Defines eg two numeric types signed and unsigned
VHDL Environment
12
Design Units
Segments of VHDL code can be compiled separately and stored in a library
Entities
A black box with interface definition
Defines the inputsoutputs of a component (define pins)
A way to represent modularity in VHDL
Similar to symbol in schematic
Entity declaration describes entity
Eg
Entity Comparator is
Port (A B in std_logic_vector (7 downto0)
EQ out std_logic)
end Comparator
13
Ports
Provide channels of communication between the component and its environment
Each port must have a name direction and a type
An entity may have NO port declaration
Port directions
In A value of a port can be read inside the component but cannot be assigned
Multiple reads of port are allowed
Out Assignments can be made to a port but data from a port cannot be read Multiple
assignments are allowed
In out Bi-directional assignments can be made and data can be read Multiple
assignments are allowed
Buffer An out port with read capability May have at most one assignment (are not
recommended)
Architectures
Every entity has at least one architecture
One entity can have several architectures
Architectures can describe design using
BehaviorndashStructurendashDataflow
Architectures can describe design on many levelsndashGate levelndashRTL (Register Transfer
Level)ndashBehavioral level
Configuration declaration links architecture to entity
Eg
Architecture Comparator1 of Comparator is
Begin
EQ lt= rsquo1rsquowhen (A=B) else rsquo0rsquo
End Comparator1
Configurations
Links entity declaration and architecture body together
14
Concept of default configuration is a bit messy in VHDL lsquo87
ndashLast architecture analyzed links to entity
Can be used to change simulation behavior without re-analyzing the VHDL source
Complex configuration declarations are ignored in synthesis
Some entities can have eggate level architecture and behavioral architecture
Are always optional
Packages
Packages contain information common to many design units
1 Package declaration
Constant declarations
ndash Type and subtype declarations
ndash Function and procedure declarations
ndash Global signal declarations
ndash File declarations
ndash Component declarations
2 Package body
ndash Is not necessary needed
ndash Function bodies
ndash Procedure bodies
Packages are meant for encapsuling data which can be shared globally among several design
units These consist of declaration part and optional body part
Package declaration can contain
ndash Type and subtype declarations
ndash Subprograms
ndash Constants
ndash Alias declarations
ndash Global signal declarations
ndash file declarations
ndash Component declarations
Package body consists of
15
ndash Subprogram declarations and bodies
ndash Type and subtype declarations
ndash Deferred constants
ndash File declarations
Libraries
Collection of VHDL design units (database)
1 Packages
package declaration
package body
2 Entities (entity declaration)
3 Architectures (architecture body)
4 Configurations (configuration declarations)
Usually directory in UNIX file system
Can be also any other kind of database
Levels of Abstraction
VHDL supports many possible styles of design description which differ primarily in how
closely they relate to the HW
It is possible to describe a circuit in a number of ways
Structural-------
Dataflow ------- Higher level of abstraction
Behavioral -------
Structural VHDL description
Circuit is described in terms of its components
From a low-level description (eg transistor-level description) to a high level
description (eg block diagram)
For large circuits low-level descriptions quickly become impractical
Dataflow VHDL Description
Circuit is described in terms of how data moves through the system
16
In the dataflow style you describe how information flows between registers in the
system
The combinational logic is described at a relatively high level the placement and
operation of registers is specified quite precisely
The behavior of the system over the time is defined by registers
There are no build-in registers in VHDL-language
ndashEither lower level description
ndashor behavioral description of sequential elements is needed
The lower level register descriptions must be created or obtained
If there is no 3rd party models for registers =gt you must write the behavioral
description of registers
The behavioral description can be provided in the form of subprograms(functions or
procedures)
Behavioral VHDL Description
Circuit is described in terms of its operation over time
Representation might include eg state diagrams timing diagrams and algorithmic
descriptions
The concept of time may be expressed precisely using delays (eg A lt= B after 10 ns)
If no actual delays are used order of sequential operations is defined
17
In the lower levels of abstraction (eg RTL) synthesis tools ignore detailed timing
specifications
The actual timing results depend on implementation technology and efficiency of
synthesis tool
There are a few tools for behavioral synthesis
Concurrent Vs Sequential
Processes
Basic simulation concept in VHDL
VHDL description can always be broken up to interconnected processes
Quite similar to UNIX process
18
Process keyword in VHDL
Process statement is concurrent statement
Statements inside process statements are sequential statements
Process must contain either sensitivity list or wait statement(s) but NOT both
Sensitivity list or wait statement(s) contains signals which wakes process up
General Format
Process [(sensitivity list)]
process_declarative_part
begin
process_statements
[wait_statement]
End process
19
CHAPTER 4 SMART
CARD OVERVIEW
This section will very briefly introduce the concept of a smart card Basically a smart
card is a computer embedded in a safe It consists of a (typically 8-bit or 32-bit) processor
together with ROM EEPROM and a small amount of RAM which is therefore capable of
performing computations The main goal of a smart card is to allow the execution of
cryptographic operations involving some secret parameter (the key) while not revealing this
parameter to the outside world As opposed the goal of the attacker is to recover this secret
parameter This processor is embedded in a chip and connected to the outside world through
eight wires the role use position of which is normalized In addition to the inputoutput wires
the parts we will be the most interested in are the following
1 Power supply Smart cards do not have an internal battery
2 The current they need is provided by the smart card reader This will make the smart
cards power consumption pretty easy to measure for the attacker
3 Clock Similarly smart cards do not dispose of an internal clock either The clock ticks
must also be provided from the outside world As a consequence this will allow the
attacker to measure the cards running time with very good precision
Smart cards are usually equipped with protection mechanisms composed of a shield (the
passivation layer) whose goal is to hide the internal behavior of the chip and possibly sensors
that react when the shield is removed by destroying all sensitive data and preventing the card
to function properly
20
CHAPTER 5 SIDE
CHANNEL ATTACKS
ldquoSide channel attacksrdquo are attacks that are based on ldquoSide Channel Informationrdquo Side
channel information is information that can be retrieved from the encryption device that is
neither the plaintext to be encrypted nor the cipher text resulting from the encryption process
In the past an encryption device was perceived as a unit that receives plaintext input
and produces cipher text output and vice-versa Attacks were therefore based on either
knowing the cipher text (such as cipher text-only attacks) or knowing both (such as known
plaintext attacks) or on the ability to define what plaintext is to be encrypted and then seeing
the results of the encryption (known as chosen plaintext attacks) Today it is known that
encryption devices have additional output and often additional inputs which are not the
plaintext or the cipher text
Encryption devices produce timing information (information about the time that
operations take) that is easily measurable radiation of various sorts power consumption
statistics (that can be easily measured as well) and more Often the encryption device also has
additional ldquounintentionalrdquo inputs such as voltage that can be modified to cause predictable
outcomes Side channel attacks make use of some or all of this information along with other
(known) cryptanalytic techniques to recover the key the device is using
Side channel analysis techniques are of concern because the attacks can be mounted
quickly and can sometimes be implemented using readily available hardware costing from only
a few hundred dollars to thousands of dollars
51 Classification of side channel attacks
The literature usually classifies side channel attacks along two orthogonal axes
1 Invasive vs Non-invasive
21
Invasive attacks require de-packaging the chip to get direct access to its components
A typical example of this is the connection of a wire on a data bus to see the data transfers
A non-invasive attack only exploits externally available information (the emission of
which is however often unintentional) such as running time power consumption
A new distinction called semi-invasive attacks These attacks have the specificity that
they require de-packaging of the chip to get access to the chip surface but do not tamper with
the passivation layer ( they do not require electrical contact to the metal surface)
2 Active vs passive
Active attacks try to tamper with the cards proper functioning For example fault
induction attacks will try to induce errors in the computation
As opposed passive attacks will simply observe the cards behavior during its
processing without disturbing it
Note that these two axes are well orthogonal
An Invasive attack may completely avoid disturbing the cards behavior and a passive
attack may require a preliminary de-packaging for the required information to be observable
These attacks are of course not mutually exclusive an invasive attack may for example serve
as a preliminary step for a non-invasive one by giving a detailed description of the chips
architecture that helps to find out where to put external probes
As smart cards are usually equipped with protection mechanisms that are supposed to
react to invasive attacks (although several invasive attacks are nonetheless capable to defeat
these mechanisms as will be illustrated below) On the other hand it is worth pointing out that
a non-invasive attack is completely undetectable there is for example no way for a smart card
to figure out that its running time is currently being measured Other countermeasures will
therefore be necessary From an economical point of view invasive attacks are usually more
expensive to deploy on a large scale since they require individual processing of each attacked
device In this sense non-invasive attacks constitute therefore a bigger menace for the smart
card industry
Invasive attacks involved a relatively high capital investment for lab equipment plus a
moderate investment of effort for each individual chip attacked Non-invasive attacks require
only a moderate capital investment plus a moderate investment of effort in designing an attack
on a particular type of device Thereafter the cost per device attacked is low Semi-invasive
attacks can be carried out using very cheap and simple equipment
The attacker can gain information by
22
1 Probing attacks
2 Fault induction attacks
3 Timing attacks
4 Power analysis attacks and
5 Electromagnetic timing attacks
These attacks are performed during the switching behavior of digital
complementary metalndashoxidendashsemiconductor (CMOS) gates Of all these Power analysis attack
is of major concern
52 Power analysis attacks
The power consumption of a cryptographic device may provide much information
about the operations that take place and the involved parameters This is the idea of simple and
differential power analysis first introduced by Kocher et al As the clock ticks the cards
energy is also provided by the terminal and can therefore easily be measured Basically to
measure a circuits power consumption a small (eg 50 ohm) resistor is inserted in series with
the power or ground input The voltage difference across the resistor divided by the resistance
yields the current Well-equipped electronics labs have equipment that can digitally sample
voltage differences at extraordinarily high rates (over 1GHz) with excellent accuracy (less than
1 error) Devices capable of sampling at 20MHz or faster and transferring the data to a PC
can be bought for less than US$ 400
Power analysis attacks are of two types
1 Simple power analysis attack and
2 Differential Power Analysis attack
SPA attacks on smartcards typically take a few seconds per card while DPA attacks
can take several hours In a general with a somewhat academic perspective we may consider
the entire internal state of the block cipher to be all the intermediate results and values that are
never included in the output in normal operations For example DES has 16 rounds we can
consider the intermediate states state [115] after each round except the last as a secret internal
state Side channels typically give information about these internal states or about the
operations used in the transition of this internal state from one round to another The type of
side-channel will of course determine what information is available to the attacker about these
states The attacks typically work by finding some information about the internal state of the
cipher which can be learned both by guessing part of the key and checking the value directly
23
and additionally by some statistical property of the cipher that makes that checkable value
slightly nonrandom
521 Simple Power Analysis attack (SPA)
Simple Power Analysis is generally based on looking at the visual representation of the
power consumption of a unit while an encryption operation is being performed Simple Power
Analysis is a technique that involves direct interpretation of power consumption measurements
collected during cryptographic operations SPA can yield information about a devices
operation as well as key material
A trace refers to a set of power consumption measurements taken across a
cryptographic operation For example a 1 millisecond operation sampled at 5 MHz yields a
trace containing 5000 points Figure for example shows an SPA trace from a smart card
performing a DES operation
Figure SPA monitoring from a single DES operation performed by a typical smart card The
upper trace shows the entire encryption operation including the initial permutation the 16
DES rounds and the final permutation The lower trace is a detailed view of the second and
third rounds
Because SPA can reveal the sequence of instructions executed it can be used to break
cryptographic implementations in which the execution path depends on the data being
processed For example
DES key schedule the DES key schedule computation involves rotating 28-bit key registers
A conditional branch is commonly used to check the bit shifted off the end so that ldquo1 bits can
24
be wrapped around The resulting power consumption traces for a ldquo1 bit and a ldquo0 bit will
contain different SPA features if the execution paths take different branches for each
DES permutations DES implementations perform a variety of bit permutations Conditional
branching in software or microcode can cause significant power consumption differences for
ldquo0 and ldquo1 bits
Comparisons String or memory comparison operations typically perform a conditional
branch when a mismatch is found This conditional branching causes large SPA (and
sometimes timing) characteristics
Multipliers Modular multiplication circuits tend to leak a great deal of information about the
data they process The leakage functions depend on the multiplier design but are often strongly
correlated to operand values and Hamming weights
Exponentiators A simple modular exponentiation function scans across the exponent
performing a squaring operation in every iteration with an additional multiplication operation
for each exponent bit that is equal to ldquo1 The exponent can be compromised if squaring and
multiplication operations have different power consumption characteristics take different
amounts of time or are separated by different code Modular exponentiation functions that
operate on two or more exponent bits at a time may have more complex leakage functions
522Differential Power Analysis attack (DPA)
In addition to large-scale power variations due to the instruction sequence there are
effects correlated to data values being manipulated These variations tend to be smaller and are
sometimes overshadowed by measurement errors and other noise In such cases it is still often
possible to break the system using statistical functions tailored to the target algorithm
To implement the DPA attack an attacker first observes m encryption operations and captures
power traces T1 m [1 k] containing k samples each In addition the attacker records the
cipher text C1 m No knowledge of the plain text is required DPA analysis uses power
consumption measurements to determine whether a key block guess Ks is correct The attacker
computes a k-sample differential trace centD [1 k] by finding the difference between the
average of the traces for which a certain intermediate value V is one and the average of the
traces for which V is zero Thus cent D[j) is the average over C1m of the effect due to the value
represented by the selection function D on the power consumption at point j In particular25
If Ks is incorrect the bit computed using D will differ from the actual target bit for about half
of the ciphertext Ci The selection function is thus effectively uncorrelated to what was actually
computed by the target device If a random function is used to divide a set into two subsets the
difference in the averages of the subsets should approach zero as the subset sizes approach
infinity
Thus because trace components uncorrelated to D will diminish with 1 pm causing the
differential trace to become at (the actual trace may not be completely at as D with Ks
incorrect may have a weak correlation to D with the correct Ks) If Ks is correct however the
computed value for D (Ci bKs) will equal the actual value of target bit b with probability 1
The selection function is thus correlated to the value of the bit considered Other data values
measurement errors etc that are not correlated to D approach zero Because power
consumption is correlated to data bit values the plot of centD will be degat with spikes in regions
where D is correlated to the values being processed The correct value of Ks can thus be
identified from the spikes in its differential trace Four values of b correspond to each S box
providing confirmation of key block guesses Finding all eight Ks yields the entire 48-bit round
sub key The remaining 8 key bits can be found easily using exhaustive search or by analyzing
one additional round Triple DES keys can be found by analyzing an outer DES operation first
using the resulting key to decrypt the cipher text and attacking the next DES key DPA can use
known plaintext or known cipher text and can find encryption or decryption keys
26
CHAPTER 6 CONSTANT POWER CONSUMING
LOGIC STYLES
The power consumption of traditional standard cells and logic is
dependent on the signal activity When the output of the logic gate makes
a 0 to 1 transition a current comes from the power supply and charges the
output capacitance On the other hand when the output sees a 1 to 0 a 0
to 0 or a 1 to 1 transition no or only a limited amount of energy (due to
short circuit or leakage) is consumed from the power supply This is the
fundamental reason why information is leaked through the power supply
and why power attacks are possible The basis of a secure digital design
flow is a logic style with constant power consumption
61 Current Mode Logic
Current mode logic (CML) eg current steering logic seems the
ideal solution This type of logic continuously draws a current from the
supply and measures its state through the path that the current takes A
gate has constant power consumption if it draws a perfectly constant
current from the power supply independently of the input and output
signals To build a current source capable of generating a constant current
special circuit techniques that minimize channel length modulation have to
be used
The decisive drawback of CML however is its static power
consumption When the logic gate is not processing any data it burns the
27
current which makes this logic style unacceptable for embedded battery-
operated devices
62 Voltage Mode Logic (CMOS circuit styles)
Voltage mode logic (VML) eg static CMOS logic only draws a current from the
supply to change state and measures its state by the amount of charge it stores on a
capacitance A regular standard CMOS circuit will only consume power when a capacitance
gets charged and later discharged ie when a gate switches state It is the main reason that
CMOS is the style of choice for every battery operated or low power device This is illustrated
in the figure below for simple inverter Thus static CMOS is the preferred logic style because
of its low power consumption and high noise margins
Standard CMOS inverter
Yet two conditions must be satisfied for VML to have constant power consumption
namely
1) A logic gate must have exactly one switching event per signal transition
2) The logic gate must charge a constant capacitance in that switching event
28
Here above all the four transitions of CMOS inverter can be distinguished when
monitoring the power supply
63 Dynamic Differential Logic
Dynamic differential logic sometimes also referred to as dual rail with pre-charge
logic fulfills the first condition A differential logic family uses the true and the false
representation of the input and output signals and a dynamic logic family alternates pre-charge
and evaluation phases As a result since both outputs (true and false) are pre-charged to 1
exactly one of the two output nodes evaluates to 0 to have a differential output signal in the
evaluation phase The discharged output node is charged to 1 in the following pre-charge phase
to pre-charge both outputs to 1 In other words every signal transition including the events in
which the input signals remain constant is represented with an actual switching event in
which the logic gate charges a capacitance All the logic families that have been introduced to
thwart the differential power analysis (DPA) by using dynamic differential logic in the
following techniques
1 Sense Amplifier Based Logic (SABL) and
2 Wave Dynamic Differential Logic (WDDL) gates
631 Sense Amplifier Based logic (SABL)
SABL has its main advantage that it has balanced input and output nodes and that all
internal nodes connect to an output The output capacitances can be balanced Systematic
methods have been developed to make sure that both branches of the differential pull down
network are balanced and that no memory effects are present in the network Sense Amplifier
Based logic is illustrated as
29
Sense Amplifier Based Logic
ANDNAND gate
This circuit style does require however a full custom characterization and layout It also
suffers from a high clock load common to all dynamic logic gates
632 Wave Dynamic Differential Logic Gates (WDDL)
WDDL logic can be implemented with static CMOS logic Static CMOS
standard cells are combined to form secure compound standard cells
which have a reduced power signature WDDL has many advantages It can
be readily implemented from an existing standard cell library The design
flow is fully supported with accurate EDA library files that come directly
from the vendor WDDL also results in a dynamic differential logic with only
a small load capacitance on the pre-charge control signal and with the low
power consumption and the high noise margins of static CMOS
Advantages of WDDL logic style are as follows
30
A major advantage of the proposed logic style is that it can be incorporated by the common
Electronic Design Automation (EDA) tool flow
No special design rules are involved in the interconnection of WDDL gates
The switching factor of WDDL is 100 A WDDL gate consists of a parallel
combination of two positive complementary gates one calculating the
true output using the true inputs the other the false output using the
false inputs A positive gate produces a zero output for an all zero input
The AND gate and the OR gate are examples of positive gates A
complementary gate sometimes also referred to as a dual gate
expresses the false output of the original logic gate using the false
inputs of the original gate The AND gate fed with true input signals and
the OR gate fed with false input signals are two dual gates Fig shows
the WDDL AND gate and the WDDL OR gate In the evaluation phase
each input signal is differential and the WDDL gate calculates its
differential output In the pre-charge phase the inputs to the WDDL gate
are set at 0 This puts the output of the gate at 0 A module in WDDL
pre-charges without distributing the pre-charge signal to each individual
gate During the pre-charge phase the input vector of the combinatorial
logic is set at all 0s Each individual gate will eventually have all its
inputs at 0 evaluate its output to 0 and pass this 0 value to the next
gate One could say that the pre-charge signal travels over the
combinatorial logic as a 0-wave hence WDDL There are several ways
to launch to pre-charge wave In Fig a pre-charge operator is inserted
at the start of every combinatorial logic tree ie the inputs of the
encryption module and the outputs of the registers They produce an all-
zero output in the pre-charge phase (clk-signal high) but let the
31
differential signal through during the evaluation phase (clk-signal low)
Fig
ure WDDL Pre-charge wave generationCHAPTER 7
WDDL GATESThe methodology used in the project is bottom-up approach Lower
modules are designed and later integrated to form larger modules whose further integration
leads to the final top module As it is a fact that logic gates form lower level modules
initially logic gates required for the design are implemented in WDDL style WDDL
demands a parallel combination of two positive complementary gates one calculating the
true value and the other negative value The logic gates like OR AND XOR have been
implemented Besides there is even implementation of Full Adder 32-bit XOR
etc71WDDL OR gateA WDDL OR gate is constructed by considering conventional
OR gate in parallel to its complementary gate ie AND gate as shown in the following
32
figure Figure
41 WDDL OR GateThe NAND and NOT gates used act as Precharge operator injecting
signal lsquo0rsquo into the gates when lsquoclkrsquo signal is high ie during the Precharge phase72
WDDL AND gateA WDDL AND gate is constructed by considering conventional
AND gate in parallel to its complementary gate ie OR gate as shown in the following
33
figure Figure
42 WDDL AND Gate73 WDDL NAND gateA WDDL NAND gate is constructed by
considering conventional AND gate in parallel to its complementary gate ie OR gate as
shown in the following figure
34
Figure
43 WDDL NAND Gate74 WDDL NOR gateA WDDL NOR gate is constructed by
considering conventional OR gate in parallel to its complementary gate ie AND gate as
shown in the following figure
35
Figure 44 WDDL
NOR Gate 75 WDDL XOR gate XOR function can be implemented by the
Boolean Equation A B = ABrsquo + ArsquoB Therefore XOR Gate is implemented
in terms of AND gate and OR gate as shown in the figure 44 It can also be implemented
by instantiating a WDDL AND gate and WDDL OR gate But the number of gates
involved in the latter one is greater than the former one Therefore the first method of
implementation is followed rather than the second one
36
Figure 45
WDDL XOR gateWith the help of the above basic gates Full adder circuit has been
designed by instantiating the above designed WDDL gates During the implementation of
the Blowfish algorithm a 32-bit XOR gate and 32-bit Adder circuit are required They can
be easily implemented by instantiating the corresponding lower module 32 number of
timesCHAPTER 8 FRONT END
RESULTSWDDL OR GATESynthesis
Report==========================================================
= Final Report
===========================================================Final
ResultsRTL Top Level Output File Name wddlorngrTop Level Output File Name
wddlorOutput Format NGCOptimization Goal SpeedKeep
Hierarchy NODesign Statistics IOs 5Cell Usage
BELS 2 LUT3 2 IO Buffers 5
37
IBUF 3 OBUF
2============================================================Devi
ce utilization summary---------------------------Selected Device 3s250eft256-4 Number
of Slices 1 out of 2448 0 Number of 4 input LUTs 2
out of 4896 0 Number of IOs 5 Number of bonded IOBs
5 out of 172 2 Timing Summary---------------Speed Grade -4Maximum
combinational path delay 6236nsSimulation Result
S
ynthesis Result
38
WDD
L AND GATESynthesis
Report==========================================================
== Final Report
============================================================Fina
l ResultsRTL Top Level Output File Name wddlgatesngrTop Level Output File
Name wddlgatesOutput Format NGCOptimization Goal
SpeedKeep Hierarchy NODesign Statistics IOs
5Cell Usage BELS 2 LUT3 2 IO Buffers
5 IBUF 3 OBUF
2===========================================================Devic
e utilization summary---------------------------Selected Device 3s250etq144-4 Number
of Slices 1 out of 2448 0 Number of 4 input LUTs 2
out of 4896 0 Number of IOs 5 Number of bonded IOBs
5 out of 108 4 Timing Summary---------------Speed Grade -4Maximum
combinational path delay 6236nsSimulation Result
39
Sy
nthesis Result
WDDL NAND GATESynthesis
Report==========================================================
== Final Report
============================================================Fina
l ResultsRTL Top Level Output File Name wddlnand1ngrTop Level Output File
Name wddlnand1Output Format NGCOptimization Goal
SpeedKeep Hierarchy NODesign Statistics IOs
5Cell Usage BELS 2 LUT3 2 IO Buffers
40
5 IBUF 3 OBUF
2============================================================Devi
ce utilization summarySelected Device 3s500efg320-4 Number of Slices
1 out of 4656 0 Number of 4 input LUTs 2 out of 9312 0
Number of IOs 5 Number of bonded IOBs 5 out of 232
2 Timing SummarySpeed Grade -4Maximum combinational path delay
6236nsSimulation Result
Synthesis Result
WD
41
DL XOR GATESimulation Result
Synthesis Result
WDDL XOR GATESynthesis
Report==========================================================
== Final Report
===========================================================Final
42
ResultsRTL Top Level Output File Name wddlxorgatengrTop Level Output File
Name wddlxorgateOutput Format NGCOptimization Goal
SpeedKeep Hierarchy NODesign Statistics IOs
5Cell Usage BELS 2 LUT3 2 IO Buffers
5 IBUF 3 OBUF
2============================================================Devi
ce utilization summary---------------------------Selected Device 3s250eft256-4 Number
of Slices 1 out of 2448 0 Number of 4 input LUTs 2
out of 4896 0 Number of IOs 5 Number of bonded IOBs
5 out of 172 2 Timing Summary---------------Speed Grade -4Maximum
combinational path delay 6236nsSimulation Result
Synthesis Result
43
CHAPTER 9 SUMMARY AND CONCLUSION 91
SummaryIn order to provide security to ICs against side-channel attacks especially
Differential Power Analysis (DPA) it is necessary to implement the design in a logic that
can render constant power dissipation irrespective of the input combination WDDL is
proved to be advantageous to others and therefore is of great significance In this
dissertation work architecture for Blowfish Algorithm is designed and implemented in
WDDL style In this implementation bottom-up approach is used The low level entities
are designed and later they are all combined to form the entire module The key
scheduling is online The sub-keys generated for a particular key can be used for the
encryption of the entire data to be encrypted with that key The sub keys are given in
reverse direction for the decryption data path Initially logic gates are implemented in
WDDL and then higher modules have been designed by instantiating the WDDL gates to
form the entire module thus resulting in constant power dissipation irrespective of any
input data combination The entire design works in two phases namely Precharge phase and
Evaluation phase In the Precharge phase all the signals of the design are zeroed and
during the Evaluation phase the functionality of the design is achieved This sort of design
has been found simple and very effective in thwarting the side-channel attack namely
Differential Power analysis (DPA)92 ConclusionThe crypto processor has been
44
designed for the key size of 448 bits and plain text of 64 bits The code for the
implementation has been written in VHDL The functional verification has been done using
the Cadence (NC Launch) Simulation package and Synthesis using RTL Compiler The
Backend of the design is done using the SOC EncounterAccording to the specifications
desired functionality has been achieved In the output during the Evaluation phase there
has been same number of transitions thus resulting in constant power dissipation During
Synthesis it has been observed that a simple WDDL gate comprised many conventional
gates Therefore the area of the design has grown nearly three-fold when compared to the
design implemented in conventional CMOS logic at the cost of security incorporated into
the IC against Differential Power Analysis (DPA) Due to the constant power dissipation at
the output hacker cannot apply Differential Power Analysis (DPA) scheme to find the
secret key that is being used in the crypto-processor Thus security against DPA is
incorporated into the IC at hardware level by implementing the design in WDDL style
which is quite simple and effectiveCHAPTER 10
REFERENCES 101 Referred Technical papers[1] Kris Tiri Member
IEEE and Ingrid Verbauwhede Senior Member IEEE ldquoA Digital Design Flow for
Secure Integrated Circuitsrdquo IEEE Transaction on Computer-Aided Design of Integrated
Circuits and Systems Vol 25 No 7 July 2006[2] Prof Jean-Jacques Quisquater Math
RiZK laquo Side Channel Attacks raquo October 2002[3] Ross Anderson Mike Bond Jolyon
Clulow and Sergei Skorobogatov ldquoCrypto processors ndash A Surveyrdquo IEEE Proceedings[4]
Noohul Basheer Zain Ali James M Noras ldquoOptimal Data path Design for a Cryptographic
Processor The Blowfish Algorithmrdquo Malaysian Journal of Computer Science Vol 14 No
1 June 2001 pp 16-27[5] Dan Rinehimer Derek Wilson ldquoSummary of B Schneierrsquos
Description of New Variable Length Key 64-Bit Block Cipher (Blowfish)rdquo[6] Kris Tiri and
Ingrid Verbauwhede ldquoA Dynamic and Differential CMOS Logic Style to Resist Power and
Timing Attacks on Security ICrsquosrdquo[7] Discretix Technologies Limited ldquoIntroduction to Side
45
Channel Attacksrdquo[8] Kris Tiri Moonmoon Akmal and Ingrid Verbauwhede ldquoA Dynamic
and Differential Logic with Signal Independent Power Consumption to withstand
Differential Power Analysis on Smart CardsrdquoReference books[1] William Stallings
ldquoCryptography and Network Security Principles and Practicesrdquo Pearson Education
2003[2] Samir Palnitkar ldquoVerilog HDL A Guide to Digital Design and Synthesisrdquo
Prentice Hall Referred Web Pages[1] httpwwwschneiercomblowfishhtml[2]
httpwwwnistgov[3] httpwwwdiscretixcomPDFIntroduction20to20Side20Channel
20Attackspdf[4] httpwwwwipointpctdbenwojsp
IA=WO2005081085ampDISPLAY=CLAIMS
46
Figure 21 Secure Digital Design Flow
During the cell substitution step cells that are designed by any constant power logic style
replace the conventional CMOS gates This ensures the security of the ICs against power
analysis attacks
10
CHAPTER 3 HARDWARE DESCRIPTIVE
LANGUAGE (VHDL)
Why (V) HDL
Interoperability
Technology independence
Design reuse
Several levels of abstraction
Readability
Standard language
Widely supported
What is VHDL
VHDL = VHSIC Hardware Description Language(VHSIC = Very High-Speed IC)
Design specification language
Design entry language
Design simulation language
Design documentation language
An alternative to schematics
Brief History
VHDL Was developed in the early 1980s for managing design problems that involved
large circuits and multiple teams of engineers
Funded by US Department of Defence
11
The first publicly available version was released in 1985
In 1986 IEEE (Institute of Electrical and Electronics Engineers Inc) was presented
with a proposal to standardize the VHDL
In 1987 standardization =gt IEEE 1076-1987
An improved version of the language was released in 1994 =gt IEEE standard1076-
1993
Related Standards
IEEE 1076 doesnrsquot support simulation conditions such as unknown and high-
impedance
Soon after IEEE 1076-1987 was released simulator companies began using their own
non-standard types =gt VHDL was becoming a nonstandard
IEEE 1164 standard was developed by an IEEE1048715IEEE 1164 contains definitions for a
nine-valued data type std_logic
IEEE 10763 (Numeric or Synthesis Standard) defines data types as they relate to actual
hardware
Defines eg two numeric types signed and unsigned
VHDL Environment
12
Design Units
Segments of VHDL code can be compiled separately and stored in a library
Entities
A black box with interface definition
Defines the inputsoutputs of a component (define pins)
A way to represent modularity in VHDL
Similar to symbol in schematic
Entity declaration describes entity
Eg
Entity Comparator is
Port (A B in std_logic_vector (7 downto0)
EQ out std_logic)
end Comparator
13
Ports
Provide channels of communication between the component and its environment
Each port must have a name direction and a type
An entity may have NO port declaration
Port directions
In A value of a port can be read inside the component but cannot be assigned
Multiple reads of port are allowed
Out Assignments can be made to a port but data from a port cannot be read Multiple
assignments are allowed
In out Bi-directional assignments can be made and data can be read Multiple
assignments are allowed
Buffer An out port with read capability May have at most one assignment (are not
recommended)
Architectures
Every entity has at least one architecture
One entity can have several architectures
Architectures can describe design using
BehaviorndashStructurendashDataflow
Architectures can describe design on many levelsndashGate levelndashRTL (Register Transfer
Level)ndashBehavioral level
Configuration declaration links architecture to entity
Eg
Architecture Comparator1 of Comparator is
Begin
EQ lt= rsquo1rsquowhen (A=B) else rsquo0rsquo
End Comparator1
Configurations
Links entity declaration and architecture body together
14
Concept of default configuration is a bit messy in VHDL lsquo87
ndashLast architecture analyzed links to entity
Can be used to change simulation behavior without re-analyzing the VHDL source
Complex configuration declarations are ignored in synthesis
Some entities can have eggate level architecture and behavioral architecture
Are always optional
Packages
Packages contain information common to many design units
1 Package declaration
Constant declarations
ndash Type and subtype declarations
ndash Function and procedure declarations
ndash Global signal declarations
ndash File declarations
ndash Component declarations
2 Package body
ndash Is not necessary needed
ndash Function bodies
ndash Procedure bodies
Packages are meant for encapsuling data which can be shared globally among several design
units These consist of declaration part and optional body part
Package declaration can contain
ndash Type and subtype declarations
ndash Subprograms
ndash Constants
ndash Alias declarations
ndash Global signal declarations
ndash file declarations
ndash Component declarations
Package body consists of
15
ndash Subprogram declarations and bodies
ndash Type and subtype declarations
ndash Deferred constants
ndash File declarations
Libraries
Collection of VHDL design units (database)
1 Packages
package declaration
package body
2 Entities (entity declaration)
3 Architectures (architecture body)
4 Configurations (configuration declarations)
Usually directory in UNIX file system
Can be also any other kind of database
Levels of Abstraction
VHDL supports many possible styles of design description which differ primarily in how
closely they relate to the HW
It is possible to describe a circuit in a number of ways
Structural-------
Dataflow ------- Higher level of abstraction
Behavioral -------
Structural VHDL description
Circuit is described in terms of its components
From a low-level description (eg transistor-level description) to a high level
description (eg block diagram)
For large circuits low-level descriptions quickly become impractical
Dataflow VHDL Description
Circuit is described in terms of how data moves through the system
16
In the dataflow style you describe how information flows between registers in the
system
The combinational logic is described at a relatively high level the placement and
operation of registers is specified quite precisely
The behavior of the system over the time is defined by registers
There are no build-in registers in VHDL-language
ndashEither lower level description
ndashor behavioral description of sequential elements is needed
The lower level register descriptions must be created or obtained
If there is no 3rd party models for registers =gt you must write the behavioral
description of registers
The behavioral description can be provided in the form of subprograms(functions or
procedures)
Behavioral VHDL Description
Circuit is described in terms of its operation over time
Representation might include eg state diagrams timing diagrams and algorithmic
descriptions
The concept of time may be expressed precisely using delays (eg A lt= B after 10 ns)
If no actual delays are used order of sequential operations is defined
17
In the lower levels of abstraction (eg RTL) synthesis tools ignore detailed timing
specifications
The actual timing results depend on implementation technology and efficiency of
synthesis tool
There are a few tools for behavioral synthesis
Concurrent Vs Sequential
Processes
Basic simulation concept in VHDL
VHDL description can always be broken up to interconnected processes
Quite similar to UNIX process
18
Process keyword in VHDL
Process statement is concurrent statement
Statements inside process statements are sequential statements
Process must contain either sensitivity list or wait statement(s) but NOT both
Sensitivity list or wait statement(s) contains signals which wakes process up
General Format
Process [(sensitivity list)]
process_declarative_part
begin
process_statements
[wait_statement]
End process
19
CHAPTER 4 SMART
CARD OVERVIEW
This section will very briefly introduce the concept of a smart card Basically a smart
card is a computer embedded in a safe It consists of a (typically 8-bit or 32-bit) processor
together with ROM EEPROM and a small amount of RAM which is therefore capable of
performing computations The main goal of a smart card is to allow the execution of
cryptographic operations involving some secret parameter (the key) while not revealing this
parameter to the outside world As opposed the goal of the attacker is to recover this secret
parameter This processor is embedded in a chip and connected to the outside world through
eight wires the role use position of which is normalized In addition to the inputoutput wires
the parts we will be the most interested in are the following
1 Power supply Smart cards do not have an internal battery
2 The current they need is provided by the smart card reader This will make the smart
cards power consumption pretty easy to measure for the attacker
3 Clock Similarly smart cards do not dispose of an internal clock either The clock ticks
must also be provided from the outside world As a consequence this will allow the
attacker to measure the cards running time with very good precision
Smart cards are usually equipped with protection mechanisms composed of a shield (the
passivation layer) whose goal is to hide the internal behavior of the chip and possibly sensors
that react when the shield is removed by destroying all sensitive data and preventing the card
to function properly
20
CHAPTER 5 SIDE
CHANNEL ATTACKS
ldquoSide channel attacksrdquo are attacks that are based on ldquoSide Channel Informationrdquo Side
channel information is information that can be retrieved from the encryption device that is
neither the plaintext to be encrypted nor the cipher text resulting from the encryption process
In the past an encryption device was perceived as a unit that receives plaintext input
and produces cipher text output and vice-versa Attacks were therefore based on either
knowing the cipher text (such as cipher text-only attacks) or knowing both (such as known
plaintext attacks) or on the ability to define what plaintext is to be encrypted and then seeing
the results of the encryption (known as chosen plaintext attacks) Today it is known that
encryption devices have additional output and often additional inputs which are not the
plaintext or the cipher text
Encryption devices produce timing information (information about the time that
operations take) that is easily measurable radiation of various sorts power consumption
statistics (that can be easily measured as well) and more Often the encryption device also has
additional ldquounintentionalrdquo inputs such as voltage that can be modified to cause predictable
outcomes Side channel attacks make use of some or all of this information along with other
(known) cryptanalytic techniques to recover the key the device is using
Side channel analysis techniques are of concern because the attacks can be mounted
quickly and can sometimes be implemented using readily available hardware costing from only
a few hundred dollars to thousands of dollars
51 Classification of side channel attacks
The literature usually classifies side channel attacks along two orthogonal axes
1 Invasive vs Non-invasive
21
Invasive attacks require de-packaging the chip to get direct access to its components
A typical example of this is the connection of a wire on a data bus to see the data transfers
A non-invasive attack only exploits externally available information (the emission of
which is however often unintentional) such as running time power consumption
A new distinction called semi-invasive attacks These attacks have the specificity that
they require de-packaging of the chip to get access to the chip surface but do not tamper with
the passivation layer ( they do not require electrical contact to the metal surface)
2 Active vs passive
Active attacks try to tamper with the cards proper functioning For example fault
induction attacks will try to induce errors in the computation
As opposed passive attacks will simply observe the cards behavior during its
processing without disturbing it
Note that these two axes are well orthogonal
An Invasive attack may completely avoid disturbing the cards behavior and a passive
attack may require a preliminary de-packaging for the required information to be observable
These attacks are of course not mutually exclusive an invasive attack may for example serve
as a preliminary step for a non-invasive one by giving a detailed description of the chips
architecture that helps to find out where to put external probes
As smart cards are usually equipped with protection mechanisms that are supposed to
react to invasive attacks (although several invasive attacks are nonetheless capable to defeat
these mechanisms as will be illustrated below) On the other hand it is worth pointing out that
a non-invasive attack is completely undetectable there is for example no way for a smart card
to figure out that its running time is currently being measured Other countermeasures will
therefore be necessary From an economical point of view invasive attacks are usually more
expensive to deploy on a large scale since they require individual processing of each attacked
device In this sense non-invasive attacks constitute therefore a bigger menace for the smart
card industry
Invasive attacks involved a relatively high capital investment for lab equipment plus a
moderate investment of effort for each individual chip attacked Non-invasive attacks require
only a moderate capital investment plus a moderate investment of effort in designing an attack
on a particular type of device Thereafter the cost per device attacked is low Semi-invasive
attacks can be carried out using very cheap and simple equipment
The attacker can gain information by
22
1 Probing attacks
2 Fault induction attacks
3 Timing attacks
4 Power analysis attacks and
5 Electromagnetic timing attacks
These attacks are performed during the switching behavior of digital
complementary metalndashoxidendashsemiconductor (CMOS) gates Of all these Power analysis attack
is of major concern
52 Power analysis attacks
The power consumption of a cryptographic device may provide much information
about the operations that take place and the involved parameters This is the idea of simple and
differential power analysis first introduced by Kocher et al As the clock ticks the cards
energy is also provided by the terminal and can therefore easily be measured Basically to
measure a circuits power consumption a small (eg 50 ohm) resistor is inserted in series with
the power or ground input The voltage difference across the resistor divided by the resistance
yields the current Well-equipped electronics labs have equipment that can digitally sample
voltage differences at extraordinarily high rates (over 1GHz) with excellent accuracy (less than
1 error) Devices capable of sampling at 20MHz or faster and transferring the data to a PC
can be bought for less than US$ 400
Power analysis attacks are of two types
1 Simple power analysis attack and
2 Differential Power Analysis attack
SPA attacks on smartcards typically take a few seconds per card while DPA attacks
can take several hours In a general with a somewhat academic perspective we may consider
the entire internal state of the block cipher to be all the intermediate results and values that are
never included in the output in normal operations For example DES has 16 rounds we can
consider the intermediate states state [115] after each round except the last as a secret internal
state Side channels typically give information about these internal states or about the
operations used in the transition of this internal state from one round to another The type of
side-channel will of course determine what information is available to the attacker about these
states The attacks typically work by finding some information about the internal state of the
cipher which can be learned both by guessing part of the key and checking the value directly
23
and additionally by some statistical property of the cipher that makes that checkable value
slightly nonrandom
521 Simple Power Analysis attack (SPA)
Simple Power Analysis is generally based on looking at the visual representation of the
power consumption of a unit while an encryption operation is being performed Simple Power
Analysis is a technique that involves direct interpretation of power consumption measurements
collected during cryptographic operations SPA can yield information about a devices
operation as well as key material
A trace refers to a set of power consumption measurements taken across a
cryptographic operation For example a 1 millisecond operation sampled at 5 MHz yields a
trace containing 5000 points Figure for example shows an SPA trace from a smart card
performing a DES operation
Figure SPA monitoring from a single DES operation performed by a typical smart card The
upper trace shows the entire encryption operation including the initial permutation the 16
DES rounds and the final permutation The lower trace is a detailed view of the second and
third rounds
Because SPA can reveal the sequence of instructions executed it can be used to break
cryptographic implementations in which the execution path depends on the data being
processed For example
DES key schedule the DES key schedule computation involves rotating 28-bit key registers
A conditional branch is commonly used to check the bit shifted off the end so that ldquo1 bits can
24
be wrapped around The resulting power consumption traces for a ldquo1 bit and a ldquo0 bit will
contain different SPA features if the execution paths take different branches for each
DES permutations DES implementations perform a variety of bit permutations Conditional
branching in software or microcode can cause significant power consumption differences for
ldquo0 and ldquo1 bits
Comparisons String or memory comparison operations typically perform a conditional
branch when a mismatch is found This conditional branching causes large SPA (and
sometimes timing) characteristics
Multipliers Modular multiplication circuits tend to leak a great deal of information about the
data they process The leakage functions depend on the multiplier design but are often strongly
correlated to operand values and Hamming weights
Exponentiators A simple modular exponentiation function scans across the exponent
performing a squaring operation in every iteration with an additional multiplication operation
for each exponent bit that is equal to ldquo1 The exponent can be compromised if squaring and
multiplication operations have different power consumption characteristics take different
amounts of time or are separated by different code Modular exponentiation functions that
operate on two or more exponent bits at a time may have more complex leakage functions
522Differential Power Analysis attack (DPA)
In addition to large-scale power variations due to the instruction sequence there are
effects correlated to data values being manipulated These variations tend to be smaller and are
sometimes overshadowed by measurement errors and other noise In such cases it is still often
possible to break the system using statistical functions tailored to the target algorithm
To implement the DPA attack an attacker first observes m encryption operations and captures
power traces T1 m [1 k] containing k samples each In addition the attacker records the
cipher text C1 m No knowledge of the plain text is required DPA analysis uses power
consumption measurements to determine whether a key block guess Ks is correct The attacker
computes a k-sample differential trace centD [1 k] by finding the difference between the
average of the traces for which a certain intermediate value V is one and the average of the
traces for which V is zero Thus cent D[j) is the average over C1m of the effect due to the value
represented by the selection function D on the power consumption at point j In particular25
If Ks is incorrect the bit computed using D will differ from the actual target bit for about half
of the ciphertext Ci The selection function is thus effectively uncorrelated to what was actually
computed by the target device If a random function is used to divide a set into two subsets the
difference in the averages of the subsets should approach zero as the subset sizes approach
infinity
Thus because trace components uncorrelated to D will diminish with 1 pm causing the
differential trace to become at (the actual trace may not be completely at as D with Ks
incorrect may have a weak correlation to D with the correct Ks) If Ks is correct however the
computed value for D (Ci bKs) will equal the actual value of target bit b with probability 1
The selection function is thus correlated to the value of the bit considered Other data values
measurement errors etc that are not correlated to D approach zero Because power
consumption is correlated to data bit values the plot of centD will be degat with spikes in regions
where D is correlated to the values being processed The correct value of Ks can thus be
identified from the spikes in its differential trace Four values of b correspond to each S box
providing confirmation of key block guesses Finding all eight Ks yields the entire 48-bit round
sub key The remaining 8 key bits can be found easily using exhaustive search or by analyzing
one additional round Triple DES keys can be found by analyzing an outer DES operation first
using the resulting key to decrypt the cipher text and attacking the next DES key DPA can use
known plaintext or known cipher text and can find encryption or decryption keys
26
CHAPTER 6 CONSTANT POWER CONSUMING
LOGIC STYLES
The power consumption of traditional standard cells and logic is
dependent on the signal activity When the output of the logic gate makes
a 0 to 1 transition a current comes from the power supply and charges the
output capacitance On the other hand when the output sees a 1 to 0 a 0
to 0 or a 1 to 1 transition no or only a limited amount of energy (due to
short circuit or leakage) is consumed from the power supply This is the
fundamental reason why information is leaked through the power supply
and why power attacks are possible The basis of a secure digital design
flow is a logic style with constant power consumption
61 Current Mode Logic
Current mode logic (CML) eg current steering logic seems the
ideal solution This type of logic continuously draws a current from the
supply and measures its state through the path that the current takes A
gate has constant power consumption if it draws a perfectly constant
current from the power supply independently of the input and output
signals To build a current source capable of generating a constant current
special circuit techniques that minimize channel length modulation have to
be used
The decisive drawback of CML however is its static power
consumption When the logic gate is not processing any data it burns the
27
current which makes this logic style unacceptable for embedded battery-
operated devices
62 Voltage Mode Logic (CMOS circuit styles)
Voltage mode logic (VML) eg static CMOS logic only draws a current from the
supply to change state and measures its state by the amount of charge it stores on a
capacitance A regular standard CMOS circuit will only consume power when a capacitance
gets charged and later discharged ie when a gate switches state It is the main reason that
CMOS is the style of choice for every battery operated or low power device This is illustrated
in the figure below for simple inverter Thus static CMOS is the preferred logic style because
of its low power consumption and high noise margins
Standard CMOS inverter
Yet two conditions must be satisfied for VML to have constant power consumption
namely
1) A logic gate must have exactly one switching event per signal transition
2) The logic gate must charge a constant capacitance in that switching event
28
Here above all the four transitions of CMOS inverter can be distinguished when
monitoring the power supply
63 Dynamic Differential Logic
Dynamic differential logic sometimes also referred to as dual rail with pre-charge
logic fulfills the first condition A differential logic family uses the true and the false
representation of the input and output signals and a dynamic logic family alternates pre-charge
and evaluation phases As a result since both outputs (true and false) are pre-charged to 1
exactly one of the two output nodes evaluates to 0 to have a differential output signal in the
evaluation phase The discharged output node is charged to 1 in the following pre-charge phase
to pre-charge both outputs to 1 In other words every signal transition including the events in
which the input signals remain constant is represented with an actual switching event in
which the logic gate charges a capacitance All the logic families that have been introduced to
thwart the differential power analysis (DPA) by using dynamic differential logic in the
following techniques
1 Sense Amplifier Based Logic (SABL) and
2 Wave Dynamic Differential Logic (WDDL) gates
631 Sense Amplifier Based logic (SABL)
SABL has its main advantage that it has balanced input and output nodes and that all
internal nodes connect to an output The output capacitances can be balanced Systematic
methods have been developed to make sure that both branches of the differential pull down
network are balanced and that no memory effects are present in the network Sense Amplifier
Based logic is illustrated as
29
Sense Amplifier Based Logic
ANDNAND gate
This circuit style does require however a full custom characterization and layout It also
suffers from a high clock load common to all dynamic logic gates
632 Wave Dynamic Differential Logic Gates (WDDL)
WDDL logic can be implemented with static CMOS logic Static CMOS
standard cells are combined to form secure compound standard cells
which have a reduced power signature WDDL has many advantages It can
be readily implemented from an existing standard cell library The design
flow is fully supported with accurate EDA library files that come directly
from the vendor WDDL also results in a dynamic differential logic with only
a small load capacitance on the pre-charge control signal and with the low
power consumption and the high noise margins of static CMOS
Advantages of WDDL logic style are as follows
30
A major advantage of the proposed logic style is that it can be incorporated by the common
Electronic Design Automation (EDA) tool flow
No special design rules are involved in the interconnection of WDDL gates
The switching factor of WDDL is 100 A WDDL gate consists of a parallel
combination of two positive complementary gates one calculating the
true output using the true inputs the other the false output using the
false inputs A positive gate produces a zero output for an all zero input
The AND gate and the OR gate are examples of positive gates A
complementary gate sometimes also referred to as a dual gate
expresses the false output of the original logic gate using the false
inputs of the original gate The AND gate fed with true input signals and
the OR gate fed with false input signals are two dual gates Fig shows
the WDDL AND gate and the WDDL OR gate In the evaluation phase
each input signal is differential and the WDDL gate calculates its
differential output In the pre-charge phase the inputs to the WDDL gate
are set at 0 This puts the output of the gate at 0 A module in WDDL
pre-charges without distributing the pre-charge signal to each individual
gate During the pre-charge phase the input vector of the combinatorial
logic is set at all 0s Each individual gate will eventually have all its
inputs at 0 evaluate its output to 0 and pass this 0 value to the next
gate One could say that the pre-charge signal travels over the
combinatorial logic as a 0-wave hence WDDL There are several ways
to launch to pre-charge wave In Fig a pre-charge operator is inserted
at the start of every combinatorial logic tree ie the inputs of the
encryption module and the outputs of the registers They produce an all-
zero output in the pre-charge phase (clk-signal high) but let the
31
differential signal through during the evaluation phase (clk-signal low)
Fig
ure WDDL Pre-charge wave generationCHAPTER 7
WDDL GATESThe methodology used in the project is bottom-up approach Lower
modules are designed and later integrated to form larger modules whose further integration
leads to the final top module As it is a fact that logic gates form lower level modules
initially logic gates required for the design are implemented in WDDL style WDDL
demands a parallel combination of two positive complementary gates one calculating the
true value and the other negative value The logic gates like OR AND XOR have been
implemented Besides there is even implementation of Full Adder 32-bit XOR
etc71WDDL OR gateA WDDL OR gate is constructed by considering conventional
OR gate in parallel to its complementary gate ie AND gate as shown in the following
32
figure Figure
41 WDDL OR GateThe NAND and NOT gates used act as Precharge operator injecting
signal lsquo0rsquo into the gates when lsquoclkrsquo signal is high ie during the Precharge phase72
WDDL AND gateA WDDL AND gate is constructed by considering conventional
AND gate in parallel to its complementary gate ie OR gate as shown in the following
33
figure Figure
42 WDDL AND Gate73 WDDL NAND gateA WDDL NAND gate is constructed by
considering conventional AND gate in parallel to its complementary gate ie OR gate as
shown in the following figure
34
Figure
43 WDDL NAND Gate74 WDDL NOR gateA WDDL NOR gate is constructed by
considering conventional OR gate in parallel to its complementary gate ie AND gate as
shown in the following figure
35
Figure 44 WDDL
NOR Gate 75 WDDL XOR gate XOR function can be implemented by the
Boolean Equation A B = ABrsquo + ArsquoB Therefore XOR Gate is implemented
in terms of AND gate and OR gate as shown in the figure 44 It can also be implemented
by instantiating a WDDL AND gate and WDDL OR gate But the number of gates
involved in the latter one is greater than the former one Therefore the first method of
implementation is followed rather than the second one
36
Figure 45
WDDL XOR gateWith the help of the above basic gates Full adder circuit has been
designed by instantiating the above designed WDDL gates During the implementation of
the Blowfish algorithm a 32-bit XOR gate and 32-bit Adder circuit are required They can
be easily implemented by instantiating the corresponding lower module 32 number of
timesCHAPTER 8 FRONT END
RESULTSWDDL OR GATESynthesis
Report==========================================================
= Final Report
===========================================================Final
ResultsRTL Top Level Output File Name wddlorngrTop Level Output File Name
wddlorOutput Format NGCOptimization Goal SpeedKeep
Hierarchy NODesign Statistics IOs 5Cell Usage
BELS 2 LUT3 2 IO Buffers 5
37
IBUF 3 OBUF
2============================================================Devi
ce utilization summary---------------------------Selected Device 3s250eft256-4 Number
of Slices 1 out of 2448 0 Number of 4 input LUTs 2
out of 4896 0 Number of IOs 5 Number of bonded IOBs
5 out of 172 2 Timing Summary---------------Speed Grade -4Maximum
combinational path delay 6236nsSimulation Result
S
ynthesis Result
38
WDD
L AND GATESynthesis
Report==========================================================
== Final Report
============================================================Fina
l ResultsRTL Top Level Output File Name wddlgatesngrTop Level Output File
Name wddlgatesOutput Format NGCOptimization Goal
SpeedKeep Hierarchy NODesign Statistics IOs
5Cell Usage BELS 2 LUT3 2 IO Buffers
5 IBUF 3 OBUF
2===========================================================Devic
e utilization summary---------------------------Selected Device 3s250etq144-4 Number
of Slices 1 out of 2448 0 Number of 4 input LUTs 2
out of 4896 0 Number of IOs 5 Number of bonded IOBs
5 out of 108 4 Timing Summary---------------Speed Grade -4Maximum
combinational path delay 6236nsSimulation Result
39
Sy
nthesis Result
WDDL NAND GATESynthesis
Report==========================================================
== Final Report
============================================================Fina
l ResultsRTL Top Level Output File Name wddlnand1ngrTop Level Output File
Name wddlnand1Output Format NGCOptimization Goal
SpeedKeep Hierarchy NODesign Statistics IOs
5Cell Usage BELS 2 LUT3 2 IO Buffers
40
5 IBUF 3 OBUF
2============================================================Devi
ce utilization summarySelected Device 3s500efg320-4 Number of Slices
1 out of 4656 0 Number of 4 input LUTs 2 out of 9312 0
Number of IOs 5 Number of bonded IOBs 5 out of 232
2 Timing SummarySpeed Grade -4Maximum combinational path delay
6236nsSimulation Result
Synthesis Result
WD
41
DL XOR GATESimulation Result
Synthesis Result
WDDL XOR GATESynthesis
Report==========================================================
== Final Report
===========================================================Final
42
ResultsRTL Top Level Output File Name wddlxorgatengrTop Level Output File
Name wddlxorgateOutput Format NGCOptimization Goal
SpeedKeep Hierarchy NODesign Statistics IOs
5Cell Usage BELS 2 LUT3 2 IO Buffers
5 IBUF 3 OBUF
2============================================================Devi
ce utilization summary---------------------------Selected Device 3s250eft256-4 Number
of Slices 1 out of 2448 0 Number of 4 input LUTs 2
out of 4896 0 Number of IOs 5 Number of bonded IOBs
5 out of 172 2 Timing Summary---------------Speed Grade -4Maximum
combinational path delay 6236nsSimulation Result
Synthesis Result
43
CHAPTER 9 SUMMARY AND CONCLUSION 91
SummaryIn order to provide security to ICs against side-channel attacks especially
Differential Power Analysis (DPA) it is necessary to implement the design in a logic that
can render constant power dissipation irrespective of the input combination WDDL is
proved to be advantageous to others and therefore is of great significance In this
dissertation work architecture for Blowfish Algorithm is designed and implemented in
WDDL style In this implementation bottom-up approach is used The low level entities
are designed and later they are all combined to form the entire module The key
scheduling is online The sub-keys generated for a particular key can be used for the
encryption of the entire data to be encrypted with that key The sub keys are given in
reverse direction for the decryption data path Initially logic gates are implemented in
WDDL and then higher modules have been designed by instantiating the WDDL gates to
form the entire module thus resulting in constant power dissipation irrespective of any
input data combination The entire design works in two phases namely Precharge phase and
Evaluation phase In the Precharge phase all the signals of the design are zeroed and
during the Evaluation phase the functionality of the design is achieved This sort of design
has been found simple and very effective in thwarting the side-channel attack namely
Differential Power analysis (DPA)92 ConclusionThe crypto processor has been
44
designed for the key size of 448 bits and plain text of 64 bits The code for the
implementation has been written in VHDL The functional verification has been done using
the Cadence (NC Launch) Simulation package and Synthesis using RTL Compiler The
Backend of the design is done using the SOC EncounterAccording to the specifications
desired functionality has been achieved In the output during the Evaluation phase there
has been same number of transitions thus resulting in constant power dissipation During
Synthesis it has been observed that a simple WDDL gate comprised many conventional
gates Therefore the area of the design has grown nearly three-fold when compared to the
design implemented in conventional CMOS logic at the cost of security incorporated into
the IC against Differential Power Analysis (DPA) Due to the constant power dissipation at
the output hacker cannot apply Differential Power Analysis (DPA) scheme to find the
secret key that is being used in the crypto-processor Thus security against DPA is
incorporated into the IC at hardware level by implementing the design in WDDL style
which is quite simple and effectiveCHAPTER 10
REFERENCES 101 Referred Technical papers[1] Kris Tiri Member
IEEE and Ingrid Verbauwhede Senior Member IEEE ldquoA Digital Design Flow for
Secure Integrated Circuitsrdquo IEEE Transaction on Computer-Aided Design of Integrated
Circuits and Systems Vol 25 No 7 July 2006[2] Prof Jean-Jacques Quisquater Math
RiZK laquo Side Channel Attacks raquo October 2002[3] Ross Anderson Mike Bond Jolyon
Clulow and Sergei Skorobogatov ldquoCrypto processors ndash A Surveyrdquo IEEE Proceedings[4]
Noohul Basheer Zain Ali James M Noras ldquoOptimal Data path Design for a Cryptographic
Processor The Blowfish Algorithmrdquo Malaysian Journal of Computer Science Vol 14 No
1 June 2001 pp 16-27[5] Dan Rinehimer Derek Wilson ldquoSummary of B Schneierrsquos
Description of New Variable Length Key 64-Bit Block Cipher (Blowfish)rdquo[6] Kris Tiri and
Ingrid Verbauwhede ldquoA Dynamic and Differential CMOS Logic Style to Resist Power and
Timing Attacks on Security ICrsquosrdquo[7] Discretix Technologies Limited ldquoIntroduction to Side
45
Channel Attacksrdquo[8] Kris Tiri Moonmoon Akmal and Ingrid Verbauwhede ldquoA Dynamic
and Differential Logic with Signal Independent Power Consumption to withstand
Differential Power Analysis on Smart CardsrdquoReference books[1] William Stallings
ldquoCryptography and Network Security Principles and Practicesrdquo Pearson Education
2003[2] Samir Palnitkar ldquoVerilog HDL A Guide to Digital Design and Synthesisrdquo
Prentice Hall Referred Web Pages[1] httpwwwschneiercomblowfishhtml[2]
httpwwwnistgov[3] httpwwwdiscretixcomPDFIntroduction20to20Side20Channel
20Attackspdf[4] httpwwwwipointpctdbenwojsp
IA=WO2005081085ampDISPLAY=CLAIMS
46
CHAPTER 3 HARDWARE DESCRIPTIVE
LANGUAGE (VHDL)
Why (V) HDL
Interoperability
Technology independence
Design reuse
Several levels of abstraction
Readability
Standard language
Widely supported
What is VHDL
VHDL = VHSIC Hardware Description Language(VHSIC = Very High-Speed IC)
Design specification language
Design entry language
Design simulation language
Design documentation language
An alternative to schematics
Brief History
VHDL Was developed in the early 1980s for managing design problems that involved
large circuits and multiple teams of engineers
Funded by US Department of Defence
11
The first publicly available version was released in 1985
In 1986 IEEE (Institute of Electrical and Electronics Engineers Inc) was presented
with a proposal to standardize the VHDL
In 1987 standardization =gt IEEE 1076-1987
An improved version of the language was released in 1994 =gt IEEE standard1076-
1993
Related Standards
IEEE 1076 doesnrsquot support simulation conditions such as unknown and high-
impedance
Soon after IEEE 1076-1987 was released simulator companies began using their own
non-standard types =gt VHDL was becoming a nonstandard
IEEE 1164 standard was developed by an IEEE1048715IEEE 1164 contains definitions for a
nine-valued data type std_logic
IEEE 10763 (Numeric or Synthesis Standard) defines data types as they relate to actual
hardware
Defines eg two numeric types signed and unsigned
VHDL Environment
12
Design Units
Segments of VHDL code can be compiled separately and stored in a library
Entities
A black box with interface definition
Defines the inputsoutputs of a component (define pins)
A way to represent modularity in VHDL
Similar to symbol in schematic
Entity declaration describes entity
Eg
Entity Comparator is
Port (A B in std_logic_vector (7 downto0)
EQ out std_logic)
end Comparator
13
Ports
Provide channels of communication between the component and its environment
Each port must have a name direction and a type
An entity may have NO port declaration
Port directions
In A value of a port can be read inside the component but cannot be assigned
Multiple reads of port are allowed
Out Assignments can be made to a port but data from a port cannot be read Multiple
assignments are allowed
In out Bi-directional assignments can be made and data can be read Multiple
assignments are allowed
Buffer An out port with read capability May have at most one assignment (are not
recommended)
Architectures
Every entity has at least one architecture
One entity can have several architectures
Architectures can describe design using
BehaviorndashStructurendashDataflow
Architectures can describe design on many levelsndashGate levelndashRTL (Register Transfer
Level)ndashBehavioral level
Configuration declaration links architecture to entity
Eg
Architecture Comparator1 of Comparator is
Begin
EQ lt= rsquo1rsquowhen (A=B) else rsquo0rsquo
End Comparator1
Configurations
Links entity declaration and architecture body together
14
Concept of default configuration is a bit messy in VHDL lsquo87
ndashLast architecture analyzed links to entity
Can be used to change simulation behavior without re-analyzing the VHDL source
Complex configuration declarations are ignored in synthesis
Some entities can have eggate level architecture and behavioral architecture
Are always optional
Packages
Packages contain information common to many design units
1 Package declaration
Constant declarations
ndash Type and subtype declarations
ndash Function and procedure declarations
ndash Global signal declarations
ndash File declarations
ndash Component declarations
2 Package body
ndash Is not necessary needed
ndash Function bodies
ndash Procedure bodies
Packages are meant for encapsuling data which can be shared globally among several design
units These consist of declaration part and optional body part
Package declaration can contain
ndash Type and subtype declarations
ndash Subprograms
ndash Constants
ndash Alias declarations
ndash Global signal declarations
ndash file declarations
ndash Component declarations
Package body consists of
15
ndash Subprogram declarations and bodies
ndash Type and subtype declarations
ndash Deferred constants
ndash File declarations
Libraries
Collection of VHDL design units (database)
1 Packages
package declaration
package body
2 Entities (entity declaration)
3 Architectures (architecture body)
4 Configurations (configuration declarations)
Usually directory in UNIX file system
Can be also any other kind of database
Levels of Abstraction
VHDL supports many possible styles of design description which differ primarily in how
closely they relate to the HW
It is possible to describe a circuit in a number of ways
Structural-------
Dataflow ------- Higher level of abstraction
Behavioral -------
Structural VHDL description
Circuit is described in terms of its components
From a low-level description (eg transistor-level description) to a high level
description (eg block diagram)
For large circuits low-level descriptions quickly become impractical
Dataflow VHDL Description
Circuit is described in terms of how data moves through the system
16
In the dataflow style you describe how information flows between registers in the
system
The combinational logic is described at a relatively high level the placement and
operation of registers is specified quite precisely
The behavior of the system over the time is defined by registers
There are no build-in registers in VHDL-language
ndashEither lower level description
ndashor behavioral description of sequential elements is needed
The lower level register descriptions must be created or obtained
If there is no 3rd party models for registers =gt you must write the behavioral
description of registers
The behavioral description can be provided in the form of subprograms(functions or
procedures)
Behavioral VHDL Description
Circuit is described in terms of its operation over time
Representation might include eg state diagrams timing diagrams and algorithmic
descriptions
The concept of time may be expressed precisely using delays (eg A lt= B after 10 ns)
If no actual delays are used order of sequential operations is defined
17
In the lower levels of abstraction (eg RTL) synthesis tools ignore detailed timing
specifications
The actual timing results depend on implementation technology and efficiency of
synthesis tool
There are a few tools for behavioral synthesis
Concurrent Vs Sequential
Processes
Basic simulation concept in VHDL
VHDL description can always be broken up to interconnected processes
Quite similar to UNIX process
18
Process keyword in VHDL
Process statement is concurrent statement
Statements inside process statements are sequential statements
Process must contain either sensitivity list or wait statement(s) but NOT both
Sensitivity list or wait statement(s) contains signals which wakes process up
General Format
Process [(sensitivity list)]
process_declarative_part
begin
process_statements
[wait_statement]
End process
19
CHAPTER 4 SMART
CARD OVERVIEW
This section will very briefly introduce the concept of a smart card Basically a smart
card is a computer embedded in a safe It consists of a (typically 8-bit or 32-bit) processor
together with ROM EEPROM and a small amount of RAM which is therefore capable of
performing computations The main goal of a smart card is to allow the execution of
cryptographic operations involving some secret parameter (the key) while not revealing this
parameter to the outside world As opposed the goal of the attacker is to recover this secret
parameter This processor is embedded in a chip and connected to the outside world through
eight wires the role use position of which is normalized In addition to the inputoutput wires
the parts we will be the most interested in are the following
1 Power supply Smart cards do not have an internal battery
2 The current they need is provided by the smart card reader This will make the smart
cards power consumption pretty easy to measure for the attacker
3 Clock Similarly smart cards do not dispose of an internal clock either The clock ticks
must also be provided from the outside world As a consequence this will allow the
attacker to measure the cards running time with very good precision
Smart cards are usually equipped with protection mechanisms composed of a shield (the
passivation layer) whose goal is to hide the internal behavior of the chip and possibly sensors
that react when the shield is removed by destroying all sensitive data and preventing the card
to function properly
20
CHAPTER 5 SIDE
CHANNEL ATTACKS
ldquoSide channel attacksrdquo are attacks that are based on ldquoSide Channel Informationrdquo Side
channel information is information that can be retrieved from the encryption device that is
neither the plaintext to be encrypted nor the cipher text resulting from the encryption process
In the past an encryption device was perceived as a unit that receives plaintext input
and produces cipher text output and vice-versa Attacks were therefore based on either
knowing the cipher text (such as cipher text-only attacks) or knowing both (such as known
plaintext attacks) or on the ability to define what plaintext is to be encrypted and then seeing
the results of the encryption (known as chosen plaintext attacks) Today it is known that
encryption devices have additional output and often additional inputs which are not the
plaintext or the cipher text
Encryption devices produce timing information (information about the time that
operations take) that is easily measurable radiation of various sorts power consumption
statistics (that can be easily measured as well) and more Often the encryption device also has
additional ldquounintentionalrdquo inputs such as voltage that can be modified to cause predictable
outcomes Side channel attacks make use of some or all of this information along with other
(known) cryptanalytic techniques to recover the key the device is using
Side channel analysis techniques are of concern because the attacks can be mounted
quickly and can sometimes be implemented using readily available hardware costing from only
a few hundred dollars to thousands of dollars
51 Classification of side channel attacks
The literature usually classifies side channel attacks along two orthogonal axes
1 Invasive vs Non-invasive
21
Invasive attacks require de-packaging the chip to get direct access to its components
A typical example of this is the connection of a wire on a data bus to see the data transfers
A non-invasive attack only exploits externally available information (the emission of
which is however often unintentional) such as running time power consumption
A new distinction called semi-invasive attacks These attacks have the specificity that
they require de-packaging of the chip to get access to the chip surface but do not tamper with
the passivation layer ( they do not require electrical contact to the metal surface)
2 Active vs passive
Active attacks try to tamper with the cards proper functioning For example fault
induction attacks will try to induce errors in the computation
As opposed passive attacks will simply observe the cards behavior during its
processing without disturbing it
Note that these two axes are well orthogonal
An Invasive attack may completely avoid disturbing the cards behavior and a passive
attack may require a preliminary de-packaging for the required information to be observable
These attacks are of course not mutually exclusive an invasive attack may for example serve
as a preliminary step for a non-invasive one by giving a detailed description of the chips
architecture that helps to find out where to put external probes
As smart cards are usually equipped with protection mechanisms that are supposed to
react to invasive attacks (although several invasive attacks are nonetheless capable to defeat
these mechanisms as will be illustrated below) On the other hand it is worth pointing out that
a non-invasive attack is completely undetectable there is for example no way for a smart card
to figure out that its running time is currently being measured Other countermeasures will
therefore be necessary From an economical point of view invasive attacks are usually more
expensive to deploy on a large scale since they require individual processing of each attacked
device In this sense non-invasive attacks constitute therefore a bigger menace for the smart
card industry
Invasive attacks involved a relatively high capital investment for lab equipment plus a
moderate investment of effort for each individual chip attacked Non-invasive attacks require
only a moderate capital investment plus a moderate investment of effort in designing an attack
on a particular type of device Thereafter the cost per device attacked is low Semi-invasive
attacks can be carried out using very cheap and simple equipment
The attacker can gain information by
22
1 Probing attacks
2 Fault induction attacks
3 Timing attacks
4 Power analysis attacks and
5 Electromagnetic timing attacks
These attacks are performed during the switching behavior of digital
complementary metalndashoxidendashsemiconductor (CMOS) gates Of all these Power analysis attack
is of major concern
52 Power analysis attacks
The power consumption of a cryptographic device may provide much information
about the operations that take place and the involved parameters This is the idea of simple and
differential power analysis first introduced by Kocher et al As the clock ticks the cards
energy is also provided by the terminal and can therefore easily be measured Basically to
measure a circuits power consumption a small (eg 50 ohm) resistor is inserted in series with
the power or ground input The voltage difference across the resistor divided by the resistance
yields the current Well-equipped electronics labs have equipment that can digitally sample
voltage differences at extraordinarily high rates (over 1GHz) with excellent accuracy (less than
1 error) Devices capable of sampling at 20MHz or faster and transferring the data to a PC
can be bought for less than US$ 400
Power analysis attacks are of two types
1 Simple power analysis attack and
2 Differential Power Analysis attack
SPA attacks on smartcards typically take a few seconds per card while DPA attacks
can take several hours In a general with a somewhat academic perspective we may consider
the entire internal state of the block cipher to be all the intermediate results and values that are
never included in the output in normal operations For example DES has 16 rounds we can
consider the intermediate states state [115] after each round except the last as a secret internal
state Side channels typically give information about these internal states or about the
operations used in the transition of this internal state from one round to another The type of
side-channel will of course determine what information is available to the attacker about these
states The attacks typically work by finding some information about the internal state of the
cipher which can be learned both by guessing part of the key and checking the value directly
23
and additionally by some statistical property of the cipher that makes that checkable value
slightly nonrandom
521 Simple Power Analysis attack (SPA)
Simple Power Analysis is generally based on looking at the visual representation of the
power consumption of a unit while an encryption operation is being performed Simple Power
Analysis is a technique that involves direct interpretation of power consumption measurements
collected during cryptographic operations SPA can yield information about a devices
operation as well as key material
A trace refers to a set of power consumption measurements taken across a
cryptographic operation For example a 1 millisecond operation sampled at 5 MHz yields a
trace containing 5000 points Figure for example shows an SPA trace from a smart card
performing a DES operation
Figure SPA monitoring from a single DES operation performed by a typical smart card The
upper trace shows the entire encryption operation including the initial permutation the 16
DES rounds and the final permutation The lower trace is a detailed view of the second and
third rounds
Because SPA can reveal the sequence of instructions executed it can be used to break
cryptographic implementations in which the execution path depends on the data being
processed For example
DES key schedule the DES key schedule computation involves rotating 28-bit key registers
A conditional branch is commonly used to check the bit shifted off the end so that ldquo1 bits can
24
be wrapped around The resulting power consumption traces for a ldquo1 bit and a ldquo0 bit will
contain different SPA features if the execution paths take different branches for each
DES permutations DES implementations perform a variety of bit permutations Conditional
branching in software or microcode can cause significant power consumption differences for
ldquo0 and ldquo1 bits
Comparisons String or memory comparison operations typically perform a conditional
branch when a mismatch is found This conditional branching causes large SPA (and
sometimes timing) characteristics
Multipliers Modular multiplication circuits tend to leak a great deal of information about the
data they process The leakage functions depend on the multiplier design but are often strongly
correlated to operand values and Hamming weights
Exponentiators A simple modular exponentiation function scans across the exponent
performing a squaring operation in every iteration with an additional multiplication operation
for each exponent bit that is equal to ldquo1 The exponent can be compromised if squaring and
multiplication operations have different power consumption characteristics take different
amounts of time or are separated by different code Modular exponentiation functions that
operate on two or more exponent bits at a time may have more complex leakage functions
522Differential Power Analysis attack (DPA)
In addition to large-scale power variations due to the instruction sequence there are
effects correlated to data values being manipulated These variations tend to be smaller and are
sometimes overshadowed by measurement errors and other noise In such cases it is still often
possible to break the system using statistical functions tailored to the target algorithm
To implement the DPA attack an attacker first observes m encryption operations and captures
power traces T1 m [1 k] containing k samples each In addition the attacker records the
cipher text C1 m No knowledge of the plain text is required DPA analysis uses power
consumption measurements to determine whether a key block guess Ks is correct The attacker
computes a k-sample differential trace centD [1 k] by finding the difference between the
average of the traces for which a certain intermediate value V is one and the average of the
traces for which V is zero Thus cent D[j) is the average over C1m of the effect due to the value
represented by the selection function D on the power consumption at point j In particular25
If Ks is incorrect the bit computed using D will differ from the actual target bit for about half
of the ciphertext Ci The selection function is thus effectively uncorrelated to what was actually
computed by the target device If a random function is used to divide a set into two subsets the
difference in the averages of the subsets should approach zero as the subset sizes approach
infinity
Thus because trace components uncorrelated to D will diminish with 1 pm causing the
differential trace to become at (the actual trace may not be completely at as D with Ks
incorrect may have a weak correlation to D with the correct Ks) If Ks is correct however the
computed value for D (Ci bKs) will equal the actual value of target bit b with probability 1
The selection function is thus correlated to the value of the bit considered Other data values
measurement errors etc that are not correlated to D approach zero Because power
consumption is correlated to data bit values the plot of centD will be degat with spikes in regions
where D is correlated to the values being processed The correct value of Ks can thus be
identified from the spikes in its differential trace Four values of b correspond to each S box
providing confirmation of key block guesses Finding all eight Ks yields the entire 48-bit round
sub key The remaining 8 key bits can be found easily using exhaustive search or by analyzing
one additional round Triple DES keys can be found by analyzing an outer DES operation first
using the resulting key to decrypt the cipher text and attacking the next DES key DPA can use
known plaintext or known cipher text and can find encryption or decryption keys
26
CHAPTER 6 CONSTANT POWER CONSUMING
LOGIC STYLES
The power consumption of traditional standard cells and logic is
dependent on the signal activity When the output of the logic gate makes
a 0 to 1 transition a current comes from the power supply and charges the
output capacitance On the other hand when the output sees a 1 to 0 a 0
to 0 or a 1 to 1 transition no or only a limited amount of energy (due to
short circuit or leakage) is consumed from the power supply This is the
fundamental reason why information is leaked through the power supply
and why power attacks are possible The basis of a secure digital design
flow is a logic style with constant power consumption
61 Current Mode Logic
Current mode logic (CML) eg current steering logic seems the
ideal solution This type of logic continuously draws a current from the
supply and measures its state through the path that the current takes A
gate has constant power consumption if it draws a perfectly constant
current from the power supply independently of the input and output
signals To build a current source capable of generating a constant current
special circuit techniques that minimize channel length modulation have to
be used
The decisive drawback of CML however is its static power
consumption When the logic gate is not processing any data it burns the
27
current which makes this logic style unacceptable for embedded battery-
operated devices
62 Voltage Mode Logic (CMOS circuit styles)
Voltage mode logic (VML) eg static CMOS logic only draws a current from the
supply to change state and measures its state by the amount of charge it stores on a
capacitance A regular standard CMOS circuit will only consume power when a capacitance
gets charged and later discharged ie when a gate switches state It is the main reason that
CMOS is the style of choice for every battery operated or low power device This is illustrated
in the figure below for simple inverter Thus static CMOS is the preferred logic style because
of its low power consumption and high noise margins
Standard CMOS inverter
Yet two conditions must be satisfied for VML to have constant power consumption
namely
1) A logic gate must have exactly one switching event per signal transition
2) The logic gate must charge a constant capacitance in that switching event
28
Here above all the four transitions of CMOS inverter can be distinguished when
monitoring the power supply
63 Dynamic Differential Logic
Dynamic differential logic sometimes also referred to as dual rail with pre-charge
logic fulfills the first condition A differential logic family uses the true and the false
representation of the input and output signals and a dynamic logic family alternates pre-charge
and evaluation phases As a result since both outputs (true and false) are pre-charged to 1
exactly one of the two output nodes evaluates to 0 to have a differential output signal in the
evaluation phase The discharged output node is charged to 1 in the following pre-charge phase
to pre-charge both outputs to 1 In other words every signal transition including the events in
which the input signals remain constant is represented with an actual switching event in
which the logic gate charges a capacitance All the logic families that have been introduced to
thwart the differential power analysis (DPA) by using dynamic differential logic in the
following techniques
1 Sense Amplifier Based Logic (SABL) and
2 Wave Dynamic Differential Logic (WDDL) gates
631 Sense Amplifier Based logic (SABL)
SABL has its main advantage that it has balanced input and output nodes and that all
internal nodes connect to an output The output capacitances can be balanced Systematic
methods have been developed to make sure that both branches of the differential pull down
network are balanced and that no memory effects are present in the network Sense Amplifier
Based logic is illustrated as
29
Sense Amplifier Based Logic
ANDNAND gate
This circuit style does require however a full custom characterization and layout It also
suffers from a high clock load common to all dynamic logic gates
632 Wave Dynamic Differential Logic Gates (WDDL)
WDDL logic can be implemented with static CMOS logic Static CMOS
standard cells are combined to form secure compound standard cells
which have a reduced power signature WDDL has many advantages It can
be readily implemented from an existing standard cell library The design
flow is fully supported with accurate EDA library files that come directly
from the vendor WDDL also results in a dynamic differential logic with only
a small load capacitance on the pre-charge control signal and with the low
power consumption and the high noise margins of static CMOS
Advantages of WDDL logic style are as follows
30
A major advantage of the proposed logic style is that it can be incorporated by the common
Electronic Design Automation (EDA) tool flow
No special design rules are involved in the interconnection of WDDL gates
The switching factor of WDDL is 100 A WDDL gate consists of a parallel
combination of two positive complementary gates one calculating the
true output using the true inputs the other the false output using the
false inputs A positive gate produces a zero output for an all zero input
The AND gate and the OR gate are examples of positive gates A
complementary gate sometimes also referred to as a dual gate
expresses the false output of the original logic gate using the false
inputs of the original gate The AND gate fed with true input signals and
the OR gate fed with false input signals are two dual gates Fig shows
the WDDL AND gate and the WDDL OR gate In the evaluation phase
each input signal is differential and the WDDL gate calculates its
differential output In the pre-charge phase the inputs to the WDDL gate
are set at 0 This puts the output of the gate at 0 A module in WDDL
pre-charges without distributing the pre-charge signal to each individual
gate During the pre-charge phase the input vector of the combinatorial
logic is set at all 0s Each individual gate will eventually have all its
inputs at 0 evaluate its output to 0 and pass this 0 value to the next
gate One could say that the pre-charge signal travels over the
combinatorial logic as a 0-wave hence WDDL There are several ways
to launch to pre-charge wave In Fig a pre-charge operator is inserted
at the start of every combinatorial logic tree ie the inputs of the
encryption module and the outputs of the registers They produce an all-
zero output in the pre-charge phase (clk-signal high) but let the
31
differential signal through during the evaluation phase (clk-signal low)
Fig
ure WDDL Pre-charge wave generationCHAPTER 7
WDDL GATESThe methodology used in the project is bottom-up approach Lower
modules are designed and later integrated to form larger modules whose further integration
leads to the final top module As it is a fact that logic gates form lower level modules
initially logic gates required for the design are implemented in WDDL style WDDL
demands a parallel combination of two positive complementary gates one calculating the
true value and the other negative value The logic gates like OR AND XOR have been
implemented Besides there is even implementation of Full Adder 32-bit XOR
etc71WDDL OR gateA WDDL OR gate is constructed by considering conventional
OR gate in parallel to its complementary gate ie AND gate as shown in the following
32
figure Figure
41 WDDL OR GateThe NAND and NOT gates used act as Precharge operator injecting
signal lsquo0rsquo into the gates when lsquoclkrsquo signal is high ie during the Precharge phase72
WDDL AND gateA WDDL AND gate is constructed by considering conventional
AND gate in parallel to its complementary gate ie OR gate as shown in the following
33
figure Figure
42 WDDL AND Gate73 WDDL NAND gateA WDDL NAND gate is constructed by
considering conventional AND gate in parallel to its complementary gate ie OR gate as
shown in the following figure
34
Figure
43 WDDL NAND Gate74 WDDL NOR gateA WDDL NOR gate is constructed by
considering conventional OR gate in parallel to its complementary gate ie AND gate as
shown in the following figure
35
Figure 44 WDDL
NOR Gate 75 WDDL XOR gate XOR function can be implemented by the
Boolean Equation A B = ABrsquo + ArsquoB Therefore XOR Gate is implemented
in terms of AND gate and OR gate as shown in the figure 44 It can also be implemented
by instantiating a WDDL AND gate and WDDL OR gate But the number of gates
involved in the latter one is greater than the former one Therefore the first method of
implementation is followed rather than the second one
36
Figure 45
WDDL XOR gateWith the help of the above basic gates Full adder circuit has been
designed by instantiating the above designed WDDL gates During the implementation of
the Blowfish algorithm a 32-bit XOR gate and 32-bit Adder circuit are required They can
be easily implemented by instantiating the corresponding lower module 32 number of
timesCHAPTER 8 FRONT END
RESULTSWDDL OR GATESynthesis
Report==========================================================
= Final Report
===========================================================Final
ResultsRTL Top Level Output File Name wddlorngrTop Level Output File Name
wddlorOutput Format NGCOptimization Goal SpeedKeep
Hierarchy NODesign Statistics IOs 5Cell Usage
BELS 2 LUT3 2 IO Buffers 5
37
IBUF 3 OBUF
2============================================================Devi
ce utilization summary---------------------------Selected Device 3s250eft256-4 Number
of Slices 1 out of 2448 0 Number of 4 input LUTs 2
out of 4896 0 Number of IOs 5 Number of bonded IOBs
5 out of 172 2 Timing Summary---------------Speed Grade -4Maximum
combinational path delay 6236nsSimulation Result
S
ynthesis Result
38
WDD
L AND GATESynthesis
Report==========================================================
== Final Report
============================================================Fina
l ResultsRTL Top Level Output File Name wddlgatesngrTop Level Output File
Name wddlgatesOutput Format NGCOptimization Goal
SpeedKeep Hierarchy NODesign Statistics IOs
5Cell Usage BELS 2 LUT3 2 IO Buffers
5 IBUF 3 OBUF
2===========================================================Devic
e utilization summary---------------------------Selected Device 3s250etq144-4 Number
of Slices 1 out of 2448 0 Number of 4 input LUTs 2
out of 4896 0 Number of IOs 5 Number of bonded IOBs
5 out of 108 4 Timing Summary---------------Speed Grade -4Maximum
combinational path delay 6236nsSimulation Result
39
Sy
nthesis Result
WDDL NAND GATESynthesis
Report==========================================================
== Final Report
============================================================Fina
l ResultsRTL Top Level Output File Name wddlnand1ngrTop Level Output File
Name wddlnand1Output Format NGCOptimization Goal
SpeedKeep Hierarchy NODesign Statistics IOs
5Cell Usage BELS 2 LUT3 2 IO Buffers
40
5 IBUF 3 OBUF
2============================================================Devi
ce utilization summarySelected Device 3s500efg320-4 Number of Slices
1 out of 4656 0 Number of 4 input LUTs 2 out of 9312 0
Number of IOs 5 Number of bonded IOBs 5 out of 232
2 Timing SummarySpeed Grade -4Maximum combinational path delay
6236nsSimulation Result
Synthesis Result
WD
41
DL XOR GATESimulation Result
Synthesis Result
WDDL XOR GATESynthesis
Report==========================================================
== Final Report
===========================================================Final
42
ResultsRTL Top Level Output File Name wddlxorgatengrTop Level Output File
Name wddlxorgateOutput Format NGCOptimization Goal
SpeedKeep Hierarchy NODesign Statistics IOs
5Cell Usage BELS 2 LUT3 2 IO Buffers
5 IBUF 3 OBUF
2============================================================Devi
ce utilization summary---------------------------Selected Device 3s250eft256-4 Number
of Slices 1 out of 2448 0 Number of 4 input LUTs 2
out of 4896 0 Number of IOs 5 Number of bonded IOBs
5 out of 172 2 Timing Summary---------------Speed Grade -4Maximum
combinational path delay 6236nsSimulation Result
Synthesis Result
43
CHAPTER 9 SUMMARY AND CONCLUSION 91
SummaryIn order to provide security to ICs against side-channel attacks especially
Differential Power Analysis (DPA) it is necessary to implement the design in a logic that
can render constant power dissipation irrespective of the input combination WDDL is
proved to be advantageous to others and therefore is of great significance In this
dissertation work architecture for Blowfish Algorithm is designed and implemented in
WDDL style In this implementation bottom-up approach is used The low level entities
are designed and later they are all combined to form the entire module The key
scheduling is online The sub-keys generated for a particular key can be used for the
encryption of the entire data to be encrypted with that key The sub keys are given in
reverse direction for the decryption data path Initially logic gates are implemented in
WDDL and then higher modules have been designed by instantiating the WDDL gates to
form the entire module thus resulting in constant power dissipation irrespective of any
input data combination The entire design works in two phases namely Precharge phase and
Evaluation phase In the Precharge phase all the signals of the design are zeroed and
during the Evaluation phase the functionality of the design is achieved This sort of design
has been found simple and very effective in thwarting the side-channel attack namely
Differential Power analysis (DPA)92 ConclusionThe crypto processor has been
44
designed for the key size of 448 bits and plain text of 64 bits The code for the
implementation has been written in VHDL The functional verification has been done using
the Cadence (NC Launch) Simulation package and Synthesis using RTL Compiler The
Backend of the design is done using the SOC EncounterAccording to the specifications
desired functionality has been achieved In the output during the Evaluation phase there
has been same number of transitions thus resulting in constant power dissipation During
Synthesis it has been observed that a simple WDDL gate comprised many conventional
gates Therefore the area of the design has grown nearly three-fold when compared to the
design implemented in conventional CMOS logic at the cost of security incorporated into
the IC against Differential Power Analysis (DPA) Due to the constant power dissipation at
the output hacker cannot apply Differential Power Analysis (DPA) scheme to find the
secret key that is being used in the crypto-processor Thus security against DPA is
incorporated into the IC at hardware level by implementing the design in WDDL style
which is quite simple and effectiveCHAPTER 10
REFERENCES 101 Referred Technical papers[1] Kris Tiri Member
IEEE and Ingrid Verbauwhede Senior Member IEEE ldquoA Digital Design Flow for
Secure Integrated Circuitsrdquo IEEE Transaction on Computer-Aided Design of Integrated
Circuits and Systems Vol 25 No 7 July 2006[2] Prof Jean-Jacques Quisquater Math
RiZK laquo Side Channel Attacks raquo October 2002[3] Ross Anderson Mike Bond Jolyon
Clulow and Sergei Skorobogatov ldquoCrypto processors ndash A Surveyrdquo IEEE Proceedings[4]
Noohul Basheer Zain Ali James M Noras ldquoOptimal Data path Design for a Cryptographic
Processor The Blowfish Algorithmrdquo Malaysian Journal of Computer Science Vol 14 No
1 June 2001 pp 16-27[5] Dan Rinehimer Derek Wilson ldquoSummary of B Schneierrsquos
Description of New Variable Length Key 64-Bit Block Cipher (Blowfish)rdquo[6] Kris Tiri and
Ingrid Verbauwhede ldquoA Dynamic and Differential CMOS Logic Style to Resist Power and
Timing Attacks on Security ICrsquosrdquo[7] Discretix Technologies Limited ldquoIntroduction to Side
45
Channel Attacksrdquo[8] Kris Tiri Moonmoon Akmal and Ingrid Verbauwhede ldquoA Dynamic
and Differential Logic with Signal Independent Power Consumption to withstand
Differential Power Analysis on Smart CardsrdquoReference books[1] William Stallings
ldquoCryptography and Network Security Principles and Practicesrdquo Pearson Education
2003[2] Samir Palnitkar ldquoVerilog HDL A Guide to Digital Design and Synthesisrdquo
Prentice Hall Referred Web Pages[1] httpwwwschneiercomblowfishhtml[2]
httpwwwnistgov[3] httpwwwdiscretixcomPDFIntroduction20to20Side20Channel
20Attackspdf[4] httpwwwwipointpctdbenwojsp
IA=WO2005081085ampDISPLAY=CLAIMS
46
The first publicly available version was released in 1985
In 1986 IEEE (Institute of Electrical and Electronics Engineers Inc) was presented
with a proposal to standardize the VHDL
In 1987 standardization =gt IEEE 1076-1987
An improved version of the language was released in 1994 =gt IEEE standard1076-
1993
Related Standards
IEEE 1076 doesnrsquot support simulation conditions such as unknown and high-
impedance
Soon after IEEE 1076-1987 was released simulator companies began using their own
non-standard types =gt VHDL was becoming a nonstandard
IEEE 1164 standard was developed by an IEEE1048715IEEE 1164 contains definitions for a
nine-valued data type std_logic
IEEE 10763 (Numeric or Synthesis Standard) defines data types as they relate to actual
hardware
Defines eg two numeric types signed and unsigned
VHDL Environment
12
Design Units
Segments of VHDL code can be compiled separately and stored in a library
Entities
A black box with interface definition
Defines the inputsoutputs of a component (define pins)
A way to represent modularity in VHDL
Similar to symbol in schematic
Entity declaration describes entity
Eg
Entity Comparator is
Port (A B in std_logic_vector (7 downto0)
EQ out std_logic)
end Comparator
13
Ports
Provide channels of communication between the component and its environment
Each port must have a name direction and a type
An entity may have NO port declaration
Port directions
In A value of a port can be read inside the component but cannot be assigned
Multiple reads of port are allowed
Out Assignments can be made to a port but data from a port cannot be read Multiple
assignments are allowed
In out Bi-directional assignments can be made and data can be read Multiple
assignments are allowed
Buffer An out port with read capability May have at most one assignment (are not
recommended)
Architectures
Every entity has at least one architecture
One entity can have several architectures
Architectures can describe design using
BehaviorndashStructurendashDataflow
Architectures can describe design on many levelsndashGate levelndashRTL (Register Transfer
Level)ndashBehavioral level
Configuration declaration links architecture to entity
Eg
Architecture Comparator1 of Comparator is
Begin
EQ lt= rsquo1rsquowhen (A=B) else rsquo0rsquo
End Comparator1
Configurations
Links entity declaration and architecture body together
14
Concept of default configuration is a bit messy in VHDL lsquo87
ndashLast architecture analyzed links to entity
Can be used to change simulation behavior without re-analyzing the VHDL source
Complex configuration declarations are ignored in synthesis
Some entities can have eggate level architecture and behavioral architecture
Are always optional
Packages
Packages contain information common to many design units
1 Package declaration
Constant declarations
ndash Type and subtype declarations
ndash Function and procedure declarations
ndash Global signal declarations
ndash File declarations
ndash Component declarations
2 Package body
ndash Is not necessary needed
ndash Function bodies
ndash Procedure bodies
Packages are meant for encapsuling data which can be shared globally among several design
units These consist of declaration part and optional body part
Package declaration can contain
ndash Type and subtype declarations
ndash Subprograms
ndash Constants
ndash Alias declarations
ndash Global signal declarations
ndash file declarations
ndash Component declarations
Package body consists of
15
ndash Subprogram declarations and bodies
ndash Type and subtype declarations
ndash Deferred constants
ndash File declarations
Libraries
Collection of VHDL design units (database)
1 Packages
package declaration
package body
2 Entities (entity declaration)
3 Architectures (architecture body)
4 Configurations (configuration declarations)
Usually directory in UNIX file system
Can be also any other kind of database
Levels of Abstraction
VHDL supports many possible styles of design description which differ primarily in how
closely they relate to the HW
It is possible to describe a circuit in a number of ways
Structural-------
Dataflow ------- Higher level of abstraction
Behavioral -------
Structural VHDL description
Circuit is described in terms of its components
From a low-level description (eg transistor-level description) to a high level
description (eg block diagram)
For large circuits low-level descriptions quickly become impractical
Dataflow VHDL Description
Circuit is described in terms of how data moves through the system
16
In the dataflow style you describe how information flows between registers in the
system
The combinational logic is described at a relatively high level the placement and
operation of registers is specified quite precisely
The behavior of the system over the time is defined by registers
There are no build-in registers in VHDL-language
ndashEither lower level description
ndashor behavioral description of sequential elements is needed
The lower level register descriptions must be created or obtained
If there is no 3rd party models for registers =gt you must write the behavioral
description of registers
The behavioral description can be provided in the form of subprograms(functions or
procedures)
Behavioral VHDL Description
Circuit is described in terms of its operation over time
Representation might include eg state diagrams timing diagrams and algorithmic
descriptions
The concept of time may be expressed precisely using delays (eg A lt= B after 10 ns)
If no actual delays are used order of sequential operations is defined
17
In the lower levels of abstraction (eg RTL) synthesis tools ignore detailed timing
specifications
The actual timing results depend on implementation technology and efficiency of
synthesis tool
There are a few tools for behavioral synthesis
Concurrent Vs Sequential
Processes
Basic simulation concept in VHDL
VHDL description can always be broken up to interconnected processes
Quite similar to UNIX process
18
Process keyword in VHDL
Process statement is concurrent statement
Statements inside process statements are sequential statements
Process must contain either sensitivity list or wait statement(s) but NOT both
Sensitivity list or wait statement(s) contains signals which wakes process up
General Format
Process [(sensitivity list)]
process_declarative_part
begin
process_statements
[wait_statement]
End process
19
CHAPTER 4 SMART
CARD OVERVIEW
This section will very briefly introduce the concept of a smart card Basically a smart
card is a computer embedded in a safe It consists of a (typically 8-bit or 32-bit) processor
together with ROM EEPROM and a small amount of RAM which is therefore capable of
performing computations The main goal of a smart card is to allow the execution of
cryptographic operations involving some secret parameter (the key) while not revealing this
parameter to the outside world As opposed the goal of the attacker is to recover this secret
parameter This processor is embedded in a chip and connected to the outside world through
eight wires the role use position of which is normalized In addition to the inputoutput wires
the parts we will be the most interested in are the following
1 Power supply Smart cards do not have an internal battery
2 The current they need is provided by the smart card reader This will make the smart
cards power consumption pretty easy to measure for the attacker
3 Clock Similarly smart cards do not dispose of an internal clock either The clock ticks
must also be provided from the outside world As a consequence this will allow the
attacker to measure the cards running time with very good precision
Smart cards are usually equipped with protection mechanisms composed of a shield (the
passivation layer) whose goal is to hide the internal behavior of the chip and possibly sensors
that react when the shield is removed by destroying all sensitive data and preventing the card
to function properly
20
CHAPTER 5 SIDE
CHANNEL ATTACKS
ldquoSide channel attacksrdquo are attacks that are based on ldquoSide Channel Informationrdquo Side
channel information is information that can be retrieved from the encryption device that is
neither the plaintext to be encrypted nor the cipher text resulting from the encryption process
In the past an encryption device was perceived as a unit that receives plaintext input
and produces cipher text output and vice-versa Attacks were therefore based on either
knowing the cipher text (such as cipher text-only attacks) or knowing both (such as known
plaintext attacks) or on the ability to define what plaintext is to be encrypted and then seeing
the results of the encryption (known as chosen plaintext attacks) Today it is known that
encryption devices have additional output and often additional inputs which are not the
plaintext or the cipher text
Encryption devices produce timing information (information about the time that
operations take) that is easily measurable radiation of various sorts power consumption
statistics (that can be easily measured as well) and more Often the encryption device also has
additional ldquounintentionalrdquo inputs such as voltage that can be modified to cause predictable
outcomes Side channel attacks make use of some or all of this information along with other
(known) cryptanalytic techniques to recover the key the device is using
Side channel analysis techniques are of concern because the attacks can be mounted
quickly and can sometimes be implemented using readily available hardware costing from only
a few hundred dollars to thousands of dollars
51 Classification of side channel attacks
The literature usually classifies side channel attacks along two orthogonal axes
1 Invasive vs Non-invasive
21
Invasive attacks require de-packaging the chip to get direct access to its components
A typical example of this is the connection of a wire on a data bus to see the data transfers
A non-invasive attack only exploits externally available information (the emission of
which is however often unintentional) such as running time power consumption
A new distinction called semi-invasive attacks These attacks have the specificity that
they require de-packaging of the chip to get access to the chip surface but do not tamper with
the passivation layer ( they do not require electrical contact to the metal surface)
2 Active vs passive
Active attacks try to tamper with the cards proper functioning For example fault
induction attacks will try to induce errors in the computation
As opposed passive attacks will simply observe the cards behavior during its
processing without disturbing it
Note that these two axes are well orthogonal
An Invasive attack may completely avoid disturbing the cards behavior and a passive
attack may require a preliminary de-packaging for the required information to be observable
These attacks are of course not mutually exclusive an invasive attack may for example serve
as a preliminary step for a non-invasive one by giving a detailed description of the chips
architecture that helps to find out where to put external probes
As smart cards are usually equipped with protection mechanisms that are supposed to
react to invasive attacks (although several invasive attacks are nonetheless capable to defeat
these mechanisms as will be illustrated below) On the other hand it is worth pointing out that
a non-invasive attack is completely undetectable there is for example no way for a smart card
to figure out that its running time is currently being measured Other countermeasures will
therefore be necessary From an economical point of view invasive attacks are usually more
expensive to deploy on a large scale since they require individual processing of each attacked
device In this sense non-invasive attacks constitute therefore a bigger menace for the smart
card industry
Invasive attacks involved a relatively high capital investment for lab equipment plus a
moderate investment of effort for each individual chip attacked Non-invasive attacks require
only a moderate capital investment plus a moderate investment of effort in designing an attack
on a particular type of device Thereafter the cost per device attacked is low Semi-invasive
attacks can be carried out using very cheap and simple equipment
The attacker can gain information by
22
1 Probing attacks
2 Fault induction attacks
3 Timing attacks
4 Power analysis attacks and
5 Electromagnetic timing attacks
These attacks are performed during the switching behavior of digital
complementary metalndashoxidendashsemiconductor (CMOS) gates Of all these Power analysis attack
is of major concern
52 Power analysis attacks
The power consumption of a cryptographic device may provide much information
about the operations that take place and the involved parameters This is the idea of simple and
differential power analysis first introduced by Kocher et al As the clock ticks the cards
energy is also provided by the terminal and can therefore easily be measured Basically to
measure a circuits power consumption a small (eg 50 ohm) resistor is inserted in series with
the power or ground input The voltage difference across the resistor divided by the resistance
yields the current Well-equipped electronics labs have equipment that can digitally sample
voltage differences at extraordinarily high rates (over 1GHz) with excellent accuracy (less than
1 error) Devices capable of sampling at 20MHz or faster and transferring the data to a PC
can be bought for less than US$ 400
Power analysis attacks are of two types
1 Simple power analysis attack and
2 Differential Power Analysis attack
SPA attacks on smartcards typically take a few seconds per card while DPA attacks
can take several hours In a general with a somewhat academic perspective we may consider
the entire internal state of the block cipher to be all the intermediate results and values that are
never included in the output in normal operations For example DES has 16 rounds we can
consider the intermediate states state [115] after each round except the last as a secret internal
state Side channels typically give information about these internal states or about the
operations used in the transition of this internal state from one round to another The type of
side-channel will of course determine what information is available to the attacker about these
states The attacks typically work by finding some information about the internal state of the
cipher which can be learned both by guessing part of the key and checking the value directly
23
and additionally by some statistical property of the cipher that makes that checkable value
slightly nonrandom
521 Simple Power Analysis attack (SPA)
Simple Power Analysis is generally based on looking at the visual representation of the
power consumption of a unit while an encryption operation is being performed Simple Power
Analysis is a technique that involves direct interpretation of power consumption measurements
collected during cryptographic operations SPA can yield information about a devices
operation as well as key material
A trace refers to a set of power consumption measurements taken across a
cryptographic operation For example a 1 millisecond operation sampled at 5 MHz yields a
trace containing 5000 points Figure for example shows an SPA trace from a smart card
performing a DES operation
Figure SPA monitoring from a single DES operation performed by a typical smart card The
upper trace shows the entire encryption operation including the initial permutation the 16
DES rounds and the final permutation The lower trace is a detailed view of the second and
third rounds
Because SPA can reveal the sequence of instructions executed it can be used to break
cryptographic implementations in which the execution path depends on the data being
processed For example
DES key schedule the DES key schedule computation involves rotating 28-bit key registers
A conditional branch is commonly used to check the bit shifted off the end so that ldquo1 bits can
24
be wrapped around The resulting power consumption traces for a ldquo1 bit and a ldquo0 bit will
contain different SPA features if the execution paths take different branches for each
DES permutations DES implementations perform a variety of bit permutations Conditional
branching in software or microcode can cause significant power consumption differences for
ldquo0 and ldquo1 bits
Comparisons String or memory comparison operations typically perform a conditional
branch when a mismatch is found This conditional branching causes large SPA (and
sometimes timing) characteristics
Multipliers Modular multiplication circuits tend to leak a great deal of information about the
data they process The leakage functions depend on the multiplier design but are often strongly
correlated to operand values and Hamming weights
Exponentiators A simple modular exponentiation function scans across the exponent
performing a squaring operation in every iteration with an additional multiplication operation
for each exponent bit that is equal to ldquo1 The exponent can be compromised if squaring and
multiplication operations have different power consumption characteristics take different
amounts of time or are separated by different code Modular exponentiation functions that
operate on two or more exponent bits at a time may have more complex leakage functions
522Differential Power Analysis attack (DPA)
In addition to large-scale power variations due to the instruction sequence there are
effects correlated to data values being manipulated These variations tend to be smaller and are
sometimes overshadowed by measurement errors and other noise In such cases it is still often
possible to break the system using statistical functions tailored to the target algorithm
To implement the DPA attack an attacker first observes m encryption operations and captures
power traces T1 m [1 k] containing k samples each In addition the attacker records the
cipher text C1 m No knowledge of the plain text is required DPA analysis uses power
consumption measurements to determine whether a key block guess Ks is correct The attacker
computes a k-sample differential trace centD [1 k] by finding the difference between the
average of the traces for which a certain intermediate value V is one and the average of the
traces for which V is zero Thus cent D[j) is the average over C1m of the effect due to the value
represented by the selection function D on the power consumption at point j In particular25
If Ks is incorrect the bit computed using D will differ from the actual target bit for about half
of the ciphertext Ci The selection function is thus effectively uncorrelated to what was actually
computed by the target device If a random function is used to divide a set into two subsets the
difference in the averages of the subsets should approach zero as the subset sizes approach
infinity
Thus because trace components uncorrelated to D will diminish with 1 pm causing the
differential trace to become at (the actual trace may not be completely at as D with Ks
incorrect may have a weak correlation to D with the correct Ks) If Ks is correct however the
computed value for D (Ci bKs) will equal the actual value of target bit b with probability 1
The selection function is thus correlated to the value of the bit considered Other data values
measurement errors etc that are not correlated to D approach zero Because power
consumption is correlated to data bit values the plot of centD will be degat with spikes in regions
where D is correlated to the values being processed The correct value of Ks can thus be
identified from the spikes in its differential trace Four values of b correspond to each S box
providing confirmation of key block guesses Finding all eight Ks yields the entire 48-bit round
sub key The remaining 8 key bits can be found easily using exhaustive search or by analyzing
one additional round Triple DES keys can be found by analyzing an outer DES operation first
using the resulting key to decrypt the cipher text and attacking the next DES key DPA can use
known plaintext or known cipher text and can find encryption or decryption keys
26
CHAPTER 6 CONSTANT POWER CONSUMING
LOGIC STYLES
The power consumption of traditional standard cells and logic is
dependent on the signal activity When the output of the logic gate makes
a 0 to 1 transition a current comes from the power supply and charges the
output capacitance On the other hand when the output sees a 1 to 0 a 0
to 0 or a 1 to 1 transition no or only a limited amount of energy (due to
short circuit or leakage) is consumed from the power supply This is the
fundamental reason why information is leaked through the power supply
and why power attacks are possible The basis of a secure digital design
flow is a logic style with constant power consumption
61 Current Mode Logic
Current mode logic (CML) eg current steering logic seems the
ideal solution This type of logic continuously draws a current from the
supply and measures its state through the path that the current takes A
gate has constant power consumption if it draws a perfectly constant
current from the power supply independently of the input and output
signals To build a current source capable of generating a constant current
special circuit techniques that minimize channel length modulation have to
be used
The decisive drawback of CML however is its static power
consumption When the logic gate is not processing any data it burns the
27
current which makes this logic style unacceptable for embedded battery-
operated devices
62 Voltage Mode Logic (CMOS circuit styles)
Voltage mode logic (VML) eg static CMOS logic only draws a current from the
supply to change state and measures its state by the amount of charge it stores on a
capacitance A regular standard CMOS circuit will only consume power when a capacitance
gets charged and later discharged ie when a gate switches state It is the main reason that
CMOS is the style of choice for every battery operated or low power device This is illustrated
in the figure below for simple inverter Thus static CMOS is the preferred logic style because
of its low power consumption and high noise margins
Standard CMOS inverter
Yet two conditions must be satisfied for VML to have constant power consumption
namely
1) A logic gate must have exactly one switching event per signal transition
2) The logic gate must charge a constant capacitance in that switching event
28
Here above all the four transitions of CMOS inverter can be distinguished when
monitoring the power supply
63 Dynamic Differential Logic
Dynamic differential logic sometimes also referred to as dual rail with pre-charge
logic fulfills the first condition A differential logic family uses the true and the false
representation of the input and output signals and a dynamic logic family alternates pre-charge
and evaluation phases As a result since both outputs (true and false) are pre-charged to 1
exactly one of the two output nodes evaluates to 0 to have a differential output signal in the
evaluation phase The discharged output node is charged to 1 in the following pre-charge phase
to pre-charge both outputs to 1 In other words every signal transition including the events in
which the input signals remain constant is represented with an actual switching event in
which the logic gate charges a capacitance All the logic families that have been introduced to
thwart the differential power analysis (DPA) by using dynamic differential logic in the
following techniques
1 Sense Amplifier Based Logic (SABL) and
2 Wave Dynamic Differential Logic (WDDL) gates
631 Sense Amplifier Based logic (SABL)
SABL has its main advantage that it has balanced input and output nodes and that all
internal nodes connect to an output The output capacitances can be balanced Systematic
methods have been developed to make sure that both branches of the differential pull down
network are balanced and that no memory effects are present in the network Sense Amplifier
Based logic is illustrated as
29
Sense Amplifier Based Logic
ANDNAND gate
This circuit style does require however a full custom characterization and layout It also
suffers from a high clock load common to all dynamic logic gates
632 Wave Dynamic Differential Logic Gates (WDDL)
WDDL logic can be implemented with static CMOS logic Static CMOS
standard cells are combined to form secure compound standard cells
which have a reduced power signature WDDL has many advantages It can
be readily implemented from an existing standard cell library The design
flow is fully supported with accurate EDA library files that come directly
from the vendor WDDL also results in a dynamic differential logic with only
a small load capacitance on the pre-charge control signal and with the low
power consumption and the high noise margins of static CMOS
Advantages of WDDL logic style are as follows
30
A major advantage of the proposed logic style is that it can be incorporated by the common
Electronic Design Automation (EDA) tool flow
No special design rules are involved in the interconnection of WDDL gates
The switching factor of WDDL is 100 A WDDL gate consists of a parallel
combination of two positive complementary gates one calculating the
true output using the true inputs the other the false output using the
false inputs A positive gate produces a zero output for an all zero input
The AND gate and the OR gate are examples of positive gates A
complementary gate sometimes also referred to as a dual gate
expresses the false output of the original logic gate using the false
inputs of the original gate The AND gate fed with true input signals and
the OR gate fed with false input signals are two dual gates Fig shows
the WDDL AND gate and the WDDL OR gate In the evaluation phase
each input signal is differential and the WDDL gate calculates its
differential output In the pre-charge phase the inputs to the WDDL gate
are set at 0 This puts the output of the gate at 0 A module in WDDL
pre-charges without distributing the pre-charge signal to each individual
gate During the pre-charge phase the input vector of the combinatorial
logic is set at all 0s Each individual gate will eventually have all its
inputs at 0 evaluate its output to 0 and pass this 0 value to the next
gate One could say that the pre-charge signal travels over the
combinatorial logic as a 0-wave hence WDDL There are several ways
to launch to pre-charge wave In Fig a pre-charge operator is inserted
at the start of every combinatorial logic tree ie the inputs of the
encryption module and the outputs of the registers They produce an all-
zero output in the pre-charge phase (clk-signal high) but let the
31
differential signal through during the evaluation phase (clk-signal low)
Fig
ure WDDL Pre-charge wave generationCHAPTER 7
WDDL GATESThe methodology used in the project is bottom-up approach Lower
modules are designed and later integrated to form larger modules whose further integration
leads to the final top module As it is a fact that logic gates form lower level modules
initially logic gates required for the design are implemented in WDDL style WDDL
demands a parallel combination of two positive complementary gates one calculating the
true value and the other negative value The logic gates like OR AND XOR have been
implemented Besides there is even implementation of Full Adder 32-bit XOR
etc71WDDL OR gateA WDDL OR gate is constructed by considering conventional
OR gate in parallel to its complementary gate ie AND gate as shown in the following
32
figure Figure
41 WDDL OR GateThe NAND and NOT gates used act as Precharge operator injecting
signal lsquo0rsquo into the gates when lsquoclkrsquo signal is high ie during the Precharge phase72
WDDL AND gateA WDDL AND gate is constructed by considering conventional
AND gate in parallel to its complementary gate ie OR gate as shown in the following
33
figure Figure
42 WDDL AND Gate73 WDDL NAND gateA WDDL NAND gate is constructed by
considering conventional AND gate in parallel to its complementary gate ie OR gate as
shown in the following figure
34
Figure
43 WDDL NAND Gate74 WDDL NOR gateA WDDL NOR gate is constructed by
considering conventional OR gate in parallel to its complementary gate ie AND gate as
shown in the following figure
35
Figure 44 WDDL
NOR Gate 75 WDDL XOR gate XOR function can be implemented by the
Boolean Equation A B = ABrsquo + ArsquoB Therefore XOR Gate is implemented
in terms of AND gate and OR gate as shown in the figure 44 It can also be implemented
by instantiating a WDDL AND gate and WDDL OR gate But the number of gates
involved in the latter one is greater than the former one Therefore the first method of
implementation is followed rather than the second one
36
Figure 45
WDDL XOR gateWith the help of the above basic gates Full adder circuit has been
designed by instantiating the above designed WDDL gates During the implementation of
the Blowfish algorithm a 32-bit XOR gate and 32-bit Adder circuit are required They can
be easily implemented by instantiating the corresponding lower module 32 number of
timesCHAPTER 8 FRONT END
RESULTSWDDL OR GATESynthesis
Report==========================================================
= Final Report
===========================================================Final
ResultsRTL Top Level Output File Name wddlorngrTop Level Output File Name
wddlorOutput Format NGCOptimization Goal SpeedKeep
Hierarchy NODesign Statistics IOs 5Cell Usage
BELS 2 LUT3 2 IO Buffers 5
37
IBUF 3 OBUF
2============================================================Devi
ce utilization summary---------------------------Selected Device 3s250eft256-4 Number
of Slices 1 out of 2448 0 Number of 4 input LUTs 2
out of 4896 0 Number of IOs 5 Number of bonded IOBs
5 out of 172 2 Timing Summary---------------Speed Grade -4Maximum
combinational path delay 6236nsSimulation Result
S
ynthesis Result
38
WDD
L AND GATESynthesis
Report==========================================================
== Final Report
============================================================Fina
l ResultsRTL Top Level Output File Name wddlgatesngrTop Level Output File
Name wddlgatesOutput Format NGCOptimization Goal
SpeedKeep Hierarchy NODesign Statistics IOs
5Cell Usage BELS 2 LUT3 2 IO Buffers
5 IBUF 3 OBUF
2===========================================================Devic
e utilization summary---------------------------Selected Device 3s250etq144-4 Number
of Slices 1 out of 2448 0 Number of 4 input LUTs 2
out of 4896 0 Number of IOs 5 Number of bonded IOBs
5 out of 108 4 Timing Summary---------------Speed Grade -4Maximum
combinational path delay 6236nsSimulation Result
39
Sy
nthesis Result
WDDL NAND GATESynthesis
Report==========================================================
== Final Report
============================================================Fina
l ResultsRTL Top Level Output File Name wddlnand1ngrTop Level Output File
Name wddlnand1Output Format NGCOptimization Goal
SpeedKeep Hierarchy NODesign Statistics IOs
5Cell Usage BELS 2 LUT3 2 IO Buffers
40
5 IBUF 3 OBUF
2============================================================Devi
ce utilization summarySelected Device 3s500efg320-4 Number of Slices
1 out of 4656 0 Number of 4 input LUTs 2 out of 9312 0
Number of IOs 5 Number of bonded IOBs 5 out of 232
2 Timing SummarySpeed Grade -4Maximum combinational path delay
6236nsSimulation Result
Synthesis Result
WD
41
DL XOR GATESimulation Result
Synthesis Result
WDDL XOR GATESynthesis
Report==========================================================
== Final Report
===========================================================Final
42
ResultsRTL Top Level Output File Name wddlxorgatengrTop Level Output File
Name wddlxorgateOutput Format NGCOptimization Goal
SpeedKeep Hierarchy NODesign Statistics IOs
5Cell Usage BELS 2 LUT3 2 IO Buffers
5 IBUF 3 OBUF
2============================================================Devi
ce utilization summary---------------------------Selected Device 3s250eft256-4 Number
of Slices 1 out of 2448 0 Number of 4 input LUTs 2
out of 4896 0 Number of IOs 5 Number of bonded IOBs
5 out of 172 2 Timing Summary---------------Speed Grade -4Maximum
combinational path delay 6236nsSimulation Result
Synthesis Result
43
CHAPTER 9 SUMMARY AND CONCLUSION 91
SummaryIn order to provide security to ICs against side-channel attacks especially
Differential Power Analysis (DPA) it is necessary to implement the design in a logic that
can render constant power dissipation irrespective of the input combination WDDL is
proved to be advantageous to others and therefore is of great significance In this
dissertation work architecture for Blowfish Algorithm is designed and implemented in
WDDL style In this implementation bottom-up approach is used The low level entities
are designed and later they are all combined to form the entire module The key
scheduling is online The sub-keys generated for a particular key can be used for the
encryption of the entire data to be encrypted with that key The sub keys are given in
reverse direction for the decryption data path Initially logic gates are implemented in
WDDL and then higher modules have been designed by instantiating the WDDL gates to
form the entire module thus resulting in constant power dissipation irrespective of any
input data combination The entire design works in two phases namely Precharge phase and
Evaluation phase In the Precharge phase all the signals of the design are zeroed and
during the Evaluation phase the functionality of the design is achieved This sort of design
has been found simple and very effective in thwarting the side-channel attack namely
Differential Power analysis (DPA)92 ConclusionThe crypto processor has been
44
designed for the key size of 448 bits and plain text of 64 bits The code for the
implementation has been written in VHDL The functional verification has been done using
the Cadence (NC Launch) Simulation package and Synthesis using RTL Compiler The
Backend of the design is done using the SOC EncounterAccording to the specifications
desired functionality has been achieved In the output during the Evaluation phase there
has been same number of transitions thus resulting in constant power dissipation During
Synthesis it has been observed that a simple WDDL gate comprised many conventional
gates Therefore the area of the design has grown nearly three-fold when compared to the
design implemented in conventional CMOS logic at the cost of security incorporated into
the IC against Differential Power Analysis (DPA) Due to the constant power dissipation at
the output hacker cannot apply Differential Power Analysis (DPA) scheme to find the
secret key that is being used in the crypto-processor Thus security against DPA is
incorporated into the IC at hardware level by implementing the design in WDDL style
which is quite simple and effectiveCHAPTER 10
REFERENCES 101 Referred Technical papers[1] Kris Tiri Member
IEEE and Ingrid Verbauwhede Senior Member IEEE ldquoA Digital Design Flow for
Secure Integrated Circuitsrdquo IEEE Transaction on Computer-Aided Design of Integrated
Circuits and Systems Vol 25 No 7 July 2006[2] Prof Jean-Jacques Quisquater Math
RiZK laquo Side Channel Attacks raquo October 2002[3] Ross Anderson Mike Bond Jolyon
Clulow and Sergei Skorobogatov ldquoCrypto processors ndash A Surveyrdquo IEEE Proceedings[4]
Noohul Basheer Zain Ali James M Noras ldquoOptimal Data path Design for a Cryptographic
Processor The Blowfish Algorithmrdquo Malaysian Journal of Computer Science Vol 14 No
1 June 2001 pp 16-27[5] Dan Rinehimer Derek Wilson ldquoSummary of B Schneierrsquos
Description of New Variable Length Key 64-Bit Block Cipher (Blowfish)rdquo[6] Kris Tiri and
Ingrid Verbauwhede ldquoA Dynamic and Differential CMOS Logic Style to Resist Power and
Timing Attacks on Security ICrsquosrdquo[7] Discretix Technologies Limited ldquoIntroduction to Side
45
Channel Attacksrdquo[8] Kris Tiri Moonmoon Akmal and Ingrid Verbauwhede ldquoA Dynamic
and Differential Logic with Signal Independent Power Consumption to withstand
Differential Power Analysis on Smart CardsrdquoReference books[1] William Stallings
ldquoCryptography and Network Security Principles and Practicesrdquo Pearson Education
2003[2] Samir Palnitkar ldquoVerilog HDL A Guide to Digital Design and Synthesisrdquo
Prentice Hall Referred Web Pages[1] httpwwwschneiercomblowfishhtml[2]
httpwwwnistgov[3] httpwwwdiscretixcomPDFIntroduction20to20Side20Channel
20Attackspdf[4] httpwwwwipointpctdbenwojsp
IA=WO2005081085ampDISPLAY=CLAIMS
46
Design Units
Segments of VHDL code can be compiled separately and stored in a library
Entities
A black box with interface definition
Defines the inputsoutputs of a component (define pins)
A way to represent modularity in VHDL
Similar to symbol in schematic
Entity declaration describes entity
Eg
Entity Comparator is
Port (A B in std_logic_vector (7 downto0)
EQ out std_logic)
end Comparator
13
Ports
Provide channels of communication between the component and its environment
Each port must have a name direction and a type
An entity may have NO port declaration
Port directions
In A value of a port can be read inside the component but cannot be assigned
Multiple reads of port are allowed
Out Assignments can be made to a port but data from a port cannot be read Multiple
assignments are allowed
In out Bi-directional assignments can be made and data can be read Multiple
assignments are allowed
Buffer An out port with read capability May have at most one assignment (are not
recommended)
Architectures
Every entity has at least one architecture
One entity can have several architectures
Architectures can describe design using
BehaviorndashStructurendashDataflow
Architectures can describe design on many levelsndashGate levelndashRTL (Register Transfer
Level)ndashBehavioral level
Configuration declaration links architecture to entity
Eg
Architecture Comparator1 of Comparator is
Begin
EQ lt= rsquo1rsquowhen (A=B) else rsquo0rsquo
End Comparator1
Configurations
Links entity declaration and architecture body together
14
Concept of default configuration is a bit messy in VHDL lsquo87
ndashLast architecture analyzed links to entity
Can be used to change simulation behavior without re-analyzing the VHDL source
Complex configuration declarations are ignored in synthesis
Some entities can have eggate level architecture and behavioral architecture
Are always optional
Packages
Packages contain information common to many design units
1 Package declaration
Constant declarations
ndash Type and subtype declarations
ndash Function and procedure declarations
ndash Global signal declarations
ndash File declarations
ndash Component declarations
2 Package body
ndash Is not necessary needed
ndash Function bodies
ndash Procedure bodies
Packages are meant for encapsuling data which can be shared globally among several design
units These consist of declaration part and optional body part
Package declaration can contain
ndash Type and subtype declarations
ndash Subprograms
ndash Constants
ndash Alias declarations
ndash Global signal declarations
ndash file declarations
ndash Component declarations
Package body consists of
15
ndash Subprogram declarations and bodies
ndash Type and subtype declarations
ndash Deferred constants
ndash File declarations
Libraries
Collection of VHDL design units (database)
1 Packages
package declaration
package body
2 Entities (entity declaration)
3 Architectures (architecture body)
4 Configurations (configuration declarations)
Usually directory in UNIX file system
Can be also any other kind of database
Levels of Abstraction
VHDL supports many possible styles of design description which differ primarily in how
closely they relate to the HW
It is possible to describe a circuit in a number of ways
Structural-------
Dataflow ------- Higher level of abstraction
Behavioral -------
Structural VHDL description
Circuit is described in terms of its components
From a low-level description (eg transistor-level description) to a high level
description (eg block diagram)
For large circuits low-level descriptions quickly become impractical
Dataflow VHDL Description
Circuit is described in terms of how data moves through the system
16
In the dataflow style you describe how information flows between registers in the
system
The combinational logic is described at a relatively high level the placement and
operation of registers is specified quite precisely
The behavior of the system over the time is defined by registers
There are no build-in registers in VHDL-language
ndashEither lower level description
ndashor behavioral description of sequential elements is needed
The lower level register descriptions must be created or obtained
If there is no 3rd party models for registers =gt you must write the behavioral
description of registers
The behavioral description can be provided in the form of subprograms(functions or
procedures)
Behavioral VHDL Description
Circuit is described in terms of its operation over time
Representation might include eg state diagrams timing diagrams and algorithmic
descriptions
The concept of time may be expressed precisely using delays (eg A lt= B after 10 ns)
If no actual delays are used order of sequential operations is defined
17
In the lower levels of abstraction (eg RTL) synthesis tools ignore detailed timing
specifications
The actual timing results depend on implementation technology and efficiency of
synthesis tool
There are a few tools for behavioral synthesis
Concurrent Vs Sequential
Processes
Basic simulation concept in VHDL
VHDL description can always be broken up to interconnected processes
Quite similar to UNIX process
18
Process keyword in VHDL
Process statement is concurrent statement
Statements inside process statements are sequential statements
Process must contain either sensitivity list or wait statement(s) but NOT both
Sensitivity list or wait statement(s) contains signals which wakes process up
General Format
Process [(sensitivity list)]
process_declarative_part
begin
process_statements
[wait_statement]
End process
19
CHAPTER 4 SMART
CARD OVERVIEW
This section will very briefly introduce the concept of a smart card Basically a smart
card is a computer embedded in a safe It consists of a (typically 8-bit or 32-bit) processor
together with ROM EEPROM and a small amount of RAM which is therefore capable of
performing computations The main goal of a smart card is to allow the execution of
cryptographic operations involving some secret parameter (the key) while not revealing this
parameter to the outside world As opposed the goal of the attacker is to recover this secret
parameter This processor is embedded in a chip and connected to the outside world through
eight wires the role use position of which is normalized In addition to the inputoutput wires
the parts we will be the most interested in are the following
1 Power supply Smart cards do not have an internal battery
2 The current they need is provided by the smart card reader This will make the smart
cards power consumption pretty easy to measure for the attacker
3 Clock Similarly smart cards do not dispose of an internal clock either The clock ticks
must also be provided from the outside world As a consequence this will allow the
attacker to measure the cards running time with very good precision
Smart cards are usually equipped with protection mechanisms composed of a shield (the
passivation layer) whose goal is to hide the internal behavior of the chip and possibly sensors
that react when the shield is removed by destroying all sensitive data and preventing the card
to function properly
20
CHAPTER 5 SIDE
CHANNEL ATTACKS
ldquoSide channel attacksrdquo are attacks that are based on ldquoSide Channel Informationrdquo Side
channel information is information that can be retrieved from the encryption device that is
neither the plaintext to be encrypted nor the cipher text resulting from the encryption process
In the past an encryption device was perceived as a unit that receives plaintext input
and produces cipher text output and vice-versa Attacks were therefore based on either
knowing the cipher text (such as cipher text-only attacks) or knowing both (such as known
plaintext attacks) or on the ability to define what plaintext is to be encrypted and then seeing
the results of the encryption (known as chosen plaintext attacks) Today it is known that
encryption devices have additional output and often additional inputs which are not the
plaintext or the cipher text
Encryption devices produce timing information (information about the time that
operations take) that is easily measurable radiation of various sorts power consumption
statistics (that can be easily measured as well) and more Often the encryption device also has
additional ldquounintentionalrdquo inputs such as voltage that can be modified to cause predictable
outcomes Side channel attacks make use of some or all of this information along with other
(known) cryptanalytic techniques to recover the key the device is using
Side channel analysis techniques are of concern because the attacks can be mounted
quickly and can sometimes be implemented using readily available hardware costing from only
a few hundred dollars to thousands of dollars
51 Classification of side channel attacks
The literature usually classifies side channel attacks along two orthogonal axes
1 Invasive vs Non-invasive
21
Invasive attacks require de-packaging the chip to get direct access to its components
A typical example of this is the connection of a wire on a data bus to see the data transfers
A non-invasive attack only exploits externally available information (the emission of
which is however often unintentional) such as running time power consumption
A new distinction called semi-invasive attacks These attacks have the specificity that
they require de-packaging of the chip to get access to the chip surface but do not tamper with
the passivation layer ( they do not require electrical contact to the metal surface)
2 Active vs passive
Active attacks try to tamper with the cards proper functioning For example fault
induction attacks will try to induce errors in the computation
As opposed passive attacks will simply observe the cards behavior during its
processing without disturbing it
Note that these two axes are well orthogonal
An Invasive attack may completely avoid disturbing the cards behavior and a passive
attack may require a preliminary de-packaging for the required information to be observable
These attacks are of course not mutually exclusive an invasive attack may for example serve
as a preliminary step for a non-invasive one by giving a detailed description of the chips
architecture that helps to find out where to put external probes
As smart cards are usually equipped with protection mechanisms that are supposed to
react to invasive attacks (although several invasive attacks are nonetheless capable to defeat
these mechanisms as will be illustrated below) On the other hand it is worth pointing out that
a non-invasive attack is completely undetectable there is for example no way for a smart card
to figure out that its running time is currently being measured Other countermeasures will
therefore be necessary From an economical point of view invasive attacks are usually more
expensive to deploy on a large scale since they require individual processing of each attacked
device In this sense non-invasive attacks constitute therefore a bigger menace for the smart
card industry
Invasive attacks involved a relatively high capital investment for lab equipment plus a
moderate investment of effort for each individual chip attacked Non-invasive attacks require
only a moderate capital investment plus a moderate investment of effort in designing an attack
on a particular type of device Thereafter the cost per device attacked is low Semi-invasive
attacks can be carried out using very cheap and simple equipment
The attacker can gain information by
22
1 Probing attacks
2 Fault induction attacks
3 Timing attacks
4 Power analysis attacks and
5 Electromagnetic timing attacks
These attacks are performed during the switching behavior of digital
complementary metalndashoxidendashsemiconductor (CMOS) gates Of all these Power analysis attack
is of major concern
52 Power analysis attacks
The power consumption of a cryptographic device may provide much information
about the operations that take place and the involved parameters This is the idea of simple and
differential power analysis first introduced by Kocher et al As the clock ticks the cards
energy is also provided by the terminal and can therefore easily be measured Basically to
measure a circuits power consumption a small (eg 50 ohm) resistor is inserted in series with
the power or ground input The voltage difference across the resistor divided by the resistance
yields the current Well-equipped electronics labs have equipment that can digitally sample
voltage differences at extraordinarily high rates (over 1GHz) with excellent accuracy (less than
1 error) Devices capable of sampling at 20MHz or faster and transferring the data to a PC
can be bought for less than US$ 400
Power analysis attacks are of two types
1 Simple power analysis attack and
2 Differential Power Analysis attack
SPA attacks on smartcards typically take a few seconds per card while DPA attacks
can take several hours In a general with a somewhat academic perspective we may consider
the entire internal state of the block cipher to be all the intermediate results and values that are
never included in the output in normal operations For example DES has 16 rounds we can
consider the intermediate states state [115] after each round except the last as a secret internal
state Side channels typically give information about these internal states or about the
operations used in the transition of this internal state from one round to another The type of
side-channel will of course determine what information is available to the attacker about these
states The attacks typically work by finding some information about the internal state of the
cipher which can be learned both by guessing part of the key and checking the value directly
23
and additionally by some statistical property of the cipher that makes that checkable value
slightly nonrandom
521 Simple Power Analysis attack (SPA)
Simple Power Analysis is generally based on looking at the visual representation of the
power consumption of a unit while an encryption operation is being performed Simple Power
Analysis is a technique that involves direct interpretation of power consumption measurements
collected during cryptographic operations SPA can yield information about a devices
operation as well as key material
A trace refers to a set of power consumption measurements taken across a
cryptographic operation For example a 1 millisecond operation sampled at 5 MHz yields a
trace containing 5000 points Figure for example shows an SPA trace from a smart card
performing a DES operation
Figure SPA monitoring from a single DES operation performed by a typical smart card The
upper trace shows the entire encryption operation including the initial permutation the 16
DES rounds and the final permutation The lower trace is a detailed view of the second and
third rounds
Because SPA can reveal the sequence of instructions executed it can be used to break
cryptographic implementations in which the execution path depends on the data being
processed For example
DES key schedule the DES key schedule computation involves rotating 28-bit key registers
A conditional branch is commonly used to check the bit shifted off the end so that ldquo1 bits can
24
be wrapped around The resulting power consumption traces for a ldquo1 bit and a ldquo0 bit will
contain different SPA features if the execution paths take different branches for each
DES permutations DES implementations perform a variety of bit permutations Conditional
branching in software or microcode can cause significant power consumption differences for
ldquo0 and ldquo1 bits
Comparisons String or memory comparison operations typically perform a conditional
branch when a mismatch is found This conditional branching causes large SPA (and
sometimes timing) characteristics
Multipliers Modular multiplication circuits tend to leak a great deal of information about the
data they process The leakage functions depend on the multiplier design but are often strongly
correlated to operand values and Hamming weights
Exponentiators A simple modular exponentiation function scans across the exponent
performing a squaring operation in every iteration with an additional multiplication operation
for each exponent bit that is equal to ldquo1 The exponent can be compromised if squaring and
multiplication operations have different power consumption characteristics take different
amounts of time or are separated by different code Modular exponentiation functions that
operate on two or more exponent bits at a time may have more complex leakage functions
522Differential Power Analysis attack (DPA)
In addition to large-scale power variations due to the instruction sequence there are
effects correlated to data values being manipulated These variations tend to be smaller and are
sometimes overshadowed by measurement errors and other noise In such cases it is still often
possible to break the system using statistical functions tailored to the target algorithm
To implement the DPA attack an attacker first observes m encryption operations and captures
power traces T1 m [1 k] containing k samples each In addition the attacker records the
cipher text C1 m No knowledge of the plain text is required DPA analysis uses power
consumption measurements to determine whether a key block guess Ks is correct The attacker
computes a k-sample differential trace centD [1 k] by finding the difference between the
average of the traces for which a certain intermediate value V is one and the average of the
traces for which V is zero Thus cent D[j) is the average over C1m of the effect due to the value
represented by the selection function D on the power consumption at point j In particular25
If Ks is incorrect the bit computed using D will differ from the actual target bit for about half
of the ciphertext Ci The selection function is thus effectively uncorrelated to what was actually
computed by the target device If a random function is used to divide a set into two subsets the
difference in the averages of the subsets should approach zero as the subset sizes approach
infinity
Thus because trace components uncorrelated to D will diminish with 1 pm causing the
differential trace to become at (the actual trace may not be completely at as D with Ks
incorrect may have a weak correlation to D with the correct Ks) If Ks is correct however the
computed value for D (Ci bKs) will equal the actual value of target bit b with probability 1
The selection function is thus correlated to the value of the bit considered Other data values
measurement errors etc that are not correlated to D approach zero Because power
consumption is correlated to data bit values the plot of centD will be degat with spikes in regions
where D is correlated to the values being processed The correct value of Ks can thus be
identified from the spikes in its differential trace Four values of b correspond to each S box
providing confirmation of key block guesses Finding all eight Ks yields the entire 48-bit round
sub key The remaining 8 key bits can be found easily using exhaustive search or by analyzing
one additional round Triple DES keys can be found by analyzing an outer DES operation first
using the resulting key to decrypt the cipher text and attacking the next DES key DPA can use
known plaintext or known cipher text and can find encryption or decryption keys
26
CHAPTER 6 CONSTANT POWER CONSUMING
LOGIC STYLES
The power consumption of traditional standard cells and logic is
dependent on the signal activity When the output of the logic gate makes
a 0 to 1 transition a current comes from the power supply and charges the
output capacitance On the other hand when the output sees a 1 to 0 a 0
to 0 or a 1 to 1 transition no or only a limited amount of energy (due to
short circuit or leakage) is consumed from the power supply This is the
fundamental reason why information is leaked through the power supply
and why power attacks are possible The basis of a secure digital design
flow is a logic style with constant power consumption
61 Current Mode Logic
Current mode logic (CML) eg current steering logic seems the
ideal solution This type of logic continuously draws a current from the
supply and measures its state through the path that the current takes A
gate has constant power consumption if it draws a perfectly constant
current from the power supply independently of the input and output
signals To build a current source capable of generating a constant current
special circuit techniques that minimize channel length modulation have to
be used
The decisive drawback of CML however is its static power
consumption When the logic gate is not processing any data it burns the
27
current which makes this logic style unacceptable for embedded battery-
operated devices
62 Voltage Mode Logic (CMOS circuit styles)
Voltage mode logic (VML) eg static CMOS logic only draws a current from the
supply to change state and measures its state by the amount of charge it stores on a
capacitance A regular standard CMOS circuit will only consume power when a capacitance
gets charged and later discharged ie when a gate switches state It is the main reason that
CMOS is the style of choice for every battery operated or low power device This is illustrated
in the figure below for simple inverter Thus static CMOS is the preferred logic style because
of its low power consumption and high noise margins
Standard CMOS inverter
Yet two conditions must be satisfied for VML to have constant power consumption
namely
1) A logic gate must have exactly one switching event per signal transition
2) The logic gate must charge a constant capacitance in that switching event
28
Here above all the four transitions of CMOS inverter can be distinguished when
monitoring the power supply
63 Dynamic Differential Logic
Dynamic differential logic sometimes also referred to as dual rail with pre-charge
logic fulfills the first condition A differential logic family uses the true and the false
representation of the input and output signals and a dynamic logic family alternates pre-charge
and evaluation phases As a result since both outputs (true and false) are pre-charged to 1
exactly one of the two output nodes evaluates to 0 to have a differential output signal in the
evaluation phase The discharged output node is charged to 1 in the following pre-charge phase
to pre-charge both outputs to 1 In other words every signal transition including the events in
which the input signals remain constant is represented with an actual switching event in
which the logic gate charges a capacitance All the logic families that have been introduced to
thwart the differential power analysis (DPA) by using dynamic differential logic in the
following techniques
1 Sense Amplifier Based Logic (SABL) and
2 Wave Dynamic Differential Logic (WDDL) gates
631 Sense Amplifier Based logic (SABL)
SABL has its main advantage that it has balanced input and output nodes and that all
internal nodes connect to an output The output capacitances can be balanced Systematic
methods have been developed to make sure that both branches of the differential pull down
network are balanced and that no memory effects are present in the network Sense Amplifier
Based logic is illustrated as
29
Sense Amplifier Based Logic
ANDNAND gate
This circuit style does require however a full custom characterization and layout It also
suffers from a high clock load common to all dynamic logic gates
632 Wave Dynamic Differential Logic Gates (WDDL)
WDDL logic can be implemented with static CMOS logic Static CMOS
standard cells are combined to form secure compound standard cells
which have a reduced power signature WDDL has many advantages It can
be readily implemented from an existing standard cell library The design
flow is fully supported with accurate EDA library files that come directly
from the vendor WDDL also results in a dynamic differential logic with only
a small load capacitance on the pre-charge control signal and with the low
power consumption and the high noise margins of static CMOS
Advantages of WDDL logic style are as follows
30
A major advantage of the proposed logic style is that it can be incorporated by the common
Electronic Design Automation (EDA) tool flow
No special design rules are involved in the interconnection of WDDL gates
The switching factor of WDDL is 100 A WDDL gate consists of a parallel
combination of two positive complementary gates one calculating the
true output using the true inputs the other the false output using the
false inputs A positive gate produces a zero output for an all zero input
The AND gate and the OR gate are examples of positive gates A
complementary gate sometimes also referred to as a dual gate
expresses the false output of the original logic gate using the false
inputs of the original gate The AND gate fed with true input signals and
the OR gate fed with false input signals are two dual gates Fig shows
the WDDL AND gate and the WDDL OR gate In the evaluation phase
each input signal is differential and the WDDL gate calculates its
differential output In the pre-charge phase the inputs to the WDDL gate
are set at 0 This puts the output of the gate at 0 A module in WDDL
pre-charges without distributing the pre-charge signal to each individual
gate During the pre-charge phase the input vector of the combinatorial
logic is set at all 0s Each individual gate will eventually have all its
inputs at 0 evaluate its output to 0 and pass this 0 value to the next
gate One could say that the pre-charge signal travels over the
combinatorial logic as a 0-wave hence WDDL There are several ways
to launch to pre-charge wave In Fig a pre-charge operator is inserted
at the start of every combinatorial logic tree ie the inputs of the
encryption module and the outputs of the registers They produce an all-
zero output in the pre-charge phase (clk-signal high) but let the
31
differential signal through during the evaluation phase (clk-signal low)
Fig
ure WDDL Pre-charge wave generationCHAPTER 7
WDDL GATESThe methodology used in the project is bottom-up approach Lower
modules are designed and later integrated to form larger modules whose further integration
leads to the final top module As it is a fact that logic gates form lower level modules
initially logic gates required for the design are implemented in WDDL style WDDL
demands a parallel combination of two positive complementary gates one calculating the
true value and the other negative value The logic gates like OR AND XOR have been
implemented Besides there is even implementation of Full Adder 32-bit XOR
etc71WDDL OR gateA WDDL OR gate is constructed by considering conventional
OR gate in parallel to its complementary gate ie AND gate as shown in the following
32
figure Figure
41 WDDL OR GateThe NAND and NOT gates used act as Precharge operator injecting
signal lsquo0rsquo into the gates when lsquoclkrsquo signal is high ie during the Precharge phase72
WDDL AND gateA WDDL AND gate is constructed by considering conventional
AND gate in parallel to its complementary gate ie OR gate as shown in the following
33
figure Figure
42 WDDL AND Gate73 WDDL NAND gateA WDDL NAND gate is constructed by
considering conventional AND gate in parallel to its complementary gate ie OR gate as
shown in the following figure
34
Figure
43 WDDL NAND Gate74 WDDL NOR gateA WDDL NOR gate is constructed by
considering conventional OR gate in parallel to its complementary gate ie AND gate as
shown in the following figure
35
Figure 44 WDDL
NOR Gate 75 WDDL XOR gate XOR function can be implemented by the
Boolean Equation A B = ABrsquo + ArsquoB Therefore XOR Gate is implemented
in terms of AND gate and OR gate as shown in the figure 44 It can also be implemented
by instantiating a WDDL AND gate and WDDL OR gate But the number of gates
involved in the latter one is greater than the former one Therefore the first method of
implementation is followed rather than the second one
36
Figure 45
WDDL XOR gateWith the help of the above basic gates Full adder circuit has been
designed by instantiating the above designed WDDL gates During the implementation of
the Blowfish algorithm a 32-bit XOR gate and 32-bit Adder circuit are required They can
be easily implemented by instantiating the corresponding lower module 32 number of
timesCHAPTER 8 FRONT END
RESULTSWDDL OR GATESynthesis
Report==========================================================
= Final Report
===========================================================Final
ResultsRTL Top Level Output File Name wddlorngrTop Level Output File Name
wddlorOutput Format NGCOptimization Goal SpeedKeep
Hierarchy NODesign Statistics IOs 5Cell Usage
BELS 2 LUT3 2 IO Buffers 5
37
IBUF 3 OBUF
2============================================================Devi
ce utilization summary---------------------------Selected Device 3s250eft256-4 Number
of Slices 1 out of 2448 0 Number of 4 input LUTs 2
out of 4896 0 Number of IOs 5 Number of bonded IOBs
5 out of 172 2 Timing Summary---------------Speed Grade -4Maximum
combinational path delay 6236nsSimulation Result
S
ynthesis Result
38
WDD
L AND GATESynthesis
Report==========================================================
== Final Report
============================================================Fina
l ResultsRTL Top Level Output File Name wddlgatesngrTop Level Output File
Name wddlgatesOutput Format NGCOptimization Goal
SpeedKeep Hierarchy NODesign Statistics IOs
5Cell Usage BELS 2 LUT3 2 IO Buffers
5 IBUF 3 OBUF
2===========================================================Devic
e utilization summary---------------------------Selected Device 3s250etq144-4 Number
of Slices 1 out of 2448 0 Number of 4 input LUTs 2
out of 4896 0 Number of IOs 5 Number of bonded IOBs
5 out of 108 4 Timing Summary---------------Speed Grade -4Maximum
combinational path delay 6236nsSimulation Result
39
Sy
nthesis Result
WDDL NAND GATESynthesis
Report==========================================================
== Final Report
============================================================Fina
l ResultsRTL Top Level Output File Name wddlnand1ngrTop Level Output File
Name wddlnand1Output Format NGCOptimization Goal
SpeedKeep Hierarchy NODesign Statistics IOs
5Cell Usage BELS 2 LUT3 2 IO Buffers
40
5 IBUF 3 OBUF
2============================================================Devi
ce utilization summarySelected Device 3s500efg320-4 Number of Slices
1 out of 4656 0 Number of 4 input LUTs 2 out of 9312 0
Number of IOs 5 Number of bonded IOBs 5 out of 232
2 Timing SummarySpeed Grade -4Maximum combinational path delay
6236nsSimulation Result
Synthesis Result
WD
41
DL XOR GATESimulation Result
Synthesis Result
WDDL XOR GATESynthesis
Report==========================================================
== Final Report
===========================================================Final
42
ResultsRTL Top Level Output File Name wddlxorgatengrTop Level Output File
Name wddlxorgateOutput Format NGCOptimization Goal
SpeedKeep Hierarchy NODesign Statistics IOs
5Cell Usage BELS 2 LUT3 2 IO Buffers
5 IBUF 3 OBUF
2============================================================Devi
ce utilization summary---------------------------Selected Device 3s250eft256-4 Number
of Slices 1 out of 2448 0 Number of 4 input LUTs 2
out of 4896 0 Number of IOs 5 Number of bonded IOBs
5 out of 172 2 Timing Summary---------------Speed Grade -4Maximum
combinational path delay 6236nsSimulation Result
Synthesis Result
43
CHAPTER 9 SUMMARY AND CONCLUSION 91
SummaryIn order to provide security to ICs against side-channel attacks especially
Differential Power Analysis (DPA) it is necessary to implement the design in a logic that
can render constant power dissipation irrespective of the input combination WDDL is
proved to be advantageous to others and therefore is of great significance In this
dissertation work architecture for Blowfish Algorithm is designed and implemented in
WDDL style In this implementation bottom-up approach is used The low level entities
are designed and later they are all combined to form the entire module The key
scheduling is online The sub-keys generated for a particular key can be used for the
encryption of the entire data to be encrypted with that key The sub keys are given in
reverse direction for the decryption data path Initially logic gates are implemented in
WDDL and then higher modules have been designed by instantiating the WDDL gates to
form the entire module thus resulting in constant power dissipation irrespective of any
input data combination The entire design works in two phases namely Precharge phase and
Evaluation phase In the Precharge phase all the signals of the design are zeroed and
during the Evaluation phase the functionality of the design is achieved This sort of design
has been found simple and very effective in thwarting the side-channel attack namely
Differential Power analysis (DPA)92 ConclusionThe crypto processor has been
44
designed for the key size of 448 bits and plain text of 64 bits The code for the
implementation has been written in VHDL The functional verification has been done using
the Cadence (NC Launch) Simulation package and Synthesis using RTL Compiler The
Backend of the design is done using the SOC EncounterAccording to the specifications
desired functionality has been achieved In the output during the Evaluation phase there
has been same number of transitions thus resulting in constant power dissipation During
Synthesis it has been observed that a simple WDDL gate comprised many conventional
gates Therefore the area of the design has grown nearly three-fold when compared to the
design implemented in conventional CMOS logic at the cost of security incorporated into
the IC against Differential Power Analysis (DPA) Due to the constant power dissipation at
the output hacker cannot apply Differential Power Analysis (DPA) scheme to find the
secret key that is being used in the crypto-processor Thus security against DPA is
incorporated into the IC at hardware level by implementing the design in WDDL style
which is quite simple and effectiveCHAPTER 10
REFERENCES 101 Referred Technical papers[1] Kris Tiri Member
IEEE and Ingrid Verbauwhede Senior Member IEEE ldquoA Digital Design Flow for
Secure Integrated Circuitsrdquo IEEE Transaction on Computer-Aided Design of Integrated
Circuits and Systems Vol 25 No 7 July 2006[2] Prof Jean-Jacques Quisquater Math
RiZK laquo Side Channel Attacks raquo October 2002[3] Ross Anderson Mike Bond Jolyon
Clulow and Sergei Skorobogatov ldquoCrypto processors ndash A Surveyrdquo IEEE Proceedings[4]
Noohul Basheer Zain Ali James M Noras ldquoOptimal Data path Design for a Cryptographic
Processor The Blowfish Algorithmrdquo Malaysian Journal of Computer Science Vol 14 No
1 June 2001 pp 16-27[5] Dan Rinehimer Derek Wilson ldquoSummary of B Schneierrsquos
Description of New Variable Length Key 64-Bit Block Cipher (Blowfish)rdquo[6] Kris Tiri and
Ingrid Verbauwhede ldquoA Dynamic and Differential CMOS Logic Style to Resist Power and
Timing Attacks on Security ICrsquosrdquo[7] Discretix Technologies Limited ldquoIntroduction to Side
45
Channel Attacksrdquo[8] Kris Tiri Moonmoon Akmal and Ingrid Verbauwhede ldquoA Dynamic
and Differential Logic with Signal Independent Power Consumption to withstand
Differential Power Analysis on Smart CardsrdquoReference books[1] William Stallings
ldquoCryptography and Network Security Principles and Practicesrdquo Pearson Education
2003[2] Samir Palnitkar ldquoVerilog HDL A Guide to Digital Design and Synthesisrdquo
Prentice Hall Referred Web Pages[1] httpwwwschneiercomblowfishhtml[2]
httpwwwnistgov[3] httpwwwdiscretixcomPDFIntroduction20to20Side20Channel
20Attackspdf[4] httpwwwwipointpctdbenwojsp
IA=WO2005081085ampDISPLAY=CLAIMS
46
Ports
Provide channels of communication between the component and its environment
Each port must have a name direction and a type
An entity may have NO port declaration
Port directions
In A value of a port can be read inside the component but cannot be assigned
Multiple reads of port are allowed
Out Assignments can be made to a port but data from a port cannot be read Multiple
assignments are allowed
In out Bi-directional assignments can be made and data can be read Multiple
assignments are allowed
Buffer An out port with read capability May have at most one assignment (are not
recommended)
Architectures
Every entity has at least one architecture
One entity can have several architectures
Architectures can describe design using
BehaviorndashStructurendashDataflow
Architectures can describe design on many levelsndashGate levelndashRTL (Register Transfer
Level)ndashBehavioral level
Configuration declaration links architecture to entity
Eg
Architecture Comparator1 of Comparator is
Begin
EQ lt= rsquo1rsquowhen (A=B) else rsquo0rsquo
End Comparator1
Configurations
Links entity declaration and architecture body together
14
Concept of default configuration is a bit messy in VHDL lsquo87
ndashLast architecture analyzed links to entity
Can be used to change simulation behavior without re-analyzing the VHDL source
Complex configuration declarations are ignored in synthesis
Some entities can have eggate level architecture and behavioral architecture
Are always optional
Packages
Packages contain information common to many design units
1 Package declaration
Constant declarations
ndash Type and subtype declarations
ndash Function and procedure declarations
ndash Global signal declarations
ndash File declarations
ndash Component declarations
2 Package body
ndash Is not necessary needed
ndash Function bodies
ndash Procedure bodies
Packages are meant for encapsuling data which can be shared globally among several design
units These consist of declaration part and optional body part
Package declaration can contain
ndash Type and subtype declarations
ndash Subprograms
ndash Constants
ndash Alias declarations
ndash Global signal declarations
ndash file declarations
ndash Component declarations
Package body consists of
15
ndash Subprogram declarations and bodies
ndash Type and subtype declarations
ndash Deferred constants
ndash File declarations
Libraries
Collection of VHDL design units (database)
1 Packages
package declaration
package body
2 Entities (entity declaration)
3 Architectures (architecture body)
4 Configurations (configuration declarations)
Usually directory in UNIX file system
Can be also any other kind of database
Levels of Abstraction
VHDL supports many possible styles of design description which differ primarily in how
closely they relate to the HW
It is possible to describe a circuit in a number of ways
Structural-------
Dataflow ------- Higher level of abstraction
Behavioral -------
Structural VHDL description
Circuit is described in terms of its components
From a low-level description (eg transistor-level description) to a high level
description (eg block diagram)
For large circuits low-level descriptions quickly become impractical
Dataflow VHDL Description
Circuit is described in terms of how data moves through the system
16
In the dataflow style you describe how information flows between registers in the
system
The combinational logic is described at a relatively high level the placement and
operation of registers is specified quite precisely
The behavior of the system over the time is defined by registers
There are no build-in registers in VHDL-language
ndashEither lower level description
ndashor behavioral description of sequential elements is needed
The lower level register descriptions must be created or obtained
If there is no 3rd party models for registers =gt you must write the behavioral
description of registers
The behavioral description can be provided in the form of subprograms(functions or
procedures)
Behavioral VHDL Description
Circuit is described in terms of its operation over time
Representation might include eg state diagrams timing diagrams and algorithmic
descriptions
The concept of time may be expressed precisely using delays (eg A lt= B after 10 ns)
If no actual delays are used order of sequential operations is defined
17
In the lower levels of abstraction (eg RTL) synthesis tools ignore detailed timing
specifications
The actual timing results depend on implementation technology and efficiency of
synthesis tool
There are a few tools for behavioral synthesis
Concurrent Vs Sequential
Processes
Basic simulation concept in VHDL
VHDL description can always be broken up to interconnected processes
Quite similar to UNIX process
18
Process keyword in VHDL
Process statement is concurrent statement
Statements inside process statements are sequential statements
Process must contain either sensitivity list or wait statement(s) but NOT both
Sensitivity list or wait statement(s) contains signals which wakes process up
General Format
Process [(sensitivity list)]
process_declarative_part
begin
process_statements
[wait_statement]
End process
19
CHAPTER 4 SMART
CARD OVERVIEW
This section will very briefly introduce the concept of a smart card Basically a smart
card is a computer embedded in a safe It consists of a (typically 8-bit or 32-bit) processor
together with ROM EEPROM and a small amount of RAM which is therefore capable of
performing computations The main goal of a smart card is to allow the execution of
cryptographic operations involving some secret parameter (the key) while not revealing this
parameter to the outside world As opposed the goal of the attacker is to recover this secret
parameter This processor is embedded in a chip and connected to the outside world through
eight wires the role use position of which is normalized In addition to the inputoutput wires
the parts we will be the most interested in are the following
1 Power supply Smart cards do not have an internal battery
2 The current they need is provided by the smart card reader This will make the smart
cards power consumption pretty easy to measure for the attacker
3 Clock Similarly smart cards do not dispose of an internal clock either The clock ticks
must also be provided from the outside world As a consequence this will allow the
attacker to measure the cards running time with very good precision
Smart cards are usually equipped with protection mechanisms composed of a shield (the
passivation layer) whose goal is to hide the internal behavior of the chip and possibly sensors
that react when the shield is removed by destroying all sensitive data and preventing the card
to function properly
20
CHAPTER 5 SIDE
CHANNEL ATTACKS
ldquoSide channel attacksrdquo are attacks that are based on ldquoSide Channel Informationrdquo Side
channel information is information that can be retrieved from the encryption device that is
neither the plaintext to be encrypted nor the cipher text resulting from the encryption process
In the past an encryption device was perceived as a unit that receives plaintext input
and produces cipher text output and vice-versa Attacks were therefore based on either
knowing the cipher text (such as cipher text-only attacks) or knowing both (such as known
plaintext attacks) or on the ability to define what plaintext is to be encrypted and then seeing
the results of the encryption (known as chosen plaintext attacks) Today it is known that
encryption devices have additional output and often additional inputs which are not the
plaintext or the cipher text
Encryption devices produce timing information (information about the time that
operations take) that is easily measurable radiation of various sorts power consumption
statistics (that can be easily measured as well) and more Often the encryption device also has
additional ldquounintentionalrdquo inputs such as voltage that can be modified to cause predictable
outcomes Side channel attacks make use of some or all of this information along with other
(known) cryptanalytic techniques to recover the key the device is using
Side channel analysis techniques are of concern because the attacks can be mounted
quickly and can sometimes be implemented using readily available hardware costing from only
a few hundred dollars to thousands of dollars
51 Classification of side channel attacks
The literature usually classifies side channel attacks along two orthogonal axes
1 Invasive vs Non-invasive
21
Invasive attacks require de-packaging the chip to get direct access to its components
A typical example of this is the connection of a wire on a data bus to see the data transfers
A non-invasive attack only exploits externally available information (the emission of
which is however often unintentional) such as running time power consumption
A new distinction called semi-invasive attacks These attacks have the specificity that
they require de-packaging of the chip to get access to the chip surface but do not tamper with
the passivation layer ( they do not require electrical contact to the metal surface)
2 Active vs passive
Active attacks try to tamper with the cards proper functioning For example fault
induction attacks will try to induce errors in the computation
As opposed passive attacks will simply observe the cards behavior during its
processing without disturbing it
Note that these two axes are well orthogonal
An Invasive attack may completely avoid disturbing the cards behavior and a passive
attack may require a preliminary de-packaging for the required information to be observable
These attacks are of course not mutually exclusive an invasive attack may for example serve
as a preliminary step for a non-invasive one by giving a detailed description of the chips
architecture that helps to find out where to put external probes
As smart cards are usually equipped with protection mechanisms that are supposed to
react to invasive attacks (although several invasive attacks are nonetheless capable to defeat
these mechanisms as will be illustrated below) On the other hand it is worth pointing out that
a non-invasive attack is completely undetectable there is for example no way for a smart card
to figure out that its running time is currently being measured Other countermeasures will
therefore be necessary From an economical point of view invasive attacks are usually more
expensive to deploy on a large scale since they require individual processing of each attacked
device In this sense non-invasive attacks constitute therefore a bigger menace for the smart
card industry
Invasive attacks involved a relatively high capital investment for lab equipment plus a
moderate investment of effort for each individual chip attacked Non-invasive attacks require
only a moderate capital investment plus a moderate investment of effort in designing an attack
on a particular type of device Thereafter the cost per device attacked is low Semi-invasive
attacks can be carried out using very cheap and simple equipment
The attacker can gain information by
22
1 Probing attacks
2 Fault induction attacks
3 Timing attacks
4 Power analysis attacks and
5 Electromagnetic timing attacks
These attacks are performed during the switching behavior of digital
complementary metalndashoxidendashsemiconductor (CMOS) gates Of all these Power analysis attack
is of major concern
52 Power analysis attacks
The power consumption of a cryptographic device may provide much information
about the operations that take place and the involved parameters This is the idea of simple and
differential power analysis first introduced by Kocher et al As the clock ticks the cards
energy is also provided by the terminal and can therefore easily be measured Basically to
measure a circuits power consumption a small (eg 50 ohm) resistor is inserted in series with
the power or ground input The voltage difference across the resistor divided by the resistance
yields the current Well-equipped electronics labs have equipment that can digitally sample
voltage differences at extraordinarily high rates (over 1GHz) with excellent accuracy (less than
1 error) Devices capable of sampling at 20MHz or faster and transferring the data to a PC
can be bought for less than US$ 400
Power analysis attacks are of two types
1 Simple power analysis attack and
2 Differential Power Analysis attack
SPA attacks on smartcards typically take a few seconds per card while DPA attacks
can take several hours In a general with a somewhat academic perspective we may consider
the entire internal state of the block cipher to be all the intermediate results and values that are
never included in the output in normal operations For example DES has 16 rounds we can
consider the intermediate states state [115] after each round except the last as a secret internal
state Side channels typically give information about these internal states or about the
operations used in the transition of this internal state from one round to another The type of
side-channel will of course determine what information is available to the attacker about these
states The attacks typically work by finding some information about the internal state of the
cipher which can be learned both by guessing part of the key and checking the value directly
23
and additionally by some statistical property of the cipher that makes that checkable value
slightly nonrandom
521 Simple Power Analysis attack (SPA)
Simple Power Analysis is generally based on looking at the visual representation of the
power consumption of a unit while an encryption operation is being performed Simple Power
Analysis is a technique that involves direct interpretation of power consumption measurements
collected during cryptographic operations SPA can yield information about a devices
operation as well as key material
A trace refers to a set of power consumption measurements taken across a
cryptographic operation For example a 1 millisecond operation sampled at 5 MHz yields a
trace containing 5000 points Figure for example shows an SPA trace from a smart card
performing a DES operation
Figure SPA monitoring from a single DES operation performed by a typical smart card The
upper trace shows the entire encryption operation including the initial permutation the 16
DES rounds and the final permutation The lower trace is a detailed view of the second and
third rounds
Because SPA can reveal the sequence of instructions executed it can be used to break
cryptographic implementations in which the execution path depends on the data being
processed For example
DES key schedule the DES key schedule computation involves rotating 28-bit key registers
A conditional branch is commonly used to check the bit shifted off the end so that ldquo1 bits can
24
be wrapped around The resulting power consumption traces for a ldquo1 bit and a ldquo0 bit will
contain different SPA features if the execution paths take different branches for each
DES permutations DES implementations perform a variety of bit permutations Conditional
branching in software or microcode can cause significant power consumption differences for
ldquo0 and ldquo1 bits
Comparisons String or memory comparison operations typically perform a conditional
branch when a mismatch is found This conditional branching causes large SPA (and
sometimes timing) characteristics
Multipliers Modular multiplication circuits tend to leak a great deal of information about the
data they process The leakage functions depend on the multiplier design but are often strongly
correlated to operand values and Hamming weights
Exponentiators A simple modular exponentiation function scans across the exponent
performing a squaring operation in every iteration with an additional multiplication operation
for each exponent bit that is equal to ldquo1 The exponent can be compromised if squaring and
multiplication operations have different power consumption characteristics take different
amounts of time or are separated by different code Modular exponentiation functions that
operate on two or more exponent bits at a time may have more complex leakage functions
522Differential Power Analysis attack (DPA)
In addition to large-scale power variations due to the instruction sequence there are
effects correlated to data values being manipulated These variations tend to be smaller and are
sometimes overshadowed by measurement errors and other noise In such cases it is still often
possible to break the system using statistical functions tailored to the target algorithm
To implement the DPA attack an attacker first observes m encryption operations and captures
power traces T1 m [1 k] containing k samples each In addition the attacker records the
cipher text C1 m No knowledge of the plain text is required DPA analysis uses power
consumption measurements to determine whether a key block guess Ks is correct The attacker
computes a k-sample differential trace centD [1 k] by finding the difference between the
average of the traces for which a certain intermediate value V is one and the average of the
traces for which V is zero Thus cent D[j) is the average over C1m of the effect due to the value
represented by the selection function D on the power consumption at point j In particular25
If Ks is incorrect the bit computed using D will differ from the actual target bit for about half
of the ciphertext Ci The selection function is thus effectively uncorrelated to what was actually
computed by the target device If a random function is used to divide a set into two subsets the
difference in the averages of the subsets should approach zero as the subset sizes approach
infinity
Thus because trace components uncorrelated to D will diminish with 1 pm causing the
differential trace to become at (the actual trace may not be completely at as D with Ks
incorrect may have a weak correlation to D with the correct Ks) If Ks is correct however the
computed value for D (Ci bKs) will equal the actual value of target bit b with probability 1
The selection function is thus correlated to the value of the bit considered Other data values
measurement errors etc that are not correlated to D approach zero Because power
consumption is correlated to data bit values the plot of centD will be degat with spikes in regions
where D is correlated to the values being processed The correct value of Ks can thus be
identified from the spikes in its differential trace Four values of b correspond to each S box
providing confirmation of key block guesses Finding all eight Ks yields the entire 48-bit round
sub key The remaining 8 key bits can be found easily using exhaustive search or by analyzing
one additional round Triple DES keys can be found by analyzing an outer DES operation first
using the resulting key to decrypt the cipher text and attacking the next DES key DPA can use
known plaintext or known cipher text and can find encryption or decryption keys
26
CHAPTER 6 CONSTANT POWER CONSUMING
LOGIC STYLES
The power consumption of traditional standard cells and logic is
dependent on the signal activity When the output of the logic gate makes
a 0 to 1 transition a current comes from the power supply and charges the
output capacitance On the other hand when the output sees a 1 to 0 a 0
to 0 or a 1 to 1 transition no or only a limited amount of energy (due to
short circuit or leakage) is consumed from the power supply This is the
fundamental reason why information is leaked through the power supply
and why power attacks are possible The basis of a secure digital design
flow is a logic style with constant power consumption
61 Current Mode Logic
Current mode logic (CML) eg current steering logic seems the
ideal solution This type of logic continuously draws a current from the
supply and measures its state through the path that the current takes A
gate has constant power consumption if it draws a perfectly constant
current from the power supply independently of the input and output
signals To build a current source capable of generating a constant current
special circuit techniques that minimize channel length modulation have to
be used
The decisive drawback of CML however is its static power
consumption When the logic gate is not processing any data it burns the
27
current which makes this logic style unacceptable for embedded battery-
operated devices
62 Voltage Mode Logic (CMOS circuit styles)
Voltage mode logic (VML) eg static CMOS logic only draws a current from the
supply to change state and measures its state by the amount of charge it stores on a
capacitance A regular standard CMOS circuit will only consume power when a capacitance
gets charged and later discharged ie when a gate switches state It is the main reason that
CMOS is the style of choice for every battery operated or low power device This is illustrated
in the figure below for simple inverter Thus static CMOS is the preferred logic style because
of its low power consumption and high noise margins
Standard CMOS inverter
Yet two conditions must be satisfied for VML to have constant power consumption
namely
1) A logic gate must have exactly one switching event per signal transition
2) The logic gate must charge a constant capacitance in that switching event
28
Here above all the four transitions of CMOS inverter can be distinguished when
monitoring the power supply
63 Dynamic Differential Logic
Dynamic differential logic sometimes also referred to as dual rail with pre-charge
logic fulfills the first condition A differential logic family uses the true and the false
representation of the input and output signals and a dynamic logic family alternates pre-charge
and evaluation phases As a result since both outputs (true and false) are pre-charged to 1
exactly one of the two output nodes evaluates to 0 to have a differential output signal in the
evaluation phase The discharged output node is charged to 1 in the following pre-charge phase
to pre-charge both outputs to 1 In other words every signal transition including the events in
which the input signals remain constant is represented with an actual switching event in
which the logic gate charges a capacitance All the logic families that have been introduced to
thwart the differential power analysis (DPA) by using dynamic differential logic in the
following techniques
1 Sense Amplifier Based Logic (SABL) and
2 Wave Dynamic Differential Logic (WDDL) gates
631 Sense Amplifier Based logic (SABL)
SABL has its main advantage that it has balanced input and output nodes and that all
internal nodes connect to an output The output capacitances can be balanced Systematic
methods have been developed to make sure that both branches of the differential pull down
network are balanced and that no memory effects are present in the network Sense Amplifier
Based logic is illustrated as
29
Sense Amplifier Based Logic
ANDNAND gate
This circuit style does require however a full custom characterization and layout It also
suffers from a high clock load common to all dynamic logic gates
632 Wave Dynamic Differential Logic Gates (WDDL)
WDDL logic can be implemented with static CMOS logic Static CMOS
standard cells are combined to form secure compound standard cells
which have a reduced power signature WDDL has many advantages It can
be readily implemented from an existing standard cell library The design
flow is fully supported with accurate EDA library files that come directly
from the vendor WDDL also results in a dynamic differential logic with only
a small load capacitance on the pre-charge control signal and with the low
power consumption and the high noise margins of static CMOS
Advantages of WDDL logic style are as follows
30
A major advantage of the proposed logic style is that it can be incorporated by the common
Electronic Design Automation (EDA) tool flow
No special design rules are involved in the interconnection of WDDL gates
The switching factor of WDDL is 100 A WDDL gate consists of a parallel
combination of two positive complementary gates one calculating the
true output using the true inputs the other the false output using the
false inputs A positive gate produces a zero output for an all zero input
The AND gate and the OR gate are examples of positive gates A
complementary gate sometimes also referred to as a dual gate
expresses the false output of the original logic gate using the false
inputs of the original gate The AND gate fed with true input signals and
the OR gate fed with false input signals are two dual gates Fig shows
the WDDL AND gate and the WDDL OR gate In the evaluation phase
each input signal is differential and the WDDL gate calculates its
differential output In the pre-charge phase the inputs to the WDDL gate
are set at 0 This puts the output of the gate at 0 A module in WDDL
pre-charges without distributing the pre-charge signal to each individual
gate During the pre-charge phase the input vector of the combinatorial
logic is set at all 0s Each individual gate will eventually have all its
inputs at 0 evaluate its output to 0 and pass this 0 value to the next
gate One could say that the pre-charge signal travels over the
combinatorial logic as a 0-wave hence WDDL There are several ways
to launch to pre-charge wave In Fig a pre-charge operator is inserted
at the start of every combinatorial logic tree ie the inputs of the
encryption module and the outputs of the registers They produce an all-
zero output in the pre-charge phase (clk-signal high) but let the
31
differential signal through during the evaluation phase (clk-signal low)
Fig
ure WDDL Pre-charge wave generationCHAPTER 7
WDDL GATESThe methodology used in the project is bottom-up approach Lower
modules are designed and later integrated to form larger modules whose further integration
leads to the final top module As it is a fact that logic gates form lower level modules
initially logic gates required for the design are implemented in WDDL style WDDL
demands a parallel combination of two positive complementary gates one calculating the
true value and the other negative value The logic gates like OR AND XOR have been
implemented Besides there is even implementation of Full Adder 32-bit XOR
etc71WDDL OR gateA WDDL OR gate is constructed by considering conventional
OR gate in parallel to its complementary gate ie AND gate as shown in the following
32
figure Figure
41 WDDL OR GateThe NAND and NOT gates used act as Precharge operator injecting
signal lsquo0rsquo into the gates when lsquoclkrsquo signal is high ie during the Precharge phase72
WDDL AND gateA WDDL AND gate is constructed by considering conventional
AND gate in parallel to its complementary gate ie OR gate as shown in the following
33
figure Figure
42 WDDL AND Gate73 WDDL NAND gateA WDDL NAND gate is constructed by
considering conventional AND gate in parallel to its complementary gate ie OR gate as
shown in the following figure
34
Figure
43 WDDL NAND Gate74 WDDL NOR gateA WDDL NOR gate is constructed by
considering conventional OR gate in parallel to its complementary gate ie AND gate as
shown in the following figure
35
Figure 44 WDDL
NOR Gate 75 WDDL XOR gate XOR function can be implemented by the
Boolean Equation A B = ABrsquo + ArsquoB Therefore XOR Gate is implemented
in terms of AND gate and OR gate as shown in the figure 44 It can also be implemented
by instantiating a WDDL AND gate and WDDL OR gate But the number of gates
involved in the latter one is greater than the former one Therefore the first method of
implementation is followed rather than the second one
36
Figure 45
WDDL XOR gateWith the help of the above basic gates Full adder circuit has been
designed by instantiating the above designed WDDL gates During the implementation of
the Blowfish algorithm a 32-bit XOR gate and 32-bit Adder circuit are required They can
be easily implemented by instantiating the corresponding lower module 32 number of
timesCHAPTER 8 FRONT END
RESULTSWDDL OR GATESynthesis
Report==========================================================
= Final Report
===========================================================Final
ResultsRTL Top Level Output File Name wddlorngrTop Level Output File Name
wddlorOutput Format NGCOptimization Goal SpeedKeep
Hierarchy NODesign Statistics IOs 5Cell Usage
BELS 2 LUT3 2 IO Buffers 5
37
IBUF 3 OBUF
2============================================================Devi
ce utilization summary---------------------------Selected Device 3s250eft256-4 Number
of Slices 1 out of 2448 0 Number of 4 input LUTs 2
out of 4896 0 Number of IOs 5 Number of bonded IOBs
5 out of 172 2 Timing Summary---------------Speed Grade -4Maximum
combinational path delay 6236nsSimulation Result
S
ynthesis Result
38
WDD
L AND GATESynthesis
Report==========================================================
== Final Report
============================================================Fina
l ResultsRTL Top Level Output File Name wddlgatesngrTop Level Output File
Name wddlgatesOutput Format NGCOptimization Goal
SpeedKeep Hierarchy NODesign Statistics IOs
5Cell Usage BELS 2 LUT3 2 IO Buffers
5 IBUF 3 OBUF
2===========================================================Devic
e utilization summary---------------------------Selected Device 3s250etq144-4 Number
of Slices 1 out of 2448 0 Number of 4 input LUTs 2
out of 4896 0 Number of IOs 5 Number of bonded IOBs
5 out of 108 4 Timing Summary---------------Speed Grade -4Maximum
combinational path delay 6236nsSimulation Result
39
Sy
nthesis Result
WDDL NAND GATESynthesis
Report==========================================================
== Final Report
============================================================Fina
l ResultsRTL Top Level Output File Name wddlnand1ngrTop Level Output File
Name wddlnand1Output Format NGCOptimization Goal
SpeedKeep Hierarchy NODesign Statistics IOs
5Cell Usage BELS 2 LUT3 2 IO Buffers
40
5 IBUF 3 OBUF
2============================================================Devi
ce utilization summarySelected Device 3s500efg320-4 Number of Slices
1 out of 4656 0 Number of 4 input LUTs 2 out of 9312 0
Number of IOs 5 Number of bonded IOBs 5 out of 232
2 Timing SummarySpeed Grade -4Maximum combinational path delay
6236nsSimulation Result
Synthesis Result
WD
41
DL XOR GATESimulation Result
Synthesis Result
WDDL XOR GATESynthesis
Report==========================================================
== Final Report
===========================================================Final
42
ResultsRTL Top Level Output File Name wddlxorgatengrTop Level Output File
Name wddlxorgateOutput Format NGCOptimization Goal
SpeedKeep Hierarchy NODesign Statistics IOs
5Cell Usage BELS 2 LUT3 2 IO Buffers
5 IBUF 3 OBUF
2============================================================Devi
ce utilization summary---------------------------Selected Device 3s250eft256-4 Number
of Slices 1 out of 2448 0 Number of 4 input LUTs 2
out of 4896 0 Number of IOs 5 Number of bonded IOBs
5 out of 172 2 Timing Summary---------------Speed Grade -4Maximum
combinational path delay 6236nsSimulation Result
Synthesis Result
43
CHAPTER 9 SUMMARY AND CONCLUSION 91
SummaryIn order to provide security to ICs against side-channel attacks especially
Differential Power Analysis (DPA) it is necessary to implement the design in a logic that
can render constant power dissipation irrespective of the input combination WDDL is
proved to be advantageous to others and therefore is of great significance In this
dissertation work architecture for Blowfish Algorithm is designed and implemented in
WDDL style In this implementation bottom-up approach is used The low level entities
are designed and later they are all combined to form the entire module The key
scheduling is online The sub-keys generated for a particular key can be used for the
encryption of the entire data to be encrypted with that key The sub keys are given in
reverse direction for the decryption data path Initially logic gates are implemented in
WDDL and then higher modules have been designed by instantiating the WDDL gates to
form the entire module thus resulting in constant power dissipation irrespective of any
input data combination The entire design works in two phases namely Precharge phase and
Evaluation phase In the Precharge phase all the signals of the design are zeroed and
during the Evaluation phase the functionality of the design is achieved This sort of design
has been found simple and very effective in thwarting the side-channel attack namely
Differential Power analysis (DPA)92 ConclusionThe crypto processor has been
44
designed for the key size of 448 bits and plain text of 64 bits The code for the
implementation has been written in VHDL The functional verification has been done using
the Cadence (NC Launch) Simulation package and Synthesis using RTL Compiler The
Backend of the design is done using the SOC EncounterAccording to the specifications
desired functionality has been achieved In the output during the Evaluation phase there
has been same number of transitions thus resulting in constant power dissipation During
Synthesis it has been observed that a simple WDDL gate comprised many conventional
gates Therefore the area of the design has grown nearly three-fold when compared to the
design implemented in conventional CMOS logic at the cost of security incorporated into
the IC against Differential Power Analysis (DPA) Due to the constant power dissipation at
the output hacker cannot apply Differential Power Analysis (DPA) scheme to find the
secret key that is being used in the crypto-processor Thus security against DPA is
incorporated into the IC at hardware level by implementing the design in WDDL style
which is quite simple and effectiveCHAPTER 10
REFERENCES 101 Referred Technical papers[1] Kris Tiri Member
IEEE and Ingrid Verbauwhede Senior Member IEEE ldquoA Digital Design Flow for
Secure Integrated Circuitsrdquo IEEE Transaction on Computer-Aided Design of Integrated
Circuits and Systems Vol 25 No 7 July 2006[2] Prof Jean-Jacques Quisquater Math
RiZK laquo Side Channel Attacks raquo October 2002[3] Ross Anderson Mike Bond Jolyon
Clulow and Sergei Skorobogatov ldquoCrypto processors ndash A Surveyrdquo IEEE Proceedings[4]
Noohul Basheer Zain Ali James M Noras ldquoOptimal Data path Design for a Cryptographic
Processor The Blowfish Algorithmrdquo Malaysian Journal of Computer Science Vol 14 No
1 June 2001 pp 16-27[5] Dan Rinehimer Derek Wilson ldquoSummary of B Schneierrsquos
Description of New Variable Length Key 64-Bit Block Cipher (Blowfish)rdquo[6] Kris Tiri and
Ingrid Verbauwhede ldquoA Dynamic and Differential CMOS Logic Style to Resist Power and
Timing Attacks on Security ICrsquosrdquo[7] Discretix Technologies Limited ldquoIntroduction to Side
45
Channel Attacksrdquo[8] Kris Tiri Moonmoon Akmal and Ingrid Verbauwhede ldquoA Dynamic
and Differential Logic with Signal Independent Power Consumption to withstand
Differential Power Analysis on Smart CardsrdquoReference books[1] William Stallings
ldquoCryptography and Network Security Principles and Practicesrdquo Pearson Education
2003[2] Samir Palnitkar ldquoVerilog HDL A Guide to Digital Design and Synthesisrdquo
Prentice Hall Referred Web Pages[1] httpwwwschneiercomblowfishhtml[2]
httpwwwnistgov[3] httpwwwdiscretixcomPDFIntroduction20to20Side20Channel
20Attackspdf[4] httpwwwwipointpctdbenwojsp
IA=WO2005081085ampDISPLAY=CLAIMS
46
Concept of default configuration is a bit messy in VHDL lsquo87
ndashLast architecture analyzed links to entity
Can be used to change simulation behavior without re-analyzing the VHDL source
Complex configuration declarations are ignored in synthesis
Some entities can have eggate level architecture and behavioral architecture
Are always optional
Packages
Packages contain information common to many design units
1 Package declaration
Constant declarations
ndash Type and subtype declarations
ndash Function and procedure declarations
ndash Global signal declarations
ndash File declarations
ndash Component declarations
2 Package body
ndash Is not necessary needed
ndash Function bodies
ndash Procedure bodies
Packages are meant for encapsuling data which can be shared globally among several design
units These consist of declaration part and optional body part
Package declaration can contain
ndash Type and subtype declarations
ndash Subprograms
ndash Constants
ndash Alias declarations
ndash Global signal declarations
ndash file declarations
ndash Component declarations
Package body consists of
15
ndash Subprogram declarations and bodies
ndash Type and subtype declarations
ndash Deferred constants
ndash File declarations
Libraries
Collection of VHDL design units (database)
1 Packages
package declaration
package body
2 Entities (entity declaration)
3 Architectures (architecture body)
4 Configurations (configuration declarations)
Usually directory in UNIX file system
Can be also any other kind of database
Levels of Abstraction
VHDL supports many possible styles of design description which differ primarily in how
closely they relate to the HW
It is possible to describe a circuit in a number of ways
Structural-------
Dataflow ------- Higher level of abstraction
Behavioral -------
Structural VHDL description
Circuit is described in terms of its components
From a low-level description (eg transistor-level description) to a high level
description (eg block diagram)
For large circuits low-level descriptions quickly become impractical
Dataflow VHDL Description
Circuit is described in terms of how data moves through the system
16
In the dataflow style you describe how information flows between registers in the
system
The combinational logic is described at a relatively high level the placement and
operation of registers is specified quite precisely
The behavior of the system over the time is defined by registers
There are no build-in registers in VHDL-language
ndashEither lower level description
ndashor behavioral description of sequential elements is needed
The lower level register descriptions must be created or obtained
If there is no 3rd party models for registers =gt you must write the behavioral
description of registers
The behavioral description can be provided in the form of subprograms(functions or
procedures)
Behavioral VHDL Description
Circuit is described in terms of its operation over time
Representation might include eg state diagrams timing diagrams and algorithmic
descriptions
The concept of time may be expressed precisely using delays (eg A lt= B after 10 ns)
If no actual delays are used order of sequential operations is defined
17
In the lower levels of abstraction (eg RTL) synthesis tools ignore detailed timing
specifications
The actual timing results depend on implementation technology and efficiency of
synthesis tool
There are a few tools for behavioral synthesis
Concurrent Vs Sequential
Processes
Basic simulation concept in VHDL
VHDL description can always be broken up to interconnected processes
Quite similar to UNIX process
18
Process keyword in VHDL
Process statement is concurrent statement
Statements inside process statements are sequential statements
Process must contain either sensitivity list or wait statement(s) but NOT both
Sensitivity list or wait statement(s) contains signals which wakes process up
General Format
Process [(sensitivity list)]
process_declarative_part
begin
process_statements
[wait_statement]
End process
19
CHAPTER 4 SMART
CARD OVERVIEW
This section will very briefly introduce the concept of a smart card Basically a smart
card is a computer embedded in a safe It consists of a (typically 8-bit or 32-bit) processor
together with ROM EEPROM and a small amount of RAM which is therefore capable of
performing computations The main goal of a smart card is to allow the execution of
cryptographic operations involving some secret parameter (the key) while not revealing this
parameter to the outside world As opposed the goal of the attacker is to recover this secret
parameter This processor is embedded in a chip and connected to the outside world through
eight wires the role use position of which is normalized In addition to the inputoutput wires
the parts we will be the most interested in are the following
1 Power supply Smart cards do not have an internal battery
2 The current they need is provided by the smart card reader This will make the smart
cards power consumption pretty easy to measure for the attacker
3 Clock Similarly smart cards do not dispose of an internal clock either The clock ticks
must also be provided from the outside world As a consequence this will allow the
attacker to measure the cards running time with very good precision
Smart cards are usually equipped with protection mechanisms composed of a shield (the
passivation layer) whose goal is to hide the internal behavior of the chip and possibly sensors
that react when the shield is removed by destroying all sensitive data and preventing the card
to function properly
20
CHAPTER 5 SIDE
CHANNEL ATTACKS
ldquoSide channel attacksrdquo are attacks that are based on ldquoSide Channel Informationrdquo Side
channel information is information that can be retrieved from the encryption device that is
neither the plaintext to be encrypted nor the cipher text resulting from the encryption process
In the past an encryption device was perceived as a unit that receives plaintext input
and produces cipher text output and vice-versa Attacks were therefore based on either
knowing the cipher text (such as cipher text-only attacks) or knowing both (such as known
plaintext attacks) or on the ability to define what plaintext is to be encrypted and then seeing
the results of the encryption (known as chosen plaintext attacks) Today it is known that
encryption devices have additional output and often additional inputs which are not the
plaintext or the cipher text
Encryption devices produce timing information (information about the time that
operations take) that is easily measurable radiation of various sorts power consumption
statistics (that can be easily measured as well) and more Often the encryption device also has
additional ldquounintentionalrdquo inputs such as voltage that can be modified to cause predictable
outcomes Side channel attacks make use of some or all of this information along with other
(known) cryptanalytic techniques to recover the key the device is using
Side channel analysis techniques are of concern because the attacks can be mounted
quickly and can sometimes be implemented using readily available hardware costing from only
a few hundred dollars to thousands of dollars
51 Classification of side channel attacks
The literature usually classifies side channel attacks along two orthogonal axes
1 Invasive vs Non-invasive
21
Invasive attacks require de-packaging the chip to get direct access to its components
A typical example of this is the connection of a wire on a data bus to see the data transfers
A non-invasive attack only exploits externally available information (the emission of
which is however often unintentional) such as running time power consumption
A new distinction called semi-invasive attacks These attacks have the specificity that
they require de-packaging of the chip to get access to the chip surface but do not tamper with
the passivation layer ( they do not require electrical contact to the metal surface)
2 Active vs passive
Active attacks try to tamper with the cards proper functioning For example fault
induction attacks will try to induce errors in the computation
As opposed passive attacks will simply observe the cards behavior during its
processing without disturbing it
Note that these two axes are well orthogonal
An Invasive attack may completely avoid disturbing the cards behavior and a passive
attack may require a preliminary de-packaging for the required information to be observable
These attacks are of course not mutually exclusive an invasive attack may for example serve
as a preliminary step for a non-invasive one by giving a detailed description of the chips
architecture that helps to find out where to put external probes
As smart cards are usually equipped with protection mechanisms that are supposed to
react to invasive attacks (although several invasive attacks are nonetheless capable to defeat
these mechanisms as will be illustrated below) On the other hand it is worth pointing out that
a non-invasive attack is completely undetectable there is for example no way for a smart card
to figure out that its running time is currently being measured Other countermeasures will
therefore be necessary From an economical point of view invasive attacks are usually more
expensive to deploy on a large scale since they require individual processing of each attacked
device In this sense non-invasive attacks constitute therefore a bigger menace for the smart
card industry
Invasive attacks involved a relatively high capital investment for lab equipment plus a
moderate investment of effort for each individual chip attacked Non-invasive attacks require
only a moderate capital investment plus a moderate investment of effort in designing an attack
on a particular type of device Thereafter the cost per device attacked is low Semi-invasive
attacks can be carried out using very cheap and simple equipment
The attacker can gain information by
22
1 Probing attacks
2 Fault induction attacks
3 Timing attacks
4 Power analysis attacks and
5 Electromagnetic timing attacks
These attacks are performed during the switching behavior of digital
complementary metalndashoxidendashsemiconductor (CMOS) gates Of all these Power analysis attack
is of major concern
52 Power analysis attacks
The power consumption of a cryptographic device may provide much information
about the operations that take place and the involved parameters This is the idea of simple and
differential power analysis first introduced by Kocher et al As the clock ticks the cards
energy is also provided by the terminal and can therefore easily be measured Basically to
measure a circuits power consumption a small (eg 50 ohm) resistor is inserted in series with
the power or ground input The voltage difference across the resistor divided by the resistance
yields the current Well-equipped electronics labs have equipment that can digitally sample
voltage differences at extraordinarily high rates (over 1GHz) with excellent accuracy (less than
1 error) Devices capable of sampling at 20MHz or faster and transferring the data to a PC
can be bought for less than US$ 400
Power analysis attacks are of two types
1 Simple power analysis attack and
2 Differential Power Analysis attack
SPA attacks on smartcards typically take a few seconds per card while DPA attacks
can take several hours In a general with a somewhat academic perspective we may consider
the entire internal state of the block cipher to be all the intermediate results and values that are
never included in the output in normal operations For example DES has 16 rounds we can
consider the intermediate states state [115] after each round except the last as a secret internal
state Side channels typically give information about these internal states or about the
operations used in the transition of this internal state from one round to another The type of
side-channel will of course determine what information is available to the attacker about these
states The attacks typically work by finding some information about the internal state of the
cipher which can be learned both by guessing part of the key and checking the value directly
23
and additionally by some statistical property of the cipher that makes that checkable value
slightly nonrandom
521 Simple Power Analysis attack (SPA)
Simple Power Analysis is generally based on looking at the visual representation of the
power consumption of a unit while an encryption operation is being performed Simple Power
Analysis is a technique that involves direct interpretation of power consumption measurements
collected during cryptographic operations SPA can yield information about a devices
operation as well as key material
A trace refers to a set of power consumption measurements taken across a
cryptographic operation For example a 1 millisecond operation sampled at 5 MHz yields a
trace containing 5000 points Figure for example shows an SPA trace from a smart card
performing a DES operation
Figure SPA monitoring from a single DES operation performed by a typical smart card The
upper trace shows the entire encryption operation including the initial permutation the 16
DES rounds and the final permutation The lower trace is a detailed view of the second and
third rounds
Because SPA can reveal the sequence of instructions executed it can be used to break
cryptographic implementations in which the execution path depends on the data being
processed For example
DES key schedule the DES key schedule computation involves rotating 28-bit key registers
A conditional branch is commonly used to check the bit shifted off the end so that ldquo1 bits can
24
be wrapped around The resulting power consumption traces for a ldquo1 bit and a ldquo0 bit will
contain different SPA features if the execution paths take different branches for each
DES permutations DES implementations perform a variety of bit permutations Conditional
branching in software or microcode can cause significant power consumption differences for
ldquo0 and ldquo1 bits
Comparisons String or memory comparison operations typically perform a conditional
branch when a mismatch is found This conditional branching causes large SPA (and
sometimes timing) characteristics
Multipliers Modular multiplication circuits tend to leak a great deal of information about the
data they process The leakage functions depend on the multiplier design but are often strongly
correlated to operand values and Hamming weights
Exponentiators A simple modular exponentiation function scans across the exponent
performing a squaring operation in every iteration with an additional multiplication operation
for each exponent bit that is equal to ldquo1 The exponent can be compromised if squaring and
multiplication operations have different power consumption characteristics take different
amounts of time or are separated by different code Modular exponentiation functions that
operate on two or more exponent bits at a time may have more complex leakage functions
522Differential Power Analysis attack (DPA)
In addition to large-scale power variations due to the instruction sequence there are
effects correlated to data values being manipulated These variations tend to be smaller and are
sometimes overshadowed by measurement errors and other noise In such cases it is still often
possible to break the system using statistical functions tailored to the target algorithm
To implement the DPA attack an attacker first observes m encryption operations and captures
power traces T1 m [1 k] containing k samples each In addition the attacker records the
cipher text C1 m No knowledge of the plain text is required DPA analysis uses power
consumption measurements to determine whether a key block guess Ks is correct The attacker
computes a k-sample differential trace centD [1 k] by finding the difference between the
average of the traces for which a certain intermediate value V is one and the average of the
traces for which V is zero Thus cent D[j) is the average over C1m of the effect due to the value
represented by the selection function D on the power consumption at point j In particular25
If Ks is incorrect the bit computed using D will differ from the actual target bit for about half
of the ciphertext Ci The selection function is thus effectively uncorrelated to what was actually
computed by the target device If a random function is used to divide a set into two subsets the
difference in the averages of the subsets should approach zero as the subset sizes approach
infinity
Thus because trace components uncorrelated to D will diminish with 1 pm causing the
differential trace to become at (the actual trace may not be completely at as D with Ks
incorrect may have a weak correlation to D with the correct Ks) If Ks is correct however the
computed value for D (Ci bKs) will equal the actual value of target bit b with probability 1
The selection function is thus correlated to the value of the bit considered Other data values
measurement errors etc that are not correlated to D approach zero Because power
consumption is correlated to data bit values the plot of centD will be degat with spikes in regions
where D is correlated to the values being processed The correct value of Ks can thus be
identified from the spikes in its differential trace Four values of b correspond to each S box
providing confirmation of key block guesses Finding all eight Ks yields the entire 48-bit round
sub key The remaining 8 key bits can be found easily using exhaustive search or by analyzing
one additional round Triple DES keys can be found by analyzing an outer DES operation first
using the resulting key to decrypt the cipher text and attacking the next DES key DPA can use
known plaintext or known cipher text and can find encryption or decryption keys
26
CHAPTER 6 CONSTANT POWER CONSUMING
LOGIC STYLES
The power consumption of traditional standard cells and logic is
dependent on the signal activity When the output of the logic gate makes
a 0 to 1 transition a current comes from the power supply and charges the
output capacitance On the other hand when the output sees a 1 to 0 a 0
to 0 or a 1 to 1 transition no or only a limited amount of energy (due to
short circuit or leakage) is consumed from the power supply This is the
fundamental reason why information is leaked through the power supply
and why power attacks are possible The basis of a secure digital design
flow is a logic style with constant power consumption
61 Current Mode Logic
Current mode logic (CML) eg current steering logic seems the
ideal solution This type of logic continuously draws a current from the
supply and measures its state through the path that the current takes A
gate has constant power consumption if it draws a perfectly constant
current from the power supply independently of the input and output
signals To build a current source capable of generating a constant current
special circuit techniques that minimize channel length modulation have to
be used
The decisive drawback of CML however is its static power
consumption When the logic gate is not processing any data it burns the
27
current which makes this logic style unacceptable for embedded battery-
operated devices
62 Voltage Mode Logic (CMOS circuit styles)
Voltage mode logic (VML) eg static CMOS logic only draws a current from the
supply to change state and measures its state by the amount of charge it stores on a
capacitance A regular standard CMOS circuit will only consume power when a capacitance
gets charged and later discharged ie when a gate switches state It is the main reason that
CMOS is the style of choice for every battery operated or low power device This is illustrated
in the figure below for simple inverter Thus static CMOS is the preferred logic style because
of its low power consumption and high noise margins
Standard CMOS inverter
Yet two conditions must be satisfied for VML to have constant power consumption
namely
1) A logic gate must have exactly one switching event per signal transition
2) The logic gate must charge a constant capacitance in that switching event
28
Here above all the four transitions of CMOS inverter can be distinguished when
monitoring the power supply
63 Dynamic Differential Logic
Dynamic differential logic sometimes also referred to as dual rail with pre-charge
logic fulfills the first condition A differential logic family uses the true and the false
representation of the input and output signals and a dynamic logic family alternates pre-charge
and evaluation phases As a result since both outputs (true and false) are pre-charged to 1
exactly one of the two output nodes evaluates to 0 to have a differential output signal in the
evaluation phase The discharged output node is charged to 1 in the following pre-charge phase
to pre-charge both outputs to 1 In other words every signal transition including the events in
which the input signals remain constant is represented with an actual switching event in
which the logic gate charges a capacitance All the logic families that have been introduced to
thwart the differential power analysis (DPA) by using dynamic differential logic in the
following techniques
1 Sense Amplifier Based Logic (SABL) and
2 Wave Dynamic Differential Logic (WDDL) gates
631 Sense Amplifier Based logic (SABL)
SABL has its main advantage that it has balanced input and output nodes and that all
internal nodes connect to an output The output capacitances can be balanced Systematic
methods have been developed to make sure that both branches of the differential pull down
network are balanced and that no memory effects are present in the network Sense Amplifier
Based logic is illustrated as
29
Sense Amplifier Based Logic
ANDNAND gate
This circuit style does require however a full custom characterization and layout It also
suffers from a high clock load common to all dynamic logic gates
632 Wave Dynamic Differential Logic Gates (WDDL)
WDDL logic can be implemented with static CMOS logic Static CMOS
standard cells are combined to form secure compound standard cells
which have a reduced power signature WDDL has many advantages It can
be readily implemented from an existing standard cell library The design
flow is fully supported with accurate EDA library files that come directly
from the vendor WDDL also results in a dynamic differential logic with only
a small load capacitance on the pre-charge control signal and with the low
power consumption and the high noise margins of static CMOS
Advantages of WDDL logic style are as follows
30
A major advantage of the proposed logic style is that it can be incorporated by the common
Electronic Design Automation (EDA) tool flow
No special design rules are involved in the interconnection of WDDL gates
The switching factor of WDDL is 100 A WDDL gate consists of a parallel
combination of two positive complementary gates one calculating the
true output using the true inputs the other the false output using the
false inputs A positive gate produces a zero output for an all zero input
The AND gate and the OR gate are examples of positive gates A
complementary gate sometimes also referred to as a dual gate
expresses the false output of the original logic gate using the false
inputs of the original gate The AND gate fed with true input signals and
the OR gate fed with false input signals are two dual gates Fig shows
the WDDL AND gate and the WDDL OR gate In the evaluation phase
each input signal is differential and the WDDL gate calculates its
differential output In the pre-charge phase the inputs to the WDDL gate
are set at 0 This puts the output of the gate at 0 A module in WDDL
pre-charges without distributing the pre-charge signal to each individual
gate During the pre-charge phase the input vector of the combinatorial
logic is set at all 0s Each individual gate will eventually have all its
inputs at 0 evaluate its output to 0 and pass this 0 value to the next
gate One could say that the pre-charge signal travels over the
combinatorial logic as a 0-wave hence WDDL There are several ways
to launch to pre-charge wave In Fig a pre-charge operator is inserted
at the start of every combinatorial logic tree ie the inputs of the
encryption module and the outputs of the registers They produce an all-
zero output in the pre-charge phase (clk-signal high) but let the
31
differential signal through during the evaluation phase (clk-signal low)
Fig
ure WDDL Pre-charge wave generationCHAPTER 7
WDDL GATESThe methodology used in the project is bottom-up approach Lower
modules are designed and later integrated to form larger modules whose further integration
leads to the final top module As it is a fact that logic gates form lower level modules
initially logic gates required for the design are implemented in WDDL style WDDL
demands a parallel combination of two positive complementary gates one calculating the
true value and the other negative value The logic gates like OR AND XOR have been
implemented Besides there is even implementation of Full Adder 32-bit XOR
etc71WDDL OR gateA WDDL OR gate is constructed by considering conventional
OR gate in parallel to its complementary gate ie AND gate as shown in the following
32
figure Figure
41 WDDL OR GateThe NAND and NOT gates used act as Precharge operator injecting
signal lsquo0rsquo into the gates when lsquoclkrsquo signal is high ie during the Precharge phase72
WDDL AND gateA WDDL AND gate is constructed by considering conventional
AND gate in parallel to its complementary gate ie OR gate as shown in the following
33
figure Figure
42 WDDL AND Gate73 WDDL NAND gateA WDDL NAND gate is constructed by
considering conventional AND gate in parallel to its complementary gate ie OR gate as
shown in the following figure
34
Figure
43 WDDL NAND Gate74 WDDL NOR gateA WDDL NOR gate is constructed by
considering conventional OR gate in parallel to its complementary gate ie AND gate as
shown in the following figure
35
Figure 44 WDDL
NOR Gate 75 WDDL XOR gate XOR function can be implemented by the
Boolean Equation A B = ABrsquo + ArsquoB Therefore XOR Gate is implemented
in terms of AND gate and OR gate as shown in the figure 44 It can also be implemented
by instantiating a WDDL AND gate and WDDL OR gate But the number of gates
involved in the latter one is greater than the former one Therefore the first method of
implementation is followed rather than the second one
36
Figure 45
WDDL XOR gateWith the help of the above basic gates Full adder circuit has been
designed by instantiating the above designed WDDL gates During the implementation of
the Blowfish algorithm a 32-bit XOR gate and 32-bit Adder circuit are required They can
be easily implemented by instantiating the corresponding lower module 32 number of
timesCHAPTER 8 FRONT END
RESULTSWDDL OR GATESynthesis
Report==========================================================
= Final Report
===========================================================Final
ResultsRTL Top Level Output File Name wddlorngrTop Level Output File Name
wddlorOutput Format NGCOptimization Goal SpeedKeep
Hierarchy NODesign Statistics IOs 5Cell Usage
BELS 2 LUT3 2 IO Buffers 5
37
IBUF 3 OBUF
2============================================================Devi
ce utilization summary---------------------------Selected Device 3s250eft256-4 Number
of Slices 1 out of 2448 0 Number of 4 input LUTs 2
out of 4896 0 Number of IOs 5 Number of bonded IOBs
5 out of 172 2 Timing Summary---------------Speed Grade -4Maximum
combinational path delay 6236nsSimulation Result
S
ynthesis Result
38
WDD
L AND GATESynthesis
Report==========================================================
== Final Report
============================================================Fina
l ResultsRTL Top Level Output File Name wddlgatesngrTop Level Output File
Name wddlgatesOutput Format NGCOptimization Goal
SpeedKeep Hierarchy NODesign Statistics IOs
5Cell Usage BELS 2 LUT3 2 IO Buffers
5 IBUF 3 OBUF
2===========================================================Devic
e utilization summary---------------------------Selected Device 3s250etq144-4 Number
of Slices 1 out of 2448 0 Number of 4 input LUTs 2
out of 4896 0 Number of IOs 5 Number of bonded IOBs
5 out of 108 4 Timing Summary---------------Speed Grade -4Maximum
combinational path delay 6236nsSimulation Result
39
Sy
nthesis Result
WDDL NAND GATESynthesis
Report==========================================================
== Final Report
============================================================Fina
l ResultsRTL Top Level Output File Name wddlnand1ngrTop Level Output File
Name wddlnand1Output Format NGCOptimization Goal
SpeedKeep Hierarchy NODesign Statistics IOs
5Cell Usage BELS 2 LUT3 2 IO Buffers
40
5 IBUF 3 OBUF
2============================================================Devi
ce utilization summarySelected Device 3s500efg320-4 Number of Slices
1 out of 4656 0 Number of 4 input LUTs 2 out of 9312 0
Number of IOs 5 Number of bonded IOBs 5 out of 232
2 Timing SummarySpeed Grade -4Maximum combinational path delay
6236nsSimulation Result
Synthesis Result
WD
41
DL XOR GATESimulation Result
Synthesis Result
WDDL XOR GATESynthesis
Report==========================================================
== Final Report
===========================================================Final
42
ResultsRTL Top Level Output File Name wddlxorgatengrTop Level Output File
Name wddlxorgateOutput Format NGCOptimization Goal
SpeedKeep Hierarchy NODesign Statistics IOs
5Cell Usage BELS 2 LUT3 2 IO Buffers
5 IBUF 3 OBUF
2============================================================Devi
ce utilization summary---------------------------Selected Device 3s250eft256-4 Number
of Slices 1 out of 2448 0 Number of 4 input LUTs 2
out of 4896 0 Number of IOs 5 Number of bonded IOBs
5 out of 172 2 Timing Summary---------------Speed Grade -4Maximum
combinational path delay 6236nsSimulation Result
Synthesis Result
43
CHAPTER 9 SUMMARY AND CONCLUSION 91
SummaryIn order to provide security to ICs against side-channel attacks especially
Differential Power Analysis (DPA) it is necessary to implement the design in a logic that
can render constant power dissipation irrespective of the input combination WDDL is
proved to be advantageous to others and therefore is of great significance In this
dissertation work architecture for Blowfish Algorithm is designed and implemented in
WDDL style In this implementation bottom-up approach is used The low level entities
are designed and later they are all combined to form the entire module The key
scheduling is online The sub-keys generated for a particular key can be used for the
encryption of the entire data to be encrypted with that key The sub keys are given in
reverse direction for the decryption data path Initially logic gates are implemented in
WDDL and then higher modules have been designed by instantiating the WDDL gates to
form the entire module thus resulting in constant power dissipation irrespective of any
input data combination The entire design works in two phases namely Precharge phase and
Evaluation phase In the Precharge phase all the signals of the design are zeroed and
during the Evaluation phase the functionality of the design is achieved This sort of design
has been found simple and very effective in thwarting the side-channel attack namely
Differential Power analysis (DPA)92 ConclusionThe crypto processor has been
44
designed for the key size of 448 bits and plain text of 64 bits The code for the
implementation has been written in VHDL The functional verification has been done using
the Cadence (NC Launch) Simulation package and Synthesis using RTL Compiler The
Backend of the design is done using the SOC EncounterAccording to the specifications
desired functionality has been achieved In the output during the Evaluation phase there
has been same number of transitions thus resulting in constant power dissipation During
Synthesis it has been observed that a simple WDDL gate comprised many conventional
gates Therefore the area of the design has grown nearly three-fold when compared to the
design implemented in conventional CMOS logic at the cost of security incorporated into
the IC against Differential Power Analysis (DPA) Due to the constant power dissipation at
the output hacker cannot apply Differential Power Analysis (DPA) scheme to find the
secret key that is being used in the crypto-processor Thus security against DPA is
incorporated into the IC at hardware level by implementing the design in WDDL style
which is quite simple and effectiveCHAPTER 10
REFERENCES 101 Referred Technical papers[1] Kris Tiri Member
IEEE and Ingrid Verbauwhede Senior Member IEEE ldquoA Digital Design Flow for
Secure Integrated Circuitsrdquo IEEE Transaction on Computer-Aided Design of Integrated
Circuits and Systems Vol 25 No 7 July 2006[2] Prof Jean-Jacques Quisquater Math
RiZK laquo Side Channel Attacks raquo October 2002[3] Ross Anderson Mike Bond Jolyon
Clulow and Sergei Skorobogatov ldquoCrypto processors ndash A Surveyrdquo IEEE Proceedings[4]
Noohul Basheer Zain Ali James M Noras ldquoOptimal Data path Design for a Cryptographic
Processor The Blowfish Algorithmrdquo Malaysian Journal of Computer Science Vol 14 No
1 June 2001 pp 16-27[5] Dan Rinehimer Derek Wilson ldquoSummary of B Schneierrsquos
Description of New Variable Length Key 64-Bit Block Cipher (Blowfish)rdquo[6] Kris Tiri and
Ingrid Verbauwhede ldquoA Dynamic and Differential CMOS Logic Style to Resist Power and
Timing Attacks on Security ICrsquosrdquo[7] Discretix Technologies Limited ldquoIntroduction to Side
45
Channel Attacksrdquo[8] Kris Tiri Moonmoon Akmal and Ingrid Verbauwhede ldquoA Dynamic
and Differential Logic with Signal Independent Power Consumption to withstand
Differential Power Analysis on Smart CardsrdquoReference books[1] William Stallings
ldquoCryptography and Network Security Principles and Practicesrdquo Pearson Education
2003[2] Samir Palnitkar ldquoVerilog HDL A Guide to Digital Design and Synthesisrdquo
Prentice Hall Referred Web Pages[1] httpwwwschneiercomblowfishhtml[2]
httpwwwnistgov[3] httpwwwdiscretixcomPDFIntroduction20to20Side20Channel
20Attackspdf[4] httpwwwwipointpctdbenwojsp
IA=WO2005081085ampDISPLAY=CLAIMS
46
ndash Subprogram declarations and bodies
ndash Type and subtype declarations
ndash Deferred constants
ndash File declarations
Libraries
Collection of VHDL design units (database)
1 Packages
package declaration
package body
2 Entities (entity declaration)
3 Architectures (architecture body)
4 Configurations (configuration declarations)
Usually directory in UNIX file system
Can be also any other kind of database
Levels of Abstraction
VHDL supports many possible styles of design description which differ primarily in how
closely they relate to the HW
It is possible to describe a circuit in a number of ways
Structural-------
Dataflow ------- Higher level of abstraction
Behavioral -------
Structural VHDL description
Circuit is described in terms of its components
From a low-level description (eg transistor-level description) to a high level
description (eg block diagram)
For large circuits low-level descriptions quickly become impractical
Dataflow VHDL Description
Circuit is described in terms of how data moves through the system
16
In the dataflow style you describe how information flows between registers in the
system
The combinational logic is described at a relatively high level the placement and
operation of registers is specified quite precisely
The behavior of the system over the time is defined by registers
There are no build-in registers in VHDL-language
ndashEither lower level description
ndashor behavioral description of sequential elements is needed
The lower level register descriptions must be created or obtained
If there is no 3rd party models for registers =gt you must write the behavioral
description of registers
The behavioral description can be provided in the form of subprograms(functions or
procedures)
Behavioral VHDL Description
Circuit is described in terms of its operation over time
Representation might include eg state diagrams timing diagrams and algorithmic
descriptions
The concept of time may be expressed precisely using delays (eg A lt= B after 10 ns)
If no actual delays are used order of sequential operations is defined
17
In the lower levels of abstraction (eg RTL) synthesis tools ignore detailed timing
specifications
The actual timing results depend on implementation technology and efficiency of
synthesis tool
There are a few tools for behavioral synthesis
Concurrent Vs Sequential
Processes
Basic simulation concept in VHDL
VHDL description can always be broken up to interconnected processes
Quite similar to UNIX process
18
Process keyword in VHDL
Process statement is concurrent statement
Statements inside process statements are sequential statements
Process must contain either sensitivity list or wait statement(s) but NOT both
Sensitivity list or wait statement(s) contains signals which wakes process up
General Format
Process [(sensitivity list)]
process_declarative_part
begin
process_statements
[wait_statement]
End process
19
CHAPTER 4 SMART
CARD OVERVIEW
This section will very briefly introduce the concept of a smart card Basically a smart
card is a computer embedded in a safe It consists of a (typically 8-bit or 32-bit) processor
together with ROM EEPROM and a small amount of RAM which is therefore capable of
performing computations The main goal of a smart card is to allow the execution of
cryptographic operations involving some secret parameter (the key) while not revealing this
parameter to the outside world As opposed the goal of the attacker is to recover this secret
parameter This processor is embedded in a chip and connected to the outside world through
eight wires the role use position of which is normalized In addition to the inputoutput wires
the parts we will be the most interested in are the following
1 Power supply Smart cards do not have an internal battery
2 The current they need is provided by the smart card reader This will make the smart
cards power consumption pretty easy to measure for the attacker
3 Clock Similarly smart cards do not dispose of an internal clock either The clock ticks
must also be provided from the outside world As a consequence this will allow the
attacker to measure the cards running time with very good precision
Smart cards are usually equipped with protection mechanisms composed of a shield (the
passivation layer) whose goal is to hide the internal behavior of the chip and possibly sensors
that react when the shield is removed by destroying all sensitive data and preventing the card
to function properly
20
CHAPTER 5 SIDE
CHANNEL ATTACKS
ldquoSide channel attacksrdquo are attacks that are based on ldquoSide Channel Informationrdquo Side
channel information is information that can be retrieved from the encryption device that is
neither the plaintext to be encrypted nor the cipher text resulting from the encryption process
In the past an encryption device was perceived as a unit that receives plaintext input
and produces cipher text output and vice-versa Attacks were therefore based on either
knowing the cipher text (such as cipher text-only attacks) or knowing both (such as known
plaintext attacks) or on the ability to define what plaintext is to be encrypted and then seeing
the results of the encryption (known as chosen plaintext attacks) Today it is known that
encryption devices have additional output and often additional inputs which are not the
plaintext or the cipher text
Encryption devices produce timing information (information about the time that
operations take) that is easily measurable radiation of various sorts power consumption
statistics (that can be easily measured as well) and more Often the encryption device also has
additional ldquounintentionalrdquo inputs such as voltage that can be modified to cause predictable
outcomes Side channel attacks make use of some or all of this information along with other
(known) cryptanalytic techniques to recover the key the device is using
Side channel analysis techniques are of concern because the attacks can be mounted
quickly and can sometimes be implemented using readily available hardware costing from only
a few hundred dollars to thousands of dollars
51 Classification of side channel attacks
The literature usually classifies side channel attacks along two orthogonal axes
1 Invasive vs Non-invasive
21
Invasive attacks require de-packaging the chip to get direct access to its components
A typical example of this is the connection of a wire on a data bus to see the data transfers
A non-invasive attack only exploits externally available information (the emission of
which is however often unintentional) such as running time power consumption
A new distinction called semi-invasive attacks These attacks have the specificity that
they require de-packaging of the chip to get access to the chip surface but do not tamper with
the passivation layer ( they do not require electrical contact to the metal surface)
2 Active vs passive
Active attacks try to tamper with the cards proper functioning For example fault
induction attacks will try to induce errors in the computation
As opposed passive attacks will simply observe the cards behavior during its
processing without disturbing it
Note that these two axes are well orthogonal
An Invasive attack may completely avoid disturbing the cards behavior and a passive
attack may require a preliminary de-packaging for the required information to be observable
These attacks are of course not mutually exclusive an invasive attack may for example serve
as a preliminary step for a non-invasive one by giving a detailed description of the chips
architecture that helps to find out where to put external probes
As smart cards are usually equipped with protection mechanisms that are supposed to
react to invasive attacks (although several invasive attacks are nonetheless capable to defeat
these mechanisms as will be illustrated below) On the other hand it is worth pointing out that
a non-invasive attack is completely undetectable there is for example no way for a smart card
to figure out that its running time is currently being measured Other countermeasures will
therefore be necessary From an economical point of view invasive attacks are usually more
expensive to deploy on a large scale since they require individual processing of each attacked
device In this sense non-invasive attacks constitute therefore a bigger menace for the smart
card industry
Invasive attacks involved a relatively high capital investment for lab equipment plus a
moderate investment of effort for each individual chip attacked Non-invasive attacks require
only a moderate capital investment plus a moderate investment of effort in designing an attack
on a particular type of device Thereafter the cost per device attacked is low Semi-invasive
attacks can be carried out using very cheap and simple equipment
The attacker can gain information by
22
1 Probing attacks
2 Fault induction attacks
3 Timing attacks
4 Power analysis attacks and
5 Electromagnetic timing attacks
These attacks are performed during the switching behavior of digital
complementary metalndashoxidendashsemiconductor (CMOS) gates Of all these Power analysis attack
is of major concern
52 Power analysis attacks
The power consumption of a cryptographic device may provide much information
about the operations that take place and the involved parameters This is the idea of simple and
differential power analysis first introduced by Kocher et al As the clock ticks the cards
energy is also provided by the terminal and can therefore easily be measured Basically to
measure a circuits power consumption a small (eg 50 ohm) resistor is inserted in series with
the power or ground input The voltage difference across the resistor divided by the resistance
yields the current Well-equipped electronics labs have equipment that can digitally sample
voltage differences at extraordinarily high rates (over 1GHz) with excellent accuracy (less than
1 error) Devices capable of sampling at 20MHz or faster and transferring the data to a PC
can be bought for less than US$ 400
Power analysis attacks are of two types
1 Simple power analysis attack and
2 Differential Power Analysis attack
SPA attacks on smartcards typically take a few seconds per card while DPA attacks
can take several hours In a general with a somewhat academic perspective we may consider
the entire internal state of the block cipher to be all the intermediate results and values that are
never included in the output in normal operations For example DES has 16 rounds we can
consider the intermediate states state [115] after each round except the last as a secret internal
state Side channels typically give information about these internal states or about the
operations used in the transition of this internal state from one round to another The type of
side-channel will of course determine what information is available to the attacker about these
states The attacks typically work by finding some information about the internal state of the
cipher which can be learned both by guessing part of the key and checking the value directly
23
and additionally by some statistical property of the cipher that makes that checkable value
slightly nonrandom
521 Simple Power Analysis attack (SPA)
Simple Power Analysis is generally based on looking at the visual representation of the
power consumption of a unit while an encryption operation is being performed Simple Power
Analysis is a technique that involves direct interpretation of power consumption measurements
collected during cryptographic operations SPA can yield information about a devices
operation as well as key material
A trace refers to a set of power consumption measurements taken across a
cryptographic operation For example a 1 millisecond operation sampled at 5 MHz yields a
trace containing 5000 points Figure for example shows an SPA trace from a smart card
performing a DES operation
Figure SPA monitoring from a single DES operation performed by a typical smart card The
upper trace shows the entire encryption operation including the initial permutation the 16
DES rounds and the final permutation The lower trace is a detailed view of the second and
third rounds
Because SPA can reveal the sequence of instructions executed it can be used to break
cryptographic implementations in which the execution path depends on the data being
processed For example
DES key schedule the DES key schedule computation involves rotating 28-bit key registers
A conditional branch is commonly used to check the bit shifted off the end so that ldquo1 bits can
24
be wrapped around The resulting power consumption traces for a ldquo1 bit and a ldquo0 bit will
contain different SPA features if the execution paths take different branches for each
DES permutations DES implementations perform a variety of bit permutations Conditional
branching in software or microcode can cause significant power consumption differences for
ldquo0 and ldquo1 bits
Comparisons String or memory comparison operations typically perform a conditional
branch when a mismatch is found This conditional branching causes large SPA (and
sometimes timing) characteristics
Multipliers Modular multiplication circuits tend to leak a great deal of information about the
data they process The leakage functions depend on the multiplier design but are often strongly
correlated to operand values and Hamming weights
Exponentiators A simple modular exponentiation function scans across the exponent
performing a squaring operation in every iteration with an additional multiplication operation
for each exponent bit that is equal to ldquo1 The exponent can be compromised if squaring and
multiplication operations have different power consumption characteristics take different
amounts of time or are separated by different code Modular exponentiation functions that
operate on two or more exponent bits at a time may have more complex leakage functions
522Differential Power Analysis attack (DPA)
In addition to large-scale power variations due to the instruction sequence there are
effects correlated to data values being manipulated These variations tend to be smaller and are
sometimes overshadowed by measurement errors and other noise In such cases it is still often
possible to break the system using statistical functions tailored to the target algorithm
To implement the DPA attack an attacker first observes m encryption operations and captures
power traces T1 m [1 k] containing k samples each In addition the attacker records the
cipher text C1 m No knowledge of the plain text is required DPA analysis uses power
consumption measurements to determine whether a key block guess Ks is correct The attacker
computes a k-sample differential trace centD [1 k] by finding the difference between the
average of the traces for which a certain intermediate value V is one and the average of the
traces for which V is zero Thus cent D[j) is the average over C1m of the effect due to the value
represented by the selection function D on the power consumption at point j In particular25
If Ks is incorrect the bit computed using D will differ from the actual target bit for about half
of the ciphertext Ci The selection function is thus effectively uncorrelated to what was actually
computed by the target device If a random function is used to divide a set into two subsets the
difference in the averages of the subsets should approach zero as the subset sizes approach
infinity
Thus because trace components uncorrelated to D will diminish with 1 pm causing the
differential trace to become at (the actual trace may not be completely at as D with Ks
incorrect may have a weak correlation to D with the correct Ks) If Ks is correct however the
computed value for D (Ci bKs) will equal the actual value of target bit b with probability 1
The selection function is thus correlated to the value of the bit considered Other data values
measurement errors etc that are not correlated to D approach zero Because power
consumption is correlated to data bit values the plot of centD will be degat with spikes in regions
where D is correlated to the values being processed The correct value of Ks can thus be
identified from the spikes in its differential trace Four values of b correspond to each S box
providing confirmation of key block guesses Finding all eight Ks yields the entire 48-bit round
sub key The remaining 8 key bits can be found easily using exhaustive search or by analyzing
one additional round Triple DES keys can be found by analyzing an outer DES operation first
using the resulting key to decrypt the cipher text and attacking the next DES key DPA can use
known plaintext or known cipher text and can find encryption or decryption keys
26
CHAPTER 6 CONSTANT POWER CONSUMING
LOGIC STYLES
The power consumption of traditional standard cells and logic is
dependent on the signal activity When the output of the logic gate makes
a 0 to 1 transition a current comes from the power supply and charges the
output capacitance On the other hand when the output sees a 1 to 0 a 0
to 0 or a 1 to 1 transition no or only a limited amount of energy (due to
short circuit or leakage) is consumed from the power supply This is the
fundamental reason why information is leaked through the power supply
and why power attacks are possible The basis of a secure digital design
flow is a logic style with constant power consumption
61 Current Mode Logic
Current mode logic (CML) eg current steering logic seems the
ideal solution This type of logic continuously draws a current from the
supply and measures its state through the path that the current takes A
gate has constant power consumption if it draws a perfectly constant
current from the power supply independently of the input and output
signals To build a current source capable of generating a constant current
special circuit techniques that minimize channel length modulation have to
be used
The decisive drawback of CML however is its static power
consumption When the logic gate is not processing any data it burns the
27
current which makes this logic style unacceptable for embedded battery-
operated devices
62 Voltage Mode Logic (CMOS circuit styles)
Voltage mode logic (VML) eg static CMOS logic only draws a current from the
supply to change state and measures its state by the amount of charge it stores on a
capacitance A regular standard CMOS circuit will only consume power when a capacitance
gets charged and later discharged ie when a gate switches state It is the main reason that
CMOS is the style of choice for every battery operated or low power device This is illustrated
in the figure below for simple inverter Thus static CMOS is the preferred logic style because
of its low power consumption and high noise margins
Standard CMOS inverter
Yet two conditions must be satisfied for VML to have constant power consumption
namely
1) A logic gate must have exactly one switching event per signal transition
2) The logic gate must charge a constant capacitance in that switching event
28
Here above all the four transitions of CMOS inverter can be distinguished when
monitoring the power supply
63 Dynamic Differential Logic
Dynamic differential logic sometimes also referred to as dual rail with pre-charge
logic fulfills the first condition A differential logic family uses the true and the false
representation of the input and output signals and a dynamic logic family alternates pre-charge
and evaluation phases As a result since both outputs (true and false) are pre-charged to 1
exactly one of the two output nodes evaluates to 0 to have a differential output signal in the
evaluation phase The discharged output node is charged to 1 in the following pre-charge phase
to pre-charge both outputs to 1 In other words every signal transition including the events in
which the input signals remain constant is represented with an actual switching event in
which the logic gate charges a capacitance All the logic families that have been introduced to
thwart the differential power analysis (DPA) by using dynamic differential logic in the
following techniques
1 Sense Amplifier Based Logic (SABL) and
2 Wave Dynamic Differential Logic (WDDL) gates
631 Sense Amplifier Based logic (SABL)
SABL has its main advantage that it has balanced input and output nodes and that all
internal nodes connect to an output The output capacitances can be balanced Systematic
methods have been developed to make sure that both branches of the differential pull down
network are balanced and that no memory effects are present in the network Sense Amplifier
Based logic is illustrated as
29
Sense Amplifier Based Logic
ANDNAND gate
This circuit style does require however a full custom characterization and layout It also
suffers from a high clock load common to all dynamic logic gates
632 Wave Dynamic Differential Logic Gates (WDDL)
WDDL logic can be implemented with static CMOS logic Static CMOS
standard cells are combined to form secure compound standard cells
which have a reduced power signature WDDL has many advantages It can
be readily implemented from an existing standard cell library The design
flow is fully supported with accurate EDA library files that come directly
from the vendor WDDL also results in a dynamic differential logic with only
a small load capacitance on the pre-charge control signal and with the low
power consumption and the high noise margins of static CMOS
Advantages of WDDL logic style are as follows
30
A major advantage of the proposed logic style is that it can be incorporated by the common
Electronic Design Automation (EDA) tool flow
No special design rules are involved in the interconnection of WDDL gates
The switching factor of WDDL is 100 A WDDL gate consists of a parallel
combination of two positive complementary gates one calculating the
true output using the true inputs the other the false output using the
false inputs A positive gate produces a zero output for an all zero input
The AND gate and the OR gate are examples of positive gates A
complementary gate sometimes also referred to as a dual gate
expresses the false output of the original logic gate using the false
inputs of the original gate The AND gate fed with true input signals and
the OR gate fed with false input signals are two dual gates Fig shows
the WDDL AND gate and the WDDL OR gate In the evaluation phase
each input signal is differential and the WDDL gate calculates its
differential output In the pre-charge phase the inputs to the WDDL gate
are set at 0 This puts the output of the gate at 0 A module in WDDL
pre-charges without distributing the pre-charge signal to each individual
gate During the pre-charge phase the input vector of the combinatorial
logic is set at all 0s Each individual gate will eventually have all its
inputs at 0 evaluate its output to 0 and pass this 0 value to the next
gate One could say that the pre-charge signal travels over the
combinatorial logic as a 0-wave hence WDDL There are several ways
to launch to pre-charge wave In Fig a pre-charge operator is inserted
at the start of every combinatorial logic tree ie the inputs of the
encryption module and the outputs of the registers They produce an all-
zero output in the pre-charge phase (clk-signal high) but let the
31
differential signal through during the evaluation phase (clk-signal low)
Fig
ure WDDL Pre-charge wave generationCHAPTER 7
WDDL GATESThe methodology used in the project is bottom-up approach Lower
modules are designed and later integrated to form larger modules whose further integration
leads to the final top module As it is a fact that logic gates form lower level modules
initially logic gates required for the design are implemented in WDDL style WDDL
demands a parallel combination of two positive complementary gates one calculating the
true value and the other negative value The logic gates like OR AND XOR have been
implemented Besides there is even implementation of Full Adder 32-bit XOR
etc71WDDL OR gateA WDDL OR gate is constructed by considering conventional
OR gate in parallel to its complementary gate ie AND gate as shown in the following
32
figure Figure
41 WDDL OR GateThe NAND and NOT gates used act as Precharge operator injecting
signal lsquo0rsquo into the gates when lsquoclkrsquo signal is high ie during the Precharge phase72
WDDL AND gateA WDDL AND gate is constructed by considering conventional
AND gate in parallel to its complementary gate ie OR gate as shown in the following
33
figure Figure
42 WDDL AND Gate73 WDDL NAND gateA WDDL NAND gate is constructed by
considering conventional AND gate in parallel to its complementary gate ie OR gate as
shown in the following figure
34
Figure
43 WDDL NAND Gate74 WDDL NOR gateA WDDL NOR gate is constructed by
considering conventional OR gate in parallel to its complementary gate ie AND gate as
shown in the following figure
35
Figure 44 WDDL
NOR Gate 75 WDDL XOR gate XOR function can be implemented by the
Boolean Equation A B = ABrsquo + ArsquoB Therefore XOR Gate is implemented
in terms of AND gate and OR gate as shown in the figure 44 It can also be implemented
by instantiating a WDDL AND gate and WDDL OR gate But the number of gates
involved in the latter one is greater than the former one Therefore the first method of
implementation is followed rather than the second one
36
Figure 45
WDDL XOR gateWith the help of the above basic gates Full adder circuit has been
designed by instantiating the above designed WDDL gates During the implementation of
the Blowfish algorithm a 32-bit XOR gate and 32-bit Adder circuit are required They can
be easily implemented by instantiating the corresponding lower module 32 number of
timesCHAPTER 8 FRONT END
RESULTSWDDL OR GATESynthesis
Report==========================================================
= Final Report
===========================================================Final
ResultsRTL Top Level Output File Name wddlorngrTop Level Output File Name
wddlorOutput Format NGCOptimization Goal SpeedKeep
Hierarchy NODesign Statistics IOs 5Cell Usage
BELS 2 LUT3 2 IO Buffers 5
37
IBUF 3 OBUF
2============================================================Devi
ce utilization summary---------------------------Selected Device 3s250eft256-4 Number
of Slices 1 out of 2448 0 Number of 4 input LUTs 2
out of 4896 0 Number of IOs 5 Number of bonded IOBs
5 out of 172 2 Timing Summary---------------Speed Grade -4Maximum
combinational path delay 6236nsSimulation Result
S
ynthesis Result
38
WDD
L AND GATESynthesis
Report==========================================================
== Final Report
============================================================Fina
l ResultsRTL Top Level Output File Name wddlgatesngrTop Level Output File
Name wddlgatesOutput Format NGCOptimization Goal
SpeedKeep Hierarchy NODesign Statistics IOs
5Cell Usage BELS 2 LUT3 2 IO Buffers
5 IBUF 3 OBUF
2===========================================================Devic
e utilization summary---------------------------Selected Device 3s250etq144-4 Number
of Slices 1 out of 2448 0 Number of 4 input LUTs 2
out of 4896 0 Number of IOs 5 Number of bonded IOBs
5 out of 108 4 Timing Summary---------------Speed Grade -4Maximum
combinational path delay 6236nsSimulation Result
39
Sy
nthesis Result
WDDL NAND GATESynthesis
Report==========================================================
== Final Report
============================================================Fina
l ResultsRTL Top Level Output File Name wddlnand1ngrTop Level Output File
Name wddlnand1Output Format NGCOptimization Goal
SpeedKeep Hierarchy NODesign Statistics IOs
5Cell Usage BELS 2 LUT3 2 IO Buffers
40
5 IBUF 3 OBUF
2============================================================Devi
ce utilization summarySelected Device 3s500efg320-4 Number of Slices
1 out of 4656 0 Number of 4 input LUTs 2 out of 9312 0
Number of IOs 5 Number of bonded IOBs 5 out of 232
2 Timing SummarySpeed Grade -4Maximum combinational path delay
6236nsSimulation Result
Synthesis Result
WD
41
DL XOR GATESimulation Result
Synthesis Result
WDDL XOR GATESynthesis
Report==========================================================
== Final Report
===========================================================Final
42
ResultsRTL Top Level Output File Name wddlxorgatengrTop Level Output File
Name wddlxorgateOutput Format NGCOptimization Goal
SpeedKeep Hierarchy NODesign Statistics IOs
5Cell Usage BELS 2 LUT3 2 IO Buffers
5 IBUF 3 OBUF
2============================================================Devi
ce utilization summary---------------------------Selected Device 3s250eft256-4 Number
of Slices 1 out of 2448 0 Number of 4 input LUTs 2
out of 4896 0 Number of IOs 5 Number of bonded IOBs
5 out of 172 2 Timing Summary---------------Speed Grade -4Maximum
combinational path delay 6236nsSimulation Result
Synthesis Result
43
CHAPTER 9 SUMMARY AND CONCLUSION 91
SummaryIn order to provide security to ICs against side-channel attacks especially
Differential Power Analysis (DPA) it is necessary to implement the design in a logic that
can render constant power dissipation irrespective of the input combination WDDL is
proved to be advantageous to others and therefore is of great significance In this
dissertation work architecture for Blowfish Algorithm is designed and implemented in
WDDL style In this implementation bottom-up approach is used The low level entities
are designed and later they are all combined to form the entire module The key
scheduling is online The sub-keys generated for a particular key can be used for the
encryption of the entire data to be encrypted with that key The sub keys are given in
reverse direction for the decryption data path Initially logic gates are implemented in
WDDL and then higher modules have been designed by instantiating the WDDL gates to
form the entire module thus resulting in constant power dissipation irrespective of any
input data combination The entire design works in two phases namely Precharge phase and
Evaluation phase In the Precharge phase all the signals of the design are zeroed and
during the Evaluation phase the functionality of the design is achieved This sort of design
has been found simple and very effective in thwarting the side-channel attack namely
Differential Power analysis (DPA)92 ConclusionThe crypto processor has been
44
designed for the key size of 448 bits and plain text of 64 bits The code for the
implementation has been written in VHDL The functional verification has been done using
the Cadence (NC Launch) Simulation package and Synthesis using RTL Compiler The
Backend of the design is done using the SOC EncounterAccording to the specifications
desired functionality has been achieved In the output during the Evaluation phase there
has been same number of transitions thus resulting in constant power dissipation During
Synthesis it has been observed that a simple WDDL gate comprised many conventional
gates Therefore the area of the design has grown nearly three-fold when compared to the
design implemented in conventional CMOS logic at the cost of security incorporated into
the IC against Differential Power Analysis (DPA) Due to the constant power dissipation at
the output hacker cannot apply Differential Power Analysis (DPA) scheme to find the
secret key that is being used in the crypto-processor Thus security against DPA is
incorporated into the IC at hardware level by implementing the design in WDDL style
which is quite simple and effectiveCHAPTER 10
REFERENCES 101 Referred Technical papers[1] Kris Tiri Member
IEEE and Ingrid Verbauwhede Senior Member IEEE ldquoA Digital Design Flow for
Secure Integrated Circuitsrdquo IEEE Transaction on Computer-Aided Design of Integrated
Circuits and Systems Vol 25 No 7 July 2006[2] Prof Jean-Jacques Quisquater Math
RiZK laquo Side Channel Attacks raquo October 2002[3] Ross Anderson Mike Bond Jolyon
Clulow and Sergei Skorobogatov ldquoCrypto processors ndash A Surveyrdquo IEEE Proceedings[4]
Noohul Basheer Zain Ali James M Noras ldquoOptimal Data path Design for a Cryptographic
Processor The Blowfish Algorithmrdquo Malaysian Journal of Computer Science Vol 14 No
1 June 2001 pp 16-27[5] Dan Rinehimer Derek Wilson ldquoSummary of B Schneierrsquos
Description of New Variable Length Key 64-Bit Block Cipher (Blowfish)rdquo[6] Kris Tiri and
Ingrid Verbauwhede ldquoA Dynamic and Differential CMOS Logic Style to Resist Power and
Timing Attacks on Security ICrsquosrdquo[7] Discretix Technologies Limited ldquoIntroduction to Side
45
Channel Attacksrdquo[8] Kris Tiri Moonmoon Akmal and Ingrid Verbauwhede ldquoA Dynamic
and Differential Logic with Signal Independent Power Consumption to withstand
Differential Power Analysis on Smart CardsrdquoReference books[1] William Stallings
ldquoCryptography and Network Security Principles and Practicesrdquo Pearson Education
2003[2] Samir Palnitkar ldquoVerilog HDL A Guide to Digital Design and Synthesisrdquo
Prentice Hall Referred Web Pages[1] httpwwwschneiercomblowfishhtml[2]
httpwwwnistgov[3] httpwwwdiscretixcomPDFIntroduction20to20Side20Channel
20Attackspdf[4] httpwwwwipointpctdbenwojsp
IA=WO2005081085ampDISPLAY=CLAIMS
46
In the dataflow style you describe how information flows between registers in the
system
The combinational logic is described at a relatively high level the placement and
operation of registers is specified quite precisely
The behavior of the system over the time is defined by registers
There are no build-in registers in VHDL-language
ndashEither lower level description
ndashor behavioral description of sequential elements is needed
The lower level register descriptions must be created or obtained
If there is no 3rd party models for registers =gt you must write the behavioral
description of registers
The behavioral description can be provided in the form of subprograms(functions or
procedures)
Behavioral VHDL Description
Circuit is described in terms of its operation over time
Representation might include eg state diagrams timing diagrams and algorithmic
descriptions
The concept of time may be expressed precisely using delays (eg A lt= B after 10 ns)
If no actual delays are used order of sequential operations is defined
17
In the lower levels of abstraction (eg RTL) synthesis tools ignore detailed timing
specifications
The actual timing results depend on implementation technology and efficiency of
synthesis tool
There are a few tools for behavioral synthesis
Concurrent Vs Sequential
Processes
Basic simulation concept in VHDL
VHDL description can always be broken up to interconnected processes
Quite similar to UNIX process
18
Process keyword in VHDL
Process statement is concurrent statement
Statements inside process statements are sequential statements
Process must contain either sensitivity list or wait statement(s) but NOT both
Sensitivity list or wait statement(s) contains signals which wakes process up
General Format
Process [(sensitivity list)]
process_declarative_part
begin
process_statements
[wait_statement]
End process
19
CHAPTER 4 SMART
CARD OVERVIEW
This section will very briefly introduce the concept of a smart card Basically a smart
card is a computer embedded in a safe It consists of a (typically 8-bit or 32-bit) processor
together with ROM EEPROM and a small amount of RAM which is therefore capable of
performing computations The main goal of a smart card is to allow the execution of
cryptographic operations involving some secret parameter (the key) while not revealing this
parameter to the outside world As opposed the goal of the attacker is to recover this secret
parameter This processor is embedded in a chip and connected to the outside world through
eight wires the role use position of which is normalized In addition to the inputoutput wires
the parts we will be the most interested in are the following
1 Power supply Smart cards do not have an internal battery
2 The current they need is provided by the smart card reader This will make the smart
cards power consumption pretty easy to measure for the attacker
3 Clock Similarly smart cards do not dispose of an internal clock either The clock ticks
must also be provided from the outside world As a consequence this will allow the
attacker to measure the cards running time with very good precision
Smart cards are usually equipped with protection mechanisms composed of a shield (the
passivation layer) whose goal is to hide the internal behavior of the chip and possibly sensors
that react when the shield is removed by destroying all sensitive data and preventing the card
to function properly
20
CHAPTER 5 SIDE
CHANNEL ATTACKS
ldquoSide channel attacksrdquo are attacks that are based on ldquoSide Channel Informationrdquo Side
channel information is information that can be retrieved from the encryption device that is
neither the plaintext to be encrypted nor the cipher text resulting from the encryption process
In the past an encryption device was perceived as a unit that receives plaintext input
and produces cipher text output and vice-versa Attacks were therefore based on either
knowing the cipher text (such as cipher text-only attacks) or knowing both (such as known
plaintext attacks) or on the ability to define what plaintext is to be encrypted and then seeing
the results of the encryption (known as chosen plaintext attacks) Today it is known that
encryption devices have additional output and often additional inputs which are not the
plaintext or the cipher text
Encryption devices produce timing information (information about the time that
operations take) that is easily measurable radiation of various sorts power consumption
statistics (that can be easily measured as well) and more Often the encryption device also has
additional ldquounintentionalrdquo inputs such as voltage that can be modified to cause predictable
outcomes Side channel attacks make use of some or all of this information along with other
(known) cryptanalytic techniques to recover the key the device is using
Side channel analysis techniques are of concern because the attacks can be mounted
quickly and can sometimes be implemented using readily available hardware costing from only
a few hundred dollars to thousands of dollars
51 Classification of side channel attacks
The literature usually classifies side channel attacks along two orthogonal axes
1 Invasive vs Non-invasive
21
Invasive attacks require de-packaging the chip to get direct access to its components
A typical example of this is the connection of a wire on a data bus to see the data transfers
A non-invasive attack only exploits externally available information (the emission of
which is however often unintentional) such as running time power consumption
A new distinction called semi-invasive attacks These attacks have the specificity that
they require de-packaging of the chip to get access to the chip surface but do not tamper with
the passivation layer ( they do not require electrical contact to the metal surface)
2 Active vs passive
Active attacks try to tamper with the cards proper functioning For example fault
induction attacks will try to induce errors in the computation
As opposed passive attacks will simply observe the cards behavior during its
processing without disturbing it
Note that these two axes are well orthogonal
An Invasive attack may completely avoid disturbing the cards behavior and a passive
attack may require a preliminary de-packaging for the required information to be observable
These attacks are of course not mutually exclusive an invasive attack may for example serve
as a preliminary step for a non-invasive one by giving a detailed description of the chips
architecture that helps to find out where to put external probes
As smart cards are usually equipped with protection mechanisms that are supposed to
react to invasive attacks (although several invasive attacks are nonetheless capable to defeat
these mechanisms as will be illustrated below) On the other hand it is worth pointing out that
a non-invasive attack is completely undetectable there is for example no way for a smart card
to figure out that its running time is currently being measured Other countermeasures will
therefore be necessary From an economical point of view invasive attacks are usually more
expensive to deploy on a large scale since they require individual processing of each attacked
device In this sense non-invasive attacks constitute therefore a bigger menace for the smart
card industry
Invasive attacks involved a relatively high capital investment for lab equipment plus a
moderate investment of effort for each individual chip attacked Non-invasive attacks require
only a moderate capital investment plus a moderate investment of effort in designing an attack
on a particular type of device Thereafter the cost per device attacked is low Semi-invasive
attacks can be carried out using very cheap and simple equipment
The attacker can gain information by
22
1 Probing attacks
2 Fault induction attacks
3 Timing attacks
4 Power analysis attacks and
5 Electromagnetic timing attacks
These attacks are performed during the switching behavior of digital
complementary metalndashoxidendashsemiconductor (CMOS) gates Of all these Power analysis attack
is of major concern
52 Power analysis attacks
The power consumption of a cryptographic device may provide much information
about the operations that take place and the involved parameters This is the idea of simple and
differential power analysis first introduced by Kocher et al As the clock ticks the cards
energy is also provided by the terminal and can therefore easily be measured Basically to
measure a circuits power consumption a small (eg 50 ohm) resistor is inserted in series with
the power or ground input The voltage difference across the resistor divided by the resistance
yields the current Well-equipped electronics labs have equipment that can digitally sample
voltage differences at extraordinarily high rates (over 1GHz) with excellent accuracy (less than
1 error) Devices capable of sampling at 20MHz or faster and transferring the data to a PC
can be bought for less than US$ 400
Power analysis attacks are of two types
1 Simple power analysis attack and
2 Differential Power Analysis attack
SPA attacks on smartcards typically take a few seconds per card while DPA attacks
can take several hours In a general with a somewhat academic perspective we may consider
the entire internal state of the block cipher to be all the intermediate results and values that are
never included in the output in normal operations For example DES has 16 rounds we can
consider the intermediate states state [115] after each round except the last as a secret internal
state Side channels typically give information about these internal states or about the
operations used in the transition of this internal state from one round to another The type of
side-channel will of course determine what information is available to the attacker about these
states The attacks typically work by finding some information about the internal state of the
cipher which can be learned both by guessing part of the key and checking the value directly
23
and additionally by some statistical property of the cipher that makes that checkable value
slightly nonrandom
521 Simple Power Analysis attack (SPA)
Simple Power Analysis is generally based on looking at the visual representation of the
power consumption of a unit while an encryption operation is being performed Simple Power
Analysis is a technique that involves direct interpretation of power consumption measurements
collected during cryptographic operations SPA can yield information about a devices
operation as well as key material
A trace refers to a set of power consumption measurements taken across a
cryptographic operation For example a 1 millisecond operation sampled at 5 MHz yields a
trace containing 5000 points Figure for example shows an SPA trace from a smart card
performing a DES operation
Figure SPA monitoring from a single DES operation performed by a typical smart card The
upper trace shows the entire encryption operation including the initial permutation the 16
DES rounds and the final permutation The lower trace is a detailed view of the second and
third rounds
Because SPA can reveal the sequence of instructions executed it can be used to break
cryptographic implementations in which the execution path depends on the data being
processed For example
DES key schedule the DES key schedule computation involves rotating 28-bit key registers
A conditional branch is commonly used to check the bit shifted off the end so that ldquo1 bits can
24
be wrapped around The resulting power consumption traces for a ldquo1 bit and a ldquo0 bit will
contain different SPA features if the execution paths take different branches for each
DES permutations DES implementations perform a variety of bit permutations Conditional
branching in software or microcode can cause significant power consumption differences for
ldquo0 and ldquo1 bits
Comparisons String or memory comparison operations typically perform a conditional
branch when a mismatch is found This conditional branching causes large SPA (and
sometimes timing) characteristics
Multipliers Modular multiplication circuits tend to leak a great deal of information about the
data they process The leakage functions depend on the multiplier design but are often strongly
correlated to operand values and Hamming weights
Exponentiators A simple modular exponentiation function scans across the exponent
performing a squaring operation in every iteration with an additional multiplication operation
for each exponent bit that is equal to ldquo1 The exponent can be compromised if squaring and
multiplication operations have different power consumption characteristics take different
amounts of time or are separated by different code Modular exponentiation functions that
operate on two or more exponent bits at a time may have more complex leakage functions
522Differential Power Analysis attack (DPA)
In addition to large-scale power variations due to the instruction sequence there are
effects correlated to data values being manipulated These variations tend to be smaller and are
sometimes overshadowed by measurement errors and other noise In such cases it is still often
possible to break the system using statistical functions tailored to the target algorithm
To implement the DPA attack an attacker first observes m encryption operations and captures
power traces T1 m [1 k] containing k samples each In addition the attacker records the
cipher text C1 m No knowledge of the plain text is required DPA analysis uses power
consumption measurements to determine whether a key block guess Ks is correct The attacker
computes a k-sample differential trace centD [1 k] by finding the difference between the
average of the traces for which a certain intermediate value V is one and the average of the
traces for which V is zero Thus cent D[j) is the average over C1m of the effect due to the value
represented by the selection function D on the power consumption at point j In particular25
If Ks is incorrect the bit computed using D will differ from the actual target bit for about half
of the ciphertext Ci The selection function is thus effectively uncorrelated to what was actually
computed by the target device If a random function is used to divide a set into two subsets the
difference in the averages of the subsets should approach zero as the subset sizes approach
infinity
Thus because trace components uncorrelated to D will diminish with 1 pm causing the
differential trace to become at (the actual trace may not be completely at as D with Ks
incorrect may have a weak correlation to D with the correct Ks) If Ks is correct however the
computed value for D (Ci bKs) will equal the actual value of target bit b with probability 1
The selection function is thus correlated to the value of the bit considered Other data values
measurement errors etc that are not correlated to D approach zero Because power
consumption is correlated to data bit values the plot of centD will be degat with spikes in regions
where D is correlated to the values being processed The correct value of Ks can thus be
identified from the spikes in its differential trace Four values of b correspond to each S box
providing confirmation of key block guesses Finding all eight Ks yields the entire 48-bit round
sub key The remaining 8 key bits can be found easily using exhaustive search or by analyzing
one additional round Triple DES keys can be found by analyzing an outer DES operation first
using the resulting key to decrypt the cipher text and attacking the next DES key DPA can use
known plaintext or known cipher text and can find encryption or decryption keys
26
CHAPTER 6 CONSTANT POWER CONSUMING
LOGIC STYLES
The power consumption of traditional standard cells and logic is
dependent on the signal activity When the output of the logic gate makes
a 0 to 1 transition a current comes from the power supply and charges the
output capacitance On the other hand when the output sees a 1 to 0 a 0
to 0 or a 1 to 1 transition no or only a limited amount of energy (due to
short circuit or leakage) is consumed from the power supply This is the
fundamental reason why information is leaked through the power supply
and why power attacks are possible The basis of a secure digital design
flow is a logic style with constant power consumption
61 Current Mode Logic
Current mode logic (CML) eg current steering logic seems the
ideal solution This type of logic continuously draws a current from the
supply and measures its state through the path that the current takes A
gate has constant power consumption if it draws a perfectly constant
current from the power supply independently of the input and output
signals To build a current source capable of generating a constant current
special circuit techniques that minimize channel length modulation have to
be used
The decisive drawback of CML however is its static power
consumption When the logic gate is not processing any data it burns the
27
current which makes this logic style unacceptable for embedded battery-
operated devices
62 Voltage Mode Logic (CMOS circuit styles)
Voltage mode logic (VML) eg static CMOS logic only draws a current from the
supply to change state and measures its state by the amount of charge it stores on a
capacitance A regular standard CMOS circuit will only consume power when a capacitance
gets charged and later discharged ie when a gate switches state It is the main reason that
CMOS is the style of choice for every battery operated or low power device This is illustrated
in the figure below for simple inverter Thus static CMOS is the preferred logic style because
of its low power consumption and high noise margins
Standard CMOS inverter
Yet two conditions must be satisfied for VML to have constant power consumption
namely
1) A logic gate must have exactly one switching event per signal transition
2) The logic gate must charge a constant capacitance in that switching event
28
Here above all the four transitions of CMOS inverter can be distinguished when
monitoring the power supply
63 Dynamic Differential Logic
Dynamic differential logic sometimes also referred to as dual rail with pre-charge
logic fulfills the first condition A differential logic family uses the true and the false
representation of the input and output signals and a dynamic logic family alternates pre-charge
and evaluation phases As a result since both outputs (true and false) are pre-charged to 1
exactly one of the two output nodes evaluates to 0 to have a differential output signal in the
evaluation phase The discharged output node is charged to 1 in the following pre-charge phase
to pre-charge both outputs to 1 In other words every signal transition including the events in
which the input signals remain constant is represented with an actual switching event in
which the logic gate charges a capacitance All the logic families that have been introduced to
thwart the differential power analysis (DPA) by using dynamic differential logic in the
following techniques
1 Sense Amplifier Based Logic (SABL) and
2 Wave Dynamic Differential Logic (WDDL) gates
631 Sense Amplifier Based logic (SABL)
SABL has its main advantage that it has balanced input and output nodes and that all
internal nodes connect to an output The output capacitances can be balanced Systematic
methods have been developed to make sure that both branches of the differential pull down
network are balanced and that no memory effects are present in the network Sense Amplifier
Based logic is illustrated as
29
Sense Amplifier Based Logic
ANDNAND gate
This circuit style does require however a full custom characterization and layout It also
suffers from a high clock load common to all dynamic logic gates
632 Wave Dynamic Differential Logic Gates (WDDL)
WDDL logic can be implemented with static CMOS logic Static CMOS
standard cells are combined to form secure compound standard cells
which have a reduced power signature WDDL has many advantages It can
be readily implemented from an existing standard cell library The design
flow is fully supported with accurate EDA library files that come directly
from the vendor WDDL also results in a dynamic differential logic with only
a small load capacitance on the pre-charge control signal and with the low
power consumption and the high noise margins of static CMOS
Advantages of WDDL logic style are as follows
30
A major advantage of the proposed logic style is that it can be incorporated by the common
Electronic Design Automation (EDA) tool flow
No special design rules are involved in the interconnection of WDDL gates
The switching factor of WDDL is 100 A WDDL gate consists of a parallel
combination of two positive complementary gates one calculating the
true output using the true inputs the other the false output using the
false inputs A positive gate produces a zero output for an all zero input
The AND gate and the OR gate are examples of positive gates A
complementary gate sometimes also referred to as a dual gate
expresses the false output of the original logic gate using the false
inputs of the original gate The AND gate fed with true input signals and
the OR gate fed with false input signals are two dual gates Fig shows
the WDDL AND gate and the WDDL OR gate In the evaluation phase
each input signal is differential and the WDDL gate calculates its
differential output In the pre-charge phase the inputs to the WDDL gate
are set at 0 This puts the output of the gate at 0 A module in WDDL
pre-charges without distributing the pre-charge signal to each individual
gate During the pre-charge phase the input vector of the combinatorial
logic is set at all 0s Each individual gate will eventually have all its
inputs at 0 evaluate its output to 0 and pass this 0 value to the next
gate One could say that the pre-charge signal travels over the
combinatorial logic as a 0-wave hence WDDL There are several ways
to launch to pre-charge wave In Fig a pre-charge operator is inserted
at the start of every combinatorial logic tree ie the inputs of the
encryption module and the outputs of the registers They produce an all-
zero output in the pre-charge phase (clk-signal high) but let the
31
differential signal through during the evaluation phase (clk-signal low)
Fig
ure WDDL Pre-charge wave generationCHAPTER 7
WDDL GATESThe methodology used in the project is bottom-up approach Lower
modules are designed and later integrated to form larger modules whose further integration
leads to the final top module As it is a fact that logic gates form lower level modules
initially logic gates required for the design are implemented in WDDL style WDDL
demands a parallel combination of two positive complementary gates one calculating the
true value and the other negative value The logic gates like OR AND XOR have been
implemented Besides there is even implementation of Full Adder 32-bit XOR
etc71WDDL OR gateA WDDL OR gate is constructed by considering conventional
OR gate in parallel to its complementary gate ie AND gate as shown in the following
32
figure Figure
41 WDDL OR GateThe NAND and NOT gates used act as Precharge operator injecting
signal lsquo0rsquo into the gates when lsquoclkrsquo signal is high ie during the Precharge phase72
WDDL AND gateA WDDL AND gate is constructed by considering conventional
AND gate in parallel to its complementary gate ie OR gate as shown in the following
33
figure Figure
42 WDDL AND Gate73 WDDL NAND gateA WDDL NAND gate is constructed by
considering conventional AND gate in parallel to its complementary gate ie OR gate as
shown in the following figure
34
Figure
43 WDDL NAND Gate74 WDDL NOR gateA WDDL NOR gate is constructed by
considering conventional OR gate in parallel to its complementary gate ie AND gate as
shown in the following figure
35
Figure 44 WDDL
NOR Gate 75 WDDL XOR gate XOR function can be implemented by the
Boolean Equation A B = ABrsquo + ArsquoB Therefore XOR Gate is implemented
in terms of AND gate and OR gate as shown in the figure 44 It can also be implemented
by instantiating a WDDL AND gate and WDDL OR gate But the number of gates
involved in the latter one is greater than the former one Therefore the first method of
implementation is followed rather than the second one
36
Figure 45
WDDL XOR gateWith the help of the above basic gates Full adder circuit has been
designed by instantiating the above designed WDDL gates During the implementation of
the Blowfish algorithm a 32-bit XOR gate and 32-bit Adder circuit are required They can
be easily implemented by instantiating the corresponding lower module 32 number of
timesCHAPTER 8 FRONT END
RESULTSWDDL OR GATESynthesis
Report==========================================================
= Final Report
===========================================================Final
ResultsRTL Top Level Output File Name wddlorngrTop Level Output File Name
wddlorOutput Format NGCOptimization Goal SpeedKeep
Hierarchy NODesign Statistics IOs 5Cell Usage
BELS 2 LUT3 2 IO Buffers 5
37
IBUF 3 OBUF
2============================================================Devi
ce utilization summary---------------------------Selected Device 3s250eft256-4 Number
of Slices 1 out of 2448 0 Number of 4 input LUTs 2
out of 4896 0 Number of IOs 5 Number of bonded IOBs
5 out of 172 2 Timing Summary---------------Speed Grade -4Maximum
combinational path delay 6236nsSimulation Result
S
ynthesis Result
38
WDD
L AND GATESynthesis
Report==========================================================
== Final Report
============================================================Fina
l ResultsRTL Top Level Output File Name wddlgatesngrTop Level Output File
Name wddlgatesOutput Format NGCOptimization Goal
SpeedKeep Hierarchy NODesign Statistics IOs
5Cell Usage BELS 2 LUT3 2 IO Buffers
5 IBUF 3 OBUF
2===========================================================Devic
e utilization summary---------------------------Selected Device 3s250etq144-4 Number
of Slices 1 out of 2448 0 Number of 4 input LUTs 2
out of 4896 0 Number of IOs 5 Number of bonded IOBs
5 out of 108 4 Timing Summary---------------Speed Grade -4Maximum
combinational path delay 6236nsSimulation Result
39
Sy
nthesis Result
WDDL NAND GATESynthesis
Report==========================================================
== Final Report
============================================================Fina
l ResultsRTL Top Level Output File Name wddlnand1ngrTop Level Output File
Name wddlnand1Output Format NGCOptimization Goal
SpeedKeep Hierarchy NODesign Statistics IOs
5Cell Usage BELS 2 LUT3 2 IO Buffers
40
5 IBUF 3 OBUF
2============================================================Devi
ce utilization summarySelected Device 3s500efg320-4 Number of Slices
1 out of 4656 0 Number of 4 input LUTs 2 out of 9312 0
Number of IOs 5 Number of bonded IOBs 5 out of 232
2 Timing SummarySpeed Grade -4Maximum combinational path delay
6236nsSimulation Result
Synthesis Result
WD
41
DL XOR GATESimulation Result
Synthesis Result
WDDL XOR GATESynthesis
Report==========================================================
== Final Report
===========================================================Final
42
ResultsRTL Top Level Output File Name wddlxorgatengrTop Level Output File
Name wddlxorgateOutput Format NGCOptimization Goal
SpeedKeep Hierarchy NODesign Statistics IOs
5Cell Usage BELS 2 LUT3 2 IO Buffers
5 IBUF 3 OBUF
2============================================================Devi
ce utilization summary---------------------------Selected Device 3s250eft256-4 Number
of Slices 1 out of 2448 0 Number of 4 input LUTs 2
out of 4896 0 Number of IOs 5 Number of bonded IOBs
5 out of 172 2 Timing Summary---------------Speed Grade -4Maximum
combinational path delay 6236nsSimulation Result
Synthesis Result
43
CHAPTER 9 SUMMARY AND CONCLUSION 91
SummaryIn order to provide security to ICs against side-channel attacks especially
Differential Power Analysis (DPA) it is necessary to implement the design in a logic that
can render constant power dissipation irrespective of the input combination WDDL is
proved to be advantageous to others and therefore is of great significance In this
dissertation work architecture for Blowfish Algorithm is designed and implemented in
WDDL style In this implementation bottom-up approach is used The low level entities
are designed and later they are all combined to form the entire module The key
scheduling is online The sub-keys generated for a particular key can be used for the
encryption of the entire data to be encrypted with that key The sub keys are given in
reverse direction for the decryption data path Initially logic gates are implemented in
WDDL and then higher modules have been designed by instantiating the WDDL gates to
form the entire module thus resulting in constant power dissipation irrespective of any
input data combination The entire design works in two phases namely Precharge phase and
Evaluation phase In the Precharge phase all the signals of the design are zeroed and
during the Evaluation phase the functionality of the design is achieved This sort of design
has been found simple and very effective in thwarting the side-channel attack namely
Differential Power analysis (DPA)92 ConclusionThe crypto processor has been
44
designed for the key size of 448 bits and plain text of 64 bits The code for the
implementation has been written in VHDL The functional verification has been done using
the Cadence (NC Launch) Simulation package and Synthesis using RTL Compiler The
Backend of the design is done using the SOC EncounterAccording to the specifications
desired functionality has been achieved In the output during the Evaluation phase there
has been same number of transitions thus resulting in constant power dissipation During
Synthesis it has been observed that a simple WDDL gate comprised many conventional
gates Therefore the area of the design has grown nearly three-fold when compared to the
design implemented in conventional CMOS logic at the cost of security incorporated into
the IC against Differential Power Analysis (DPA) Due to the constant power dissipation at
the output hacker cannot apply Differential Power Analysis (DPA) scheme to find the
secret key that is being used in the crypto-processor Thus security against DPA is
incorporated into the IC at hardware level by implementing the design in WDDL style
which is quite simple and effectiveCHAPTER 10
REFERENCES 101 Referred Technical papers[1] Kris Tiri Member
IEEE and Ingrid Verbauwhede Senior Member IEEE ldquoA Digital Design Flow for
Secure Integrated Circuitsrdquo IEEE Transaction on Computer-Aided Design of Integrated
Circuits and Systems Vol 25 No 7 July 2006[2] Prof Jean-Jacques Quisquater Math
RiZK laquo Side Channel Attacks raquo October 2002[3] Ross Anderson Mike Bond Jolyon
Clulow and Sergei Skorobogatov ldquoCrypto processors ndash A Surveyrdquo IEEE Proceedings[4]
Noohul Basheer Zain Ali James M Noras ldquoOptimal Data path Design for a Cryptographic
Processor The Blowfish Algorithmrdquo Malaysian Journal of Computer Science Vol 14 No
1 June 2001 pp 16-27[5] Dan Rinehimer Derek Wilson ldquoSummary of B Schneierrsquos
Description of New Variable Length Key 64-Bit Block Cipher (Blowfish)rdquo[6] Kris Tiri and
Ingrid Verbauwhede ldquoA Dynamic and Differential CMOS Logic Style to Resist Power and
Timing Attacks on Security ICrsquosrdquo[7] Discretix Technologies Limited ldquoIntroduction to Side
45
Channel Attacksrdquo[8] Kris Tiri Moonmoon Akmal and Ingrid Verbauwhede ldquoA Dynamic
and Differential Logic with Signal Independent Power Consumption to withstand
Differential Power Analysis on Smart CardsrdquoReference books[1] William Stallings
ldquoCryptography and Network Security Principles and Practicesrdquo Pearson Education
2003[2] Samir Palnitkar ldquoVerilog HDL A Guide to Digital Design and Synthesisrdquo
Prentice Hall Referred Web Pages[1] httpwwwschneiercomblowfishhtml[2]
httpwwwnistgov[3] httpwwwdiscretixcomPDFIntroduction20to20Side20Channel
20Attackspdf[4] httpwwwwipointpctdbenwojsp
IA=WO2005081085ampDISPLAY=CLAIMS
46
In the lower levels of abstraction (eg RTL) synthesis tools ignore detailed timing
specifications
The actual timing results depend on implementation technology and efficiency of
synthesis tool
There are a few tools for behavioral synthesis
Concurrent Vs Sequential
Processes
Basic simulation concept in VHDL
VHDL description can always be broken up to interconnected processes
Quite similar to UNIX process
18
Process keyword in VHDL
Process statement is concurrent statement
Statements inside process statements are sequential statements
Process must contain either sensitivity list or wait statement(s) but NOT both
Sensitivity list or wait statement(s) contains signals which wakes process up
General Format
Process [(sensitivity list)]
process_declarative_part
begin
process_statements
[wait_statement]
End process
19
CHAPTER 4 SMART
CARD OVERVIEW
This section will very briefly introduce the concept of a smart card Basically a smart
card is a computer embedded in a safe It consists of a (typically 8-bit or 32-bit) processor
together with ROM EEPROM and a small amount of RAM which is therefore capable of
performing computations The main goal of a smart card is to allow the execution of
cryptographic operations involving some secret parameter (the key) while not revealing this
parameter to the outside world As opposed the goal of the attacker is to recover this secret
parameter This processor is embedded in a chip and connected to the outside world through
eight wires the role use position of which is normalized In addition to the inputoutput wires
the parts we will be the most interested in are the following
1 Power supply Smart cards do not have an internal battery
2 The current they need is provided by the smart card reader This will make the smart
cards power consumption pretty easy to measure for the attacker
3 Clock Similarly smart cards do not dispose of an internal clock either The clock ticks
must also be provided from the outside world As a consequence this will allow the
attacker to measure the cards running time with very good precision
Smart cards are usually equipped with protection mechanisms composed of a shield (the
passivation layer) whose goal is to hide the internal behavior of the chip and possibly sensors
that react when the shield is removed by destroying all sensitive data and preventing the card
to function properly
20
CHAPTER 5 SIDE
CHANNEL ATTACKS
ldquoSide channel attacksrdquo are attacks that are based on ldquoSide Channel Informationrdquo Side
channel information is information that can be retrieved from the encryption device that is
neither the plaintext to be encrypted nor the cipher text resulting from the encryption process
In the past an encryption device was perceived as a unit that receives plaintext input
and produces cipher text output and vice-versa Attacks were therefore based on either
knowing the cipher text (such as cipher text-only attacks) or knowing both (such as known
plaintext attacks) or on the ability to define what plaintext is to be encrypted and then seeing
the results of the encryption (known as chosen plaintext attacks) Today it is known that
encryption devices have additional output and often additional inputs which are not the
plaintext or the cipher text
Encryption devices produce timing information (information about the time that
operations take) that is easily measurable radiation of various sorts power consumption
statistics (that can be easily measured as well) and more Often the encryption device also has
additional ldquounintentionalrdquo inputs such as voltage that can be modified to cause predictable
outcomes Side channel attacks make use of some or all of this information along with other
(known) cryptanalytic techniques to recover the key the device is using
Side channel analysis techniques are of concern because the attacks can be mounted
quickly and can sometimes be implemented using readily available hardware costing from only
a few hundred dollars to thousands of dollars
51 Classification of side channel attacks
The literature usually classifies side channel attacks along two orthogonal axes
1 Invasive vs Non-invasive
21
Invasive attacks require de-packaging the chip to get direct access to its components
A typical example of this is the connection of a wire on a data bus to see the data transfers
A non-invasive attack only exploits externally available information (the emission of
which is however often unintentional) such as running time power consumption
A new distinction called semi-invasive attacks These attacks have the specificity that
they require de-packaging of the chip to get access to the chip surface but do not tamper with
the passivation layer ( they do not require electrical contact to the metal surface)
2 Active vs passive
Active attacks try to tamper with the cards proper functioning For example fault
induction attacks will try to induce errors in the computation
As opposed passive attacks will simply observe the cards behavior during its
processing without disturbing it
Note that these two axes are well orthogonal
An Invasive attack may completely avoid disturbing the cards behavior and a passive
attack may require a preliminary de-packaging for the required information to be observable
These attacks are of course not mutually exclusive an invasive attack may for example serve
as a preliminary step for a non-invasive one by giving a detailed description of the chips
architecture that helps to find out where to put external probes
As smart cards are usually equipped with protection mechanisms that are supposed to
react to invasive attacks (although several invasive attacks are nonetheless capable to defeat
these mechanisms as will be illustrated below) On the other hand it is worth pointing out that
a non-invasive attack is completely undetectable there is for example no way for a smart card
to figure out that its running time is currently being measured Other countermeasures will
therefore be necessary From an economical point of view invasive attacks are usually more
expensive to deploy on a large scale since they require individual processing of each attacked
device In this sense non-invasive attacks constitute therefore a bigger menace for the smart
card industry
Invasive attacks involved a relatively high capital investment for lab equipment plus a
moderate investment of effort for each individual chip attacked Non-invasive attacks require
only a moderate capital investment plus a moderate investment of effort in designing an attack
on a particular type of device Thereafter the cost per device attacked is low Semi-invasive
attacks can be carried out using very cheap and simple equipment
The attacker can gain information by
22
1 Probing attacks
2 Fault induction attacks
3 Timing attacks
4 Power analysis attacks and
5 Electromagnetic timing attacks
These attacks are performed during the switching behavior of digital
complementary metalndashoxidendashsemiconductor (CMOS) gates Of all these Power analysis attack
is of major concern
52 Power analysis attacks
The power consumption of a cryptographic device may provide much information
about the operations that take place and the involved parameters This is the idea of simple and
differential power analysis first introduced by Kocher et al As the clock ticks the cards
energy is also provided by the terminal and can therefore easily be measured Basically to
measure a circuits power consumption a small (eg 50 ohm) resistor is inserted in series with
the power or ground input The voltage difference across the resistor divided by the resistance
yields the current Well-equipped electronics labs have equipment that can digitally sample
voltage differences at extraordinarily high rates (over 1GHz) with excellent accuracy (less than
1 error) Devices capable of sampling at 20MHz or faster and transferring the data to a PC
can be bought for less than US$ 400
Power analysis attacks are of two types
1 Simple power analysis attack and
2 Differential Power Analysis attack
SPA attacks on smartcards typically take a few seconds per card while DPA attacks
can take several hours In a general with a somewhat academic perspective we may consider
the entire internal state of the block cipher to be all the intermediate results and values that are
never included in the output in normal operations For example DES has 16 rounds we can
consider the intermediate states state [115] after each round except the last as a secret internal
state Side channels typically give information about these internal states or about the
operations used in the transition of this internal state from one round to another The type of
side-channel will of course determine what information is available to the attacker about these
states The attacks typically work by finding some information about the internal state of the
cipher which can be learned both by guessing part of the key and checking the value directly
23
and additionally by some statistical property of the cipher that makes that checkable value
slightly nonrandom
521 Simple Power Analysis attack (SPA)
Simple Power Analysis is generally based on looking at the visual representation of the
power consumption of a unit while an encryption operation is being performed Simple Power
Analysis is a technique that involves direct interpretation of power consumption measurements
collected during cryptographic operations SPA can yield information about a devices
operation as well as key material
A trace refers to a set of power consumption measurements taken across a
cryptographic operation For example a 1 millisecond operation sampled at 5 MHz yields a
trace containing 5000 points Figure for example shows an SPA trace from a smart card
performing a DES operation
Figure SPA monitoring from a single DES operation performed by a typical smart card The
upper trace shows the entire encryption operation including the initial permutation the 16
DES rounds and the final permutation The lower trace is a detailed view of the second and
third rounds
Because SPA can reveal the sequence of instructions executed it can be used to break
cryptographic implementations in which the execution path depends on the data being
processed For example
DES key schedule the DES key schedule computation involves rotating 28-bit key registers
A conditional branch is commonly used to check the bit shifted off the end so that ldquo1 bits can
24
be wrapped around The resulting power consumption traces for a ldquo1 bit and a ldquo0 bit will
contain different SPA features if the execution paths take different branches for each
DES permutations DES implementations perform a variety of bit permutations Conditional
branching in software or microcode can cause significant power consumption differences for
ldquo0 and ldquo1 bits
Comparisons String or memory comparison operations typically perform a conditional
branch when a mismatch is found This conditional branching causes large SPA (and
sometimes timing) characteristics
Multipliers Modular multiplication circuits tend to leak a great deal of information about the
data they process The leakage functions depend on the multiplier design but are often strongly
correlated to operand values and Hamming weights
Exponentiators A simple modular exponentiation function scans across the exponent
performing a squaring operation in every iteration with an additional multiplication operation
for each exponent bit that is equal to ldquo1 The exponent can be compromised if squaring and
multiplication operations have different power consumption characteristics take different
amounts of time or are separated by different code Modular exponentiation functions that
operate on two or more exponent bits at a time may have more complex leakage functions
522Differential Power Analysis attack (DPA)
In addition to large-scale power variations due to the instruction sequence there are
effects correlated to data values being manipulated These variations tend to be smaller and are
sometimes overshadowed by measurement errors and other noise In such cases it is still often
possible to break the system using statistical functions tailored to the target algorithm
To implement the DPA attack an attacker first observes m encryption operations and captures
power traces T1 m [1 k] containing k samples each In addition the attacker records the
cipher text C1 m No knowledge of the plain text is required DPA analysis uses power
consumption measurements to determine whether a key block guess Ks is correct The attacker
computes a k-sample differential trace centD [1 k] by finding the difference between the
average of the traces for which a certain intermediate value V is one and the average of the
traces for which V is zero Thus cent D[j) is the average over C1m of the effect due to the value
represented by the selection function D on the power consumption at point j In particular25
If Ks is incorrect the bit computed using D will differ from the actual target bit for about half
of the ciphertext Ci The selection function is thus effectively uncorrelated to what was actually
computed by the target device If a random function is used to divide a set into two subsets the
difference in the averages of the subsets should approach zero as the subset sizes approach
infinity
Thus because trace components uncorrelated to D will diminish with 1 pm causing the
differential trace to become at (the actual trace may not be completely at as D with Ks
incorrect may have a weak correlation to D with the correct Ks) If Ks is correct however the
computed value for D (Ci bKs) will equal the actual value of target bit b with probability 1
The selection function is thus correlated to the value of the bit considered Other data values
measurement errors etc that are not correlated to D approach zero Because power
consumption is correlated to data bit values the plot of centD will be degat with spikes in regions
where D is correlated to the values being processed The correct value of Ks can thus be
identified from the spikes in its differential trace Four values of b correspond to each S box
providing confirmation of key block guesses Finding all eight Ks yields the entire 48-bit round
sub key The remaining 8 key bits can be found easily using exhaustive search or by analyzing
one additional round Triple DES keys can be found by analyzing an outer DES operation first
using the resulting key to decrypt the cipher text and attacking the next DES key DPA can use
known plaintext or known cipher text and can find encryption or decryption keys
26
CHAPTER 6 CONSTANT POWER CONSUMING
LOGIC STYLES
The power consumption of traditional standard cells and logic is
dependent on the signal activity When the output of the logic gate makes
a 0 to 1 transition a current comes from the power supply and charges the
output capacitance On the other hand when the output sees a 1 to 0 a 0
to 0 or a 1 to 1 transition no or only a limited amount of energy (due to
short circuit or leakage) is consumed from the power supply This is the
fundamental reason why information is leaked through the power supply
and why power attacks are possible The basis of a secure digital design
flow is a logic style with constant power consumption
61 Current Mode Logic
Current mode logic (CML) eg current steering logic seems the
ideal solution This type of logic continuously draws a current from the
supply and measures its state through the path that the current takes A
gate has constant power consumption if it draws a perfectly constant
current from the power supply independently of the input and output
signals To build a current source capable of generating a constant current
special circuit techniques that minimize channel length modulation have to
be used
The decisive drawback of CML however is its static power
consumption When the logic gate is not processing any data it burns the
27
current which makes this logic style unacceptable for embedded battery-
operated devices
62 Voltage Mode Logic (CMOS circuit styles)
Voltage mode logic (VML) eg static CMOS logic only draws a current from the
supply to change state and measures its state by the amount of charge it stores on a
capacitance A regular standard CMOS circuit will only consume power when a capacitance
gets charged and later discharged ie when a gate switches state It is the main reason that
CMOS is the style of choice for every battery operated or low power device This is illustrated
in the figure below for simple inverter Thus static CMOS is the preferred logic style because
of its low power consumption and high noise margins
Standard CMOS inverter
Yet two conditions must be satisfied for VML to have constant power consumption
namely
1) A logic gate must have exactly one switching event per signal transition
2) The logic gate must charge a constant capacitance in that switching event
28
Here above all the four transitions of CMOS inverter can be distinguished when
monitoring the power supply
63 Dynamic Differential Logic
Dynamic differential logic sometimes also referred to as dual rail with pre-charge
logic fulfills the first condition A differential logic family uses the true and the false
representation of the input and output signals and a dynamic logic family alternates pre-charge
and evaluation phases As a result since both outputs (true and false) are pre-charged to 1
exactly one of the two output nodes evaluates to 0 to have a differential output signal in the
evaluation phase The discharged output node is charged to 1 in the following pre-charge phase
to pre-charge both outputs to 1 In other words every signal transition including the events in
which the input signals remain constant is represented with an actual switching event in
which the logic gate charges a capacitance All the logic families that have been introduced to
thwart the differential power analysis (DPA) by using dynamic differential logic in the
following techniques
1 Sense Amplifier Based Logic (SABL) and
2 Wave Dynamic Differential Logic (WDDL) gates
631 Sense Amplifier Based logic (SABL)
SABL has its main advantage that it has balanced input and output nodes and that all
internal nodes connect to an output The output capacitances can be balanced Systematic
methods have been developed to make sure that both branches of the differential pull down
network are balanced and that no memory effects are present in the network Sense Amplifier
Based logic is illustrated as
29
Sense Amplifier Based Logic
ANDNAND gate
This circuit style does require however a full custom characterization and layout It also
suffers from a high clock load common to all dynamic logic gates
632 Wave Dynamic Differential Logic Gates (WDDL)
WDDL logic can be implemented with static CMOS logic Static CMOS
standard cells are combined to form secure compound standard cells
which have a reduced power signature WDDL has many advantages It can
be readily implemented from an existing standard cell library The design
flow is fully supported with accurate EDA library files that come directly
from the vendor WDDL also results in a dynamic differential logic with only
a small load capacitance on the pre-charge control signal and with the low
power consumption and the high noise margins of static CMOS
Advantages of WDDL logic style are as follows
30
A major advantage of the proposed logic style is that it can be incorporated by the common
Electronic Design Automation (EDA) tool flow
No special design rules are involved in the interconnection of WDDL gates
The switching factor of WDDL is 100 A WDDL gate consists of a parallel
combination of two positive complementary gates one calculating the
true output using the true inputs the other the false output using the
false inputs A positive gate produces a zero output for an all zero input
The AND gate and the OR gate are examples of positive gates A
complementary gate sometimes also referred to as a dual gate
expresses the false output of the original logic gate using the false
inputs of the original gate The AND gate fed with true input signals and
the OR gate fed with false input signals are two dual gates Fig shows
the WDDL AND gate and the WDDL OR gate In the evaluation phase
each input signal is differential and the WDDL gate calculates its
differential output In the pre-charge phase the inputs to the WDDL gate
are set at 0 This puts the output of the gate at 0 A module in WDDL
pre-charges without distributing the pre-charge signal to each individual
gate During the pre-charge phase the input vector of the combinatorial
logic is set at all 0s Each individual gate will eventually have all its
inputs at 0 evaluate its output to 0 and pass this 0 value to the next
gate One could say that the pre-charge signal travels over the
combinatorial logic as a 0-wave hence WDDL There are several ways
to launch to pre-charge wave In Fig a pre-charge operator is inserted
at the start of every combinatorial logic tree ie the inputs of the
encryption module and the outputs of the registers They produce an all-
zero output in the pre-charge phase (clk-signal high) but let the
31
differential signal through during the evaluation phase (clk-signal low)
Fig
ure WDDL Pre-charge wave generationCHAPTER 7
WDDL GATESThe methodology used in the project is bottom-up approach Lower
modules are designed and later integrated to form larger modules whose further integration
leads to the final top module As it is a fact that logic gates form lower level modules
initially logic gates required for the design are implemented in WDDL style WDDL
demands a parallel combination of two positive complementary gates one calculating the
true value and the other negative value The logic gates like OR AND XOR have been
implemented Besides there is even implementation of Full Adder 32-bit XOR
etc71WDDL OR gateA WDDL OR gate is constructed by considering conventional
OR gate in parallel to its complementary gate ie AND gate as shown in the following
32
figure Figure
41 WDDL OR GateThe NAND and NOT gates used act as Precharge operator injecting
signal lsquo0rsquo into the gates when lsquoclkrsquo signal is high ie during the Precharge phase72
WDDL AND gateA WDDL AND gate is constructed by considering conventional
AND gate in parallel to its complementary gate ie OR gate as shown in the following
33
figure Figure
42 WDDL AND Gate73 WDDL NAND gateA WDDL NAND gate is constructed by
considering conventional AND gate in parallel to its complementary gate ie OR gate as
shown in the following figure
34
Figure
43 WDDL NAND Gate74 WDDL NOR gateA WDDL NOR gate is constructed by
considering conventional OR gate in parallel to its complementary gate ie AND gate as
shown in the following figure
35
Figure 44 WDDL
NOR Gate 75 WDDL XOR gate XOR function can be implemented by the
Boolean Equation A B = ABrsquo + ArsquoB Therefore XOR Gate is implemented
in terms of AND gate and OR gate as shown in the figure 44 It can also be implemented
by instantiating a WDDL AND gate and WDDL OR gate But the number of gates
involved in the latter one is greater than the former one Therefore the first method of
implementation is followed rather than the second one
36
Figure 45
WDDL XOR gateWith the help of the above basic gates Full adder circuit has been
designed by instantiating the above designed WDDL gates During the implementation of
the Blowfish algorithm a 32-bit XOR gate and 32-bit Adder circuit are required They can
be easily implemented by instantiating the corresponding lower module 32 number of
timesCHAPTER 8 FRONT END
RESULTSWDDL OR GATESynthesis
Report==========================================================
= Final Report
===========================================================Final
ResultsRTL Top Level Output File Name wddlorngrTop Level Output File Name
wddlorOutput Format NGCOptimization Goal SpeedKeep
Hierarchy NODesign Statistics IOs 5Cell Usage
BELS 2 LUT3 2 IO Buffers 5
37
IBUF 3 OBUF
2============================================================Devi
ce utilization summary---------------------------Selected Device 3s250eft256-4 Number
of Slices 1 out of 2448 0 Number of 4 input LUTs 2
out of 4896 0 Number of IOs 5 Number of bonded IOBs
5 out of 172 2 Timing Summary---------------Speed Grade -4Maximum
combinational path delay 6236nsSimulation Result
S
ynthesis Result
38
WDD
L AND GATESynthesis
Report==========================================================
== Final Report
============================================================Fina
l ResultsRTL Top Level Output File Name wddlgatesngrTop Level Output File
Name wddlgatesOutput Format NGCOptimization Goal
SpeedKeep Hierarchy NODesign Statistics IOs
5Cell Usage BELS 2 LUT3 2 IO Buffers
5 IBUF 3 OBUF
2===========================================================Devic
e utilization summary---------------------------Selected Device 3s250etq144-4 Number
of Slices 1 out of 2448 0 Number of 4 input LUTs 2
out of 4896 0 Number of IOs 5 Number of bonded IOBs
5 out of 108 4 Timing Summary---------------Speed Grade -4Maximum
combinational path delay 6236nsSimulation Result
39
Sy
nthesis Result
WDDL NAND GATESynthesis
Report==========================================================
== Final Report
============================================================Fina
l ResultsRTL Top Level Output File Name wddlnand1ngrTop Level Output File
Name wddlnand1Output Format NGCOptimization Goal
SpeedKeep Hierarchy NODesign Statistics IOs
5Cell Usage BELS 2 LUT3 2 IO Buffers
40
5 IBUF 3 OBUF
2============================================================Devi
ce utilization summarySelected Device 3s500efg320-4 Number of Slices
1 out of 4656 0 Number of 4 input LUTs 2 out of 9312 0
Number of IOs 5 Number of bonded IOBs 5 out of 232
2 Timing SummarySpeed Grade -4Maximum combinational path delay
6236nsSimulation Result
Synthesis Result
WD
41
DL XOR GATESimulation Result
Synthesis Result
WDDL XOR GATESynthesis
Report==========================================================
== Final Report
===========================================================Final
42
ResultsRTL Top Level Output File Name wddlxorgatengrTop Level Output File
Name wddlxorgateOutput Format NGCOptimization Goal
SpeedKeep Hierarchy NODesign Statistics IOs
5Cell Usage BELS 2 LUT3 2 IO Buffers
5 IBUF 3 OBUF
2============================================================Devi
ce utilization summary---------------------------Selected Device 3s250eft256-4 Number
of Slices 1 out of 2448 0 Number of 4 input LUTs 2
out of 4896 0 Number of IOs 5 Number of bonded IOBs
5 out of 172 2 Timing Summary---------------Speed Grade -4Maximum
combinational path delay 6236nsSimulation Result
Synthesis Result
43
CHAPTER 9 SUMMARY AND CONCLUSION 91
SummaryIn order to provide security to ICs against side-channel attacks especially
Differential Power Analysis (DPA) it is necessary to implement the design in a logic that
can render constant power dissipation irrespective of the input combination WDDL is
proved to be advantageous to others and therefore is of great significance In this
dissertation work architecture for Blowfish Algorithm is designed and implemented in
WDDL style In this implementation bottom-up approach is used The low level entities
are designed and later they are all combined to form the entire module The key
scheduling is online The sub-keys generated for a particular key can be used for the
encryption of the entire data to be encrypted with that key The sub keys are given in
reverse direction for the decryption data path Initially logic gates are implemented in
WDDL and then higher modules have been designed by instantiating the WDDL gates to
form the entire module thus resulting in constant power dissipation irrespective of any
input data combination The entire design works in two phases namely Precharge phase and
Evaluation phase In the Precharge phase all the signals of the design are zeroed and
during the Evaluation phase the functionality of the design is achieved This sort of design
has been found simple and very effective in thwarting the side-channel attack namely
Differential Power analysis (DPA)92 ConclusionThe crypto processor has been
44
designed for the key size of 448 bits and plain text of 64 bits The code for the
implementation has been written in VHDL The functional verification has been done using
the Cadence (NC Launch) Simulation package and Synthesis using RTL Compiler The
Backend of the design is done using the SOC EncounterAccording to the specifications
desired functionality has been achieved In the output during the Evaluation phase there
has been same number of transitions thus resulting in constant power dissipation During
Synthesis it has been observed that a simple WDDL gate comprised many conventional
gates Therefore the area of the design has grown nearly three-fold when compared to the
design implemented in conventional CMOS logic at the cost of security incorporated into
the IC against Differential Power Analysis (DPA) Due to the constant power dissipation at
the output hacker cannot apply Differential Power Analysis (DPA) scheme to find the
secret key that is being used in the crypto-processor Thus security against DPA is
incorporated into the IC at hardware level by implementing the design in WDDL style
which is quite simple and effectiveCHAPTER 10
REFERENCES 101 Referred Technical papers[1] Kris Tiri Member
IEEE and Ingrid Verbauwhede Senior Member IEEE ldquoA Digital Design Flow for
Secure Integrated Circuitsrdquo IEEE Transaction on Computer-Aided Design of Integrated
Circuits and Systems Vol 25 No 7 July 2006[2] Prof Jean-Jacques Quisquater Math
RiZK laquo Side Channel Attacks raquo October 2002[3] Ross Anderson Mike Bond Jolyon
Clulow and Sergei Skorobogatov ldquoCrypto processors ndash A Surveyrdquo IEEE Proceedings[4]
Noohul Basheer Zain Ali James M Noras ldquoOptimal Data path Design for a Cryptographic
Processor The Blowfish Algorithmrdquo Malaysian Journal of Computer Science Vol 14 No
1 June 2001 pp 16-27[5] Dan Rinehimer Derek Wilson ldquoSummary of B Schneierrsquos
Description of New Variable Length Key 64-Bit Block Cipher (Blowfish)rdquo[6] Kris Tiri and
Ingrid Verbauwhede ldquoA Dynamic and Differential CMOS Logic Style to Resist Power and
Timing Attacks on Security ICrsquosrdquo[7] Discretix Technologies Limited ldquoIntroduction to Side
45
Channel Attacksrdquo[8] Kris Tiri Moonmoon Akmal and Ingrid Verbauwhede ldquoA Dynamic
and Differential Logic with Signal Independent Power Consumption to withstand
Differential Power Analysis on Smart CardsrdquoReference books[1] William Stallings
ldquoCryptography and Network Security Principles and Practicesrdquo Pearson Education
2003[2] Samir Palnitkar ldquoVerilog HDL A Guide to Digital Design and Synthesisrdquo
Prentice Hall Referred Web Pages[1] httpwwwschneiercomblowfishhtml[2]
httpwwwnistgov[3] httpwwwdiscretixcomPDFIntroduction20to20Side20Channel
20Attackspdf[4] httpwwwwipointpctdbenwojsp
IA=WO2005081085ampDISPLAY=CLAIMS
46
Process keyword in VHDL
Process statement is concurrent statement
Statements inside process statements are sequential statements
Process must contain either sensitivity list or wait statement(s) but NOT both
Sensitivity list or wait statement(s) contains signals which wakes process up
General Format
Process [(sensitivity list)]
process_declarative_part
begin
process_statements
[wait_statement]
End process
19
CHAPTER 4 SMART
CARD OVERVIEW
This section will very briefly introduce the concept of a smart card Basically a smart
card is a computer embedded in a safe It consists of a (typically 8-bit or 32-bit) processor
together with ROM EEPROM and a small amount of RAM which is therefore capable of
performing computations The main goal of a smart card is to allow the execution of
cryptographic operations involving some secret parameter (the key) while not revealing this
parameter to the outside world As opposed the goal of the attacker is to recover this secret
parameter This processor is embedded in a chip and connected to the outside world through
eight wires the role use position of which is normalized In addition to the inputoutput wires
the parts we will be the most interested in are the following
1 Power supply Smart cards do not have an internal battery
2 The current they need is provided by the smart card reader This will make the smart
cards power consumption pretty easy to measure for the attacker
3 Clock Similarly smart cards do not dispose of an internal clock either The clock ticks
must also be provided from the outside world As a consequence this will allow the
attacker to measure the cards running time with very good precision
Smart cards are usually equipped with protection mechanisms composed of a shield (the
passivation layer) whose goal is to hide the internal behavior of the chip and possibly sensors
that react when the shield is removed by destroying all sensitive data and preventing the card
to function properly
20
CHAPTER 5 SIDE
CHANNEL ATTACKS
ldquoSide channel attacksrdquo are attacks that are based on ldquoSide Channel Informationrdquo Side
channel information is information that can be retrieved from the encryption device that is
neither the plaintext to be encrypted nor the cipher text resulting from the encryption process
In the past an encryption device was perceived as a unit that receives plaintext input
and produces cipher text output and vice-versa Attacks were therefore based on either
knowing the cipher text (such as cipher text-only attacks) or knowing both (such as known
plaintext attacks) or on the ability to define what plaintext is to be encrypted and then seeing
the results of the encryption (known as chosen plaintext attacks) Today it is known that
encryption devices have additional output and often additional inputs which are not the
plaintext or the cipher text
Encryption devices produce timing information (information about the time that
operations take) that is easily measurable radiation of various sorts power consumption
statistics (that can be easily measured as well) and more Often the encryption device also has
additional ldquounintentionalrdquo inputs such as voltage that can be modified to cause predictable
outcomes Side channel attacks make use of some or all of this information along with other
(known) cryptanalytic techniques to recover the key the device is using
Side channel analysis techniques are of concern because the attacks can be mounted
quickly and can sometimes be implemented using readily available hardware costing from only
a few hundred dollars to thousands of dollars
51 Classification of side channel attacks
The literature usually classifies side channel attacks along two orthogonal axes
1 Invasive vs Non-invasive
21
Invasive attacks require de-packaging the chip to get direct access to its components
A typical example of this is the connection of a wire on a data bus to see the data transfers
A non-invasive attack only exploits externally available information (the emission of
which is however often unintentional) such as running time power consumption
A new distinction called semi-invasive attacks These attacks have the specificity that
they require de-packaging of the chip to get access to the chip surface but do not tamper with
the passivation layer ( they do not require electrical contact to the metal surface)
2 Active vs passive
Active attacks try to tamper with the cards proper functioning For example fault
induction attacks will try to induce errors in the computation
As opposed passive attacks will simply observe the cards behavior during its
processing without disturbing it
Note that these two axes are well orthogonal
An Invasive attack may completely avoid disturbing the cards behavior and a passive
attack may require a preliminary de-packaging for the required information to be observable
These attacks are of course not mutually exclusive an invasive attack may for example serve
as a preliminary step for a non-invasive one by giving a detailed description of the chips
architecture that helps to find out where to put external probes
As smart cards are usually equipped with protection mechanisms that are supposed to
react to invasive attacks (although several invasive attacks are nonetheless capable to defeat
these mechanisms as will be illustrated below) On the other hand it is worth pointing out that
a non-invasive attack is completely undetectable there is for example no way for a smart card
to figure out that its running time is currently being measured Other countermeasures will
therefore be necessary From an economical point of view invasive attacks are usually more
expensive to deploy on a large scale since they require individual processing of each attacked
device In this sense non-invasive attacks constitute therefore a bigger menace for the smart
card industry
Invasive attacks involved a relatively high capital investment for lab equipment plus a
moderate investment of effort for each individual chip attacked Non-invasive attacks require
only a moderate capital investment plus a moderate investment of effort in designing an attack
on a particular type of device Thereafter the cost per device attacked is low Semi-invasive
attacks can be carried out using very cheap and simple equipment
The attacker can gain information by
22
1 Probing attacks
2 Fault induction attacks
3 Timing attacks
4 Power analysis attacks and
5 Electromagnetic timing attacks
These attacks are performed during the switching behavior of digital
complementary metalndashoxidendashsemiconductor (CMOS) gates Of all these Power analysis attack
is of major concern
52 Power analysis attacks
The power consumption of a cryptographic device may provide much information
about the operations that take place and the involved parameters This is the idea of simple and
differential power analysis first introduced by Kocher et al As the clock ticks the cards
energy is also provided by the terminal and can therefore easily be measured Basically to
measure a circuits power consumption a small (eg 50 ohm) resistor is inserted in series with
the power or ground input The voltage difference across the resistor divided by the resistance
yields the current Well-equipped electronics labs have equipment that can digitally sample
voltage differences at extraordinarily high rates (over 1GHz) with excellent accuracy (less than
1 error) Devices capable of sampling at 20MHz or faster and transferring the data to a PC
can be bought for less than US$ 400
Power analysis attacks are of two types
1 Simple power analysis attack and
2 Differential Power Analysis attack
SPA attacks on smartcards typically take a few seconds per card while DPA attacks
can take several hours In a general with a somewhat academic perspective we may consider
the entire internal state of the block cipher to be all the intermediate results and values that are
never included in the output in normal operations For example DES has 16 rounds we can
consider the intermediate states state [115] after each round except the last as a secret internal
state Side channels typically give information about these internal states or about the
operations used in the transition of this internal state from one round to another The type of
side-channel will of course determine what information is available to the attacker about these
states The attacks typically work by finding some information about the internal state of the
cipher which can be learned both by guessing part of the key and checking the value directly
23
and additionally by some statistical property of the cipher that makes that checkable value
slightly nonrandom
521 Simple Power Analysis attack (SPA)
Simple Power Analysis is generally based on looking at the visual representation of the
power consumption of a unit while an encryption operation is being performed Simple Power
Analysis is a technique that involves direct interpretation of power consumption measurements
collected during cryptographic operations SPA can yield information about a devices
operation as well as key material
A trace refers to a set of power consumption measurements taken across a
cryptographic operation For example a 1 millisecond operation sampled at 5 MHz yields a
trace containing 5000 points Figure for example shows an SPA trace from a smart card
performing a DES operation
Figure SPA monitoring from a single DES operation performed by a typical smart card The
upper trace shows the entire encryption operation including the initial permutation the 16
DES rounds and the final permutation The lower trace is a detailed view of the second and
third rounds
Because SPA can reveal the sequence of instructions executed it can be used to break
cryptographic implementations in which the execution path depends on the data being
processed For example
DES key schedule the DES key schedule computation involves rotating 28-bit key registers
A conditional branch is commonly used to check the bit shifted off the end so that ldquo1 bits can
24
be wrapped around The resulting power consumption traces for a ldquo1 bit and a ldquo0 bit will
contain different SPA features if the execution paths take different branches for each
DES permutations DES implementations perform a variety of bit permutations Conditional
branching in software or microcode can cause significant power consumption differences for
ldquo0 and ldquo1 bits
Comparisons String or memory comparison operations typically perform a conditional
branch when a mismatch is found This conditional branching causes large SPA (and
sometimes timing) characteristics
Multipliers Modular multiplication circuits tend to leak a great deal of information about the
data they process The leakage functions depend on the multiplier design but are often strongly
correlated to operand values and Hamming weights
Exponentiators A simple modular exponentiation function scans across the exponent
performing a squaring operation in every iteration with an additional multiplication operation
for each exponent bit that is equal to ldquo1 The exponent can be compromised if squaring and
multiplication operations have different power consumption characteristics take different
amounts of time or are separated by different code Modular exponentiation functions that
operate on two or more exponent bits at a time may have more complex leakage functions
522Differential Power Analysis attack (DPA)
In addition to large-scale power variations due to the instruction sequence there are
effects correlated to data values being manipulated These variations tend to be smaller and are
sometimes overshadowed by measurement errors and other noise In such cases it is still often
possible to break the system using statistical functions tailored to the target algorithm
To implement the DPA attack an attacker first observes m encryption operations and captures
power traces T1 m [1 k] containing k samples each In addition the attacker records the
cipher text C1 m No knowledge of the plain text is required DPA analysis uses power
consumption measurements to determine whether a key block guess Ks is correct The attacker
computes a k-sample differential trace centD [1 k] by finding the difference between the
average of the traces for which a certain intermediate value V is one and the average of the
traces for which V is zero Thus cent D[j) is the average over C1m of the effect due to the value
represented by the selection function D on the power consumption at point j In particular25
If Ks is incorrect the bit computed using D will differ from the actual target bit for about half
of the ciphertext Ci The selection function is thus effectively uncorrelated to what was actually
computed by the target device If a random function is used to divide a set into two subsets the
difference in the averages of the subsets should approach zero as the subset sizes approach
infinity
Thus because trace components uncorrelated to D will diminish with 1 pm causing the
differential trace to become at (the actual trace may not be completely at as D with Ks
incorrect may have a weak correlation to D with the correct Ks) If Ks is correct however the
computed value for D (Ci bKs) will equal the actual value of target bit b with probability 1
The selection function is thus correlated to the value of the bit considered Other data values
measurement errors etc that are not correlated to D approach zero Because power
consumption is correlated to data bit values the plot of centD will be degat with spikes in regions
where D is correlated to the values being processed The correct value of Ks can thus be
identified from the spikes in its differential trace Four values of b correspond to each S box
providing confirmation of key block guesses Finding all eight Ks yields the entire 48-bit round
sub key The remaining 8 key bits can be found easily using exhaustive search or by analyzing
one additional round Triple DES keys can be found by analyzing an outer DES operation first
using the resulting key to decrypt the cipher text and attacking the next DES key DPA can use
known plaintext or known cipher text and can find encryption or decryption keys
26
CHAPTER 6 CONSTANT POWER CONSUMING
LOGIC STYLES
The power consumption of traditional standard cells and logic is
dependent on the signal activity When the output of the logic gate makes
a 0 to 1 transition a current comes from the power supply and charges the
output capacitance On the other hand when the output sees a 1 to 0 a 0
to 0 or a 1 to 1 transition no or only a limited amount of energy (due to
short circuit or leakage) is consumed from the power supply This is the
fundamental reason why information is leaked through the power supply
and why power attacks are possible The basis of a secure digital design
flow is a logic style with constant power consumption
61 Current Mode Logic
Current mode logic (CML) eg current steering logic seems the
ideal solution This type of logic continuously draws a current from the
supply and measures its state through the path that the current takes A
gate has constant power consumption if it draws a perfectly constant
current from the power supply independently of the input and output
signals To build a current source capable of generating a constant current
special circuit techniques that minimize channel length modulation have to
be used
The decisive drawback of CML however is its static power
consumption When the logic gate is not processing any data it burns the
27
current which makes this logic style unacceptable for embedded battery-
operated devices
62 Voltage Mode Logic (CMOS circuit styles)
Voltage mode logic (VML) eg static CMOS logic only draws a current from the
supply to change state and measures its state by the amount of charge it stores on a
capacitance A regular standard CMOS circuit will only consume power when a capacitance
gets charged and later discharged ie when a gate switches state It is the main reason that
CMOS is the style of choice for every battery operated or low power device This is illustrated
in the figure below for simple inverter Thus static CMOS is the preferred logic style because
of its low power consumption and high noise margins
Standard CMOS inverter
Yet two conditions must be satisfied for VML to have constant power consumption
namely
1) A logic gate must have exactly one switching event per signal transition
2) The logic gate must charge a constant capacitance in that switching event
28
Here above all the four transitions of CMOS inverter can be distinguished when
monitoring the power supply
63 Dynamic Differential Logic
Dynamic differential logic sometimes also referred to as dual rail with pre-charge
logic fulfills the first condition A differential logic family uses the true and the false
representation of the input and output signals and a dynamic logic family alternates pre-charge
and evaluation phases As a result since both outputs (true and false) are pre-charged to 1
exactly one of the two output nodes evaluates to 0 to have a differential output signal in the
evaluation phase The discharged output node is charged to 1 in the following pre-charge phase
to pre-charge both outputs to 1 In other words every signal transition including the events in
which the input signals remain constant is represented with an actual switching event in
which the logic gate charges a capacitance All the logic families that have been introduced to
thwart the differential power analysis (DPA) by using dynamic differential logic in the
following techniques
1 Sense Amplifier Based Logic (SABL) and
2 Wave Dynamic Differential Logic (WDDL) gates
631 Sense Amplifier Based logic (SABL)
SABL has its main advantage that it has balanced input and output nodes and that all
internal nodes connect to an output The output capacitances can be balanced Systematic
methods have been developed to make sure that both branches of the differential pull down
network are balanced and that no memory effects are present in the network Sense Amplifier
Based logic is illustrated as
29
Sense Amplifier Based Logic
ANDNAND gate
This circuit style does require however a full custom characterization and layout It also
suffers from a high clock load common to all dynamic logic gates
632 Wave Dynamic Differential Logic Gates (WDDL)
WDDL logic can be implemented with static CMOS logic Static CMOS
standard cells are combined to form secure compound standard cells
which have a reduced power signature WDDL has many advantages It can
be readily implemented from an existing standard cell library The design
flow is fully supported with accurate EDA library files that come directly
from the vendor WDDL also results in a dynamic differential logic with only
a small load capacitance on the pre-charge control signal and with the low
power consumption and the high noise margins of static CMOS
Advantages of WDDL logic style are as follows
30
A major advantage of the proposed logic style is that it can be incorporated by the common
Electronic Design Automation (EDA) tool flow
No special design rules are involved in the interconnection of WDDL gates
The switching factor of WDDL is 100 A WDDL gate consists of a parallel
combination of two positive complementary gates one calculating the
true output using the true inputs the other the false output using the
false inputs A positive gate produces a zero output for an all zero input
The AND gate and the OR gate are examples of positive gates A
complementary gate sometimes also referred to as a dual gate
expresses the false output of the original logic gate using the false
inputs of the original gate The AND gate fed with true input signals and
the OR gate fed with false input signals are two dual gates Fig shows
the WDDL AND gate and the WDDL OR gate In the evaluation phase
each input signal is differential and the WDDL gate calculates its
differential output In the pre-charge phase the inputs to the WDDL gate
are set at 0 This puts the output of the gate at 0 A module in WDDL
pre-charges without distributing the pre-charge signal to each individual
gate During the pre-charge phase the input vector of the combinatorial
logic is set at all 0s Each individual gate will eventually have all its
inputs at 0 evaluate its output to 0 and pass this 0 value to the next
gate One could say that the pre-charge signal travels over the
combinatorial logic as a 0-wave hence WDDL There are several ways
to launch to pre-charge wave In Fig a pre-charge operator is inserted
at the start of every combinatorial logic tree ie the inputs of the
encryption module and the outputs of the registers They produce an all-
zero output in the pre-charge phase (clk-signal high) but let the
31
differential signal through during the evaluation phase (clk-signal low)
Fig
ure WDDL Pre-charge wave generationCHAPTER 7
WDDL GATESThe methodology used in the project is bottom-up approach Lower
modules are designed and later integrated to form larger modules whose further integration
leads to the final top module As it is a fact that logic gates form lower level modules
initially logic gates required for the design are implemented in WDDL style WDDL
demands a parallel combination of two positive complementary gates one calculating the
true value and the other negative value The logic gates like OR AND XOR have been
implemented Besides there is even implementation of Full Adder 32-bit XOR
etc71WDDL OR gateA WDDL OR gate is constructed by considering conventional
OR gate in parallel to its complementary gate ie AND gate as shown in the following
32
figure Figure
41 WDDL OR GateThe NAND and NOT gates used act as Precharge operator injecting
signal lsquo0rsquo into the gates when lsquoclkrsquo signal is high ie during the Precharge phase72
WDDL AND gateA WDDL AND gate is constructed by considering conventional
AND gate in parallel to its complementary gate ie OR gate as shown in the following
33
figure Figure
42 WDDL AND Gate73 WDDL NAND gateA WDDL NAND gate is constructed by
considering conventional AND gate in parallel to its complementary gate ie OR gate as
shown in the following figure
34
Figure
43 WDDL NAND Gate74 WDDL NOR gateA WDDL NOR gate is constructed by
considering conventional OR gate in parallel to its complementary gate ie AND gate as
shown in the following figure
35
Figure 44 WDDL
NOR Gate 75 WDDL XOR gate XOR function can be implemented by the
Boolean Equation A B = ABrsquo + ArsquoB Therefore XOR Gate is implemented
in terms of AND gate and OR gate as shown in the figure 44 It can also be implemented
by instantiating a WDDL AND gate and WDDL OR gate But the number of gates
involved in the latter one is greater than the former one Therefore the first method of
implementation is followed rather than the second one
36
Figure 45
WDDL XOR gateWith the help of the above basic gates Full adder circuit has been
designed by instantiating the above designed WDDL gates During the implementation of
the Blowfish algorithm a 32-bit XOR gate and 32-bit Adder circuit are required They can
be easily implemented by instantiating the corresponding lower module 32 number of
timesCHAPTER 8 FRONT END
RESULTSWDDL OR GATESynthesis
Report==========================================================
= Final Report
===========================================================Final
ResultsRTL Top Level Output File Name wddlorngrTop Level Output File Name
wddlorOutput Format NGCOptimization Goal SpeedKeep
Hierarchy NODesign Statistics IOs 5Cell Usage
BELS 2 LUT3 2 IO Buffers 5
37
IBUF 3 OBUF
2============================================================Devi
ce utilization summary---------------------------Selected Device 3s250eft256-4 Number
of Slices 1 out of 2448 0 Number of 4 input LUTs 2
out of 4896 0 Number of IOs 5 Number of bonded IOBs
5 out of 172 2 Timing Summary---------------Speed Grade -4Maximum
combinational path delay 6236nsSimulation Result
S
ynthesis Result
38
WDD
L AND GATESynthesis
Report==========================================================
== Final Report
============================================================Fina
l ResultsRTL Top Level Output File Name wddlgatesngrTop Level Output File
Name wddlgatesOutput Format NGCOptimization Goal
SpeedKeep Hierarchy NODesign Statistics IOs
5Cell Usage BELS 2 LUT3 2 IO Buffers
5 IBUF 3 OBUF
2===========================================================Devic
e utilization summary---------------------------Selected Device 3s250etq144-4 Number
of Slices 1 out of 2448 0 Number of 4 input LUTs 2
out of 4896 0 Number of IOs 5 Number of bonded IOBs
5 out of 108 4 Timing Summary---------------Speed Grade -4Maximum
combinational path delay 6236nsSimulation Result
39
Sy
nthesis Result
WDDL NAND GATESynthesis
Report==========================================================
== Final Report
============================================================Fina
l ResultsRTL Top Level Output File Name wddlnand1ngrTop Level Output File
Name wddlnand1Output Format NGCOptimization Goal
SpeedKeep Hierarchy NODesign Statistics IOs
5Cell Usage BELS 2 LUT3 2 IO Buffers
40
5 IBUF 3 OBUF
2============================================================Devi
ce utilization summarySelected Device 3s500efg320-4 Number of Slices
1 out of 4656 0 Number of 4 input LUTs 2 out of 9312 0
Number of IOs 5 Number of bonded IOBs 5 out of 232
2 Timing SummarySpeed Grade -4Maximum combinational path delay
6236nsSimulation Result
Synthesis Result
WD
41
DL XOR GATESimulation Result
Synthesis Result
WDDL XOR GATESynthesis
Report==========================================================
== Final Report
===========================================================Final
42
ResultsRTL Top Level Output File Name wddlxorgatengrTop Level Output File
Name wddlxorgateOutput Format NGCOptimization Goal
SpeedKeep Hierarchy NODesign Statistics IOs
5Cell Usage BELS 2 LUT3 2 IO Buffers
5 IBUF 3 OBUF
2============================================================Devi
ce utilization summary---------------------------Selected Device 3s250eft256-4 Number
of Slices 1 out of 2448 0 Number of 4 input LUTs 2
out of 4896 0 Number of IOs 5 Number of bonded IOBs
5 out of 172 2 Timing Summary---------------Speed Grade -4Maximum
combinational path delay 6236nsSimulation Result
Synthesis Result
43
CHAPTER 9 SUMMARY AND CONCLUSION 91
SummaryIn order to provide security to ICs against side-channel attacks especially
Differential Power Analysis (DPA) it is necessary to implement the design in a logic that
can render constant power dissipation irrespective of the input combination WDDL is
proved to be advantageous to others and therefore is of great significance In this
dissertation work architecture for Blowfish Algorithm is designed and implemented in
WDDL style In this implementation bottom-up approach is used The low level entities
are designed and later they are all combined to form the entire module The key
scheduling is online The sub-keys generated for a particular key can be used for the
encryption of the entire data to be encrypted with that key The sub keys are given in
reverse direction for the decryption data path Initially logic gates are implemented in
WDDL and then higher modules have been designed by instantiating the WDDL gates to
form the entire module thus resulting in constant power dissipation irrespective of any
input data combination The entire design works in two phases namely Precharge phase and
Evaluation phase In the Precharge phase all the signals of the design are zeroed and
during the Evaluation phase the functionality of the design is achieved This sort of design
has been found simple and very effective in thwarting the side-channel attack namely
Differential Power analysis (DPA)92 ConclusionThe crypto processor has been
44
designed for the key size of 448 bits and plain text of 64 bits The code for the
implementation has been written in VHDL The functional verification has been done using
the Cadence (NC Launch) Simulation package and Synthesis using RTL Compiler The
Backend of the design is done using the SOC EncounterAccording to the specifications
desired functionality has been achieved In the output during the Evaluation phase there
has been same number of transitions thus resulting in constant power dissipation During
Synthesis it has been observed that a simple WDDL gate comprised many conventional
gates Therefore the area of the design has grown nearly three-fold when compared to the
design implemented in conventional CMOS logic at the cost of security incorporated into
the IC against Differential Power Analysis (DPA) Due to the constant power dissipation at
the output hacker cannot apply Differential Power Analysis (DPA) scheme to find the
secret key that is being used in the crypto-processor Thus security against DPA is
incorporated into the IC at hardware level by implementing the design in WDDL style
which is quite simple and effectiveCHAPTER 10
REFERENCES 101 Referred Technical papers[1] Kris Tiri Member
IEEE and Ingrid Verbauwhede Senior Member IEEE ldquoA Digital Design Flow for
Secure Integrated Circuitsrdquo IEEE Transaction on Computer-Aided Design of Integrated
Circuits and Systems Vol 25 No 7 July 2006[2] Prof Jean-Jacques Quisquater Math
RiZK laquo Side Channel Attacks raquo October 2002[3] Ross Anderson Mike Bond Jolyon
Clulow and Sergei Skorobogatov ldquoCrypto processors ndash A Surveyrdquo IEEE Proceedings[4]
Noohul Basheer Zain Ali James M Noras ldquoOptimal Data path Design for a Cryptographic
Processor The Blowfish Algorithmrdquo Malaysian Journal of Computer Science Vol 14 No
1 June 2001 pp 16-27[5] Dan Rinehimer Derek Wilson ldquoSummary of B Schneierrsquos
Description of New Variable Length Key 64-Bit Block Cipher (Blowfish)rdquo[6] Kris Tiri and
Ingrid Verbauwhede ldquoA Dynamic and Differential CMOS Logic Style to Resist Power and
Timing Attacks on Security ICrsquosrdquo[7] Discretix Technologies Limited ldquoIntroduction to Side
45
Channel Attacksrdquo[8] Kris Tiri Moonmoon Akmal and Ingrid Verbauwhede ldquoA Dynamic
and Differential Logic with Signal Independent Power Consumption to withstand
Differential Power Analysis on Smart CardsrdquoReference books[1] William Stallings
ldquoCryptography and Network Security Principles and Practicesrdquo Pearson Education
2003[2] Samir Palnitkar ldquoVerilog HDL A Guide to Digital Design and Synthesisrdquo
Prentice Hall Referred Web Pages[1] httpwwwschneiercomblowfishhtml[2]
httpwwwnistgov[3] httpwwwdiscretixcomPDFIntroduction20to20Side20Channel
20Attackspdf[4] httpwwwwipointpctdbenwojsp
IA=WO2005081085ampDISPLAY=CLAIMS
46
CHAPTER 4 SMART
CARD OVERVIEW
This section will very briefly introduce the concept of a smart card Basically a smart
card is a computer embedded in a safe It consists of a (typically 8-bit or 32-bit) processor
together with ROM EEPROM and a small amount of RAM which is therefore capable of
performing computations The main goal of a smart card is to allow the execution of
cryptographic operations involving some secret parameter (the key) while not revealing this
parameter to the outside world As opposed the goal of the attacker is to recover this secret
parameter This processor is embedded in a chip and connected to the outside world through
eight wires the role use position of which is normalized In addition to the inputoutput wires
the parts we will be the most interested in are the following
1 Power supply Smart cards do not have an internal battery
2 The current they need is provided by the smart card reader This will make the smart
cards power consumption pretty easy to measure for the attacker
3 Clock Similarly smart cards do not dispose of an internal clock either The clock ticks
must also be provided from the outside world As a consequence this will allow the
attacker to measure the cards running time with very good precision
Smart cards are usually equipped with protection mechanisms composed of a shield (the
passivation layer) whose goal is to hide the internal behavior of the chip and possibly sensors
that react when the shield is removed by destroying all sensitive data and preventing the card
to function properly
20
CHAPTER 5 SIDE
CHANNEL ATTACKS
ldquoSide channel attacksrdquo are attacks that are based on ldquoSide Channel Informationrdquo Side
channel information is information that can be retrieved from the encryption device that is
neither the plaintext to be encrypted nor the cipher text resulting from the encryption process
In the past an encryption device was perceived as a unit that receives plaintext input
and produces cipher text output and vice-versa Attacks were therefore based on either
knowing the cipher text (such as cipher text-only attacks) or knowing both (such as known
plaintext attacks) or on the ability to define what plaintext is to be encrypted and then seeing
the results of the encryption (known as chosen plaintext attacks) Today it is known that
encryption devices have additional output and often additional inputs which are not the
plaintext or the cipher text
Encryption devices produce timing information (information about the time that
operations take) that is easily measurable radiation of various sorts power consumption
statistics (that can be easily measured as well) and more Often the encryption device also has
additional ldquounintentionalrdquo inputs such as voltage that can be modified to cause predictable
outcomes Side channel attacks make use of some or all of this information along with other
(known) cryptanalytic techniques to recover the key the device is using
Side channel analysis techniques are of concern because the attacks can be mounted
quickly and can sometimes be implemented using readily available hardware costing from only
a few hundred dollars to thousands of dollars
51 Classification of side channel attacks
The literature usually classifies side channel attacks along two orthogonal axes
1 Invasive vs Non-invasive
21
Invasive attacks require de-packaging the chip to get direct access to its components
A typical example of this is the connection of a wire on a data bus to see the data transfers
A non-invasive attack only exploits externally available information (the emission of
which is however often unintentional) such as running time power consumption
A new distinction called semi-invasive attacks These attacks have the specificity that
they require de-packaging of the chip to get access to the chip surface but do not tamper with
the passivation layer ( they do not require electrical contact to the metal surface)
2 Active vs passive
Active attacks try to tamper with the cards proper functioning For example fault
induction attacks will try to induce errors in the computation
As opposed passive attacks will simply observe the cards behavior during its
processing without disturbing it
Note that these two axes are well orthogonal
An Invasive attack may completely avoid disturbing the cards behavior and a passive
attack may require a preliminary de-packaging for the required information to be observable
These attacks are of course not mutually exclusive an invasive attack may for example serve
as a preliminary step for a non-invasive one by giving a detailed description of the chips
architecture that helps to find out where to put external probes
As smart cards are usually equipped with protection mechanisms that are supposed to
react to invasive attacks (although several invasive attacks are nonetheless capable to defeat
these mechanisms as will be illustrated below) On the other hand it is worth pointing out that
a non-invasive attack is completely undetectable there is for example no way for a smart card
to figure out that its running time is currently being measured Other countermeasures will
therefore be necessary From an economical point of view invasive attacks are usually more
expensive to deploy on a large scale since they require individual processing of each attacked
device In this sense non-invasive attacks constitute therefore a bigger menace for the smart
card industry
Invasive attacks involved a relatively high capital investment for lab equipment plus a
moderate investment of effort for each individual chip attacked Non-invasive attacks require
only a moderate capital investment plus a moderate investment of effort in designing an attack
on a particular type of device Thereafter the cost per device attacked is low Semi-invasive
attacks can be carried out using very cheap and simple equipment
The attacker can gain information by
22
1 Probing attacks
2 Fault induction attacks
3 Timing attacks
4 Power analysis attacks and
5 Electromagnetic timing attacks
These attacks are performed during the switching behavior of digital
complementary metalndashoxidendashsemiconductor (CMOS) gates Of all these Power analysis attack
is of major concern
52 Power analysis attacks
The power consumption of a cryptographic device may provide much information
about the operations that take place and the involved parameters This is the idea of simple and
differential power analysis first introduced by Kocher et al As the clock ticks the cards
energy is also provided by the terminal and can therefore easily be measured Basically to
measure a circuits power consumption a small (eg 50 ohm) resistor is inserted in series with
the power or ground input The voltage difference across the resistor divided by the resistance
yields the current Well-equipped electronics labs have equipment that can digitally sample
voltage differences at extraordinarily high rates (over 1GHz) with excellent accuracy (less than
1 error) Devices capable of sampling at 20MHz or faster and transferring the data to a PC
can be bought for less than US$ 400
Power analysis attacks are of two types
1 Simple power analysis attack and
2 Differential Power Analysis attack
SPA attacks on smartcards typically take a few seconds per card while DPA attacks
can take several hours In a general with a somewhat academic perspective we may consider
the entire internal state of the block cipher to be all the intermediate results and values that are
never included in the output in normal operations For example DES has 16 rounds we can
consider the intermediate states state [115] after each round except the last as a secret internal
state Side channels typically give information about these internal states or about the
operations used in the transition of this internal state from one round to another The type of
side-channel will of course determine what information is available to the attacker about these
states The attacks typically work by finding some information about the internal state of the
cipher which can be learned both by guessing part of the key and checking the value directly
23
and additionally by some statistical property of the cipher that makes that checkable value
slightly nonrandom
521 Simple Power Analysis attack (SPA)
Simple Power Analysis is generally based on looking at the visual representation of the
power consumption of a unit while an encryption operation is being performed Simple Power
Analysis is a technique that involves direct interpretation of power consumption measurements
collected during cryptographic operations SPA can yield information about a devices
operation as well as key material
A trace refers to a set of power consumption measurements taken across a
cryptographic operation For example a 1 millisecond operation sampled at 5 MHz yields a
trace containing 5000 points Figure for example shows an SPA trace from a smart card
performing a DES operation
Figure SPA monitoring from a single DES operation performed by a typical smart card The
upper trace shows the entire encryption operation including the initial permutation the 16
DES rounds and the final permutation The lower trace is a detailed view of the second and
third rounds
Because SPA can reveal the sequence of instructions executed it can be used to break
cryptographic implementations in which the execution path depends on the data being
processed For example
DES key schedule the DES key schedule computation involves rotating 28-bit key registers
A conditional branch is commonly used to check the bit shifted off the end so that ldquo1 bits can
24
be wrapped around The resulting power consumption traces for a ldquo1 bit and a ldquo0 bit will
contain different SPA features if the execution paths take different branches for each
DES permutations DES implementations perform a variety of bit permutations Conditional
branching in software or microcode can cause significant power consumption differences for
ldquo0 and ldquo1 bits
Comparisons String or memory comparison operations typically perform a conditional
branch when a mismatch is found This conditional branching causes large SPA (and
sometimes timing) characteristics
Multipliers Modular multiplication circuits tend to leak a great deal of information about the
data they process The leakage functions depend on the multiplier design but are often strongly
correlated to operand values and Hamming weights
Exponentiators A simple modular exponentiation function scans across the exponent
performing a squaring operation in every iteration with an additional multiplication operation
for each exponent bit that is equal to ldquo1 The exponent can be compromised if squaring and
multiplication operations have different power consumption characteristics take different
amounts of time or are separated by different code Modular exponentiation functions that
operate on two or more exponent bits at a time may have more complex leakage functions
522Differential Power Analysis attack (DPA)
In addition to large-scale power variations due to the instruction sequence there are
effects correlated to data values being manipulated These variations tend to be smaller and are
sometimes overshadowed by measurement errors and other noise In such cases it is still often
possible to break the system using statistical functions tailored to the target algorithm
To implement the DPA attack an attacker first observes m encryption operations and captures
power traces T1 m [1 k] containing k samples each In addition the attacker records the
cipher text C1 m No knowledge of the plain text is required DPA analysis uses power
consumption measurements to determine whether a key block guess Ks is correct The attacker
computes a k-sample differential trace centD [1 k] by finding the difference between the
average of the traces for which a certain intermediate value V is one and the average of the
traces for which V is zero Thus cent D[j) is the average over C1m of the effect due to the value
represented by the selection function D on the power consumption at point j In particular25
If Ks is incorrect the bit computed using D will differ from the actual target bit for about half
of the ciphertext Ci The selection function is thus effectively uncorrelated to what was actually
computed by the target device If a random function is used to divide a set into two subsets the
difference in the averages of the subsets should approach zero as the subset sizes approach
infinity
Thus because trace components uncorrelated to D will diminish with 1 pm causing the
differential trace to become at (the actual trace may not be completely at as D with Ks
incorrect may have a weak correlation to D with the correct Ks) If Ks is correct however the
computed value for D (Ci bKs) will equal the actual value of target bit b with probability 1
The selection function is thus correlated to the value of the bit considered Other data values
measurement errors etc that are not correlated to D approach zero Because power
consumption is correlated to data bit values the plot of centD will be degat with spikes in regions
where D is correlated to the values being processed The correct value of Ks can thus be
identified from the spikes in its differential trace Four values of b correspond to each S box
providing confirmation of key block guesses Finding all eight Ks yields the entire 48-bit round
sub key The remaining 8 key bits can be found easily using exhaustive search or by analyzing
one additional round Triple DES keys can be found by analyzing an outer DES operation first
using the resulting key to decrypt the cipher text and attacking the next DES key DPA can use
known plaintext or known cipher text and can find encryption or decryption keys
26
CHAPTER 6 CONSTANT POWER CONSUMING
LOGIC STYLES
The power consumption of traditional standard cells and logic is
dependent on the signal activity When the output of the logic gate makes
a 0 to 1 transition a current comes from the power supply and charges the
output capacitance On the other hand when the output sees a 1 to 0 a 0
to 0 or a 1 to 1 transition no or only a limited amount of energy (due to
short circuit or leakage) is consumed from the power supply This is the
fundamental reason why information is leaked through the power supply
and why power attacks are possible The basis of a secure digital design
flow is a logic style with constant power consumption
61 Current Mode Logic
Current mode logic (CML) eg current steering logic seems the
ideal solution This type of logic continuously draws a current from the
supply and measures its state through the path that the current takes A
gate has constant power consumption if it draws a perfectly constant
current from the power supply independently of the input and output
signals To build a current source capable of generating a constant current
special circuit techniques that minimize channel length modulation have to
be used
The decisive drawback of CML however is its static power
consumption When the logic gate is not processing any data it burns the
27
current which makes this logic style unacceptable for embedded battery-
operated devices
62 Voltage Mode Logic (CMOS circuit styles)
Voltage mode logic (VML) eg static CMOS logic only draws a current from the
supply to change state and measures its state by the amount of charge it stores on a
capacitance A regular standard CMOS circuit will only consume power when a capacitance
gets charged and later discharged ie when a gate switches state It is the main reason that
CMOS is the style of choice for every battery operated or low power device This is illustrated
in the figure below for simple inverter Thus static CMOS is the preferred logic style because
of its low power consumption and high noise margins
Standard CMOS inverter
Yet two conditions must be satisfied for VML to have constant power consumption
namely
1) A logic gate must have exactly one switching event per signal transition
2) The logic gate must charge a constant capacitance in that switching event
28
Here above all the four transitions of CMOS inverter can be distinguished when
monitoring the power supply
63 Dynamic Differential Logic
Dynamic differential logic sometimes also referred to as dual rail with pre-charge
logic fulfills the first condition A differential logic family uses the true and the false
representation of the input and output signals and a dynamic logic family alternates pre-charge
and evaluation phases As a result since both outputs (true and false) are pre-charged to 1
exactly one of the two output nodes evaluates to 0 to have a differential output signal in the
evaluation phase The discharged output node is charged to 1 in the following pre-charge phase
to pre-charge both outputs to 1 In other words every signal transition including the events in
which the input signals remain constant is represented with an actual switching event in
which the logic gate charges a capacitance All the logic families that have been introduced to
thwart the differential power analysis (DPA) by using dynamic differential logic in the
following techniques
1 Sense Amplifier Based Logic (SABL) and
2 Wave Dynamic Differential Logic (WDDL) gates
631 Sense Amplifier Based logic (SABL)
SABL has its main advantage that it has balanced input and output nodes and that all
internal nodes connect to an output The output capacitances can be balanced Systematic
methods have been developed to make sure that both branches of the differential pull down
network are balanced and that no memory effects are present in the network Sense Amplifier
Based logic is illustrated as
29
Sense Amplifier Based Logic
ANDNAND gate
This circuit style does require however a full custom characterization and layout It also
suffers from a high clock load common to all dynamic logic gates
632 Wave Dynamic Differential Logic Gates (WDDL)
WDDL logic can be implemented with static CMOS logic Static CMOS
standard cells are combined to form secure compound standard cells
which have a reduced power signature WDDL has many advantages It can
be readily implemented from an existing standard cell library The design
flow is fully supported with accurate EDA library files that come directly
from the vendor WDDL also results in a dynamic differential logic with only
a small load capacitance on the pre-charge control signal and with the low
power consumption and the high noise margins of static CMOS
Advantages of WDDL logic style are as follows
30
A major advantage of the proposed logic style is that it can be incorporated by the common
Electronic Design Automation (EDA) tool flow
No special design rules are involved in the interconnection of WDDL gates
The switching factor of WDDL is 100 A WDDL gate consists of a parallel
combination of two positive complementary gates one calculating the
true output using the true inputs the other the false output using the
false inputs A positive gate produces a zero output for an all zero input
The AND gate and the OR gate are examples of positive gates A
complementary gate sometimes also referred to as a dual gate
expresses the false output of the original logic gate using the false
inputs of the original gate The AND gate fed with true input signals and
the OR gate fed with false input signals are two dual gates Fig shows
the WDDL AND gate and the WDDL OR gate In the evaluation phase
each input signal is differential and the WDDL gate calculates its
differential output In the pre-charge phase the inputs to the WDDL gate
are set at 0 This puts the output of the gate at 0 A module in WDDL
pre-charges without distributing the pre-charge signal to each individual
gate During the pre-charge phase the input vector of the combinatorial
logic is set at all 0s Each individual gate will eventually have all its
inputs at 0 evaluate its output to 0 and pass this 0 value to the next
gate One could say that the pre-charge signal travels over the
combinatorial logic as a 0-wave hence WDDL There are several ways
to launch to pre-charge wave In Fig a pre-charge operator is inserted
at the start of every combinatorial logic tree ie the inputs of the
encryption module and the outputs of the registers They produce an all-
zero output in the pre-charge phase (clk-signal high) but let the
31
differential signal through during the evaluation phase (clk-signal low)
Fig
ure WDDL Pre-charge wave generationCHAPTER 7
WDDL GATESThe methodology used in the project is bottom-up approach Lower
modules are designed and later integrated to form larger modules whose further integration
leads to the final top module As it is a fact that logic gates form lower level modules
initially logic gates required for the design are implemented in WDDL style WDDL
demands a parallel combination of two positive complementary gates one calculating the
true value and the other negative value The logic gates like OR AND XOR have been
implemented Besides there is even implementation of Full Adder 32-bit XOR
etc71WDDL OR gateA WDDL OR gate is constructed by considering conventional
OR gate in parallel to its complementary gate ie AND gate as shown in the following
32
figure Figure
41 WDDL OR GateThe NAND and NOT gates used act as Precharge operator injecting
signal lsquo0rsquo into the gates when lsquoclkrsquo signal is high ie during the Precharge phase72
WDDL AND gateA WDDL AND gate is constructed by considering conventional
AND gate in parallel to its complementary gate ie OR gate as shown in the following
33
figure Figure
42 WDDL AND Gate73 WDDL NAND gateA WDDL NAND gate is constructed by
considering conventional AND gate in parallel to its complementary gate ie OR gate as
shown in the following figure
34
Figure
43 WDDL NAND Gate74 WDDL NOR gateA WDDL NOR gate is constructed by
considering conventional OR gate in parallel to its complementary gate ie AND gate as
shown in the following figure
35
Figure 44 WDDL
NOR Gate 75 WDDL XOR gate XOR function can be implemented by the
Boolean Equation A B = ABrsquo + ArsquoB Therefore XOR Gate is implemented
in terms of AND gate and OR gate as shown in the figure 44 It can also be implemented
by instantiating a WDDL AND gate and WDDL OR gate But the number of gates
involved in the latter one is greater than the former one Therefore the first method of
implementation is followed rather than the second one
36
Figure 45
WDDL XOR gateWith the help of the above basic gates Full adder circuit has been
designed by instantiating the above designed WDDL gates During the implementation of
the Blowfish algorithm a 32-bit XOR gate and 32-bit Adder circuit are required They can
be easily implemented by instantiating the corresponding lower module 32 number of
timesCHAPTER 8 FRONT END
RESULTSWDDL OR GATESynthesis
Report==========================================================
= Final Report
===========================================================Final
ResultsRTL Top Level Output File Name wddlorngrTop Level Output File Name
wddlorOutput Format NGCOptimization Goal SpeedKeep
Hierarchy NODesign Statistics IOs 5Cell Usage
BELS 2 LUT3 2 IO Buffers 5
37
IBUF 3 OBUF
2============================================================Devi
ce utilization summary---------------------------Selected Device 3s250eft256-4 Number
of Slices 1 out of 2448 0 Number of 4 input LUTs 2
out of 4896 0 Number of IOs 5 Number of bonded IOBs
5 out of 172 2 Timing Summary---------------Speed Grade -4Maximum
combinational path delay 6236nsSimulation Result
S
ynthesis Result
38
WDD
L AND GATESynthesis
Report==========================================================
== Final Report
============================================================Fina
l ResultsRTL Top Level Output File Name wddlgatesngrTop Level Output File
Name wddlgatesOutput Format NGCOptimization Goal
SpeedKeep Hierarchy NODesign Statistics IOs
5Cell Usage BELS 2 LUT3 2 IO Buffers
5 IBUF 3 OBUF
2===========================================================Devic
e utilization summary---------------------------Selected Device 3s250etq144-4 Number
of Slices 1 out of 2448 0 Number of 4 input LUTs 2
out of 4896 0 Number of IOs 5 Number of bonded IOBs
5 out of 108 4 Timing Summary---------------Speed Grade -4Maximum
combinational path delay 6236nsSimulation Result
39
Sy
nthesis Result
WDDL NAND GATESynthesis
Report==========================================================
== Final Report
============================================================Fina
l ResultsRTL Top Level Output File Name wddlnand1ngrTop Level Output File
Name wddlnand1Output Format NGCOptimization Goal
SpeedKeep Hierarchy NODesign Statistics IOs
5Cell Usage BELS 2 LUT3 2 IO Buffers
40
5 IBUF 3 OBUF
2============================================================Devi
ce utilization summarySelected Device 3s500efg320-4 Number of Slices
1 out of 4656 0 Number of 4 input LUTs 2 out of 9312 0
Number of IOs 5 Number of bonded IOBs 5 out of 232
2 Timing SummarySpeed Grade -4Maximum combinational path delay
6236nsSimulation Result
Synthesis Result
WD
41
DL XOR GATESimulation Result
Synthesis Result
WDDL XOR GATESynthesis
Report==========================================================
== Final Report
===========================================================Final
42
ResultsRTL Top Level Output File Name wddlxorgatengrTop Level Output File
Name wddlxorgateOutput Format NGCOptimization Goal
SpeedKeep Hierarchy NODesign Statistics IOs
5Cell Usage BELS 2 LUT3 2 IO Buffers
5 IBUF 3 OBUF
2============================================================Devi
ce utilization summary---------------------------Selected Device 3s250eft256-4 Number
of Slices 1 out of 2448 0 Number of 4 input LUTs 2
out of 4896 0 Number of IOs 5 Number of bonded IOBs
5 out of 172 2 Timing Summary---------------Speed Grade -4Maximum
combinational path delay 6236nsSimulation Result
Synthesis Result
43
CHAPTER 9 SUMMARY AND CONCLUSION 91
SummaryIn order to provide security to ICs against side-channel attacks especially
Differential Power Analysis (DPA) it is necessary to implement the design in a logic that
can render constant power dissipation irrespective of the input combination WDDL is
proved to be advantageous to others and therefore is of great significance In this
dissertation work architecture for Blowfish Algorithm is designed and implemented in
WDDL style In this implementation bottom-up approach is used The low level entities
are designed and later they are all combined to form the entire module The key
scheduling is online The sub-keys generated for a particular key can be used for the
encryption of the entire data to be encrypted with that key The sub keys are given in
reverse direction for the decryption data path Initially logic gates are implemented in
WDDL and then higher modules have been designed by instantiating the WDDL gates to
form the entire module thus resulting in constant power dissipation irrespective of any
input data combination The entire design works in two phases namely Precharge phase and
Evaluation phase In the Precharge phase all the signals of the design are zeroed and
during the Evaluation phase the functionality of the design is achieved This sort of design
has been found simple and very effective in thwarting the side-channel attack namely
Differential Power analysis (DPA)92 ConclusionThe crypto processor has been
44
designed for the key size of 448 bits and plain text of 64 bits The code for the
implementation has been written in VHDL The functional verification has been done using
the Cadence (NC Launch) Simulation package and Synthesis using RTL Compiler The
Backend of the design is done using the SOC EncounterAccording to the specifications
desired functionality has been achieved In the output during the Evaluation phase there
has been same number of transitions thus resulting in constant power dissipation During
Synthesis it has been observed that a simple WDDL gate comprised many conventional
gates Therefore the area of the design has grown nearly three-fold when compared to the
design implemented in conventional CMOS logic at the cost of security incorporated into
the IC against Differential Power Analysis (DPA) Due to the constant power dissipation at
the output hacker cannot apply Differential Power Analysis (DPA) scheme to find the
secret key that is being used in the crypto-processor Thus security against DPA is
incorporated into the IC at hardware level by implementing the design in WDDL style
which is quite simple and effectiveCHAPTER 10
REFERENCES 101 Referred Technical papers[1] Kris Tiri Member
IEEE and Ingrid Verbauwhede Senior Member IEEE ldquoA Digital Design Flow for
Secure Integrated Circuitsrdquo IEEE Transaction on Computer-Aided Design of Integrated
Circuits and Systems Vol 25 No 7 July 2006[2] Prof Jean-Jacques Quisquater Math
RiZK laquo Side Channel Attacks raquo October 2002[3] Ross Anderson Mike Bond Jolyon
Clulow and Sergei Skorobogatov ldquoCrypto processors ndash A Surveyrdquo IEEE Proceedings[4]
Noohul Basheer Zain Ali James M Noras ldquoOptimal Data path Design for a Cryptographic
Processor The Blowfish Algorithmrdquo Malaysian Journal of Computer Science Vol 14 No
1 June 2001 pp 16-27[5] Dan Rinehimer Derek Wilson ldquoSummary of B Schneierrsquos
Description of New Variable Length Key 64-Bit Block Cipher (Blowfish)rdquo[6] Kris Tiri and
Ingrid Verbauwhede ldquoA Dynamic and Differential CMOS Logic Style to Resist Power and
Timing Attacks on Security ICrsquosrdquo[7] Discretix Technologies Limited ldquoIntroduction to Side
45
Channel Attacksrdquo[8] Kris Tiri Moonmoon Akmal and Ingrid Verbauwhede ldquoA Dynamic
and Differential Logic with Signal Independent Power Consumption to withstand
Differential Power Analysis on Smart CardsrdquoReference books[1] William Stallings
ldquoCryptography and Network Security Principles and Practicesrdquo Pearson Education
2003[2] Samir Palnitkar ldquoVerilog HDL A Guide to Digital Design and Synthesisrdquo
Prentice Hall Referred Web Pages[1] httpwwwschneiercomblowfishhtml[2]
httpwwwnistgov[3] httpwwwdiscretixcomPDFIntroduction20to20Side20Channel
20Attackspdf[4] httpwwwwipointpctdbenwojsp
IA=WO2005081085ampDISPLAY=CLAIMS
46
CHAPTER 5 SIDE
CHANNEL ATTACKS
ldquoSide channel attacksrdquo are attacks that are based on ldquoSide Channel Informationrdquo Side
channel information is information that can be retrieved from the encryption device that is
neither the plaintext to be encrypted nor the cipher text resulting from the encryption process
In the past an encryption device was perceived as a unit that receives plaintext input
and produces cipher text output and vice-versa Attacks were therefore based on either
knowing the cipher text (such as cipher text-only attacks) or knowing both (such as known
plaintext attacks) or on the ability to define what plaintext is to be encrypted and then seeing
the results of the encryption (known as chosen plaintext attacks) Today it is known that
encryption devices have additional output and often additional inputs which are not the
plaintext or the cipher text
Encryption devices produce timing information (information about the time that
operations take) that is easily measurable radiation of various sorts power consumption
statistics (that can be easily measured as well) and more Often the encryption device also has
additional ldquounintentionalrdquo inputs such as voltage that can be modified to cause predictable
outcomes Side channel attacks make use of some or all of this information along with other
(known) cryptanalytic techniques to recover the key the device is using
Side channel analysis techniques are of concern because the attacks can be mounted
quickly and can sometimes be implemented using readily available hardware costing from only
a few hundred dollars to thousands of dollars
51 Classification of side channel attacks
The literature usually classifies side channel attacks along two orthogonal axes
1 Invasive vs Non-invasive
21
Invasive attacks require de-packaging the chip to get direct access to its components
A typical example of this is the connection of a wire on a data bus to see the data transfers
A non-invasive attack only exploits externally available information (the emission of
which is however often unintentional) such as running time power consumption
A new distinction called semi-invasive attacks These attacks have the specificity that
they require de-packaging of the chip to get access to the chip surface but do not tamper with
the passivation layer ( they do not require electrical contact to the metal surface)
2 Active vs passive
Active attacks try to tamper with the cards proper functioning For example fault
induction attacks will try to induce errors in the computation
As opposed passive attacks will simply observe the cards behavior during its
processing without disturbing it
Note that these two axes are well orthogonal
An Invasive attack may completely avoid disturbing the cards behavior and a passive
attack may require a preliminary de-packaging for the required information to be observable
These attacks are of course not mutually exclusive an invasive attack may for example serve
as a preliminary step for a non-invasive one by giving a detailed description of the chips
architecture that helps to find out where to put external probes
As smart cards are usually equipped with protection mechanisms that are supposed to
react to invasive attacks (although several invasive attacks are nonetheless capable to defeat
these mechanisms as will be illustrated below) On the other hand it is worth pointing out that
a non-invasive attack is completely undetectable there is for example no way for a smart card
to figure out that its running time is currently being measured Other countermeasures will
therefore be necessary From an economical point of view invasive attacks are usually more
expensive to deploy on a large scale since they require individual processing of each attacked
device In this sense non-invasive attacks constitute therefore a bigger menace for the smart
card industry
Invasive attacks involved a relatively high capital investment for lab equipment plus a
moderate investment of effort for each individual chip attacked Non-invasive attacks require
only a moderate capital investment plus a moderate investment of effort in designing an attack
on a particular type of device Thereafter the cost per device attacked is low Semi-invasive
attacks can be carried out using very cheap and simple equipment
The attacker can gain information by
22
1 Probing attacks
2 Fault induction attacks
3 Timing attacks
4 Power analysis attacks and
5 Electromagnetic timing attacks
These attacks are performed during the switching behavior of digital
complementary metalndashoxidendashsemiconductor (CMOS) gates Of all these Power analysis attack
is of major concern
52 Power analysis attacks
The power consumption of a cryptographic device may provide much information
about the operations that take place and the involved parameters This is the idea of simple and
differential power analysis first introduced by Kocher et al As the clock ticks the cards
energy is also provided by the terminal and can therefore easily be measured Basically to
measure a circuits power consumption a small (eg 50 ohm) resistor is inserted in series with
the power or ground input The voltage difference across the resistor divided by the resistance
yields the current Well-equipped electronics labs have equipment that can digitally sample
voltage differences at extraordinarily high rates (over 1GHz) with excellent accuracy (less than
1 error) Devices capable of sampling at 20MHz or faster and transferring the data to a PC
can be bought for less than US$ 400
Power analysis attacks are of two types
1 Simple power analysis attack and
2 Differential Power Analysis attack
SPA attacks on smartcards typically take a few seconds per card while DPA attacks
can take several hours In a general with a somewhat academic perspective we may consider
the entire internal state of the block cipher to be all the intermediate results and values that are
never included in the output in normal operations For example DES has 16 rounds we can
consider the intermediate states state [115] after each round except the last as a secret internal
state Side channels typically give information about these internal states or about the
operations used in the transition of this internal state from one round to another The type of
side-channel will of course determine what information is available to the attacker about these
states The attacks typically work by finding some information about the internal state of the
cipher which can be learned both by guessing part of the key and checking the value directly
23
and additionally by some statistical property of the cipher that makes that checkable value
slightly nonrandom
521 Simple Power Analysis attack (SPA)
Simple Power Analysis is generally based on looking at the visual representation of the
power consumption of a unit while an encryption operation is being performed Simple Power
Analysis is a technique that involves direct interpretation of power consumption measurements
collected during cryptographic operations SPA can yield information about a devices
operation as well as key material
A trace refers to a set of power consumption measurements taken across a
cryptographic operation For example a 1 millisecond operation sampled at 5 MHz yields a
trace containing 5000 points Figure for example shows an SPA trace from a smart card
performing a DES operation
Figure SPA monitoring from a single DES operation performed by a typical smart card The
upper trace shows the entire encryption operation including the initial permutation the 16
DES rounds and the final permutation The lower trace is a detailed view of the second and
third rounds
Because SPA can reveal the sequence of instructions executed it can be used to break
cryptographic implementations in which the execution path depends on the data being
processed For example
DES key schedule the DES key schedule computation involves rotating 28-bit key registers
A conditional branch is commonly used to check the bit shifted off the end so that ldquo1 bits can
24
be wrapped around The resulting power consumption traces for a ldquo1 bit and a ldquo0 bit will
contain different SPA features if the execution paths take different branches for each
DES permutations DES implementations perform a variety of bit permutations Conditional
branching in software or microcode can cause significant power consumption differences for
ldquo0 and ldquo1 bits
Comparisons String or memory comparison operations typically perform a conditional
branch when a mismatch is found This conditional branching causes large SPA (and
sometimes timing) characteristics
Multipliers Modular multiplication circuits tend to leak a great deal of information about the
data they process The leakage functions depend on the multiplier design but are often strongly
correlated to operand values and Hamming weights
Exponentiators A simple modular exponentiation function scans across the exponent
performing a squaring operation in every iteration with an additional multiplication operation
for each exponent bit that is equal to ldquo1 The exponent can be compromised if squaring and
multiplication operations have different power consumption characteristics take different
amounts of time or are separated by different code Modular exponentiation functions that
operate on two or more exponent bits at a time may have more complex leakage functions
522Differential Power Analysis attack (DPA)
In addition to large-scale power variations due to the instruction sequence there are
effects correlated to data values being manipulated These variations tend to be smaller and are
sometimes overshadowed by measurement errors and other noise In such cases it is still often
possible to break the system using statistical functions tailored to the target algorithm
To implement the DPA attack an attacker first observes m encryption operations and captures
power traces T1 m [1 k] containing k samples each In addition the attacker records the
cipher text C1 m No knowledge of the plain text is required DPA analysis uses power
consumption measurements to determine whether a key block guess Ks is correct The attacker
computes a k-sample differential trace centD [1 k] by finding the difference between the
average of the traces for which a certain intermediate value V is one and the average of the
traces for which V is zero Thus cent D[j) is the average over C1m of the effect due to the value
represented by the selection function D on the power consumption at point j In particular25
If Ks is incorrect the bit computed using D will differ from the actual target bit for about half
of the ciphertext Ci The selection function is thus effectively uncorrelated to what was actually
computed by the target device If a random function is used to divide a set into two subsets the
difference in the averages of the subsets should approach zero as the subset sizes approach
infinity
Thus because trace components uncorrelated to D will diminish with 1 pm causing the
differential trace to become at (the actual trace may not be completely at as D with Ks
incorrect may have a weak correlation to D with the correct Ks) If Ks is correct however the
computed value for D (Ci bKs) will equal the actual value of target bit b with probability 1
The selection function is thus correlated to the value of the bit considered Other data values
measurement errors etc that are not correlated to D approach zero Because power
consumption is correlated to data bit values the plot of centD will be degat with spikes in regions
where D is correlated to the values being processed The correct value of Ks can thus be
identified from the spikes in its differential trace Four values of b correspond to each S box
providing confirmation of key block guesses Finding all eight Ks yields the entire 48-bit round
sub key The remaining 8 key bits can be found easily using exhaustive search or by analyzing
one additional round Triple DES keys can be found by analyzing an outer DES operation first
using the resulting key to decrypt the cipher text and attacking the next DES key DPA can use
known plaintext or known cipher text and can find encryption or decryption keys
26
CHAPTER 6 CONSTANT POWER CONSUMING
LOGIC STYLES
The power consumption of traditional standard cells and logic is
dependent on the signal activity When the output of the logic gate makes
a 0 to 1 transition a current comes from the power supply and charges the
output capacitance On the other hand when the output sees a 1 to 0 a 0
to 0 or a 1 to 1 transition no or only a limited amount of energy (due to
short circuit or leakage) is consumed from the power supply This is the
fundamental reason why information is leaked through the power supply
and why power attacks are possible The basis of a secure digital design
flow is a logic style with constant power consumption
61 Current Mode Logic
Current mode logic (CML) eg current steering logic seems the
ideal solution This type of logic continuously draws a current from the
supply and measures its state through the path that the current takes A
gate has constant power consumption if it draws a perfectly constant
current from the power supply independently of the input and output
signals To build a current source capable of generating a constant current
special circuit techniques that minimize channel length modulation have to
be used
The decisive drawback of CML however is its static power
consumption When the logic gate is not processing any data it burns the
27
current which makes this logic style unacceptable for embedded battery-
operated devices
62 Voltage Mode Logic (CMOS circuit styles)
Voltage mode logic (VML) eg static CMOS logic only draws a current from the
supply to change state and measures its state by the amount of charge it stores on a
capacitance A regular standard CMOS circuit will only consume power when a capacitance
gets charged and later discharged ie when a gate switches state It is the main reason that
CMOS is the style of choice for every battery operated or low power device This is illustrated
in the figure below for simple inverter Thus static CMOS is the preferred logic style because
of its low power consumption and high noise margins
Standard CMOS inverter
Yet two conditions must be satisfied for VML to have constant power consumption
namely
1) A logic gate must have exactly one switching event per signal transition
2) The logic gate must charge a constant capacitance in that switching event
28
Here above all the four transitions of CMOS inverter can be distinguished when
monitoring the power supply
63 Dynamic Differential Logic
Dynamic differential logic sometimes also referred to as dual rail with pre-charge
logic fulfills the first condition A differential logic family uses the true and the false
representation of the input and output signals and a dynamic logic family alternates pre-charge
and evaluation phases As a result since both outputs (true and false) are pre-charged to 1
exactly one of the two output nodes evaluates to 0 to have a differential output signal in the
evaluation phase The discharged output node is charged to 1 in the following pre-charge phase
to pre-charge both outputs to 1 In other words every signal transition including the events in
which the input signals remain constant is represented with an actual switching event in
which the logic gate charges a capacitance All the logic families that have been introduced to
thwart the differential power analysis (DPA) by using dynamic differential logic in the
following techniques
1 Sense Amplifier Based Logic (SABL) and
2 Wave Dynamic Differential Logic (WDDL) gates
631 Sense Amplifier Based logic (SABL)
SABL has its main advantage that it has balanced input and output nodes and that all
internal nodes connect to an output The output capacitances can be balanced Systematic
methods have been developed to make sure that both branches of the differential pull down
network are balanced and that no memory effects are present in the network Sense Amplifier
Based logic is illustrated as
29
Sense Amplifier Based Logic
ANDNAND gate
This circuit style does require however a full custom characterization and layout It also
suffers from a high clock load common to all dynamic logic gates
632 Wave Dynamic Differential Logic Gates (WDDL)
WDDL logic can be implemented with static CMOS logic Static CMOS
standard cells are combined to form secure compound standard cells
which have a reduced power signature WDDL has many advantages It can
be readily implemented from an existing standard cell library The design
flow is fully supported with accurate EDA library files that come directly
from the vendor WDDL also results in a dynamic differential logic with only
a small load capacitance on the pre-charge control signal and with the low
power consumption and the high noise margins of static CMOS
Advantages of WDDL logic style are as follows
30
A major advantage of the proposed logic style is that it can be incorporated by the common
Electronic Design Automation (EDA) tool flow
No special design rules are involved in the interconnection of WDDL gates
The switching factor of WDDL is 100 A WDDL gate consists of a parallel
combination of two positive complementary gates one calculating the
true output using the true inputs the other the false output using the
false inputs A positive gate produces a zero output for an all zero input
The AND gate and the OR gate are examples of positive gates A
complementary gate sometimes also referred to as a dual gate
expresses the false output of the original logic gate using the false
inputs of the original gate The AND gate fed with true input signals and
the OR gate fed with false input signals are two dual gates Fig shows
the WDDL AND gate and the WDDL OR gate In the evaluation phase
each input signal is differential and the WDDL gate calculates its
differential output In the pre-charge phase the inputs to the WDDL gate
are set at 0 This puts the output of the gate at 0 A module in WDDL
pre-charges without distributing the pre-charge signal to each individual
gate During the pre-charge phase the input vector of the combinatorial
logic is set at all 0s Each individual gate will eventually have all its
inputs at 0 evaluate its output to 0 and pass this 0 value to the next
gate One could say that the pre-charge signal travels over the
combinatorial logic as a 0-wave hence WDDL There are several ways
to launch to pre-charge wave In Fig a pre-charge operator is inserted
at the start of every combinatorial logic tree ie the inputs of the
encryption module and the outputs of the registers They produce an all-
zero output in the pre-charge phase (clk-signal high) but let the
31
differential signal through during the evaluation phase (clk-signal low)
Fig
ure WDDL Pre-charge wave generationCHAPTER 7
WDDL GATESThe methodology used in the project is bottom-up approach Lower
modules are designed and later integrated to form larger modules whose further integration
leads to the final top module As it is a fact that logic gates form lower level modules
initially logic gates required for the design are implemented in WDDL style WDDL
demands a parallel combination of two positive complementary gates one calculating the
true value and the other negative value The logic gates like OR AND XOR have been
implemented Besides there is even implementation of Full Adder 32-bit XOR
etc71WDDL OR gateA WDDL OR gate is constructed by considering conventional
OR gate in parallel to its complementary gate ie AND gate as shown in the following
32
figure Figure
41 WDDL OR GateThe NAND and NOT gates used act as Precharge operator injecting
signal lsquo0rsquo into the gates when lsquoclkrsquo signal is high ie during the Precharge phase72
WDDL AND gateA WDDL AND gate is constructed by considering conventional
AND gate in parallel to its complementary gate ie OR gate as shown in the following
33
figure Figure
42 WDDL AND Gate73 WDDL NAND gateA WDDL NAND gate is constructed by
considering conventional AND gate in parallel to its complementary gate ie OR gate as
shown in the following figure
34
Figure
43 WDDL NAND Gate74 WDDL NOR gateA WDDL NOR gate is constructed by
considering conventional OR gate in parallel to its complementary gate ie AND gate as
shown in the following figure
35
Figure 44 WDDL
NOR Gate 75 WDDL XOR gate XOR function can be implemented by the
Boolean Equation A B = ABrsquo + ArsquoB Therefore XOR Gate is implemented
in terms of AND gate and OR gate as shown in the figure 44 It can also be implemented
by instantiating a WDDL AND gate and WDDL OR gate But the number of gates
involved in the latter one is greater than the former one Therefore the first method of
implementation is followed rather than the second one
36
Figure 45
WDDL XOR gateWith the help of the above basic gates Full adder circuit has been
designed by instantiating the above designed WDDL gates During the implementation of
the Blowfish algorithm a 32-bit XOR gate and 32-bit Adder circuit are required They can
be easily implemented by instantiating the corresponding lower module 32 number of
timesCHAPTER 8 FRONT END
RESULTSWDDL OR GATESynthesis
Report==========================================================
= Final Report
===========================================================Final
ResultsRTL Top Level Output File Name wddlorngrTop Level Output File Name
wddlorOutput Format NGCOptimization Goal SpeedKeep
Hierarchy NODesign Statistics IOs 5Cell Usage
BELS 2 LUT3 2 IO Buffers 5
37
IBUF 3 OBUF
2============================================================Devi
ce utilization summary---------------------------Selected Device 3s250eft256-4 Number
of Slices 1 out of 2448 0 Number of 4 input LUTs 2
out of 4896 0 Number of IOs 5 Number of bonded IOBs
5 out of 172 2 Timing Summary---------------Speed Grade -4Maximum
combinational path delay 6236nsSimulation Result
S
ynthesis Result
38
WDD
L AND GATESynthesis
Report==========================================================
== Final Report
============================================================Fina
l ResultsRTL Top Level Output File Name wddlgatesngrTop Level Output File
Name wddlgatesOutput Format NGCOptimization Goal
SpeedKeep Hierarchy NODesign Statistics IOs
5Cell Usage BELS 2 LUT3 2 IO Buffers
5 IBUF 3 OBUF
2===========================================================Devic
e utilization summary---------------------------Selected Device 3s250etq144-4 Number
of Slices 1 out of 2448 0 Number of 4 input LUTs 2
out of 4896 0 Number of IOs 5 Number of bonded IOBs
5 out of 108 4 Timing Summary---------------Speed Grade -4Maximum
combinational path delay 6236nsSimulation Result
39
Sy
nthesis Result
WDDL NAND GATESynthesis
Report==========================================================
== Final Report
============================================================Fina
l ResultsRTL Top Level Output File Name wddlnand1ngrTop Level Output File
Name wddlnand1Output Format NGCOptimization Goal
SpeedKeep Hierarchy NODesign Statistics IOs
5Cell Usage BELS 2 LUT3 2 IO Buffers
40
5 IBUF 3 OBUF
2============================================================Devi
ce utilization summarySelected Device 3s500efg320-4 Number of Slices
1 out of 4656 0 Number of 4 input LUTs 2 out of 9312 0
Number of IOs 5 Number of bonded IOBs 5 out of 232
2 Timing SummarySpeed Grade -4Maximum combinational path delay
6236nsSimulation Result
Synthesis Result
WD
41
DL XOR GATESimulation Result
Synthesis Result
WDDL XOR GATESynthesis
Report==========================================================
== Final Report
===========================================================Final
42
ResultsRTL Top Level Output File Name wddlxorgatengrTop Level Output File
Name wddlxorgateOutput Format NGCOptimization Goal
SpeedKeep Hierarchy NODesign Statistics IOs
5Cell Usage BELS 2 LUT3 2 IO Buffers
5 IBUF 3 OBUF
2============================================================Devi
ce utilization summary---------------------------Selected Device 3s250eft256-4 Number
of Slices 1 out of 2448 0 Number of 4 input LUTs 2
out of 4896 0 Number of IOs 5 Number of bonded IOBs
5 out of 172 2 Timing Summary---------------Speed Grade -4Maximum
combinational path delay 6236nsSimulation Result
Synthesis Result
43
CHAPTER 9 SUMMARY AND CONCLUSION 91
SummaryIn order to provide security to ICs against side-channel attacks especially
Differential Power Analysis (DPA) it is necessary to implement the design in a logic that
can render constant power dissipation irrespective of the input combination WDDL is
proved to be advantageous to others and therefore is of great significance In this
dissertation work architecture for Blowfish Algorithm is designed and implemented in
WDDL style In this implementation bottom-up approach is used The low level entities
are designed and later they are all combined to form the entire module The key
scheduling is online The sub-keys generated for a particular key can be used for the
encryption of the entire data to be encrypted with that key The sub keys are given in
reverse direction for the decryption data path Initially logic gates are implemented in
WDDL and then higher modules have been designed by instantiating the WDDL gates to
form the entire module thus resulting in constant power dissipation irrespective of any
input data combination The entire design works in two phases namely Precharge phase and
Evaluation phase In the Precharge phase all the signals of the design are zeroed and
during the Evaluation phase the functionality of the design is achieved This sort of design
has been found simple and very effective in thwarting the side-channel attack namely
Differential Power analysis (DPA)92 ConclusionThe crypto processor has been
44
designed for the key size of 448 bits and plain text of 64 bits The code for the
implementation has been written in VHDL The functional verification has been done using
the Cadence (NC Launch) Simulation package and Synthesis using RTL Compiler The
Backend of the design is done using the SOC EncounterAccording to the specifications
desired functionality has been achieved In the output during the Evaluation phase there
has been same number of transitions thus resulting in constant power dissipation During
Synthesis it has been observed that a simple WDDL gate comprised many conventional
gates Therefore the area of the design has grown nearly three-fold when compared to the
design implemented in conventional CMOS logic at the cost of security incorporated into
the IC against Differential Power Analysis (DPA) Due to the constant power dissipation at
the output hacker cannot apply Differential Power Analysis (DPA) scheme to find the
secret key that is being used in the crypto-processor Thus security against DPA is
incorporated into the IC at hardware level by implementing the design in WDDL style
which is quite simple and effectiveCHAPTER 10
REFERENCES 101 Referred Technical papers[1] Kris Tiri Member
IEEE and Ingrid Verbauwhede Senior Member IEEE ldquoA Digital Design Flow for
Secure Integrated Circuitsrdquo IEEE Transaction on Computer-Aided Design of Integrated
Circuits and Systems Vol 25 No 7 July 2006[2] Prof Jean-Jacques Quisquater Math
RiZK laquo Side Channel Attacks raquo October 2002[3] Ross Anderson Mike Bond Jolyon
Clulow and Sergei Skorobogatov ldquoCrypto processors ndash A Surveyrdquo IEEE Proceedings[4]
Noohul Basheer Zain Ali James M Noras ldquoOptimal Data path Design for a Cryptographic
Processor The Blowfish Algorithmrdquo Malaysian Journal of Computer Science Vol 14 No
1 June 2001 pp 16-27[5] Dan Rinehimer Derek Wilson ldquoSummary of B Schneierrsquos
Description of New Variable Length Key 64-Bit Block Cipher (Blowfish)rdquo[6] Kris Tiri and
Ingrid Verbauwhede ldquoA Dynamic and Differential CMOS Logic Style to Resist Power and
Timing Attacks on Security ICrsquosrdquo[7] Discretix Technologies Limited ldquoIntroduction to Side
45
Channel Attacksrdquo[8] Kris Tiri Moonmoon Akmal and Ingrid Verbauwhede ldquoA Dynamic
and Differential Logic with Signal Independent Power Consumption to withstand
Differential Power Analysis on Smart CardsrdquoReference books[1] William Stallings
ldquoCryptography and Network Security Principles and Practicesrdquo Pearson Education
2003[2] Samir Palnitkar ldquoVerilog HDL A Guide to Digital Design and Synthesisrdquo
Prentice Hall Referred Web Pages[1] httpwwwschneiercomblowfishhtml[2]
httpwwwnistgov[3] httpwwwdiscretixcomPDFIntroduction20to20Side20Channel
20Attackspdf[4] httpwwwwipointpctdbenwojsp
IA=WO2005081085ampDISPLAY=CLAIMS
46
Invasive attacks require de-packaging the chip to get direct access to its components
A typical example of this is the connection of a wire on a data bus to see the data transfers
A non-invasive attack only exploits externally available information (the emission of
which is however often unintentional) such as running time power consumption
A new distinction called semi-invasive attacks These attacks have the specificity that
they require de-packaging of the chip to get access to the chip surface but do not tamper with
the passivation layer ( they do not require electrical contact to the metal surface)
2 Active vs passive
Active attacks try to tamper with the cards proper functioning For example fault
induction attacks will try to induce errors in the computation
As opposed passive attacks will simply observe the cards behavior during its
processing without disturbing it
Note that these two axes are well orthogonal
An Invasive attack may completely avoid disturbing the cards behavior and a passive
attack may require a preliminary de-packaging for the required information to be observable
These attacks are of course not mutually exclusive an invasive attack may for example serve
as a preliminary step for a non-invasive one by giving a detailed description of the chips
architecture that helps to find out where to put external probes
As smart cards are usually equipped with protection mechanisms that are supposed to
react to invasive attacks (although several invasive attacks are nonetheless capable to defeat
these mechanisms as will be illustrated below) On the other hand it is worth pointing out that
a non-invasive attack is completely undetectable there is for example no way for a smart card
to figure out that its running time is currently being measured Other countermeasures will
therefore be necessary From an economical point of view invasive attacks are usually more
expensive to deploy on a large scale since they require individual processing of each attacked
device In this sense non-invasive attacks constitute therefore a bigger menace for the smart
card industry
Invasive attacks involved a relatively high capital investment for lab equipment plus a
moderate investment of effort for each individual chip attacked Non-invasive attacks require
only a moderate capital investment plus a moderate investment of effort in designing an attack
on a particular type of device Thereafter the cost per device attacked is low Semi-invasive
attacks can be carried out using very cheap and simple equipment
The attacker can gain information by
22
1 Probing attacks
2 Fault induction attacks
3 Timing attacks
4 Power analysis attacks and
5 Electromagnetic timing attacks
These attacks are performed during the switching behavior of digital
complementary metalndashoxidendashsemiconductor (CMOS) gates Of all these Power analysis attack
is of major concern
52 Power analysis attacks
The power consumption of a cryptographic device may provide much information
about the operations that take place and the involved parameters This is the idea of simple and
differential power analysis first introduced by Kocher et al As the clock ticks the cards
energy is also provided by the terminal and can therefore easily be measured Basically to
measure a circuits power consumption a small (eg 50 ohm) resistor is inserted in series with
the power or ground input The voltage difference across the resistor divided by the resistance
yields the current Well-equipped electronics labs have equipment that can digitally sample
voltage differences at extraordinarily high rates (over 1GHz) with excellent accuracy (less than
1 error) Devices capable of sampling at 20MHz or faster and transferring the data to a PC
can be bought for less than US$ 400
Power analysis attacks are of two types
1 Simple power analysis attack and
2 Differential Power Analysis attack
SPA attacks on smartcards typically take a few seconds per card while DPA attacks
can take several hours In a general with a somewhat academic perspective we may consider
the entire internal state of the block cipher to be all the intermediate results and values that are
never included in the output in normal operations For example DES has 16 rounds we can
consider the intermediate states state [115] after each round except the last as a secret internal
state Side channels typically give information about these internal states or about the
operations used in the transition of this internal state from one round to another The type of
side-channel will of course determine what information is available to the attacker about these
states The attacks typically work by finding some information about the internal state of the
cipher which can be learned both by guessing part of the key and checking the value directly
23
and additionally by some statistical property of the cipher that makes that checkable value
slightly nonrandom
521 Simple Power Analysis attack (SPA)
Simple Power Analysis is generally based on looking at the visual representation of the
power consumption of a unit while an encryption operation is being performed Simple Power
Analysis is a technique that involves direct interpretation of power consumption measurements
collected during cryptographic operations SPA can yield information about a devices
operation as well as key material
A trace refers to a set of power consumption measurements taken across a
cryptographic operation For example a 1 millisecond operation sampled at 5 MHz yields a
trace containing 5000 points Figure for example shows an SPA trace from a smart card
performing a DES operation
Figure SPA monitoring from a single DES operation performed by a typical smart card The
upper trace shows the entire encryption operation including the initial permutation the 16
DES rounds and the final permutation The lower trace is a detailed view of the second and
third rounds
Because SPA can reveal the sequence of instructions executed it can be used to break
cryptographic implementations in which the execution path depends on the data being
processed For example
DES key schedule the DES key schedule computation involves rotating 28-bit key registers
A conditional branch is commonly used to check the bit shifted off the end so that ldquo1 bits can
24
be wrapped around The resulting power consumption traces for a ldquo1 bit and a ldquo0 bit will
contain different SPA features if the execution paths take different branches for each
DES permutations DES implementations perform a variety of bit permutations Conditional
branching in software or microcode can cause significant power consumption differences for
ldquo0 and ldquo1 bits
Comparisons String or memory comparison operations typically perform a conditional
branch when a mismatch is found This conditional branching causes large SPA (and
sometimes timing) characteristics
Multipliers Modular multiplication circuits tend to leak a great deal of information about the
data they process The leakage functions depend on the multiplier design but are often strongly
correlated to operand values and Hamming weights
Exponentiators A simple modular exponentiation function scans across the exponent
performing a squaring operation in every iteration with an additional multiplication operation
for each exponent bit that is equal to ldquo1 The exponent can be compromised if squaring and
multiplication operations have different power consumption characteristics take different
amounts of time or are separated by different code Modular exponentiation functions that
operate on two or more exponent bits at a time may have more complex leakage functions
522Differential Power Analysis attack (DPA)
In addition to large-scale power variations due to the instruction sequence there are
effects correlated to data values being manipulated These variations tend to be smaller and are
sometimes overshadowed by measurement errors and other noise In such cases it is still often
possible to break the system using statistical functions tailored to the target algorithm
To implement the DPA attack an attacker first observes m encryption operations and captures
power traces T1 m [1 k] containing k samples each In addition the attacker records the
cipher text C1 m No knowledge of the plain text is required DPA analysis uses power
consumption measurements to determine whether a key block guess Ks is correct The attacker
computes a k-sample differential trace centD [1 k] by finding the difference between the
average of the traces for which a certain intermediate value V is one and the average of the
traces for which V is zero Thus cent D[j) is the average over C1m of the effect due to the value
represented by the selection function D on the power consumption at point j In particular25
If Ks is incorrect the bit computed using D will differ from the actual target bit for about half
of the ciphertext Ci The selection function is thus effectively uncorrelated to what was actually
computed by the target device If a random function is used to divide a set into two subsets the
difference in the averages of the subsets should approach zero as the subset sizes approach
infinity
Thus because trace components uncorrelated to D will diminish with 1 pm causing the
differential trace to become at (the actual trace may not be completely at as D with Ks
incorrect may have a weak correlation to D with the correct Ks) If Ks is correct however the
computed value for D (Ci bKs) will equal the actual value of target bit b with probability 1
The selection function is thus correlated to the value of the bit considered Other data values
measurement errors etc that are not correlated to D approach zero Because power
consumption is correlated to data bit values the plot of centD will be degat with spikes in regions
where D is correlated to the values being processed The correct value of Ks can thus be
identified from the spikes in its differential trace Four values of b correspond to each S box
providing confirmation of key block guesses Finding all eight Ks yields the entire 48-bit round
sub key The remaining 8 key bits can be found easily using exhaustive search or by analyzing
one additional round Triple DES keys can be found by analyzing an outer DES operation first
using the resulting key to decrypt the cipher text and attacking the next DES key DPA can use
known plaintext or known cipher text and can find encryption or decryption keys
26
CHAPTER 6 CONSTANT POWER CONSUMING
LOGIC STYLES
The power consumption of traditional standard cells and logic is
dependent on the signal activity When the output of the logic gate makes
a 0 to 1 transition a current comes from the power supply and charges the
output capacitance On the other hand when the output sees a 1 to 0 a 0
to 0 or a 1 to 1 transition no or only a limited amount of energy (due to
short circuit or leakage) is consumed from the power supply This is the
fundamental reason why information is leaked through the power supply
and why power attacks are possible The basis of a secure digital design
flow is a logic style with constant power consumption
61 Current Mode Logic
Current mode logic (CML) eg current steering logic seems the
ideal solution This type of logic continuously draws a current from the
supply and measures its state through the path that the current takes A
gate has constant power consumption if it draws a perfectly constant
current from the power supply independently of the input and output
signals To build a current source capable of generating a constant current
special circuit techniques that minimize channel length modulation have to
be used
The decisive drawback of CML however is its static power
consumption When the logic gate is not processing any data it burns the
27
current which makes this logic style unacceptable for embedded battery-
operated devices
62 Voltage Mode Logic (CMOS circuit styles)
Voltage mode logic (VML) eg static CMOS logic only draws a current from the
supply to change state and measures its state by the amount of charge it stores on a
capacitance A regular standard CMOS circuit will only consume power when a capacitance
gets charged and later discharged ie when a gate switches state It is the main reason that
CMOS is the style of choice for every battery operated or low power device This is illustrated
in the figure below for simple inverter Thus static CMOS is the preferred logic style because
of its low power consumption and high noise margins
Standard CMOS inverter
Yet two conditions must be satisfied for VML to have constant power consumption
namely
1) A logic gate must have exactly one switching event per signal transition
2) The logic gate must charge a constant capacitance in that switching event
28
Here above all the four transitions of CMOS inverter can be distinguished when
monitoring the power supply
63 Dynamic Differential Logic
Dynamic differential logic sometimes also referred to as dual rail with pre-charge
logic fulfills the first condition A differential logic family uses the true and the false
representation of the input and output signals and a dynamic logic family alternates pre-charge
and evaluation phases As a result since both outputs (true and false) are pre-charged to 1
exactly one of the two output nodes evaluates to 0 to have a differential output signal in the
evaluation phase The discharged output node is charged to 1 in the following pre-charge phase
to pre-charge both outputs to 1 In other words every signal transition including the events in
which the input signals remain constant is represented with an actual switching event in
which the logic gate charges a capacitance All the logic families that have been introduced to
thwart the differential power analysis (DPA) by using dynamic differential logic in the
following techniques
1 Sense Amplifier Based Logic (SABL) and
2 Wave Dynamic Differential Logic (WDDL) gates
631 Sense Amplifier Based logic (SABL)
SABL has its main advantage that it has balanced input and output nodes and that all
internal nodes connect to an output The output capacitances can be balanced Systematic
methods have been developed to make sure that both branches of the differential pull down
network are balanced and that no memory effects are present in the network Sense Amplifier
Based logic is illustrated as
29
Sense Amplifier Based Logic
ANDNAND gate
This circuit style does require however a full custom characterization and layout It also
suffers from a high clock load common to all dynamic logic gates
632 Wave Dynamic Differential Logic Gates (WDDL)
WDDL logic can be implemented with static CMOS logic Static CMOS
standard cells are combined to form secure compound standard cells
which have a reduced power signature WDDL has many advantages It can
be readily implemented from an existing standard cell library The design
flow is fully supported with accurate EDA library files that come directly
from the vendor WDDL also results in a dynamic differential logic with only
a small load capacitance on the pre-charge control signal and with the low
power consumption and the high noise margins of static CMOS
Advantages of WDDL logic style are as follows
30
A major advantage of the proposed logic style is that it can be incorporated by the common
Electronic Design Automation (EDA) tool flow
No special design rules are involved in the interconnection of WDDL gates
The switching factor of WDDL is 100 A WDDL gate consists of a parallel
combination of two positive complementary gates one calculating the
true output using the true inputs the other the false output using the
false inputs A positive gate produces a zero output for an all zero input
The AND gate and the OR gate are examples of positive gates A
complementary gate sometimes also referred to as a dual gate
expresses the false output of the original logic gate using the false
inputs of the original gate The AND gate fed with true input signals and
the OR gate fed with false input signals are two dual gates Fig shows
the WDDL AND gate and the WDDL OR gate In the evaluation phase
each input signal is differential and the WDDL gate calculates its
differential output In the pre-charge phase the inputs to the WDDL gate
are set at 0 This puts the output of the gate at 0 A module in WDDL
pre-charges without distributing the pre-charge signal to each individual
gate During the pre-charge phase the input vector of the combinatorial
logic is set at all 0s Each individual gate will eventually have all its
inputs at 0 evaluate its output to 0 and pass this 0 value to the next
gate One could say that the pre-charge signal travels over the
combinatorial logic as a 0-wave hence WDDL There are several ways
to launch to pre-charge wave In Fig a pre-charge operator is inserted
at the start of every combinatorial logic tree ie the inputs of the
encryption module and the outputs of the registers They produce an all-
zero output in the pre-charge phase (clk-signal high) but let the
31
differential signal through during the evaluation phase (clk-signal low)
Fig
ure WDDL Pre-charge wave generationCHAPTER 7
WDDL GATESThe methodology used in the project is bottom-up approach Lower
modules are designed and later integrated to form larger modules whose further integration
leads to the final top module As it is a fact that logic gates form lower level modules
initially logic gates required for the design are implemented in WDDL style WDDL
demands a parallel combination of two positive complementary gates one calculating the
true value and the other negative value The logic gates like OR AND XOR have been
implemented Besides there is even implementation of Full Adder 32-bit XOR
etc71WDDL OR gateA WDDL OR gate is constructed by considering conventional
OR gate in parallel to its complementary gate ie AND gate as shown in the following
32
figure Figure
41 WDDL OR GateThe NAND and NOT gates used act as Precharge operator injecting
signal lsquo0rsquo into the gates when lsquoclkrsquo signal is high ie during the Precharge phase72
WDDL AND gateA WDDL AND gate is constructed by considering conventional
AND gate in parallel to its complementary gate ie OR gate as shown in the following
33
figure Figure
42 WDDL AND Gate73 WDDL NAND gateA WDDL NAND gate is constructed by
considering conventional AND gate in parallel to its complementary gate ie OR gate as
shown in the following figure
34
Figure
43 WDDL NAND Gate74 WDDL NOR gateA WDDL NOR gate is constructed by
considering conventional OR gate in parallel to its complementary gate ie AND gate as
shown in the following figure
35
Figure 44 WDDL
NOR Gate 75 WDDL XOR gate XOR function can be implemented by the
Boolean Equation A B = ABrsquo + ArsquoB Therefore XOR Gate is implemented
in terms of AND gate and OR gate as shown in the figure 44 It can also be implemented
by instantiating a WDDL AND gate and WDDL OR gate But the number of gates
involved in the latter one is greater than the former one Therefore the first method of
implementation is followed rather than the second one
36
Figure 45
WDDL XOR gateWith the help of the above basic gates Full adder circuit has been
designed by instantiating the above designed WDDL gates During the implementation of
the Blowfish algorithm a 32-bit XOR gate and 32-bit Adder circuit are required They can
be easily implemented by instantiating the corresponding lower module 32 number of
timesCHAPTER 8 FRONT END
RESULTSWDDL OR GATESynthesis
Report==========================================================
= Final Report
===========================================================Final
ResultsRTL Top Level Output File Name wddlorngrTop Level Output File Name
wddlorOutput Format NGCOptimization Goal SpeedKeep
Hierarchy NODesign Statistics IOs 5Cell Usage
BELS 2 LUT3 2 IO Buffers 5
37
IBUF 3 OBUF
2============================================================Devi
ce utilization summary---------------------------Selected Device 3s250eft256-4 Number
of Slices 1 out of 2448 0 Number of 4 input LUTs 2
out of 4896 0 Number of IOs 5 Number of bonded IOBs
5 out of 172 2 Timing Summary---------------Speed Grade -4Maximum
combinational path delay 6236nsSimulation Result
S
ynthesis Result
38
WDD
L AND GATESynthesis
Report==========================================================
== Final Report
============================================================Fina
l ResultsRTL Top Level Output File Name wddlgatesngrTop Level Output File
Name wddlgatesOutput Format NGCOptimization Goal
SpeedKeep Hierarchy NODesign Statistics IOs
5Cell Usage BELS 2 LUT3 2 IO Buffers
5 IBUF 3 OBUF
2===========================================================Devic
e utilization summary---------------------------Selected Device 3s250etq144-4 Number
of Slices 1 out of 2448 0 Number of 4 input LUTs 2
out of 4896 0 Number of IOs 5 Number of bonded IOBs
5 out of 108 4 Timing Summary---------------Speed Grade -4Maximum
combinational path delay 6236nsSimulation Result
39
Sy
nthesis Result
WDDL NAND GATESynthesis
Report==========================================================
== Final Report
============================================================Fina
l ResultsRTL Top Level Output File Name wddlnand1ngrTop Level Output File
Name wddlnand1Output Format NGCOptimization Goal
SpeedKeep Hierarchy NODesign Statistics IOs
5Cell Usage BELS 2 LUT3 2 IO Buffers
40
5 IBUF 3 OBUF
2============================================================Devi
ce utilization summarySelected Device 3s500efg320-4 Number of Slices
1 out of 4656 0 Number of 4 input LUTs 2 out of 9312 0
Number of IOs 5 Number of bonded IOBs 5 out of 232
2 Timing SummarySpeed Grade -4Maximum combinational path delay
6236nsSimulation Result
Synthesis Result
WD
41
DL XOR GATESimulation Result
Synthesis Result
WDDL XOR GATESynthesis
Report==========================================================
== Final Report
===========================================================Final
42
ResultsRTL Top Level Output File Name wddlxorgatengrTop Level Output File
Name wddlxorgateOutput Format NGCOptimization Goal
SpeedKeep Hierarchy NODesign Statistics IOs
5Cell Usage BELS 2 LUT3 2 IO Buffers
5 IBUF 3 OBUF
2============================================================Devi
ce utilization summary---------------------------Selected Device 3s250eft256-4 Number
of Slices 1 out of 2448 0 Number of 4 input LUTs 2
out of 4896 0 Number of IOs 5 Number of bonded IOBs
5 out of 172 2 Timing Summary---------------Speed Grade -4Maximum
combinational path delay 6236nsSimulation Result
Synthesis Result
43
CHAPTER 9 SUMMARY AND CONCLUSION 91
SummaryIn order to provide security to ICs against side-channel attacks especially
Differential Power Analysis (DPA) it is necessary to implement the design in a logic that
can render constant power dissipation irrespective of the input combination WDDL is
proved to be advantageous to others and therefore is of great significance In this
dissertation work architecture for Blowfish Algorithm is designed and implemented in
WDDL style In this implementation bottom-up approach is used The low level entities
are designed and later they are all combined to form the entire module The key
scheduling is online The sub-keys generated for a particular key can be used for the
encryption of the entire data to be encrypted with that key The sub keys are given in
reverse direction for the decryption data path Initially logic gates are implemented in
WDDL and then higher modules have been designed by instantiating the WDDL gates to
form the entire module thus resulting in constant power dissipation irrespective of any
input data combination The entire design works in two phases namely Precharge phase and
Evaluation phase In the Precharge phase all the signals of the design are zeroed and
during the Evaluation phase the functionality of the design is achieved This sort of design
has been found simple and very effective in thwarting the side-channel attack namely
Differential Power analysis (DPA)92 ConclusionThe crypto processor has been
44
designed for the key size of 448 bits and plain text of 64 bits The code for the
implementation has been written in VHDL The functional verification has been done using
the Cadence (NC Launch) Simulation package and Synthesis using RTL Compiler The
Backend of the design is done using the SOC EncounterAccording to the specifications
desired functionality has been achieved In the output during the Evaluation phase there
has been same number of transitions thus resulting in constant power dissipation During
Synthesis it has been observed that a simple WDDL gate comprised many conventional
gates Therefore the area of the design has grown nearly three-fold when compared to the
design implemented in conventional CMOS logic at the cost of security incorporated into
the IC against Differential Power Analysis (DPA) Due to the constant power dissipation at
the output hacker cannot apply Differential Power Analysis (DPA) scheme to find the
secret key that is being used in the crypto-processor Thus security against DPA is
incorporated into the IC at hardware level by implementing the design in WDDL style
which is quite simple and effectiveCHAPTER 10
REFERENCES 101 Referred Technical papers[1] Kris Tiri Member
IEEE and Ingrid Verbauwhede Senior Member IEEE ldquoA Digital Design Flow for
Secure Integrated Circuitsrdquo IEEE Transaction on Computer-Aided Design of Integrated
Circuits and Systems Vol 25 No 7 July 2006[2] Prof Jean-Jacques Quisquater Math
RiZK laquo Side Channel Attacks raquo October 2002[3] Ross Anderson Mike Bond Jolyon
Clulow and Sergei Skorobogatov ldquoCrypto processors ndash A Surveyrdquo IEEE Proceedings[4]
Noohul Basheer Zain Ali James M Noras ldquoOptimal Data path Design for a Cryptographic
Processor The Blowfish Algorithmrdquo Malaysian Journal of Computer Science Vol 14 No
1 June 2001 pp 16-27[5] Dan Rinehimer Derek Wilson ldquoSummary of B Schneierrsquos
Description of New Variable Length Key 64-Bit Block Cipher (Blowfish)rdquo[6] Kris Tiri and
Ingrid Verbauwhede ldquoA Dynamic and Differential CMOS Logic Style to Resist Power and
Timing Attacks on Security ICrsquosrdquo[7] Discretix Technologies Limited ldquoIntroduction to Side
45
Channel Attacksrdquo[8] Kris Tiri Moonmoon Akmal and Ingrid Verbauwhede ldquoA Dynamic
and Differential Logic with Signal Independent Power Consumption to withstand
Differential Power Analysis on Smart CardsrdquoReference books[1] William Stallings
ldquoCryptography and Network Security Principles and Practicesrdquo Pearson Education
2003[2] Samir Palnitkar ldquoVerilog HDL A Guide to Digital Design and Synthesisrdquo
Prentice Hall Referred Web Pages[1] httpwwwschneiercomblowfishhtml[2]
httpwwwnistgov[3] httpwwwdiscretixcomPDFIntroduction20to20Side20Channel
20Attackspdf[4] httpwwwwipointpctdbenwojsp
IA=WO2005081085ampDISPLAY=CLAIMS
46
1 Probing attacks
2 Fault induction attacks
3 Timing attacks
4 Power analysis attacks and
5 Electromagnetic timing attacks
These attacks are performed during the switching behavior of digital
complementary metalndashoxidendashsemiconductor (CMOS) gates Of all these Power analysis attack
is of major concern
52 Power analysis attacks
The power consumption of a cryptographic device may provide much information
about the operations that take place and the involved parameters This is the idea of simple and
differential power analysis first introduced by Kocher et al As the clock ticks the cards
energy is also provided by the terminal and can therefore easily be measured Basically to
measure a circuits power consumption a small (eg 50 ohm) resistor is inserted in series with
the power or ground input The voltage difference across the resistor divided by the resistance
yields the current Well-equipped electronics labs have equipment that can digitally sample
voltage differences at extraordinarily high rates (over 1GHz) with excellent accuracy (less than
1 error) Devices capable of sampling at 20MHz or faster and transferring the data to a PC
can be bought for less than US$ 400
Power analysis attacks are of two types
1 Simple power analysis attack and
2 Differential Power Analysis attack
SPA attacks on smartcards typically take a few seconds per card while DPA attacks
can take several hours In a general with a somewhat academic perspective we may consider
the entire internal state of the block cipher to be all the intermediate results and values that are
never included in the output in normal operations For example DES has 16 rounds we can
consider the intermediate states state [115] after each round except the last as a secret internal
state Side channels typically give information about these internal states or about the
operations used in the transition of this internal state from one round to another The type of
side-channel will of course determine what information is available to the attacker about these
states The attacks typically work by finding some information about the internal state of the
cipher which can be learned both by guessing part of the key and checking the value directly
23
and additionally by some statistical property of the cipher that makes that checkable value
slightly nonrandom
521 Simple Power Analysis attack (SPA)
Simple Power Analysis is generally based on looking at the visual representation of the
power consumption of a unit while an encryption operation is being performed Simple Power
Analysis is a technique that involves direct interpretation of power consumption measurements
collected during cryptographic operations SPA can yield information about a devices
operation as well as key material
A trace refers to a set of power consumption measurements taken across a
cryptographic operation For example a 1 millisecond operation sampled at 5 MHz yields a
trace containing 5000 points Figure for example shows an SPA trace from a smart card
performing a DES operation
Figure SPA monitoring from a single DES operation performed by a typical smart card The
upper trace shows the entire encryption operation including the initial permutation the 16
DES rounds and the final permutation The lower trace is a detailed view of the second and
third rounds
Because SPA can reveal the sequence of instructions executed it can be used to break
cryptographic implementations in which the execution path depends on the data being
processed For example
DES key schedule the DES key schedule computation involves rotating 28-bit key registers
A conditional branch is commonly used to check the bit shifted off the end so that ldquo1 bits can
24
be wrapped around The resulting power consumption traces for a ldquo1 bit and a ldquo0 bit will
contain different SPA features if the execution paths take different branches for each
DES permutations DES implementations perform a variety of bit permutations Conditional
branching in software or microcode can cause significant power consumption differences for
ldquo0 and ldquo1 bits
Comparisons String or memory comparison operations typically perform a conditional
branch when a mismatch is found This conditional branching causes large SPA (and
sometimes timing) characteristics
Multipliers Modular multiplication circuits tend to leak a great deal of information about the
data they process The leakage functions depend on the multiplier design but are often strongly
correlated to operand values and Hamming weights
Exponentiators A simple modular exponentiation function scans across the exponent
performing a squaring operation in every iteration with an additional multiplication operation
for each exponent bit that is equal to ldquo1 The exponent can be compromised if squaring and
multiplication operations have different power consumption characteristics take different
amounts of time or are separated by different code Modular exponentiation functions that
operate on two or more exponent bits at a time may have more complex leakage functions
522Differential Power Analysis attack (DPA)
In addition to large-scale power variations due to the instruction sequence there are
effects correlated to data values being manipulated These variations tend to be smaller and are
sometimes overshadowed by measurement errors and other noise In such cases it is still often
possible to break the system using statistical functions tailored to the target algorithm
To implement the DPA attack an attacker first observes m encryption operations and captures
power traces T1 m [1 k] containing k samples each In addition the attacker records the
cipher text C1 m No knowledge of the plain text is required DPA analysis uses power
consumption measurements to determine whether a key block guess Ks is correct The attacker
computes a k-sample differential trace centD [1 k] by finding the difference between the
average of the traces for which a certain intermediate value V is one and the average of the
traces for which V is zero Thus cent D[j) is the average over C1m of the effect due to the value
represented by the selection function D on the power consumption at point j In particular25
If Ks is incorrect the bit computed using D will differ from the actual target bit for about half
of the ciphertext Ci The selection function is thus effectively uncorrelated to what was actually
computed by the target device If a random function is used to divide a set into two subsets the
difference in the averages of the subsets should approach zero as the subset sizes approach
infinity
Thus because trace components uncorrelated to D will diminish with 1 pm causing the
differential trace to become at (the actual trace may not be completely at as D with Ks
incorrect may have a weak correlation to D with the correct Ks) If Ks is correct however the
computed value for D (Ci bKs) will equal the actual value of target bit b with probability 1
The selection function is thus correlated to the value of the bit considered Other data values
measurement errors etc that are not correlated to D approach zero Because power
consumption is correlated to data bit values the plot of centD will be degat with spikes in regions
where D is correlated to the values being processed The correct value of Ks can thus be
identified from the spikes in its differential trace Four values of b correspond to each S box
providing confirmation of key block guesses Finding all eight Ks yields the entire 48-bit round
sub key The remaining 8 key bits can be found easily using exhaustive search or by analyzing
one additional round Triple DES keys can be found by analyzing an outer DES operation first
using the resulting key to decrypt the cipher text and attacking the next DES key DPA can use
known plaintext or known cipher text and can find encryption or decryption keys
26
CHAPTER 6 CONSTANT POWER CONSUMING
LOGIC STYLES
The power consumption of traditional standard cells and logic is
dependent on the signal activity When the output of the logic gate makes
a 0 to 1 transition a current comes from the power supply and charges the
output capacitance On the other hand when the output sees a 1 to 0 a 0
to 0 or a 1 to 1 transition no or only a limited amount of energy (due to
short circuit or leakage) is consumed from the power supply This is the
fundamental reason why information is leaked through the power supply
and why power attacks are possible The basis of a secure digital design
flow is a logic style with constant power consumption
61 Current Mode Logic
Current mode logic (CML) eg current steering logic seems the
ideal solution This type of logic continuously draws a current from the
supply and measures its state through the path that the current takes A
gate has constant power consumption if it draws a perfectly constant
current from the power supply independently of the input and output
signals To build a current source capable of generating a constant current
special circuit techniques that minimize channel length modulation have to
be used
The decisive drawback of CML however is its static power
consumption When the logic gate is not processing any data it burns the
27
current which makes this logic style unacceptable for embedded battery-
operated devices
62 Voltage Mode Logic (CMOS circuit styles)
Voltage mode logic (VML) eg static CMOS logic only draws a current from the
supply to change state and measures its state by the amount of charge it stores on a
capacitance A regular standard CMOS circuit will only consume power when a capacitance
gets charged and later discharged ie when a gate switches state It is the main reason that
CMOS is the style of choice for every battery operated or low power device This is illustrated
in the figure below for simple inverter Thus static CMOS is the preferred logic style because
of its low power consumption and high noise margins
Standard CMOS inverter
Yet two conditions must be satisfied for VML to have constant power consumption
namely
1) A logic gate must have exactly one switching event per signal transition
2) The logic gate must charge a constant capacitance in that switching event
28
Here above all the four transitions of CMOS inverter can be distinguished when
monitoring the power supply
63 Dynamic Differential Logic
Dynamic differential logic sometimes also referred to as dual rail with pre-charge
logic fulfills the first condition A differential logic family uses the true and the false
representation of the input and output signals and a dynamic logic family alternates pre-charge
and evaluation phases As a result since both outputs (true and false) are pre-charged to 1
exactly one of the two output nodes evaluates to 0 to have a differential output signal in the
evaluation phase The discharged output node is charged to 1 in the following pre-charge phase
to pre-charge both outputs to 1 In other words every signal transition including the events in
which the input signals remain constant is represented with an actual switching event in
which the logic gate charges a capacitance All the logic families that have been introduced to
thwart the differential power analysis (DPA) by using dynamic differential logic in the
following techniques
1 Sense Amplifier Based Logic (SABL) and
2 Wave Dynamic Differential Logic (WDDL) gates
631 Sense Amplifier Based logic (SABL)
SABL has its main advantage that it has balanced input and output nodes and that all
internal nodes connect to an output The output capacitances can be balanced Systematic
methods have been developed to make sure that both branches of the differential pull down
network are balanced and that no memory effects are present in the network Sense Amplifier
Based logic is illustrated as
29
Sense Amplifier Based Logic
ANDNAND gate
This circuit style does require however a full custom characterization and layout It also
suffers from a high clock load common to all dynamic logic gates
632 Wave Dynamic Differential Logic Gates (WDDL)
WDDL logic can be implemented with static CMOS logic Static CMOS
standard cells are combined to form secure compound standard cells
which have a reduced power signature WDDL has many advantages It can
be readily implemented from an existing standard cell library The design
flow is fully supported with accurate EDA library files that come directly
from the vendor WDDL also results in a dynamic differential logic with only
a small load capacitance on the pre-charge control signal and with the low
power consumption and the high noise margins of static CMOS
Advantages of WDDL logic style are as follows
30
A major advantage of the proposed logic style is that it can be incorporated by the common
Electronic Design Automation (EDA) tool flow
No special design rules are involved in the interconnection of WDDL gates
The switching factor of WDDL is 100 A WDDL gate consists of a parallel
combination of two positive complementary gates one calculating the
true output using the true inputs the other the false output using the
false inputs A positive gate produces a zero output for an all zero input
The AND gate and the OR gate are examples of positive gates A
complementary gate sometimes also referred to as a dual gate
expresses the false output of the original logic gate using the false
inputs of the original gate The AND gate fed with true input signals and
the OR gate fed with false input signals are two dual gates Fig shows
the WDDL AND gate and the WDDL OR gate In the evaluation phase
each input signal is differential and the WDDL gate calculates its
differential output In the pre-charge phase the inputs to the WDDL gate
are set at 0 This puts the output of the gate at 0 A module in WDDL
pre-charges without distributing the pre-charge signal to each individual
gate During the pre-charge phase the input vector of the combinatorial
logic is set at all 0s Each individual gate will eventually have all its
inputs at 0 evaluate its output to 0 and pass this 0 value to the next
gate One could say that the pre-charge signal travels over the
combinatorial logic as a 0-wave hence WDDL There are several ways
to launch to pre-charge wave In Fig a pre-charge operator is inserted
at the start of every combinatorial logic tree ie the inputs of the
encryption module and the outputs of the registers They produce an all-
zero output in the pre-charge phase (clk-signal high) but let the
31
differential signal through during the evaluation phase (clk-signal low)
Fig
ure WDDL Pre-charge wave generationCHAPTER 7
WDDL GATESThe methodology used in the project is bottom-up approach Lower
modules are designed and later integrated to form larger modules whose further integration
leads to the final top module As it is a fact that logic gates form lower level modules
initially logic gates required for the design are implemented in WDDL style WDDL
demands a parallel combination of two positive complementary gates one calculating the
true value and the other negative value The logic gates like OR AND XOR have been
implemented Besides there is even implementation of Full Adder 32-bit XOR
etc71WDDL OR gateA WDDL OR gate is constructed by considering conventional
OR gate in parallel to its complementary gate ie AND gate as shown in the following
32
figure Figure
41 WDDL OR GateThe NAND and NOT gates used act as Precharge operator injecting
signal lsquo0rsquo into the gates when lsquoclkrsquo signal is high ie during the Precharge phase72
WDDL AND gateA WDDL AND gate is constructed by considering conventional
AND gate in parallel to its complementary gate ie OR gate as shown in the following
33
figure Figure
42 WDDL AND Gate73 WDDL NAND gateA WDDL NAND gate is constructed by
considering conventional AND gate in parallel to its complementary gate ie OR gate as
shown in the following figure
34
Figure
43 WDDL NAND Gate74 WDDL NOR gateA WDDL NOR gate is constructed by
considering conventional OR gate in parallel to its complementary gate ie AND gate as
shown in the following figure
35
Figure 44 WDDL
NOR Gate 75 WDDL XOR gate XOR function can be implemented by the
Boolean Equation A B = ABrsquo + ArsquoB Therefore XOR Gate is implemented
in terms of AND gate and OR gate as shown in the figure 44 It can also be implemented
by instantiating a WDDL AND gate and WDDL OR gate But the number of gates
involved in the latter one is greater than the former one Therefore the first method of
implementation is followed rather than the second one
36
Figure 45
WDDL XOR gateWith the help of the above basic gates Full adder circuit has been
designed by instantiating the above designed WDDL gates During the implementation of
the Blowfish algorithm a 32-bit XOR gate and 32-bit Adder circuit are required They can
be easily implemented by instantiating the corresponding lower module 32 number of
timesCHAPTER 8 FRONT END
RESULTSWDDL OR GATESynthesis
Report==========================================================
= Final Report
===========================================================Final
ResultsRTL Top Level Output File Name wddlorngrTop Level Output File Name
wddlorOutput Format NGCOptimization Goal SpeedKeep
Hierarchy NODesign Statistics IOs 5Cell Usage
BELS 2 LUT3 2 IO Buffers 5
37
IBUF 3 OBUF
2============================================================Devi
ce utilization summary---------------------------Selected Device 3s250eft256-4 Number
of Slices 1 out of 2448 0 Number of 4 input LUTs 2
out of 4896 0 Number of IOs 5 Number of bonded IOBs
5 out of 172 2 Timing Summary---------------Speed Grade -4Maximum
combinational path delay 6236nsSimulation Result
S
ynthesis Result
38
WDD
L AND GATESynthesis
Report==========================================================
== Final Report
============================================================Fina
l ResultsRTL Top Level Output File Name wddlgatesngrTop Level Output File
Name wddlgatesOutput Format NGCOptimization Goal
SpeedKeep Hierarchy NODesign Statistics IOs
5Cell Usage BELS 2 LUT3 2 IO Buffers
5 IBUF 3 OBUF
2===========================================================Devic
e utilization summary---------------------------Selected Device 3s250etq144-4 Number
of Slices 1 out of 2448 0 Number of 4 input LUTs 2
out of 4896 0 Number of IOs 5 Number of bonded IOBs
5 out of 108 4 Timing Summary---------------Speed Grade -4Maximum
combinational path delay 6236nsSimulation Result
39
Sy
nthesis Result
WDDL NAND GATESynthesis
Report==========================================================
== Final Report
============================================================Fina
l ResultsRTL Top Level Output File Name wddlnand1ngrTop Level Output File
Name wddlnand1Output Format NGCOptimization Goal
SpeedKeep Hierarchy NODesign Statistics IOs
5Cell Usage BELS 2 LUT3 2 IO Buffers
40
5 IBUF 3 OBUF
2============================================================Devi
ce utilization summarySelected Device 3s500efg320-4 Number of Slices
1 out of 4656 0 Number of 4 input LUTs 2 out of 9312 0
Number of IOs 5 Number of bonded IOBs 5 out of 232
2 Timing SummarySpeed Grade -4Maximum combinational path delay
6236nsSimulation Result
Synthesis Result
WD
41
DL XOR GATESimulation Result
Synthesis Result
WDDL XOR GATESynthesis
Report==========================================================
== Final Report
===========================================================Final
42
ResultsRTL Top Level Output File Name wddlxorgatengrTop Level Output File
Name wddlxorgateOutput Format NGCOptimization Goal
SpeedKeep Hierarchy NODesign Statistics IOs
5Cell Usage BELS 2 LUT3 2 IO Buffers
5 IBUF 3 OBUF
2============================================================Devi
ce utilization summary---------------------------Selected Device 3s250eft256-4 Number
of Slices 1 out of 2448 0 Number of 4 input LUTs 2
out of 4896 0 Number of IOs 5 Number of bonded IOBs
5 out of 172 2 Timing Summary---------------Speed Grade -4Maximum
combinational path delay 6236nsSimulation Result
Synthesis Result
43
CHAPTER 9 SUMMARY AND CONCLUSION 91
SummaryIn order to provide security to ICs against side-channel attacks especially
Differential Power Analysis (DPA) it is necessary to implement the design in a logic that
can render constant power dissipation irrespective of the input combination WDDL is
proved to be advantageous to others and therefore is of great significance In this
dissertation work architecture for Blowfish Algorithm is designed and implemented in
WDDL style In this implementation bottom-up approach is used The low level entities
are designed and later they are all combined to form the entire module The key
scheduling is online The sub-keys generated for a particular key can be used for the
encryption of the entire data to be encrypted with that key The sub keys are given in
reverse direction for the decryption data path Initially logic gates are implemented in
WDDL and then higher modules have been designed by instantiating the WDDL gates to
form the entire module thus resulting in constant power dissipation irrespective of any
input data combination The entire design works in two phases namely Precharge phase and
Evaluation phase In the Precharge phase all the signals of the design are zeroed and
during the Evaluation phase the functionality of the design is achieved This sort of design
has been found simple and very effective in thwarting the side-channel attack namely
Differential Power analysis (DPA)92 ConclusionThe crypto processor has been
44
designed for the key size of 448 bits and plain text of 64 bits The code for the
implementation has been written in VHDL The functional verification has been done using
the Cadence (NC Launch) Simulation package and Synthesis using RTL Compiler The
Backend of the design is done using the SOC EncounterAccording to the specifications
desired functionality has been achieved In the output during the Evaluation phase there
has been same number of transitions thus resulting in constant power dissipation During
Synthesis it has been observed that a simple WDDL gate comprised many conventional
gates Therefore the area of the design has grown nearly three-fold when compared to the
design implemented in conventional CMOS logic at the cost of security incorporated into
the IC against Differential Power Analysis (DPA) Due to the constant power dissipation at
the output hacker cannot apply Differential Power Analysis (DPA) scheme to find the
secret key that is being used in the crypto-processor Thus security against DPA is
incorporated into the IC at hardware level by implementing the design in WDDL style
which is quite simple and effectiveCHAPTER 10
REFERENCES 101 Referred Technical papers[1] Kris Tiri Member
IEEE and Ingrid Verbauwhede Senior Member IEEE ldquoA Digital Design Flow for
Secure Integrated Circuitsrdquo IEEE Transaction on Computer-Aided Design of Integrated
Circuits and Systems Vol 25 No 7 July 2006[2] Prof Jean-Jacques Quisquater Math
RiZK laquo Side Channel Attacks raquo October 2002[3] Ross Anderson Mike Bond Jolyon
Clulow and Sergei Skorobogatov ldquoCrypto processors ndash A Surveyrdquo IEEE Proceedings[4]
Noohul Basheer Zain Ali James M Noras ldquoOptimal Data path Design for a Cryptographic
Processor The Blowfish Algorithmrdquo Malaysian Journal of Computer Science Vol 14 No
1 June 2001 pp 16-27[5] Dan Rinehimer Derek Wilson ldquoSummary of B Schneierrsquos
Description of New Variable Length Key 64-Bit Block Cipher (Blowfish)rdquo[6] Kris Tiri and
Ingrid Verbauwhede ldquoA Dynamic and Differential CMOS Logic Style to Resist Power and
Timing Attacks on Security ICrsquosrdquo[7] Discretix Technologies Limited ldquoIntroduction to Side
45
Channel Attacksrdquo[8] Kris Tiri Moonmoon Akmal and Ingrid Verbauwhede ldquoA Dynamic
and Differential Logic with Signal Independent Power Consumption to withstand
Differential Power Analysis on Smart CardsrdquoReference books[1] William Stallings
ldquoCryptography and Network Security Principles and Practicesrdquo Pearson Education
2003[2] Samir Palnitkar ldquoVerilog HDL A Guide to Digital Design and Synthesisrdquo
Prentice Hall Referred Web Pages[1] httpwwwschneiercomblowfishhtml[2]
httpwwwnistgov[3] httpwwwdiscretixcomPDFIntroduction20to20Side20Channel
20Attackspdf[4] httpwwwwipointpctdbenwojsp
IA=WO2005081085ampDISPLAY=CLAIMS
46
and additionally by some statistical property of the cipher that makes that checkable value
slightly nonrandom
521 Simple Power Analysis attack (SPA)
Simple Power Analysis is generally based on looking at the visual representation of the
power consumption of a unit while an encryption operation is being performed Simple Power
Analysis is a technique that involves direct interpretation of power consumption measurements
collected during cryptographic operations SPA can yield information about a devices
operation as well as key material
A trace refers to a set of power consumption measurements taken across a
cryptographic operation For example a 1 millisecond operation sampled at 5 MHz yields a
trace containing 5000 points Figure for example shows an SPA trace from a smart card
performing a DES operation
Figure SPA monitoring from a single DES operation performed by a typical smart card The
upper trace shows the entire encryption operation including the initial permutation the 16
DES rounds and the final permutation The lower trace is a detailed view of the second and
third rounds
Because SPA can reveal the sequence of instructions executed it can be used to break
cryptographic implementations in which the execution path depends on the data being
processed For example
DES key schedule the DES key schedule computation involves rotating 28-bit key registers
A conditional branch is commonly used to check the bit shifted off the end so that ldquo1 bits can
24
be wrapped around The resulting power consumption traces for a ldquo1 bit and a ldquo0 bit will
contain different SPA features if the execution paths take different branches for each
DES permutations DES implementations perform a variety of bit permutations Conditional
branching in software or microcode can cause significant power consumption differences for
ldquo0 and ldquo1 bits
Comparisons String or memory comparison operations typically perform a conditional
branch when a mismatch is found This conditional branching causes large SPA (and
sometimes timing) characteristics
Multipliers Modular multiplication circuits tend to leak a great deal of information about the
data they process The leakage functions depend on the multiplier design but are often strongly
correlated to operand values and Hamming weights
Exponentiators A simple modular exponentiation function scans across the exponent
performing a squaring operation in every iteration with an additional multiplication operation
for each exponent bit that is equal to ldquo1 The exponent can be compromised if squaring and
multiplication operations have different power consumption characteristics take different
amounts of time or are separated by different code Modular exponentiation functions that
operate on two or more exponent bits at a time may have more complex leakage functions
522Differential Power Analysis attack (DPA)
In addition to large-scale power variations due to the instruction sequence there are
effects correlated to data values being manipulated These variations tend to be smaller and are
sometimes overshadowed by measurement errors and other noise In such cases it is still often
possible to break the system using statistical functions tailored to the target algorithm
To implement the DPA attack an attacker first observes m encryption operations and captures
power traces T1 m [1 k] containing k samples each In addition the attacker records the
cipher text C1 m No knowledge of the plain text is required DPA analysis uses power
consumption measurements to determine whether a key block guess Ks is correct The attacker
computes a k-sample differential trace centD [1 k] by finding the difference between the
average of the traces for which a certain intermediate value V is one and the average of the
traces for which V is zero Thus cent D[j) is the average over C1m of the effect due to the value
represented by the selection function D on the power consumption at point j In particular25
If Ks is incorrect the bit computed using D will differ from the actual target bit for about half
of the ciphertext Ci The selection function is thus effectively uncorrelated to what was actually
computed by the target device If a random function is used to divide a set into two subsets the
difference in the averages of the subsets should approach zero as the subset sizes approach
infinity
Thus because trace components uncorrelated to D will diminish with 1 pm causing the
differential trace to become at (the actual trace may not be completely at as D with Ks
incorrect may have a weak correlation to D with the correct Ks) If Ks is correct however the
computed value for D (Ci bKs) will equal the actual value of target bit b with probability 1
The selection function is thus correlated to the value of the bit considered Other data values
measurement errors etc that are not correlated to D approach zero Because power
consumption is correlated to data bit values the plot of centD will be degat with spikes in regions
where D is correlated to the values being processed The correct value of Ks can thus be
identified from the spikes in its differential trace Four values of b correspond to each S box
providing confirmation of key block guesses Finding all eight Ks yields the entire 48-bit round
sub key The remaining 8 key bits can be found easily using exhaustive search or by analyzing
one additional round Triple DES keys can be found by analyzing an outer DES operation first
using the resulting key to decrypt the cipher text and attacking the next DES key DPA can use
known plaintext or known cipher text and can find encryption or decryption keys
26
CHAPTER 6 CONSTANT POWER CONSUMING
LOGIC STYLES
The power consumption of traditional standard cells and logic is
dependent on the signal activity When the output of the logic gate makes
a 0 to 1 transition a current comes from the power supply and charges the
output capacitance On the other hand when the output sees a 1 to 0 a 0
to 0 or a 1 to 1 transition no or only a limited amount of energy (due to
short circuit or leakage) is consumed from the power supply This is the
fundamental reason why information is leaked through the power supply
and why power attacks are possible The basis of a secure digital design
flow is a logic style with constant power consumption
61 Current Mode Logic
Current mode logic (CML) eg current steering logic seems the
ideal solution This type of logic continuously draws a current from the
supply and measures its state through the path that the current takes A
gate has constant power consumption if it draws a perfectly constant
current from the power supply independently of the input and output
signals To build a current source capable of generating a constant current
special circuit techniques that minimize channel length modulation have to
be used
The decisive drawback of CML however is its static power
consumption When the logic gate is not processing any data it burns the
27
current which makes this logic style unacceptable for embedded battery-
operated devices
62 Voltage Mode Logic (CMOS circuit styles)
Voltage mode logic (VML) eg static CMOS logic only draws a current from the
supply to change state and measures its state by the amount of charge it stores on a
capacitance A regular standard CMOS circuit will only consume power when a capacitance
gets charged and later discharged ie when a gate switches state It is the main reason that
CMOS is the style of choice for every battery operated or low power device This is illustrated
in the figure below for simple inverter Thus static CMOS is the preferred logic style because
of its low power consumption and high noise margins
Standard CMOS inverter
Yet two conditions must be satisfied for VML to have constant power consumption
namely
1) A logic gate must have exactly one switching event per signal transition
2) The logic gate must charge a constant capacitance in that switching event
28
Here above all the four transitions of CMOS inverter can be distinguished when
monitoring the power supply
63 Dynamic Differential Logic
Dynamic differential logic sometimes also referred to as dual rail with pre-charge
logic fulfills the first condition A differential logic family uses the true and the false
representation of the input and output signals and a dynamic logic family alternates pre-charge
and evaluation phases As a result since both outputs (true and false) are pre-charged to 1
exactly one of the two output nodes evaluates to 0 to have a differential output signal in the
evaluation phase The discharged output node is charged to 1 in the following pre-charge phase
to pre-charge both outputs to 1 In other words every signal transition including the events in
which the input signals remain constant is represented with an actual switching event in
which the logic gate charges a capacitance All the logic families that have been introduced to
thwart the differential power analysis (DPA) by using dynamic differential logic in the
following techniques
1 Sense Amplifier Based Logic (SABL) and
2 Wave Dynamic Differential Logic (WDDL) gates
631 Sense Amplifier Based logic (SABL)
SABL has its main advantage that it has balanced input and output nodes and that all
internal nodes connect to an output The output capacitances can be balanced Systematic
methods have been developed to make sure that both branches of the differential pull down
network are balanced and that no memory effects are present in the network Sense Amplifier
Based logic is illustrated as
29
Sense Amplifier Based Logic
ANDNAND gate
This circuit style does require however a full custom characterization and layout It also
suffers from a high clock load common to all dynamic logic gates
632 Wave Dynamic Differential Logic Gates (WDDL)
WDDL logic can be implemented with static CMOS logic Static CMOS
standard cells are combined to form secure compound standard cells
which have a reduced power signature WDDL has many advantages It can
be readily implemented from an existing standard cell library The design
flow is fully supported with accurate EDA library files that come directly
from the vendor WDDL also results in a dynamic differential logic with only
a small load capacitance on the pre-charge control signal and with the low
power consumption and the high noise margins of static CMOS
Advantages of WDDL logic style are as follows
30
A major advantage of the proposed logic style is that it can be incorporated by the common
Electronic Design Automation (EDA) tool flow
No special design rules are involved in the interconnection of WDDL gates
The switching factor of WDDL is 100 A WDDL gate consists of a parallel
combination of two positive complementary gates one calculating the
true output using the true inputs the other the false output using the
false inputs A positive gate produces a zero output for an all zero input
The AND gate and the OR gate are examples of positive gates A
complementary gate sometimes also referred to as a dual gate
expresses the false output of the original logic gate using the false
inputs of the original gate The AND gate fed with true input signals and
the OR gate fed with false input signals are two dual gates Fig shows
the WDDL AND gate and the WDDL OR gate In the evaluation phase
each input signal is differential and the WDDL gate calculates its
differential output In the pre-charge phase the inputs to the WDDL gate
are set at 0 This puts the output of the gate at 0 A module in WDDL
pre-charges without distributing the pre-charge signal to each individual
gate During the pre-charge phase the input vector of the combinatorial
logic is set at all 0s Each individual gate will eventually have all its
inputs at 0 evaluate its output to 0 and pass this 0 value to the next
gate One could say that the pre-charge signal travels over the
combinatorial logic as a 0-wave hence WDDL There are several ways
to launch to pre-charge wave In Fig a pre-charge operator is inserted
at the start of every combinatorial logic tree ie the inputs of the
encryption module and the outputs of the registers They produce an all-
zero output in the pre-charge phase (clk-signal high) but let the
31
differential signal through during the evaluation phase (clk-signal low)
Fig
ure WDDL Pre-charge wave generationCHAPTER 7
WDDL GATESThe methodology used in the project is bottom-up approach Lower
modules are designed and later integrated to form larger modules whose further integration
leads to the final top module As it is a fact that logic gates form lower level modules
initially logic gates required for the design are implemented in WDDL style WDDL
demands a parallel combination of two positive complementary gates one calculating the
true value and the other negative value The logic gates like OR AND XOR have been
implemented Besides there is even implementation of Full Adder 32-bit XOR
etc71WDDL OR gateA WDDL OR gate is constructed by considering conventional
OR gate in parallel to its complementary gate ie AND gate as shown in the following
32
figure Figure
41 WDDL OR GateThe NAND and NOT gates used act as Precharge operator injecting
signal lsquo0rsquo into the gates when lsquoclkrsquo signal is high ie during the Precharge phase72
WDDL AND gateA WDDL AND gate is constructed by considering conventional
AND gate in parallel to its complementary gate ie OR gate as shown in the following
33
figure Figure
42 WDDL AND Gate73 WDDL NAND gateA WDDL NAND gate is constructed by
considering conventional AND gate in parallel to its complementary gate ie OR gate as
shown in the following figure
34
Figure
43 WDDL NAND Gate74 WDDL NOR gateA WDDL NOR gate is constructed by
considering conventional OR gate in parallel to its complementary gate ie AND gate as
shown in the following figure
35
Figure 44 WDDL
NOR Gate 75 WDDL XOR gate XOR function can be implemented by the
Boolean Equation A B = ABrsquo + ArsquoB Therefore XOR Gate is implemented
in terms of AND gate and OR gate as shown in the figure 44 It can also be implemented
by instantiating a WDDL AND gate and WDDL OR gate But the number of gates
involved in the latter one is greater than the former one Therefore the first method of
implementation is followed rather than the second one
36
Figure 45
WDDL XOR gateWith the help of the above basic gates Full adder circuit has been
designed by instantiating the above designed WDDL gates During the implementation of
the Blowfish algorithm a 32-bit XOR gate and 32-bit Adder circuit are required They can
be easily implemented by instantiating the corresponding lower module 32 number of
timesCHAPTER 8 FRONT END
RESULTSWDDL OR GATESynthesis
Report==========================================================
= Final Report
===========================================================Final
ResultsRTL Top Level Output File Name wddlorngrTop Level Output File Name
wddlorOutput Format NGCOptimization Goal SpeedKeep
Hierarchy NODesign Statistics IOs 5Cell Usage
BELS 2 LUT3 2 IO Buffers 5
37
IBUF 3 OBUF
2============================================================Devi
ce utilization summary---------------------------Selected Device 3s250eft256-4 Number
of Slices 1 out of 2448 0 Number of 4 input LUTs 2
out of 4896 0 Number of IOs 5 Number of bonded IOBs
5 out of 172 2 Timing Summary---------------Speed Grade -4Maximum
combinational path delay 6236nsSimulation Result
S
ynthesis Result
38
WDD
L AND GATESynthesis
Report==========================================================
== Final Report
============================================================Fina
l ResultsRTL Top Level Output File Name wddlgatesngrTop Level Output File
Name wddlgatesOutput Format NGCOptimization Goal
SpeedKeep Hierarchy NODesign Statistics IOs
5Cell Usage BELS 2 LUT3 2 IO Buffers
5 IBUF 3 OBUF
2===========================================================Devic
e utilization summary---------------------------Selected Device 3s250etq144-4 Number
of Slices 1 out of 2448 0 Number of 4 input LUTs 2
out of 4896 0 Number of IOs 5 Number of bonded IOBs
5 out of 108 4 Timing Summary---------------Speed Grade -4Maximum
combinational path delay 6236nsSimulation Result
39
Sy
nthesis Result
WDDL NAND GATESynthesis
Report==========================================================
== Final Report
============================================================Fina
l ResultsRTL Top Level Output File Name wddlnand1ngrTop Level Output File
Name wddlnand1Output Format NGCOptimization Goal
SpeedKeep Hierarchy NODesign Statistics IOs
5Cell Usage BELS 2 LUT3 2 IO Buffers
40
5 IBUF 3 OBUF
2============================================================Devi
ce utilization summarySelected Device 3s500efg320-4 Number of Slices
1 out of 4656 0 Number of 4 input LUTs 2 out of 9312 0
Number of IOs 5 Number of bonded IOBs 5 out of 232
2 Timing SummarySpeed Grade -4Maximum combinational path delay
6236nsSimulation Result
Synthesis Result
WD
41
DL XOR GATESimulation Result
Synthesis Result
WDDL XOR GATESynthesis
Report==========================================================
== Final Report
===========================================================Final
42
ResultsRTL Top Level Output File Name wddlxorgatengrTop Level Output File
Name wddlxorgateOutput Format NGCOptimization Goal
SpeedKeep Hierarchy NODesign Statistics IOs
5Cell Usage BELS 2 LUT3 2 IO Buffers
5 IBUF 3 OBUF
2============================================================Devi
ce utilization summary---------------------------Selected Device 3s250eft256-4 Number
of Slices 1 out of 2448 0 Number of 4 input LUTs 2
out of 4896 0 Number of IOs 5 Number of bonded IOBs
5 out of 172 2 Timing Summary---------------Speed Grade -4Maximum
combinational path delay 6236nsSimulation Result
Synthesis Result
43
CHAPTER 9 SUMMARY AND CONCLUSION 91
SummaryIn order to provide security to ICs against side-channel attacks especially
Differential Power Analysis (DPA) it is necessary to implement the design in a logic that
can render constant power dissipation irrespective of the input combination WDDL is
proved to be advantageous to others and therefore is of great significance In this
dissertation work architecture for Blowfish Algorithm is designed and implemented in
WDDL style In this implementation bottom-up approach is used The low level entities
are designed and later they are all combined to form the entire module The key
scheduling is online The sub-keys generated for a particular key can be used for the
encryption of the entire data to be encrypted with that key The sub keys are given in
reverse direction for the decryption data path Initially logic gates are implemented in
WDDL and then higher modules have been designed by instantiating the WDDL gates to
form the entire module thus resulting in constant power dissipation irrespective of any
input data combination The entire design works in two phases namely Precharge phase and
Evaluation phase In the Precharge phase all the signals of the design are zeroed and
during the Evaluation phase the functionality of the design is achieved This sort of design
has been found simple and very effective in thwarting the side-channel attack namely
Differential Power analysis (DPA)92 ConclusionThe crypto processor has been
44
designed for the key size of 448 bits and plain text of 64 bits The code for the
implementation has been written in VHDL The functional verification has been done using
the Cadence (NC Launch) Simulation package and Synthesis using RTL Compiler The
Backend of the design is done using the SOC EncounterAccording to the specifications
desired functionality has been achieved In the output during the Evaluation phase there
has been same number of transitions thus resulting in constant power dissipation During
Synthesis it has been observed that a simple WDDL gate comprised many conventional
gates Therefore the area of the design has grown nearly three-fold when compared to the
design implemented in conventional CMOS logic at the cost of security incorporated into
the IC against Differential Power Analysis (DPA) Due to the constant power dissipation at
the output hacker cannot apply Differential Power Analysis (DPA) scheme to find the
secret key that is being used in the crypto-processor Thus security against DPA is
incorporated into the IC at hardware level by implementing the design in WDDL style
which is quite simple and effectiveCHAPTER 10
REFERENCES 101 Referred Technical papers[1] Kris Tiri Member
IEEE and Ingrid Verbauwhede Senior Member IEEE ldquoA Digital Design Flow for
Secure Integrated Circuitsrdquo IEEE Transaction on Computer-Aided Design of Integrated
Circuits and Systems Vol 25 No 7 July 2006[2] Prof Jean-Jacques Quisquater Math
RiZK laquo Side Channel Attacks raquo October 2002[3] Ross Anderson Mike Bond Jolyon
Clulow and Sergei Skorobogatov ldquoCrypto processors ndash A Surveyrdquo IEEE Proceedings[4]
Noohul Basheer Zain Ali James M Noras ldquoOptimal Data path Design for a Cryptographic
Processor The Blowfish Algorithmrdquo Malaysian Journal of Computer Science Vol 14 No
1 June 2001 pp 16-27[5] Dan Rinehimer Derek Wilson ldquoSummary of B Schneierrsquos
Description of New Variable Length Key 64-Bit Block Cipher (Blowfish)rdquo[6] Kris Tiri and
Ingrid Verbauwhede ldquoA Dynamic and Differential CMOS Logic Style to Resist Power and
Timing Attacks on Security ICrsquosrdquo[7] Discretix Technologies Limited ldquoIntroduction to Side
45
Channel Attacksrdquo[8] Kris Tiri Moonmoon Akmal and Ingrid Verbauwhede ldquoA Dynamic
and Differential Logic with Signal Independent Power Consumption to withstand
Differential Power Analysis on Smart CardsrdquoReference books[1] William Stallings
ldquoCryptography and Network Security Principles and Practicesrdquo Pearson Education
2003[2] Samir Palnitkar ldquoVerilog HDL A Guide to Digital Design and Synthesisrdquo
Prentice Hall Referred Web Pages[1] httpwwwschneiercomblowfishhtml[2]
httpwwwnistgov[3] httpwwwdiscretixcomPDFIntroduction20to20Side20Channel
20Attackspdf[4] httpwwwwipointpctdbenwojsp
IA=WO2005081085ampDISPLAY=CLAIMS
46
be wrapped around The resulting power consumption traces for a ldquo1 bit and a ldquo0 bit will
contain different SPA features if the execution paths take different branches for each
DES permutations DES implementations perform a variety of bit permutations Conditional
branching in software or microcode can cause significant power consumption differences for
ldquo0 and ldquo1 bits
Comparisons String or memory comparison operations typically perform a conditional
branch when a mismatch is found This conditional branching causes large SPA (and
sometimes timing) characteristics
Multipliers Modular multiplication circuits tend to leak a great deal of information about the
data they process The leakage functions depend on the multiplier design but are often strongly
correlated to operand values and Hamming weights
Exponentiators A simple modular exponentiation function scans across the exponent
performing a squaring operation in every iteration with an additional multiplication operation
for each exponent bit that is equal to ldquo1 The exponent can be compromised if squaring and
multiplication operations have different power consumption characteristics take different
amounts of time or are separated by different code Modular exponentiation functions that
operate on two or more exponent bits at a time may have more complex leakage functions
522Differential Power Analysis attack (DPA)
In addition to large-scale power variations due to the instruction sequence there are
effects correlated to data values being manipulated These variations tend to be smaller and are
sometimes overshadowed by measurement errors and other noise In such cases it is still often
possible to break the system using statistical functions tailored to the target algorithm
To implement the DPA attack an attacker first observes m encryption operations and captures
power traces T1 m [1 k] containing k samples each In addition the attacker records the
cipher text C1 m No knowledge of the plain text is required DPA analysis uses power
consumption measurements to determine whether a key block guess Ks is correct The attacker
computes a k-sample differential trace centD [1 k] by finding the difference between the
average of the traces for which a certain intermediate value V is one and the average of the
traces for which V is zero Thus cent D[j) is the average over C1m of the effect due to the value
represented by the selection function D on the power consumption at point j In particular25
If Ks is incorrect the bit computed using D will differ from the actual target bit for about half
of the ciphertext Ci The selection function is thus effectively uncorrelated to what was actually
computed by the target device If a random function is used to divide a set into two subsets the
difference in the averages of the subsets should approach zero as the subset sizes approach
infinity
Thus because trace components uncorrelated to D will diminish with 1 pm causing the
differential trace to become at (the actual trace may not be completely at as D with Ks
incorrect may have a weak correlation to D with the correct Ks) If Ks is correct however the
computed value for D (Ci bKs) will equal the actual value of target bit b with probability 1
The selection function is thus correlated to the value of the bit considered Other data values
measurement errors etc that are not correlated to D approach zero Because power
consumption is correlated to data bit values the plot of centD will be degat with spikes in regions
where D is correlated to the values being processed The correct value of Ks can thus be
identified from the spikes in its differential trace Four values of b correspond to each S box
providing confirmation of key block guesses Finding all eight Ks yields the entire 48-bit round
sub key The remaining 8 key bits can be found easily using exhaustive search or by analyzing
one additional round Triple DES keys can be found by analyzing an outer DES operation first
using the resulting key to decrypt the cipher text and attacking the next DES key DPA can use
known plaintext or known cipher text and can find encryption or decryption keys
26
CHAPTER 6 CONSTANT POWER CONSUMING
LOGIC STYLES
The power consumption of traditional standard cells and logic is
dependent on the signal activity When the output of the logic gate makes
a 0 to 1 transition a current comes from the power supply and charges the
output capacitance On the other hand when the output sees a 1 to 0 a 0
to 0 or a 1 to 1 transition no or only a limited amount of energy (due to
short circuit or leakage) is consumed from the power supply This is the
fundamental reason why information is leaked through the power supply
and why power attacks are possible The basis of a secure digital design
flow is a logic style with constant power consumption
61 Current Mode Logic
Current mode logic (CML) eg current steering logic seems the
ideal solution This type of logic continuously draws a current from the
supply and measures its state through the path that the current takes A
gate has constant power consumption if it draws a perfectly constant
current from the power supply independently of the input and output
signals To build a current source capable of generating a constant current
special circuit techniques that minimize channel length modulation have to
be used
The decisive drawback of CML however is its static power
consumption When the logic gate is not processing any data it burns the
27
current which makes this logic style unacceptable for embedded battery-
operated devices
62 Voltage Mode Logic (CMOS circuit styles)
Voltage mode logic (VML) eg static CMOS logic only draws a current from the
supply to change state and measures its state by the amount of charge it stores on a
capacitance A regular standard CMOS circuit will only consume power when a capacitance
gets charged and later discharged ie when a gate switches state It is the main reason that
CMOS is the style of choice for every battery operated or low power device This is illustrated
in the figure below for simple inverter Thus static CMOS is the preferred logic style because
of its low power consumption and high noise margins
Standard CMOS inverter
Yet two conditions must be satisfied for VML to have constant power consumption
namely
1) A logic gate must have exactly one switching event per signal transition
2) The logic gate must charge a constant capacitance in that switching event
28
Here above all the four transitions of CMOS inverter can be distinguished when
monitoring the power supply
63 Dynamic Differential Logic
Dynamic differential logic sometimes also referred to as dual rail with pre-charge
logic fulfills the first condition A differential logic family uses the true and the false
representation of the input and output signals and a dynamic logic family alternates pre-charge
and evaluation phases As a result since both outputs (true and false) are pre-charged to 1
exactly one of the two output nodes evaluates to 0 to have a differential output signal in the
evaluation phase The discharged output node is charged to 1 in the following pre-charge phase
to pre-charge both outputs to 1 In other words every signal transition including the events in
which the input signals remain constant is represented with an actual switching event in
which the logic gate charges a capacitance All the logic families that have been introduced to
thwart the differential power analysis (DPA) by using dynamic differential logic in the
following techniques
1 Sense Amplifier Based Logic (SABL) and
2 Wave Dynamic Differential Logic (WDDL) gates
631 Sense Amplifier Based logic (SABL)
SABL has its main advantage that it has balanced input and output nodes and that all
internal nodes connect to an output The output capacitances can be balanced Systematic
methods have been developed to make sure that both branches of the differential pull down
network are balanced and that no memory effects are present in the network Sense Amplifier
Based logic is illustrated as
29
Sense Amplifier Based Logic
ANDNAND gate
This circuit style does require however a full custom characterization and layout It also
suffers from a high clock load common to all dynamic logic gates
632 Wave Dynamic Differential Logic Gates (WDDL)
WDDL logic can be implemented with static CMOS logic Static CMOS
standard cells are combined to form secure compound standard cells
which have a reduced power signature WDDL has many advantages It can
be readily implemented from an existing standard cell library The design
flow is fully supported with accurate EDA library files that come directly
from the vendor WDDL also results in a dynamic differential logic with only
a small load capacitance on the pre-charge control signal and with the low
power consumption and the high noise margins of static CMOS
Advantages of WDDL logic style are as follows
30
A major advantage of the proposed logic style is that it can be incorporated by the common
Electronic Design Automation (EDA) tool flow
No special design rules are involved in the interconnection of WDDL gates
The switching factor of WDDL is 100 A WDDL gate consists of a parallel
combination of two positive complementary gates one calculating the
true output using the true inputs the other the false output using the
false inputs A positive gate produces a zero output for an all zero input
The AND gate and the OR gate are examples of positive gates A
complementary gate sometimes also referred to as a dual gate
expresses the false output of the original logic gate using the false
inputs of the original gate The AND gate fed with true input signals and
the OR gate fed with false input signals are two dual gates Fig shows
the WDDL AND gate and the WDDL OR gate In the evaluation phase
each input signal is differential and the WDDL gate calculates its
differential output In the pre-charge phase the inputs to the WDDL gate
are set at 0 This puts the output of the gate at 0 A module in WDDL
pre-charges without distributing the pre-charge signal to each individual
gate During the pre-charge phase the input vector of the combinatorial
logic is set at all 0s Each individual gate will eventually have all its
inputs at 0 evaluate its output to 0 and pass this 0 value to the next
gate One could say that the pre-charge signal travels over the
combinatorial logic as a 0-wave hence WDDL There are several ways
to launch to pre-charge wave In Fig a pre-charge operator is inserted
at the start of every combinatorial logic tree ie the inputs of the
encryption module and the outputs of the registers They produce an all-
zero output in the pre-charge phase (clk-signal high) but let the
31
differential signal through during the evaluation phase (clk-signal low)
Fig
ure WDDL Pre-charge wave generationCHAPTER 7
WDDL GATESThe methodology used in the project is bottom-up approach Lower
modules are designed and later integrated to form larger modules whose further integration
leads to the final top module As it is a fact that logic gates form lower level modules
initially logic gates required for the design are implemented in WDDL style WDDL
demands a parallel combination of two positive complementary gates one calculating the
true value and the other negative value The logic gates like OR AND XOR have been
implemented Besides there is even implementation of Full Adder 32-bit XOR
etc71WDDL OR gateA WDDL OR gate is constructed by considering conventional
OR gate in parallel to its complementary gate ie AND gate as shown in the following
32
figure Figure
41 WDDL OR GateThe NAND and NOT gates used act as Precharge operator injecting
signal lsquo0rsquo into the gates when lsquoclkrsquo signal is high ie during the Precharge phase72
WDDL AND gateA WDDL AND gate is constructed by considering conventional
AND gate in parallel to its complementary gate ie OR gate as shown in the following
33
figure Figure
42 WDDL AND Gate73 WDDL NAND gateA WDDL NAND gate is constructed by
considering conventional AND gate in parallel to its complementary gate ie OR gate as
shown in the following figure
34
Figure
43 WDDL NAND Gate74 WDDL NOR gateA WDDL NOR gate is constructed by
considering conventional OR gate in parallel to its complementary gate ie AND gate as
shown in the following figure
35
Figure 44 WDDL
NOR Gate 75 WDDL XOR gate XOR function can be implemented by the
Boolean Equation A B = ABrsquo + ArsquoB Therefore XOR Gate is implemented
in terms of AND gate and OR gate as shown in the figure 44 It can also be implemented
by instantiating a WDDL AND gate and WDDL OR gate But the number of gates
involved in the latter one is greater than the former one Therefore the first method of
implementation is followed rather than the second one
36
Figure 45
WDDL XOR gateWith the help of the above basic gates Full adder circuit has been
designed by instantiating the above designed WDDL gates During the implementation of
the Blowfish algorithm a 32-bit XOR gate and 32-bit Adder circuit are required They can
be easily implemented by instantiating the corresponding lower module 32 number of
timesCHAPTER 8 FRONT END
RESULTSWDDL OR GATESynthesis
Report==========================================================
= Final Report
===========================================================Final
ResultsRTL Top Level Output File Name wddlorngrTop Level Output File Name
wddlorOutput Format NGCOptimization Goal SpeedKeep
Hierarchy NODesign Statistics IOs 5Cell Usage
BELS 2 LUT3 2 IO Buffers 5
37
IBUF 3 OBUF
2============================================================Devi
ce utilization summary---------------------------Selected Device 3s250eft256-4 Number
of Slices 1 out of 2448 0 Number of 4 input LUTs 2
out of 4896 0 Number of IOs 5 Number of bonded IOBs
5 out of 172 2 Timing Summary---------------Speed Grade -4Maximum
combinational path delay 6236nsSimulation Result
S
ynthesis Result
38
WDD
L AND GATESynthesis
Report==========================================================
== Final Report
============================================================Fina
l ResultsRTL Top Level Output File Name wddlgatesngrTop Level Output File
Name wddlgatesOutput Format NGCOptimization Goal
SpeedKeep Hierarchy NODesign Statistics IOs
5Cell Usage BELS 2 LUT3 2 IO Buffers
5 IBUF 3 OBUF
2===========================================================Devic
e utilization summary---------------------------Selected Device 3s250etq144-4 Number
of Slices 1 out of 2448 0 Number of 4 input LUTs 2
out of 4896 0 Number of IOs 5 Number of bonded IOBs
5 out of 108 4 Timing Summary---------------Speed Grade -4Maximum
combinational path delay 6236nsSimulation Result
39
Sy
nthesis Result
WDDL NAND GATESynthesis
Report==========================================================
== Final Report
============================================================Fina
l ResultsRTL Top Level Output File Name wddlnand1ngrTop Level Output File
Name wddlnand1Output Format NGCOptimization Goal
SpeedKeep Hierarchy NODesign Statistics IOs
5Cell Usage BELS 2 LUT3 2 IO Buffers
40
5 IBUF 3 OBUF
2============================================================Devi
ce utilization summarySelected Device 3s500efg320-4 Number of Slices
1 out of 4656 0 Number of 4 input LUTs 2 out of 9312 0
Number of IOs 5 Number of bonded IOBs 5 out of 232
2 Timing SummarySpeed Grade -4Maximum combinational path delay
6236nsSimulation Result
Synthesis Result
WD
41
DL XOR GATESimulation Result
Synthesis Result
WDDL XOR GATESynthesis
Report==========================================================
== Final Report
===========================================================Final
42
ResultsRTL Top Level Output File Name wddlxorgatengrTop Level Output File
Name wddlxorgateOutput Format NGCOptimization Goal
SpeedKeep Hierarchy NODesign Statistics IOs
5Cell Usage BELS 2 LUT3 2 IO Buffers
5 IBUF 3 OBUF
2============================================================Devi
ce utilization summary---------------------------Selected Device 3s250eft256-4 Number
of Slices 1 out of 2448 0 Number of 4 input LUTs 2
out of 4896 0 Number of IOs 5 Number of bonded IOBs
5 out of 172 2 Timing Summary---------------Speed Grade -4Maximum
combinational path delay 6236nsSimulation Result
Synthesis Result
43
CHAPTER 9 SUMMARY AND CONCLUSION 91
SummaryIn order to provide security to ICs against side-channel attacks especially
Differential Power Analysis (DPA) it is necessary to implement the design in a logic that
can render constant power dissipation irrespective of the input combination WDDL is
proved to be advantageous to others and therefore is of great significance In this
dissertation work architecture for Blowfish Algorithm is designed and implemented in
WDDL style In this implementation bottom-up approach is used The low level entities
are designed and later they are all combined to form the entire module The key
scheduling is online The sub-keys generated for a particular key can be used for the
encryption of the entire data to be encrypted with that key The sub keys are given in
reverse direction for the decryption data path Initially logic gates are implemented in
WDDL and then higher modules have been designed by instantiating the WDDL gates to
form the entire module thus resulting in constant power dissipation irrespective of any
input data combination The entire design works in two phases namely Precharge phase and
Evaluation phase In the Precharge phase all the signals of the design are zeroed and
during the Evaluation phase the functionality of the design is achieved This sort of design
has been found simple and very effective in thwarting the side-channel attack namely
Differential Power analysis (DPA)92 ConclusionThe crypto processor has been
44
designed for the key size of 448 bits and plain text of 64 bits The code for the
implementation has been written in VHDL The functional verification has been done using
the Cadence (NC Launch) Simulation package and Synthesis using RTL Compiler The
Backend of the design is done using the SOC EncounterAccording to the specifications
desired functionality has been achieved In the output during the Evaluation phase there
has been same number of transitions thus resulting in constant power dissipation During
Synthesis it has been observed that a simple WDDL gate comprised many conventional
gates Therefore the area of the design has grown nearly three-fold when compared to the
design implemented in conventional CMOS logic at the cost of security incorporated into
the IC against Differential Power Analysis (DPA) Due to the constant power dissipation at
the output hacker cannot apply Differential Power Analysis (DPA) scheme to find the
secret key that is being used in the crypto-processor Thus security against DPA is
incorporated into the IC at hardware level by implementing the design in WDDL style
which is quite simple and effectiveCHAPTER 10
REFERENCES 101 Referred Technical papers[1] Kris Tiri Member
IEEE and Ingrid Verbauwhede Senior Member IEEE ldquoA Digital Design Flow for
Secure Integrated Circuitsrdquo IEEE Transaction on Computer-Aided Design of Integrated
Circuits and Systems Vol 25 No 7 July 2006[2] Prof Jean-Jacques Quisquater Math
RiZK laquo Side Channel Attacks raquo October 2002[3] Ross Anderson Mike Bond Jolyon
Clulow and Sergei Skorobogatov ldquoCrypto processors ndash A Surveyrdquo IEEE Proceedings[4]
Noohul Basheer Zain Ali James M Noras ldquoOptimal Data path Design for a Cryptographic
Processor The Blowfish Algorithmrdquo Malaysian Journal of Computer Science Vol 14 No
1 June 2001 pp 16-27[5] Dan Rinehimer Derek Wilson ldquoSummary of B Schneierrsquos
Description of New Variable Length Key 64-Bit Block Cipher (Blowfish)rdquo[6] Kris Tiri and
Ingrid Verbauwhede ldquoA Dynamic and Differential CMOS Logic Style to Resist Power and
Timing Attacks on Security ICrsquosrdquo[7] Discretix Technologies Limited ldquoIntroduction to Side
45
Channel Attacksrdquo[8] Kris Tiri Moonmoon Akmal and Ingrid Verbauwhede ldquoA Dynamic
and Differential Logic with Signal Independent Power Consumption to withstand
Differential Power Analysis on Smart CardsrdquoReference books[1] William Stallings
ldquoCryptography and Network Security Principles and Practicesrdquo Pearson Education
2003[2] Samir Palnitkar ldquoVerilog HDL A Guide to Digital Design and Synthesisrdquo
Prentice Hall Referred Web Pages[1] httpwwwschneiercomblowfishhtml[2]
httpwwwnistgov[3] httpwwwdiscretixcomPDFIntroduction20to20Side20Channel
20Attackspdf[4] httpwwwwipointpctdbenwojsp
IA=WO2005081085ampDISPLAY=CLAIMS
46
If Ks is incorrect the bit computed using D will differ from the actual target bit for about half
of the ciphertext Ci The selection function is thus effectively uncorrelated to what was actually
computed by the target device If a random function is used to divide a set into two subsets the
difference in the averages of the subsets should approach zero as the subset sizes approach
infinity
Thus because trace components uncorrelated to D will diminish with 1 pm causing the
differential trace to become at (the actual trace may not be completely at as D with Ks
incorrect may have a weak correlation to D with the correct Ks) If Ks is correct however the
computed value for D (Ci bKs) will equal the actual value of target bit b with probability 1
The selection function is thus correlated to the value of the bit considered Other data values
measurement errors etc that are not correlated to D approach zero Because power
consumption is correlated to data bit values the plot of centD will be degat with spikes in regions
where D is correlated to the values being processed The correct value of Ks can thus be
identified from the spikes in its differential trace Four values of b correspond to each S box
providing confirmation of key block guesses Finding all eight Ks yields the entire 48-bit round
sub key The remaining 8 key bits can be found easily using exhaustive search or by analyzing
one additional round Triple DES keys can be found by analyzing an outer DES operation first
using the resulting key to decrypt the cipher text and attacking the next DES key DPA can use
known plaintext or known cipher text and can find encryption or decryption keys
26
CHAPTER 6 CONSTANT POWER CONSUMING
LOGIC STYLES
The power consumption of traditional standard cells and logic is
dependent on the signal activity When the output of the logic gate makes
a 0 to 1 transition a current comes from the power supply and charges the
output capacitance On the other hand when the output sees a 1 to 0 a 0
to 0 or a 1 to 1 transition no or only a limited amount of energy (due to
short circuit or leakage) is consumed from the power supply This is the
fundamental reason why information is leaked through the power supply
and why power attacks are possible The basis of a secure digital design
flow is a logic style with constant power consumption
61 Current Mode Logic
Current mode logic (CML) eg current steering logic seems the
ideal solution This type of logic continuously draws a current from the
supply and measures its state through the path that the current takes A
gate has constant power consumption if it draws a perfectly constant
current from the power supply independently of the input and output
signals To build a current source capable of generating a constant current
special circuit techniques that minimize channel length modulation have to
be used
The decisive drawback of CML however is its static power
consumption When the logic gate is not processing any data it burns the
27
current which makes this logic style unacceptable for embedded battery-
operated devices
62 Voltage Mode Logic (CMOS circuit styles)
Voltage mode logic (VML) eg static CMOS logic only draws a current from the
supply to change state and measures its state by the amount of charge it stores on a
capacitance A regular standard CMOS circuit will only consume power when a capacitance
gets charged and later discharged ie when a gate switches state It is the main reason that
CMOS is the style of choice for every battery operated or low power device This is illustrated
in the figure below for simple inverter Thus static CMOS is the preferred logic style because
of its low power consumption and high noise margins
Standard CMOS inverter
Yet two conditions must be satisfied for VML to have constant power consumption
namely
1) A logic gate must have exactly one switching event per signal transition
2) The logic gate must charge a constant capacitance in that switching event
28
Here above all the four transitions of CMOS inverter can be distinguished when
monitoring the power supply
63 Dynamic Differential Logic
Dynamic differential logic sometimes also referred to as dual rail with pre-charge
logic fulfills the first condition A differential logic family uses the true and the false
representation of the input and output signals and a dynamic logic family alternates pre-charge
and evaluation phases As a result since both outputs (true and false) are pre-charged to 1
exactly one of the two output nodes evaluates to 0 to have a differential output signal in the
evaluation phase The discharged output node is charged to 1 in the following pre-charge phase
to pre-charge both outputs to 1 In other words every signal transition including the events in
which the input signals remain constant is represented with an actual switching event in
which the logic gate charges a capacitance All the logic families that have been introduced to
thwart the differential power analysis (DPA) by using dynamic differential logic in the
following techniques
1 Sense Amplifier Based Logic (SABL) and
2 Wave Dynamic Differential Logic (WDDL) gates
631 Sense Amplifier Based logic (SABL)
SABL has its main advantage that it has balanced input and output nodes and that all
internal nodes connect to an output The output capacitances can be balanced Systematic
methods have been developed to make sure that both branches of the differential pull down
network are balanced and that no memory effects are present in the network Sense Amplifier
Based logic is illustrated as
29
Sense Amplifier Based Logic
ANDNAND gate
This circuit style does require however a full custom characterization and layout It also
suffers from a high clock load common to all dynamic logic gates
632 Wave Dynamic Differential Logic Gates (WDDL)
WDDL logic can be implemented with static CMOS logic Static CMOS
standard cells are combined to form secure compound standard cells
which have a reduced power signature WDDL has many advantages It can
be readily implemented from an existing standard cell library The design
flow is fully supported with accurate EDA library files that come directly
from the vendor WDDL also results in a dynamic differential logic with only
a small load capacitance on the pre-charge control signal and with the low
power consumption and the high noise margins of static CMOS
Advantages of WDDL logic style are as follows
30
A major advantage of the proposed logic style is that it can be incorporated by the common
Electronic Design Automation (EDA) tool flow
No special design rules are involved in the interconnection of WDDL gates
The switching factor of WDDL is 100 A WDDL gate consists of a parallel
combination of two positive complementary gates one calculating the
true output using the true inputs the other the false output using the
false inputs A positive gate produces a zero output for an all zero input
The AND gate and the OR gate are examples of positive gates A
complementary gate sometimes also referred to as a dual gate
expresses the false output of the original logic gate using the false
inputs of the original gate The AND gate fed with true input signals and
the OR gate fed with false input signals are two dual gates Fig shows
the WDDL AND gate and the WDDL OR gate In the evaluation phase
each input signal is differential and the WDDL gate calculates its
differential output In the pre-charge phase the inputs to the WDDL gate
are set at 0 This puts the output of the gate at 0 A module in WDDL
pre-charges without distributing the pre-charge signal to each individual
gate During the pre-charge phase the input vector of the combinatorial
logic is set at all 0s Each individual gate will eventually have all its
inputs at 0 evaluate its output to 0 and pass this 0 value to the next
gate One could say that the pre-charge signal travels over the
combinatorial logic as a 0-wave hence WDDL There are several ways
to launch to pre-charge wave In Fig a pre-charge operator is inserted
at the start of every combinatorial logic tree ie the inputs of the
encryption module and the outputs of the registers They produce an all-
zero output in the pre-charge phase (clk-signal high) but let the
31
differential signal through during the evaluation phase (clk-signal low)
Fig
ure WDDL Pre-charge wave generationCHAPTER 7
WDDL GATESThe methodology used in the project is bottom-up approach Lower
modules are designed and later integrated to form larger modules whose further integration
leads to the final top module As it is a fact that logic gates form lower level modules
initially logic gates required for the design are implemented in WDDL style WDDL
demands a parallel combination of two positive complementary gates one calculating the
true value and the other negative value The logic gates like OR AND XOR have been
implemented Besides there is even implementation of Full Adder 32-bit XOR
etc71WDDL OR gateA WDDL OR gate is constructed by considering conventional
OR gate in parallel to its complementary gate ie AND gate as shown in the following
32
figure Figure
41 WDDL OR GateThe NAND and NOT gates used act as Precharge operator injecting
signal lsquo0rsquo into the gates when lsquoclkrsquo signal is high ie during the Precharge phase72
WDDL AND gateA WDDL AND gate is constructed by considering conventional
AND gate in parallel to its complementary gate ie OR gate as shown in the following
33
figure Figure
42 WDDL AND Gate73 WDDL NAND gateA WDDL NAND gate is constructed by
considering conventional AND gate in parallel to its complementary gate ie OR gate as
shown in the following figure
34
Figure
43 WDDL NAND Gate74 WDDL NOR gateA WDDL NOR gate is constructed by
considering conventional OR gate in parallel to its complementary gate ie AND gate as
shown in the following figure
35
Figure 44 WDDL
NOR Gate 75 WDDL XOR gate XOR function can be implemented by the
Boolean Equation A B = ABrsquo + ArsquoB Therefore XOR Gate is implemented
in terms of AND gate and OR gate as shown in the figure 44 It can also be implemented
by instantiating a WDDL AND gate and WDDL OR gate But the number of gates
involved in the latter one is greater than the former one Therefore the first method of
implementation is followed rather than the second one
36
Figure 45
WDDL XOR gateWith the help of the above basic gates Full adder circuit has been
designed by instantiating the above designed WDDL gates During the implementation of
the Blowfish algorithm a 32-bit XOR gate and 32-bit Adder circuit are required They can
be easily implemented by instantiating the corresponding lower module 32 number of
timesCHAPTER 8 FRONT END
RESULTSWDDL OR GATESynthesis
Report==========================================================
= Final Report
===========================================================Final
ResultsRTL Top Level Output File Name wddlorngrTop Level Output File Name
wddlorOutput Format NGCOptimization Goal SpeedKeep
Hierarchy NODesign Statistics IOs 5Cell Usage
BELS 2 LUT3 2 IO Buffers 5
37
IBUF 3 OBUF
2============================================================Devi
ce utilization summary---------------------------Selected Device 3s250eft256-4 Number
of Slices 1 out of 2448 0 Number of 4 input LUTs 2
out of 4896 0 Number of IOs 5 Number of bonded IOBs
5 out of 172 2 Timing Summary---------------Speed Grade -4Maximum
combinational path delay 6236nsSimulation Result
S
ynthesis Result
38
WDD
L AND GATESynthesis
Report==========================================================
== Final Report
============================================================Fina
l ResultsRTL Top Level Output File Name wddlgatesngrTop Level Output File
Name wddlgatesOutput Format NGCOptimization Goal
SpeedKeep Hierarchy NODesign Statistics IOs
5Cell Usage BELS 2 LUT3 2 IO Buffers
5 IBUF 3 OBUF
2===========================================================Devic
e utilization summary---------------------------Selected Device 3s250etq144-4 Number
of Slices 1 out of 2448 0 Number of 4 input LUTs 2
out of 4896 0 Number of IOs 5 Number of bonded IOBs
5 out of 108 4 Timing Summary---------------Speed Grade -4Maximum
combinational path delay 6236nsSimulation Result
39
Sy
nthesis Result
WDDL NAND GATESynthesis
Report==========================================================
== Final Report
============================================================Fina
l ResultsRTL Top Level Output File Name wddlnand1ngrTop Level Output File
Name wddlnand1Output Format NGCOptimization Goal
SpeedKeep Hierarchy NODesign Statistics IOs
5Cell Usage BELS 2 LUT3 2 IO Buffers
40
5 IBUF 3 OBUF
2============================================================Devi
ce utilization summarySelected Device 3s500efg320-4 Number of Slices
1 out of 4656 0 Number of 4 input LUTs 2 out of 9312 0
Number of IOs 5 Number of bonded IOBs 5 out of 232
2 Timing SummarySpeed Grade -4Maximum combinational path delay
6236nsSimulation Result
Synthesis Result
WD
41
DL XOR GATESimulation Result
Synthesis Result
WDDL XOR GATESynthesis
Report==========================================================
== Final Report
===========================================================Final
42
ResultsRTL Top Level Output File Name wddlxorgatengrTop Level Output File
Name wddlxorgateOutput Format NGCOptimization Goal
SpeedKeep Hierarchy NODesign Statistics IOs
5Cell Usage BELS 2 LUT3 2 IO Buffers
5 IBUF 3 OBUF
2============================================================Devi
ce utilization summary---------------------------Selected Device 3s250eft256-4 Number
of Slices 1 out of 2448 0 Number of 4 input LUTs 2
out of 4896 0 Number of IOs 5 Number of bonded IOBs
5 out of 172 2 Timing Summary---------------Speed Grade -4Maximum
combinational path delay 6236nsSimulation Result
Synthesis Result
43
CHAPTER 9 SUMMARY AND CONCLUSION 91
SummaryIn order to provide security to ICs against side-channel attacks especially
Differential Power Analysis (DPA) it is necessary to implement the design in a logic that
can render constant power dissipation irrespective of the input combination WDDL is
proved to be advantageous to others and therefore is of great significance In this
dissertation work architecture for Blowfish Algorithm is designed and implemented in
WDDL style In this implementation bottom-up approach is used The low level entities
are designed and later they are all combined to form the entire module The key
scheduling is online The sub-keys generated for a particular key can be used for the
encryption of the entire data to be encrypted with that key The sub keys are given in
reverse direction for the decryption data path Initially logic gates are implemented in
WDDL and then higher modules have been designed by instantiating the WDDL gates to
form the entire module thus resulting in constant power dissipation irrespective of any
input data combination The entire design works in two phases namely Precharge phase and
Evaluation phase In the Precharge phase all the signals of the design are zeroed and
during the Evaluation phase the functionality of the design is achieved This sort of design
has been found simple and very effective in thwarting the side-channel attack namely
Differential Power analysis (DPA)92 ConclusionThe crypto processor has been
44
designed for the key size of 448 bits and plain text of 64 bits The code for the
implementation has been written in VHDL The functional verification has been done using
the Cadence (NC Launch) Simulation package and Synthesis using RTL Compiler The
Backend of the design is done using the SOC EncounterAccording to the specifications
desired functionality has been achieved In the output during the Evaluation phase there
has been same number of transitions thus resulting in constant power dissipation During
Synthesis it has been observed that a simple WDDL gate comprised many conventional
gates Therefore the area of the design has grown nearly three-fold when compared to the
design implemented in conventional CMOS logic at the cost of security incorporated into
the IC against Differential Power Analysis (DPA) Due to the constant power dissipation at
the output hacker cannot apply Differential Power Analysis (DPA) scheme to find the
secret key that is being used in the crypto-processor Thus security against DPA is
incorporated into the IC at hardware level by implementing the design in WDDL style
which is quite simple and effectiveCHAPTER 10
REFERENCES 101 Referred Technical papers[1] Kris Tiri Member
IEEE and Ingrid Verbauwhede Senior Member IEEE ldquoA Digital Design Flow for
Secure Integrated Circuitsrdquo IEEE Transaction on Computer-Aided Design of Integrated
Circuits and Systems Vol 25 No 7 July 2006[2] Prof Jean-Jacques Quisquater Math
RiZK laquo Side Channel Attacks raquo October 2002[3] Ross Anderson Mike Bond Jolyon
Clulow and Sergei Skorobogatov ldquoCrypto processors ndash A Surveyrdquo IEEE Proceedings[4]
Noohul Basheer Zain Ali James M Noras ldquoOptimal Data path Design for a Cryptographic
Processor The Blowfish Algorithmrdquo Malaysian Journal of Computer Science Vol 14 No
1 June 2001 pp 16-27[5] Dan Rinehimer Derek Wilson ldquoSummary of B Schneierrsquos
Description of New Variable Length Key 64-Bit Block Cipher (Blowfish)rdquo[6] Kris Tiri and
Ingrid Verbauwhede ldquoA Dynamic and Differential CMOS Logic Style to Resist Power and
Timing Attacks on Security ICrsquosrdquo[7] Discretix Technologies Limited ldquoIntroduction to Side
45
Channel Attacksrdquo[8] Kris Tiri Moonmoon Akmal and Ingrid Verbauwhede ldquoA Dynamic
and Differential Logic with Signal Independent Power Consumption to withstand
Differential Power Analysis on Smart CardsrdquoReference books[1] William Stallings
ldquoCryptography and Network Security Principles and Practicesrdquo Pearson Education
2003[2] Samir Palnitkar ldquoVerilog HDL A Guide to Digital Design and Synthesisrdquo
Prentice Hall Referred Web Pages[1] httpwwwschneiercomblowfishhtml[2]
httpwwwnistgov[3] httpwwwdiscretixcomPDFIntroduction20to20Side20Channel
20Attackspdf[4] httpwwwwipointpctdbenwojsp
IA=WO2005081085ampDISPLAY=CLAIMS
46
CHAPTER 6 CONSTANT POWER CONSUMING
LOGIC STYLES
The power consumption of traditional standard cells and logic is
dependent on the signal activity When the output of the logic gate makes
a 0 to 1 transition a current comes from the power supply and charges the
output capacitance On the other hand when the output sees a 1 to 0 a 0
to 0 or a 1 to 1 transition no or only a limited amount of energy (due to
short circuit or leakage) is consumed from the power supply This is the
fundamental reason why information is leaked through the power supply
and why power attacks are possible The basis of a secure digital design
flow is a logic style with constant power consumption
61 Current Mode Logic
Current mode logic (CML) eg current steering logic seems the
ideal solution This type of logic continuously draws a current from the
supply and measures its state through the path that the current takes A
gate has constant power consumption if it draws a perfectly constant
current from the power supply independently of the input and output
signals To build a current source capable of generating a constant current
special circuit techniques that minimize channel length modulation have to
be used
The decisive drawback of CML however is its static power
consumption When the logic gate is not processing any data it burns the
27
current which makes this logic style unacceptable for embedded battery-
operated devices
62 Voltage Mode Logic (CMOS circuit styles)
Voltage mode logic (VML) eg static CMOS logic only draws a current from the
supply to change state and measures its state by the amount of charge it stores on a
capacitance A regular standard CMOS circuit will only consume power when a capacitance
gets charged and later discharged ie when a gate switches state It is the main reason that
CMOS is the style of choice for every battery operated or low power device This is illustrated
in the figure below for simple inverter Thus static CMOS is the preferred logic style because
of its low power consumption and high noise margins
Standard CMOS inverter
Yet two conditions must be satisfied for VML to have constant power consumption
namely
1) A logic gate must have exactly one switching event per signal transition
2) The logic gate must charge a constant capacitance in that switching event
28
Here above all the four transitions of CMOS inverter can be distinguished when
monitoring the power supply
63 Dynamic Differential Logic
Dynamic differential logic sometimes also referred to as dual rail with pre-charge
logic fulfills the first condition A differential logic family uses the true and the false
representation of the input and output signals and a dynamic logic family alternates pre-charge
and evaluation phases As a result since both outputs (true and false) are pre-charged to 1
exactly one of the two output nodes evaluates to 0 to have a differential output signal in the
evaluation phase The discharged output node is charged to 1 in the following pre-charge phase
to pre-charge both outputs to 1 In other words every signal transition including the events in
which the input signals remain constant is represented with an actual switching event in
which the logic gate charges a capacitance All the logic families that have been introduced to
thwart the differential power analysis (DPA) by using dynamic differential logic in the
following techniques
1 Sense Amplifier Based Logic (SABL) and
2 Wave Dynamic Differential Logic (WDDL) gates
631 Sense Amplifier Based logic (SABL)
SABL has its main advantage that it has balanced input and output nodes and that all
internal nodes connect to an output The output capacitances can be balanced Systematic
methods have been developed to make sure that both branches of the differential pull down
network are balanced and that no memory effects are present in the network Sense Amplifier
Based logic is illustrated as
29
Sense Amplifier Based Logic
ANDNAND gate
This circuit style does require however a full custom characterization and layout It also
suffers from a high clock load common to all dynamic logic gates
632 Wave Dynamic Differential Logic Gates (WDDL)
WDDL logic can be implemented with static CMOS logic Static CMOS
standard cells are combined to form secure compound standard cells
which have a reduced power signature WDDL has many advantages It can
be readily implemented from an existing standard cell library The design
flow is fully supported with accurate EDA library files that come directly
from the vendor WDDL also results in a dynamic differential logic with only
a small load capacitance on the pre-charge control signal and with the low
power consumption and the high noise margins of static CMOS
Advantages of WDDL logic style are as follows
30
A major advantage of the proposed logic style is that it can be incorporated by the common
Electronic Design Automation (EDA) tool flow
No special design rules are involved in the interconnection of WDDL gates
The switching factor of WDDL is 100 A WDDL gate consists of a parallel
combination of two positive complementary gates one calculating the
true output using the true inputs the other the false output using the
false inputs A positive gate produces a zero output for an all zero input
The AND gate and the OR gate are examples of positive gates A
complementary gate sometimes also referred to as a dual gate
expresses the false output of the original logic gate using the false
inputs of the original gate The AND gate fed with true input signals and
the OR gate fed with false input signals are two dual gates Fig shows
the WDDL AND gate and the WDDL OR gate In the evaluation phase
each input signal is differential and the WDDL gate calculates its
differential output In the pre-charge phase the inputs to the WDDL gate
are set at 0 This puts the output of the gate at 0 A module in WDDL
pre-charges without distributing the pre-charge signal to each individual
gate During the pre-charge phase the input vector of the combinatorial
logic is set at all 0s Each individual gate will eventually have all its
inputs at 0 evaluate its output to 0 and pass this 0 value to the next
gate One could say that the pre-charge signal travels over the
combinatorial logic as a 0-wave hence WDDL There are several ways
to launch to pre-charge wave In Fig a pre-charge operator is inserted
at the start of every combinatorial logic tree ie the inputs of the
encryption module and the outputs of the registers They produce an all-
zero output in the pre-charge phase (clk-signal high) but let the
31
differential signal through during the evaluation phase (clk-signal low)
Fig
ure WDDL Pre-charge wave generationCHAPTER 7
WDDL GATESThe methodology used in the project is bottom-up approach Lower
modules are designed and later integrated to form larger modules whose further integration
leads to the final top module As it is a fact that logic gates form lower level modules
initially logic gates required for the design are implemented in WDDL style WDDL
demands a parallel combination of two positive complementary gates one calculating the
true value and the other negative value The logic gates like OR AND XOR have been
implemented Besides there is even implementation of Full Adder 32-bit XOR
etc71WDDL OR gateA WDDL OR gate is constructed by considering conventional
OR gate in parallel to its complementary gate ie AND gate as shown in the following
32
figure Figure
41 WDDL OR GateThe NAND and NOT gates used act as Precharge operator injecting
signal lsquo0rsquo into the gates when lsquoclkrsquo signal is high ie during the Precharge phase72
WDDL AND gateA WDDL AND gate is constructed by considering conventional
AND gate in parallel to its complementary gate ie OR gate as shown in the following
33
figure Figure
42 WDDL AND Gate73 WDDL NAND gateA WDDL NAND gate is constructed by
considering conventional AND gate in parallel to its complementary gate ie OR gate as
shown in the following figure
34
Figure
43 WDDL NAND Gate74 WDDL NOR gateA WDDL NOR gate is constructed by
considering conventional OR gate in parallel to its complementary gate ie AND gate as
shown in the following figure
35
Figure 44 WDDL
NOR Gate 75 WDDL XOR gate XOR function can be implemented by the
Boolean Equation A B = ABrsquo + ArsquoB Therefore XOR Gate is implemented
in terms of AND gate and OR gate as shown in the figure 44 It can also be implemented
by instantiating a WDDL AND gate and WDDL OR gate But the number of gates
involved in the latter one is greater than the former one Therefore the first method of
implementation is followed rather than the second one
36
Figure 45
WDDL XOR gateWith the help of the above basic gates Full adder circuit has been
designed by instantiating the above designed WDDL gates During the implementation of
the Blowfish algorithm a 32-bit XOR gate and 32-bit Adder circuit are required They can
be easily implemented by instantiating the corresponding lower module 32 number of
timesCHAPTER 8 FRONT END
RESULTSWDDL OR GATESynthesis
Report==========================================================
= Final Report
===========================================================Final
ResultsRTL Top Level Output File Name wddlorngrTop Level Output File Name
wddlorOutput Format NGCOptimization Goal SpeedKeep
Hierarchy NODesign Statistics IOs 5Cell Usage
BELS 2 LUT3 2 IO Buffers 5
37
IBUF 3 OBUF
2============================================================Devi
ce utilization summary---------------------------Selected Device 3s250eft256-4 Number
of Slices 1 out of 2448 0 Number of 4 input LUTs 2
out of 4896 0 Number of IOs 5 Number of bonded IOBs
5 out of 172 2 Timing Summary---------------Speed Grade -4Maximum
combinational path delay 6236nsSimulation Result
S
ynthesis Result
38
WDD
L AND GATESynthesis
Report==========================================================
== Final Report
============================================================Fina
l ResultsRTL Top Level Output File Name wddlgatesngrTop Level Output File
Name wddlgatesOutput Format NGCOptimization Goal
SpeedKeep Hierarchy NODesign Statistics IOs
5Cell Usage BELS 2 LUT3 2 IO Buffers
5 IBUF 3 OBUF
2===========================================================Devic
e utilization summary---------------------------Selected Device 3s250etq144-4 Number
of Slices 1 out of 2448 0 Number of 4 input LUTs 2
out of 4896 0 Number of IOs 5 Number of bonded IOBs
5 out of 108 4 Timing Summary---------------Speed Grade -4Maximum
combinational path delay 6236nsSimulation Result
39
Sy
nthesis Result
WDDL NAND GATESynthesis
Report==========================================================
== Final Report
============================================================Fina
l ResultsRTL Top Level Output File Name wddlnand1ngrTop Level Output File
Name wddlnand1Output Format NGCOptimization Goal
SpeedKeep Hierarchy NODesign Statistics IOs
5Cell Usage BELS 2 LUT3 2 IO Buffers
40
5 IBUF 3 OBUF
2============================================================Devi
ce utilization summarySelected Device 3s500efg320-4 Number of Slices
1 out of 4656 0 Number of 4 input LUTs 2 out of 9312 0
Number of IOs 5 Number of bonded IOBs 5 out of 232
2 Timing SummarySpeed Grade -4Maximum combinational path delay
6236nsSimulation Result
Synthesis Result
WD
41
DL XOR GATESimulation Result
Synthesis Result
WDDL XOR GATESynthesis
Report==========================================================
== Final Report
===========================================================Final
42
ResultsRTL Top Level Output File Name wddlxorgatengrTop Level Output File
Name wddlxorgateOutput Format NGCOptimization Goal
SpeedKeep Hierarchy NODesign Statistics IOs
5Cell Usage BELS 2 LUT3 2 IO Buffers
5 IBUF 3 OBUF
2============================================================Devi
ce utilization summary---------------------------Selected Device 3s250eft256-4 Number
of Slices 1 out of 2448 0 Number of 4 input LUTs 2
out of 4896 0 Number of IOs 5 Number of bonded IOBs
5 out of 172 2 Timing Summary---------------Speed Grade -4Maximum
combinational path delay 6236nsSimulation Result
Synthesis Result
43
CHAPTER 9 SUMMARY AND CONCLUSION 91
SummaryIn order to provide security to ICs against side-channel attacks especially
Differential Power Analysis (DPA) it is necessary to implement the design in a logic that
can render constant power dissipation irrespective of the input combination WDDL is
proved to be advantageous to others and therefore is of great significance In this
dissertation work architecture for Blowfish Algorithm is designed and implemented in
WDDL style In this implementation bottom-up approach is used The low level entities
are designed and later they are all combined to form the entire module The key
scheduling is online The sub-keys generated for a particular key can be used for the
encryption of the entire data to be encrypted with that key The sub keys are given in
reverse direction for the decryption data path Initially logic gates are implemented in
WDDL and then higher modules have been designed by instantiating the WDDL gates to
form the entire module thus resulting in constant power dissipation irrespective of any
input data combination The entire design works in two phases namely Precharge phase and
Evaluation phase In the Precharge phase all the signals of the design are zeroed and
during the Evaluation phase the functionality of the design is achieved This sort of design
has been found simple and very effective in thwarting the side-channel attack namely
Differential Power analysis (DPA)92 ConclusionThe crypto processor has been
44
designed for the key size of 448 bits and plain text of 64 bits The code for the
implementation has been written in VHDL The functional verification has been done using
the Cadence (NC Launch) Simulation package and Synthesis using RTL Compiler The
Backend of the design is done using the SOC EncounterAccording to the specifications
desired functionality has been achieved In the output during the Evaluation phase there
has been same number of transitions thus resulting in constant power dissipation During
Synthesis it has been observed that a simple WDDL gate comprised many conventional
gates Therefore the area of the design has grown nearly three-fold when compared to the
design implemented in conventional CMOS logic at the cost of security incorporated into
the IC against Differential Power Analysis (DPA) Due to the constant power dissipation at
the output hacker cannot apply Differential Power Analysis (DPA) scheme to find the
secret key that is being used in the crypto-processor Thus security against DPA is
incorporated into the IC at hardware level by implementing the design in WDDL style
which is quite simple and effectiveCHAPTER 10
REFERENCES 101 Referred Technical papers[1] Kris Tiri Member
IEEE and Ingrid Verbauwhede Senior Member IEEE ldquoA Digital Design Flow for
Secure Integrated Circuitsrdquo IEEE Transaction on Computer-Aided Design of Integrated
Circuits and Systems Vol 25 No 7 July 2006[2] Prof Jean-Jacques Quisquater Math
RiZK laquo Side Channel Attacks raquo October 2002[3] Ross Anderson Mike Bond Jolyon
Clulow and Sergei Skorobogatov ldquoCrypto processors ndash A Surveyrdquo IEEE Proceedings[4]
Noohul Basheer Zain Ali James M Noras ldquoOptimal Data path Design for a Cryptographic
Processor The Blowfish Algorithmrdquo Malaysian Journal of Computer Science Vol 14 No
1 June 2001 pp 16-27[5] Dan Rinehimer Derek Wilson ldquoSummary of B Schneierrsquos
Description of New Variable Length Key 64-Bit Block Cipher (Blowfish)rdquo[6] Kris Tiri and
Ingrid Verbauwhede ldquoA Dynamic and Differential CMOS Logic Style to Resist Power and
Timing Attacks on Security ICrsquosrdquo[7] Discretix Technologies Limited ldquoIntroduction to Side
45
Channel Attacksrdquo[8] Kris Tiri Moonmoon Akmal and Ingrid Verbauwhede ldquoA Dynamic
and Differential Logic with Signal Independent Power Consumption to withstand
Differential Power Analysis on Smart CardsrdquoReference books[1] William Stallings
ldquoCryptography and Network Security Principles and Practicesrdquo Pearson Education
2003[2] Samir Palnitkar ldquoVerilog HDL A Guide to Digital Design and Synthesisrdquo
Prentice Hall Referred Web Pages[1] httpwwwschneiercomblowfishhtml[2]
httpwwwnistgov[3] httpwwwdiscretixcomPDFIntroduction20to20Side20Channel
20Attackspdf[4] httpwwwwipointpctdbenwojsp
IA=WO2005081085ampDISPLAY=CLAIMS
46
current which makes this logic style unacceptable for embedded battery-
operated devices
62 Voltage Mode Logic (CMOS circuit styles)
Voltage mode logic (VML) eg static CMOS logic only draws a current from the
supply to change state and measures its state by the amount of charge it stores on a
capacitance A regular standard CMOS circuit will only consume power when a capacitance
gets charged and later discharged ie when a gate switches state It is the main reason that
CMOS is the style of choice for every battery operated or low power device This is illustrated
in the figure below for simple inverter Thus static CMOS is the preferred logic style because
of its low power consumption and high noise margins
Standard CMOS inverter
Yet two conditions must be satisfied for VML to have constant power consumption
namely
1) A logic gate must have exactly one switching event per signal transition
2) The logic gate must charge a constant capacitance in that switching event
28
Here above all the four transitions of CMOS inverter can be distinguished when
monitoring the power supply
63 Dynamic Differential Logic
Dynamic differential logic sometimes also referred to as dual rail with pre-charge
logic fulfills the first condition A differential logic family uses the true and the false
representation of the input and output signals and a dynamic logic family alternates pre-charge
and evaluation phases As a result since both outputs (true and false) are pre-charged to 1
exactly one of the two output nodes evaluates to 0 to have a differential output signal in the
evaluation phase The discharged output node is charged to 1 in the following pre-charge phase
to pre-charge both outputs to 1 In other words every signal transition including the events in
which the input signals remain constant is represented with an actual switching event in
which the logic gate charges a capacitance All the logic families that have been introduced to
thwart the differential power analysis (DPA) by using dynamic differential logic in the
following techniques
1 Sense Amplifier Based Logic (SABL) and
2 Wave Dynamic Differential Logic (WDDL) gates
631 Sense Amplifier Based logic (SABL)
SABL has its main advantage that it has balanced input and output nodes and that all
internal nodes connect to an output The output capacitances can be balanced Systematic
methods have been developed to make sure that both branches of the differential pull down
network are balanced and that no memory effects are present in the network Sense Amplifier
Based logic is illustrated as
29
Sense Amplifier Based Logic
ANDNAND gate
This circuit style does require however a full custom characterization and layout It also
suffers from a high clock load common to all dynamic logic gates
632 Wave Dynamic Differential Logic Gates (WDDL)
WDDL logic can be implemented with static CMOS logic Static CMOS
standard cells are combined to form secure compound standard cells
which have a reduced power signature WDDL has many advantages It can
be readily implemented from an existing standard cell library The design
flow is fully supported with accurate EDA library files that come directly
from the vendor WDDL also results in a dynamic differential logic with only
a small load capacitance on the pre-charge control signal and with the low
power consumption and the high noise margins of static CMOS
Advantages of WDDL logic style are as follows
30
A major advantage of the proposed logic style is that it can be incorporated by the common
Electronic Design Automation (EDA) tool flow
No special design rules are involved in the interconnection of WDDL gates
The switching factor of WDDL is 100 A WDDL gate consists of a parallel
combination of two positive complementary gates one calculating the
true output using the true inputs the other the false output using the
false inputs A positive gate produces a zero output for an all zero input
The AND gate and the OR gate are examples of positive gates A
complementary gate sometimes also referred to as a dual gate
expresses the false output of the original logic gate using the false
inputs of the original gate The AND gate fed with true input signals and
the OR gate fed with false input signals are two dual gates Fig shows
the WDDL AND gate and the WDDL OR gate In the evaluation phase
each input signal is differential and the WDDL gate calculates its
differential output In the pre-charge phase the inputs to the WDDL gate
are set at 0 This puts the output of the gate at 0 A module in WDDL
pre-charges without distributing the pre-charge signal to each individual
gate During the pre-charge phase the input vector of the combinatorial
logic is set at all 0s Each individual gate will eventually have all its
inputs at 0 evaluate its output to 0 and pass this 0 value to the next
gate One could say that the pre-charge signal travels over the
combinatorial logic as a 0-wave hence WDDL There are several ways
to launch to pre-charge wave In Fig a pre-charge operator is inserted
at the start of every combinatorial logic tree ie the inputs of the
encryption module and the outputs of the registers They produce an all-
zero output in the pre-charge phase (clk-signal high) but let the
31
differential signal through during the evaluation phase (clk-signal low)
Fig
ure WDDL Pre-charge wave generationCHAPTER 7
WDDL GATESThe methodology used in the project is bottom-up approach Lower
modules are designed and later integrated to form larger modules whose further integration
leads to the final top module As it is a fact that logic gates form lower level modules
initially logic gates required for the design are implemented in WDDL style WDDL
demands a parallel combination of two positive complementary gates one calculating the
true value and the other negative value The logic gates like OR AND XOR have been
implemented Besides there is even implementation of Full Adder 32-bit XOR
etc71WDDL OR gateA WDDL OR gate is constructed by considering conventional
OR gate in parallel to its complementary gate ie AND gate as shown in the following
32
figure Figure
41 WDDL OR GateThe NAND and NOT gates used act as Precharge operator injecting
signal lsquo0rsquo into the gates when lsquoclkrsquo signal is high ie during the Precharge phase72
WDDL AND gateA WDDL AND gate is constructed by considering conventional
AND gate in parallel to its complementary gate ie OR gate as shown in the following
33
figure Figure
42 WDDL AND Gate73 WDDL NAND gateA WDDL NAND gate is constructed by
considering conventional AND gate in parallel to its complementary gate ie OR gate as
shown in the following figure
34
Figure
43 WDDL NAND Gate74 WDDL NOR gateA WDDL NOR gate is constructed by
considering conventional OR gate in parallel to its complementary gate ie AND gate as
shown in the following figure
35
Figure 44 WDDL
NOR Gate 75 WDDL XOR gate XOR function can be implemented by the
Boolean Equation A B = ABrsquo + ArsquoB Therefore XOR Gate is implemented
in terms of AND gate and OR gate as shown in the figure 44 It can also be implemented
by instantiating a WDDL AND gate and WDDL OR gate But the number of gates
involved in the latter one is greater than the former one Therefore the first method of
implementation is followed rather than the second one
36
Figure 45
WDDL XOR gateWith the help of the above basic gates Full adder circuit has been
designed by instantiating the above designed WDDL gates During the implementation of
the Blowfish algorithm a 32-bit XOR gate and 32-bit Adder circuit are required They can
be easily implemented by instantiating the corresponding lower module 32 number of
timesCHAPTER 8 FRONT END
RESULTSWDDL OR GATESynthesis
Report==========================================================
= Final Report
===========================================================Final
ResultsRTL Top Level Output File Name wddlorngrTop Level Output File Name
wddlorOutput Format NGCOptimization Goal SpeedKeep
Hierarchy NODesign Statistics IOs 5Cell Usage
BELS 2 LUT3 2 IO Buffers 5
37
IBUF 3 OBUF
2============================================================Devi
ce utilization summary---------------------------Selected Device 3s250eft256-4 Number
of Slices 1 out of 2448 0 Number of 4 input LUTs 2
out of 4896 0 Number of IOs 5 Number of bonded IOBs
5 out of 172 2 Timing Summary---------------Speed Grade -4Maximum
combinational path delay 6236nsSimulation Result
S
ynthesis Result
38
WDD
L AND GATESynthesis
Report==========================================================
== Final Report
============================================================Fina
l ResultsRTL Top Level Output File Name wddlgatesngrTop Level Output File
Name wddlgatesOutput Format NGCOptimization Goal
SpeedKeep Hierarchy NODesign Statistics IOs
5Cell Usage BELS 2 LUT3 2 IO Buffers
5 IBUF 3 OBUF
2===========================================================Devic
e utilization summary---------------------------Selected Device 3s250etq144-4 Number
of Slices 1 out of 2448 0 Number of 4 input LUTs 2
out of 4896 0 Number of IOs 5 Number of bonded IOBs
5 out of 108 4 Timing Summary---------------Speed Grade -4Maximum
combinational path delay 6236nsSimulation Result
39
Sy
nthesis Result
WDDL NAND GATESynthesis
Report==========================================================
== Final Report
============================================================Fina
l ResultsRTL Top Level Output File Name wddlnand1ngrTop Level Output File
Name wddlnand1Output Format NGCOptimization Goal
SpeedKeep Hierarchy NODesign Statistics IOs
5Cell Usage BELS 2 LUT3 2 IO Buffers
40
5 IBUF 3 OBUF
2============================================================Devi
ce utilization summarySelected Device 3s500efg320-4 Number of Slices
1 out of 4656 0 Number of 4 input LUTs 2 out of 9312 0
Number of IOs 5 Number of bonded IOBs 5 out of 232
2 Timing SummarySpeed Grade -4Maximum combinational path delay
6236nsSimulation Result
Synthesis Result
WD
41
DL XOR GATESimulation Result
Synthesis Result
WDDL XOR GATESynthesis
Report==========================================================
== Final Report
===========================================================Final
42
ResultsRTL Top Level Output File Name wddlxorgatengrTop Level Output File
Name wddlxorgateOutput Format NGCOptimization Goal
SpeedKeep Hierarchy NODesign Statistics IOs
5Cell Usage BELS 2 LUT3 2 IO Buffers
5 IBUF 3 OBUF
2============================================================Devi
ce utilization summary---------------------------Selected Device 3s250eft256-4 Number
of Slices 1 out of 2448 0 Number of 4 input LUTs 2
out of 4896 0 Number of IOs 5 Number of bonded IOBs
5 out of 172 2 Timing Summary---------------Speed Grade -4Maximum
combinational path delay 6236nsSimulation Result
Synthesis Result
43
CHAPTER 9 SUMMARY AND CONCLUSION 91
SummaryIn order to provide security to ICs against side-channel attacks especially
Differential Power Analysis (DPA) it is necessary to implement the design in a logic that
can render constant power dissipation irrespective of the input combination WDDL is
proved to be advantageous to others and therefore is of great significance In this
dissertation work architecture for Blowfish Algorithm is designed and implemented in
WDDL style In this implementation bottom-up approach is used The low level entities
are designed and later they are all combined to form the entire module The key
scheduling is online The sub-keys generated for a particular key can be used for the
encryption of the entire data to be encrypted with that key The sub keys are given in
reverse direction for the decryption data path Initially logic gates are implemented in
WDDL and then higher modules have been designed by instantiating the WDDL gates to
form the entire module thus resulting in constant power dissipation irrespective of any
input data combination The entire design works in two phases namely Precharge phase and
Evaluation phase In the Precharge phase all the signals of the design are zeroed and
during the Evaluation phase the functionality of the design is achieved This sort of design
has been found simple and very effective in thwarting the side-channel attack namely
Differential Power analysis (DPA)92 ConclusionThe crypto processor has been
44
designed for the key size of 448 bits and plain text of 64 bits The code for the
implementation has been written in VHDL The functional verification has been done using
the Cadence (NC Launch) Simulation package and Synthesis using RTL Compiler The
Backend of the design is done using the SOC EncounterAccording to the specifications
desired functionality has been achieved In the output during the Evaluation phase there
has been same number of transitions thus resulting in constant power dissipation During
Synthesis it has been observed that a simple WDDL gate comprised many conventional
gates Therefore the area of the design has grown nearly three-fold when compared to the
design implemented in conventional CMOS logic at the cost of security incorporated into
the IC against Differential Power Analysis (DPA) Due to the constant power dissipation at
the output hacker cannot apply Differential Power Analysis (DPA) scheme to find the
secret key that is being used in the crypto-processor Thus security against DPA is
incorporated into the IC at hardware level by implementing the design in WDDL style
which is quite simple and effectiveCHAPTER 10
REFERENCES 101 Referred Technical papers[1] Kris Tiri Member
IEEE and Ingrid Verbauwhede Senior Member IEEE ldquoA Digital Design Flow for
Secure Integrated Circuitsrdquo IEEE Transaction on Computer-Aided Design of Integrated
Circuits and Systems Vol 25 No 7 July 2006[2] Prof Jean-Jacques Quisquater Math
RiZK laquo Side Channel Attacks raquo October 2002[3] Ross Anderson Mike Bond Jolyon
Clulow and Sergei Skorobogatov ldquoCrypto processors ndash A Surveyrdquo IEEE Proceedings[4]
Noohul Basheer Zain Ali James M Noras ldquoOptimal Data path Design for a Cryptographic
Processor The Blowfish Algorithmrdquo Malaysian Journal of Computer Science Vol 14 No
1 June 2001 pp 16-27[5] Dan Rinehimer Derek Wilson ldquoSummary of B Schneierrsquos
Description of New Variable Length Key 64-Bit Block Cipher (Blowfish)rdquo[6] Kris Tiri and
Ingrid Verbauwhede ldquoA Dynamic and Differential CMOS Logic Style to Resist Power and
Timing Attacks on Security ICrsquosrdquo[7] Discretix Technologies Limited ldquoIntroduction to Side
45
Channel Attacksrdquo[8] Kris Tiri Moonmoon Akmal and Ingrid Verbauwhede ldquoA Dynamic
and Differential Logic with Signal Independent Power Consumption to withstand
Differential Power Analysis on Smart CardsrdquoReference books[1] William Stallings
ldquoCryptography and Network Security Principles and Practicesrdquo Pearson Education
2003[2] Samir Palnitkar ldquoVerilog HDL A Guide to Digital Design and Synthesisrdquo
Prentice Hall Referred Web Pages[1] httpwwwschneiercomblowfishhtml[2]
httpwwwnistgov[3] httpwwwdiscretixcomPDFIntroduction20to20Side20Channel
20Attackspdf[4] httpwwwwipointpctdbenwojsp
IA=WO2005081085ampDISPLAY=CLAIMS
46
Here above all the four transitions of CMOS inverter can be distinguished when
monitoring the power supply
63 Dynamic Differential Logic
Dynamic differential logic sometimes also referred to as dual rail with pre-charge
logic fulfills the first condition A differential logic family uses the true and the false
representation of the input and output signals and a dynamic logic family alternates pre-charge
and evaluation phases As a result since both outputs (true and false) are pre-charged to 1
exactly one of the two output nodes evaluates to 0 to have a differential output signal in the
evaluation phase The discharged output node is charged to 1 in the following pre-charge phase
to pre-charge both outputs to 1 In other words every signal transition including the events in
which the input signals remain constant is represented with an actual switching event in
which the logic gate charges a capacitance All the logic families that have been introduced to
thwart the differential power analysis (DPA) by using dynamic differential logic in the
following techniques
1 Sense Amplifier Based Logic (SABL) and
2 Wave Dynamic Differential Logic (WDDL) gates
631 Sense Amplifier Based logic (SABL)
SABL has its main advantage that it has balanced input and output nodes and that all
internal nodes connect to an output The output capacitances can be balanced Systematic
methods have been developed to make sure that both branches of the differential pull down
network are balanced and that no memory effects are present in the network Sense Amplifier
Based logic is illustrated as
29
Sense Amplifier Based Logic
ANDNAND gate
This circuit style does require however a full custom characterization and layout It also
suffers from a high clock load common to all dynamic logic gates
632 Wave Dynamic Differential Logic Gates (WDDL)
WDDL logic can be implemented with static CMOS logic Static CMOS
standard cells are combined to form secure compound standard cells
which have a reduced power signature WDDL has many advantages It can
be readily implemented from an existing standard cell library The design
flow is fully supported with accurate EDA library files that come directly
from the vendor WDDL also results in a dynamic differential logic with only
a small load capacitance on the pre-charge control signal and with the low
power consumption and the high noise margins of static CMOS
Advantages of WDDL logic style are as follows
30
A major advantage of the proposed logic style is that it can be incorporated by the common
Electronic Design Automation (EDA) tool flow
No special design rules are involved in the interconnection of WDDL gates
The switching factor of WDDL is 100 A WDDL gate consists of a parallel
combination of two positive complementary gates one calculating the
true output using the true inputs the other the false output using the
false inputs A positive gate produces a zero output for an all zero input
The AND gate and the OR gate are examples of positive gates A
complementary gate sometimes also referred to as a dual gate
expresses the false output of the original logic gate using the false
inputs of the original gate The AND gate fed with true input signals and
the OR gate fed with false input signals are two dual gates Fig shows
the WDDL AND gate and the WDDL OR gate In the evaluation phase
each input signal is differential and the WDDL gate calculates its
differential output In the pre-charge phase the inputs to the WDDL gate
are set at 0 This puts the output of the gate at 0 A module in WDDL
pre-charges without distributing the pre-charge signal to each individual
gate During the pre-charge phase the input vector of the combinatorial
logic is set at all 0s Each individual gate will eventually have all its
inputs at 0 evaluate its output to 0 and pass this 0 value to the next
gate One could say that the pre-charge signal travels over the
combinatorial logic as a 0-wave hence WDDL There are several ways
to launch to pre-charge wave In Fig a pre-charge operator is inserted
at the start of every combinatorial logic tree ie the inputs of the
encryption module and the outputs of the registers They produce an all-
zero output in the pre-charge phase (clk-signal high) but let the
31
differential signal through during the evaluation phase (clk-signal low)
Fig
ure WDDL Pre-charge wave generationCHAPTER 7
WDDL GATESThe methodology used in the project is bottom-up approach Lower
modules are designed and later integrated to form larger modules whose further integration
leads to the final top module As it is a fact that logic gates form lower level modules
initially logic gates required for the design are implemented in WDDL style WDDL
demands a parallel combination of two positive complementary gates one calculating the
true value and the other negative value The logic gates like OR AND XOR have been
implemented Besides there is even implementation of Full Adder 32-bit XOR
etc71WDDL OR gateA WDDL OR gate is constructed by considering conventional
OR gate in parallel to its complementary gate ie AND gate as shown in the following
32
figure Figure
41 WDDL OR GateThe NAND and NOT gates used act as Precharge operator injecting
signal lsquo0rsquo into the gates when lsquoclkrsquo signal is high ie during the Precharge phase72
WDDL AND gateA WDDL AND gate is constructed by considering conventional
AND gate in parallel to its complementary gate ie OR gate as shown in the following
33
figure Figure
42 WDDL AND Gate73 WDDL NAND gateA WDDL NAND gate is constructed by
considering conventional AND gate in parallel to its complementary gate ie OR gate as
shown in the following figure
34
Figure
43 WDDL NAND Gate74 WDDL NOR gateA WDDL NOR gate is constructed by
considering conventional OR gate in parallel to its complementary gate ie AND gate as
shown in the following figure
35
Figure 44 WDDL
NOR Gate 75 WDDL XOR gate XOR function can be implemented by the
Boolean Equation A B = ABrsquo + ArsquoB Therefore XOR Gate is implemented
in terms of AND gate and OR gate as shown in the figure 44 It can also be implemented
by instantiating a WDDL AND gate and WDDL OR gate But the number of gates
involved in the latter one is greater than the former one Therefore the first method of
implementation is followed rather than the second one
36
Figure 45
WDDL XOR gateWith the help of the above basic gates Full adder circuit has been
designed by instantiating the above designed WDDL gates During the implementation of
the Blowfish algorithm a 32-bit XOR gate and 32-bit Adder circuit are required They can
be easily implemented by instantiating the corresponding lower module 32 number of
timesCHAPTER 8 FRONT END
RESULTSWDDL OR GATESynthesis
Report==========================================================
= Final Report
===========================================================Final
ResultsRTL Top Level Output File Name wddlorngrTop Level Output File Name
wddlorOutput Format NGCOptimization Goal SpeedKeep
Hierarchy NODesign Statistics IOs 5Cell Usage
BELS 2 LUT3 2 IO Buffers 5
37
IBUF 3 OBUF
2============================================================Devi
ce utilization summary---------------------------Selected Device 3s250eft256-4 Number
of Slices 1 out of 2448 0 Number of 4 input LUTs 2
out of 4896 0 Number of IOs 5 Number of bonded IOBs
5 out of 172 2 Timing Summary---------------Speed Grade -4Maximum
combinational path delay 6236nsSimulation Result
S
ynthesis Result
38
WDD
L AND GATESynthesis
Report==========================================================
== Final Report
============================================================Fina
l ResultsRTL Top Level Output File Name wddlgatesngrTop Level Output File
Name wddlgatesOutput Format NGCOptimization Goal
SpeedKeep Hierarchy NODesign Statistics IOs
5Cell Usage BELS 2 LUT3 2 IO Buffers
5 IBUF 3 OBUF
2===========================================================Devic
e utilization summary---------------------------Selected Device 3s250etq144-4 Number
of Slices 1 out of 2448 0 Number of 4 input LUTs 2
out of 4896 0 Number of IOs 5 Number of bonded IOBs
5 out of 108 4 Timing Summary---------------Speed Grade -4Maximum
combinational path delay 6236nsSimulation Result
39
Sy
nthesis Result
WDDL NAND GATESynthesis
Report==========================================================
== Final Report
============================================================Fina
l ResultsRTL Top Level Output File Name wddlnand1ngrTop Level Output File
Name wddlnand1Output Format NGCOptimization Goal
SpeedKeep Hierarchy NODesign Statistics IOs
5Cell Usage BELS 2 LUT3 2 IO Buffers
40
5 IBUF 3 OBUF
2============================================================Devi
ce utilization summarySelected Device 3s500efg320-4 Number of Slices
1 out of 4656 0 Number of 4 input LUTs 2 out of 9312 0
Number of IOs 5 Number of bonded IOBs 5 out of 232
2 Timing SummarySpeed Grade -4Maximum combinational path delay
6236nsSimulation Result
Synthesis Result
WD
41
DL XOR GATESimulation Result
Synthesis Result
WDDL XOR GATESynthesis
Report==========================================================
== Final Report
===========================================================Final
42
ResultsRTL Top Level Output File Name wddlxorgatengrTop Level Output File
Name wddlxorgateOutput Format NGCOptimization Goal
SpeedKeep Hierarchy NODesign Statistics IOs
5Cell Usage BELS 2 LUT3 2 IO Buffers
5 IBUF 3 OBUF
2============================================================Devi
ce utilization summary---------------------------Selected Device 3s250eft256-4 Number
of Slices 1 out of 2448 0 Number of 4 input LUTs 2
out of 4896 0 Number of IOs 5 Number of bonded IOBs
5 out of 172 2 Timing Summary---------------Speed Grade -4Maximum
combinational path delay 6236nsSimulation Result
Synthesis Result
43
CHAPTER 9 SUMMARY AND CONCLUSION 91
SummaryIn order to provide security to ICs against side-channel attacks especially
Differential Power Analysis (DPA) it is necessary to implement the design in a logic that
can render constant power dissipation irrespective of the input combination WDDL is
proved to be advantageous to others and therefore is of great significance In this
dissertation work architecture for Blowfish Algorithm is designed and implemented in
WDDL style In this implementation bottom-up approach is used The low level entities
are designed and later they are all combined to form the entire module The key
scheduling is online The sub-keys generated for a particular key can be used for the
encryption of the entire data to be encrypted with that key The sub keys are given in
reverse direction for the decryption data path Initially logic gates are implemented in
WDDL and then higher modules have been designed by instantiating the WDDL gates to
form the entire module thus resulting in constant power dissipation irrespective of any
input data combination The entire design works in two phases namely Precharge phase and
Evaluation phase In the Precharge phase all the signals of the design are zeroed and
during the Evaluation phase the functionality of the design is achieved This sort of design
has been found simple and very effective in thwarting the side-channel attack namely
Differential Power analysis (DPA)92 ConclusionThe crypto processor has been
44
designed for the key size of 448 bits and plain text of 64 bits The code for the
implementation has been written in VHDL The functional verification has been done using
the Cadence (NC Launch) Simulation package and Synthesis using RTL Compiler The
Backend of the design is done using the SOC EncounterAccording to the specifications
desired functionality has been achieved In the output during the Evaluation phase there
has been same number of transitions thus resulting in constant power dissipation During
Synthesis it has been observed that a simple WDDL gate comprised many conventional
gates Therefore the area of the design has grown nearly three-fold when compared to the
design implemented in conventional CMOS logic at the cost of security incorporated into
the IC against Differential Power Analysis (DPA) Due to the constant power dissipation at
the output hacker cannot apply Differential Power Analysis (DPA) scheme to find the
secret key that is being used in the crypto-processor Thus security against DPA is
incorporated into the IC at hardware level by implementing the design in WDDL style
which is quite simple and effectiveCHAPTER 10
REFERENCES 101 Referred Technical papers[1] Kris Tiri Member
IEEE and Ingrid Verbauwhede Senior Member IEEE ldquoA Digital Design Flow for
Secure Integrated Circuitsrdquo IEEE Transaction on Computer-Aided Design of Integrated
Circuits and Systems Vol 25 No 7 July 2006[2] Prof Jean-Jacques Quisquater Math
RiZK laquo Side Channel Attacks raquo October 2002[3] Ross Anderson Mike Bond Jolyon
Clulow and Sergei Skorobogatov ldquoCrypto processors ndash A Surveyrdquo IEEE Proceedings[4]
Noohul Basheer Zain Ali James M Noras ldquoOptimal Data path Design for a Cryptographic
Processor The Blowfish Algorithmrdquo Malaysian Journal of Computer Science Vol 14 No
1 June 2001 pp 16-27[5] Dan Rinehimer Derek Wilson ldquoSummary of B Schneierrsquos
Description of New Variable Length Key 64-Bit Block Cipher (Blowfish)rdquo[6] Kris Tiri and
Ingrid Verbauwhede ldquoA Dynamic and Differential CMOS Logic Style to Resist Power and
Timing Attacks on Security ICrsquosrdquo[7] Discretix Technologies Limited ldquoIntroduction to Side
45
Channel Attacksrdquo[8] Kris Tiri Moonmoon Akmal and Ingrid Verbauwhede ldquoA Dynamic
and Differential Logic with Signal Independent Power Consumption to withstand
Differential Power Analysis on Smart CardsrdquoReference books[1] William Stallings
ldquoCryptography and Network Security Principles and Practicesrdquo Pearson Education
2003[2] Samir Palnitkar ldquoVerilog HDL A Guide to Digital Design and Synthesisrdquo
Prentice Hall Referred Web Pages[1] httpwwwschneiercomblowfishhtml[2]
httpwwwnistgov[3] httpwwwdiscretixcomPDFIntroduction20to20Side20Channel
20Attackspdf[4] httpwwwwipointpctdbenwojsp
IA=WO2005081085ampDISPLAY=CLAIMS
46
Sense Amplifier Based Logic
ANDNAND gate
This circuit style does require however a full custom characterization and layout It also
suffers from a high clock load common to all dynamic logic gates
632 Wave Dynamic Differential Logic Gates (WDDL)
WDDL logic can be implemented with static CMOS logic Static CMOS
standard cells are combined to form secure compound standard cells
which have a reduced power signature WDDL has many advantages It can
be readily implemented from an existing standard cell library The design
flow is fully supported with accurate EDA library files that come directly
from the vendor WDDL also results in a dynamic differential logic with only
a small load capacitance on the pre-charge control signal and with the low
power consumption and the high noise margins of static CMOS
Advantages of WDDL logic style are as follows
30
A major advantage of the proposed logic style is that it can be incorporated by the common
Electronic Design Automation (EDA) tool flow
No special design rules are involved in the interconnection of WDDL gates
The switching factor of WDDL is 100 A WDDL gate consists of a parallel
combination of two positive complementary gates one calculating the
true output using the true inputs the other the false output using the
false inputs A positive gate produces a zero output for an all zero input
The AND gate and the OR gate are examples of positive gates A
complementary gate sometimes also referred to as a dual gate
expresses the false output of the original logic gate using the false
inputs of the original gate The AND gate fed with true input signals and
the OR gate fed with false input signals are two dual gates Fig shows
the WDDL AND gate and the WDDL OR gate In the evaluation phase
each input signal is differential and the WDDL gate calculates its
differential output In the pre-charge phase the inputs to the WDDL gate
are set at 0 This puts the output of the gate at 0 A module in WDDL
pre-charges without distributing the pre-charge signal to each individual
gate During the pre-charge phase the input vector of the combinatorial
logic is set at all 0s Each individual gate will eventually have all its
inputs at 0 evaluate its output to 0 and pass this 0 value to the next
gate One could say that the pre-charge signal travels over the
combinatorial logic as a 0-wave hence WDDL There are several ways
to launch to pre-charge wave In Fig a pre-charge operator is inserted
at the start of every combinatorial logic tree ie the inputs of the
encryption module and the outputs of the registers They produce an all-
zero output in the pre-charge phase (clk-signal high) but let the
31
differential signal through during the evaluation phase (clk-signal low)
Fig
ure WDDL Pre-charge wave generationCHAPTER 7
WDDL GATESThe methodology used in the project is bottom-up approach Lower
modules are designed and later integrated to form larger modules whose further integration
leads to the final top module As it is a fact that logic gates form lower level modules
initially logic gates required for the design are implemented in WDDL style WDDL
demands a parallel combination of two positive complementary gates one calculating the
true value and the other negative value The logic gates like OR AND XOR have been
implemented Besides there is even implementation of Full Adder 32-bit XOR
etc71WDDL OR gateA WDDL OR gate is constructed by considering conventional
OR gate in parallel to its complementary gate ie AND gate as shown in the following
32
figure Figure
41 WDDL OR GateThe NAND and NOT gates used act as Precharge operator injecting
signal lsquo0rsquo into the gates when lsquoclkrsquo signal is high ie during the Precharge phase72
WDDL AND gateA WDDL AND gate is constructed by considering conventional
AND gate in parallel to its complementary gate ie OR gate as shown in the following
33
figure Figure
42 WDDL AND Gate73 WDDL NAND gateA WDDL NAND gate is constructed by
considering conventional AND gate in parallel to its complementary gate ie OR gate as
shown in the following figure
34
Figure
43 WDDL NAND Gate74 WDDL NOR gateA WDDL NOR gate is constructed by
considering conventional OR gate in parallel to its complementary gate ie AND gate as
shown in the following figure
35
Figure 44 WDDL
NOR Gate 75 WDDL XOR gate XOR function can be implemented by the
Boolean Equation A B = ABrsquo + ArsquoB Therefore XOR Gate is implemented
in terms of AND gate and OR gate as shown in the figure 44 It can also be implemented
by instantiating a WDDL AND gate and WDDL OR gate But the number of gates
involved in the latter one is greater than the former one Therefore the first method of
implementation is followed rather than the second one
36
Figure 45
WDDL XOR gateWith the help of the above basic gates Full adder circuit has been
designed by instantiating the above designed WDDL gates During the implementation of
the Blowfish algorithm a 32-bit XOR gate and 32-bit Adder circuit are required They can
be easily implemented by instantiating the corresponding lower module 32 number of
timesCHAPTER 8 FRONT END
RESULTSWDDL OR GATESynthesis
Report==========================================================
= Final Report
===========================================================Final
ResultsRTL Top Level Output File Name wddlorngrTop Level Output File Name
wddlorOutput Format NGCOptimization Goal SpeedKeep
Hierarchy NODesign Statistics IOs 5Cell Usage
BELS 2 LUT3 2 IO Buffers 5
37
IBUF 3 OBUF
2============================================================Devi
ce utilization summary---------------------------Selected Device 3s250eft256-4 Number
of Slices 1 out of 2448 0 Number of 4 input LUTs 2
out of 4896 0 Number of IOs 5 Number of bonded IOBs
5 out of 172 2 Timing Summary---------------Speed Grade -4Maximum
combinational path delay 6236nsSimulation Result
S
ynthesis Result
38
WDD
L AND GATESynthesis
Report==========================================================
== Final Report
============================================================Fina
l ResultsRTL Top Level Output File Name wddlgatesngrTop Level Output File
Name wddlgatesOutput Format NGCOptimization Goal
SpeedKeep Hierarchy NODesign Statistics IOs
5Cell Usage BELS 2 LUT3 2 IO Buffers
5 IBUF 3 OBUF
2===========================================================Devic
e utilization summary---------------------------Selected Device 3s250etq144-4 Number
of Slices 1 out of 2448 0 Number of 4 input LUTs 2
out of 4896 0 Number of IOs 5 Number of bonded IOBs
5 out of 108 4 Timing Summary---------------Speed Grade -4Maximum
combinational path delay 6236nsSimulation Result
39
Sy
nthesis Result
WDDL NAND GATESynthesis
Report==========================================================
== Final Report
============================================================Fina
l ResultsRTL Top Level Output File Name wddlnand1ngrTop Level Output File
Name wddlnand1Output Format NGCOptimization Goal
SpeedKeep Hierarchy NODesign Statistics IOs
5Cell Usage BELS 2 LUT3 2 IO Buffers
40
5 IBUF 3 OBUF
2============================================================Devi
ce utilization summarySelected Device 3s500efg320-4 Number of Slices
1 out of 4656 0 Number of 4 input LUTs 2 out of 9312 0
Number of IOs 5 Number of bonded IOBs 5 out of 232
2 Timing SummarySpeed Grade -4Maximum combinational path delay
6236nsSimulation Result
Synthesis Result
WD
41
DL XOR GATESimulation Result
Synthesis Result
WDDL XOR GATESynthesis
Report==========================================================
== Final Report
===========================================================Final
42
ResultsRTL Top Level Output File Name wddlxorgatengrTop Level Output File
Name wddlxorgateOutput Format NGCOptimization Goal
SpeedKeep Hierarchy NODesign Statistics IOs
5Cell Usage BELS 2 LUT3 2 IO Buffers
5 IBUF 3 OBUF
2============================================================Devi
ce utilization summary---------------------------Selected Device 3s250eft256-4 Number
of Slices 1 out of 2448 0 Number of 4 input LUTs 2
out of 4896 0 Number of IOs 5 Number of bonded IOBs
5 out of 172 2 Timing Summary---------------Speed Grade -4Maximum
combinational path delay 6236nsSimulation Result
Synthesis Result
43
CHAPTER 9 SUMMARY AND CONCLUSION 91
SummaryIn order to provide security to ICs against side-channel attacks especially
Differential Power Analysis (DPA) it is necessary to implement the design in a logic that
can render constant power dissipation irrespective of the input combination WDDL is
proved to be advantageous to others and therefore is of great significance In this
dissertation work architecture for Blowfish Algorithm is designed and implemented in
WDDL style In this implementation bottom-up approach is used The low level entities
are designed and later they are all combined to form the entire module The key
scheduling is online The sub-keys generated for a particular key can be used for the
encryption of the entire data to be encrypted with that key The sub keys are given in
reverse direction for the decryption data path Initially logic gates are implemented in
WDDL and then higher modules have been designed by instantiating the WDDL gates to
form the entire module thus resulting in constant power dissipation irrespective of any
input data combination The entire design works in two phases namely Precharge phase and
Evaluation phase In the Precharge phase all the signals of the design are zeroed and
during the Evaluation phase the functionality of the design is achieved This sort of design
has been found simple and very effective in thwarting the side-channel attack namely
Differential Power analysis (DPA)92 ConclusionThe crypto processor has been
44
designed for the key size of 448 bits and plain text of 64 bits The code for the
implementation has been written in VHDL The functional verification has been done using
the Cadence (NC Launch) Simulation package and Synthesis using RTL Compiler The
Backend of the design is done using the SOC EncounterAccording to the specifications
desired functionality has been achieved In the output during the Evaluation phase there
has been same number of transitions thus resulting in constant power dissipation During
Synthesis it has been observed that a simple WDDL gate comprised many conventional
gates Therefore the area of the design has grown nearly three-fold when compared to the
design implemented in conventional CMOS logic at the cost of security incorporated into
the IC against Differential Power Analysis (DPA) Due to the constant power dissipation at
the output hacker cannot apply Differential Power Analysis (DPA) scheme to find the
secret key that is being used in the crypto-processor Thus security against DPA is
incorporated into the IC at hardware level by implementing the design in WDDL style
which is quite simple and effectiveCHAPTER 10
REFERENCES 101 Referred Technical papers[1] Kris Tiri Member
IEEE and Ingrid Verbauwhede Senior Member IEEE ldquoA Digital Design Flow for
Secure Integrated Circuitsrdquo IEEE Transaction on Computer-Aided Design of Integrated
Circuits and Systems Vol 25 No 7 July 2006[2] Prof Jean-Jacques Quisquater Math
RiZK laquo Side Channel Attacks raquo October 2002[3] Ross Anderson Mike Bond Jolyon
Clulow and Sergei Skorobogatov ldquoCrypto processors ndash A Surveyrdquo IEEE Proceedings[4]
Noohul Basheer Zain Ali James M Noras ldquoOptimal Data path Design for a Cryptographic
Processor The Blowfish Algorithmrdquo Malaysian Journal of Computer Science Vol 14 No
1 June 2001 pp 16-27[5] Dan Rinehimer Derek Wilson ldquoSummary of B Schneierrsquos
Description of New Variable Length Key 64-Bit Block Cipher (Blowfish)rdquo[6] Kris Tiri and
Ingrid Verbauwhede ldquoA Dynamic and Differential CMOS Logic Style to Resist Power and
Timing Attacks on Security ICrsquosrdquo[7] Discretix Technologies Limited ldquoIntroduction to Side
45
Channel Attacksrdquo[8] Kris Tiri Moonmoon Akmal and Ingrid Verbauwhede ldquoA Dynamic
and Differential Logic with Signal Independent Power Consumption to withstand
Differential Power Analysis on Smart CardsrdquoReference books[1] William Stallings
ldquoCryptography and Network Security Principles and Practicesrdquo Pearson Education
2003[2] Samir Palnitkar ldquoVerilog HDL A Guide to Digital Design and Synthesisrdquo
Prentice Hall Referred Web Pages[1] httpwwwschneiercomblowfishhtml[2]
httpwwwnistgov[3] httpwwwdiscretixcomPDFIntroduction20to20Side20Channel
20Attackspdf[4] httpwwwwipointpctdbenwojsp
IA=WO2005081085ampDISPLAY=CLAIMS
46
A major advantage of the proposed logic style is that it can be incorporated by the common
Electronic Design Automation (EDA) tool flow
No special design rules are involved in the interconnection of WDDL gates
The switching factor of WDDL is 100 A WDDL gate consists of a parallel
combination of two positive complementary gates one calculating the
true output using the true inputs the other the false output using the
false inputs A positive gate produces a zero output for an all zero input
The AND gate and the OR gate are examples of positive gates A
complementary gate sometimes also referred to as a dual gate
expresses the false output of the original logic gate using the false
inputs of the original gate The AND gate fed with true input signals and
the OR gate fed with false input signals are two dual gates Fig shows
the WDDL AND gate and the WDDL OR gate In the evaluation phase
each input signal is differential and the WDDL gate calculates its
differential output In the pre-charge phase the inputs to the WDDL gate
are set at 0 This puts the output of the gate at 0 A module in WDDL
pre-charges without distributing the pre-charge signal to each individual
gate During the pre-charge phase the input vector of the combinatorial
logic is set at all 0s Each individual gate will eventually have all its
inputs at 0 evaluate its output to 0 and pass this 0 value to the next
gate One could say that the pre-charge signal travels over the
combinatorial logic as a 0-wave hence WDDL There are several ways
to launch to pre-charge wave In Fig a pre-charge operator is inserted
at the start of every combinatorial logic tree ie the inputs of the
encryption module and the outputs of the registers They produce an all-
zero output in the pre-charge phase (clk-signal high) but let the
31
differential signal through during the evaluation phase (clk-signal low)
Fig
ure WDDL Pre-charge wave generationCHAPTER 7
WDDL GATESThe methodology used in the project is bottom-up approach Lower
modules are designed and later integrated to form larger modules whose further integration
leads to the final top module As it is a fact that logic gates form lower level modules
initially logic gates required for the design are implemented in WDDL style WDDL
demands a parallel combination of two positive complementary gates one calculating the
true value and the other negative value The logic gates like OR AND XOR have been
implemented Besides there is even implementation of Full Adder 32-bit XOR
etc71WDDL OR gateA WDDL OR gate is constructed by considering conventional
OR gate in parallel to its complementary gate ie AND gate as shown in the following
32
figure Figure
41 WDDL OR GateThe NAND and NOT gates used act as Precharge operator injecting
signal lsquo0rsquo into the gates when lsquoclkrsquo signal is high ie during the Precharge phase72
WDDL AND gateA WDDL AND gate is constructed by considering conventional
AND gate in parallel to its complementary gate ie OR gate as shown in the following
33
figure Figure
42 WDDL AND Gate73 WDDL NAND gateA WDDL NAND gate is constructed by
considering conventional AND gate in parallel to its complementary gate ie OR gate as
shown in the following figure
34
Figure
43 WDDL NAND Gate74 WDDL NOR gateA WDDL NOR gate is constructed by
considering conventional OR gate in parallel to its complementary gate ie AND gate as
shown in the following figure
35
Figure 44 WDDL
NOR Gate 75 WDDL XOR gate XOR function can be implemented by the
Boolean Equation A B = ABrsquo + ArsquoB Therefore XOR Gate is implemented
in terms of AND gate and OR gate as shown in the figure 44 It can also be implemented
by instantiating a WDDL AND gate and WDDL OR gate But the number of gates
involved in the latter one is greater than the former one Therefore the first method of
implementation is followed rather than the second one
36
Figure 45
WDDL XOR gateWith the help of the above basic gates Full adder circuit has been
designed by instantiating the above designed WDDL gates During the implementation of
the Blowfish algorithm a 32-bit XOR gate and 32-bit Adder circuit are required They can
be easily implemented by instantiating the corresponding lower module 32 number of
timesCHAPTER 8 FRONT END
RESULTSWDDL OR GATESynthesis
Report==========================================================
= Final Report
===========================================================Final
ResultsRTL Top Level Output File Name wddlorngrTop Level Output File Name
wddlorOutput Format NGCOptimization Goal SpeedKeep
Hierarchy NODesign Statistics IOs 5Cell Usage
BELS 2 LUT3 2 IO Buffers 5
37
IBUF 3 OBUF
2============================================================Devi
ce utilization summary---------------------------Selected Device 3s250eft256-4 Number
of Slices 1 out of 2448 0 Number of 4 input LUTs 2
out of 4896 0 Number of IOs 5 Number of bonded IOBs
5 out of 172 2 Timing Summary---------------Speed Grade -4Maximum
combinational path delay 6236nsSimulation Result
S
ynthesis Result
38
WDD
L AND GATESynthesis
Report==========================================================
== Final Report
============================================================Fina
l ResultsRTL Top Level Output File Name wddlgatesngrTop Level Output File
Name wddlgatesOutput Format NGCOptimization Goal
SpeedKeep Hierarchy NODesign Statistics IOs
5Cell Usage BELS 2 LUT3 2 IO Buffers
5 IBUF 3 OBUF
2===========================================================Devic
e utilization summary---------------------------Selected Device 3s250etq144-4 Number
of Slices 1 out of 2448 0 Number of 4 input LUTs 2
out of 4896 0 Number of IOs 5 Number of bonded IOBs
5 out of 108 4 Timing Summary---------------Speed Grade -4Maximum
combinational path delay 6236nsSimulation Result
39
Sy
nthesis Result
WDDL NAND GATESynthesis
Report==========================================================
== Final Report
============================================================Fina
l ResultsRTL Top Level Output File Name wddlnand1ngrTop Level Output File
Name wddlnand1Output Format NGCOptimization Goal
SpeedKeep Hierarchy NODesign Statistics IOs
5Cell Usage BELS 2 LUT3 2 IO Buffers
40
5 IBUF 3 OBUF
2============================================================Devi
ce utilization summarySelected Device 3s500efg320-4 Number of Slices
1 out of 4656 0 Number of 4 input LUTs 2 out of 9312 0
Number of IOs 5 Number of bonded IOBs 5 out of 232
2 Timing SummarySpeed Grade -4Maximum combinational path delay
6236nsSimulation Result
Synthesis Result
WD
41
DL XOR GATESimulation Result
Synthesis Result
WDDL XOR GATESynthesis
Report==========================================================
== Final Report
===========================================================Final
42
ResultsRTL Top Level Output File Name wddlxorgatengrTop Level Output File
Name wddlxorgateOutput Format NGCOptimization Goal
SpeedKeep Hierarchy NODesign Statistics IOs
5Cell Usage BELS 2 LUT3 2 IO Buffers
5 IBUF 3 OBUF
2============================================================Devi
ce utilization summary---------------------------Selected Device 3s250eft256-4 Number
of Slices 1 out of 2448 0 Number of 4 input LUTs 2
out of 4896 0 Number of IOs 5 Number of bonded IOBs
5 out of 172 2 Timing Summary---------------Speed Grade -4Maximum
combinational path delay 6236nsSimulation Result
Synthesis Result
43
CHAPTER 9 SUMMARY AND CONCLUSION 91
SummaryIn order to provide security to ICs against side-channel attacks especially
Differential Power Analysis (DPA) it is necessary to implement the design in a logic that
can render constant power dissipation irrespective of the input combination WDDL is
proved to be advantageous to others and therefore is of great significance In this
dissertation work architecture for Blowfish Algorithm is designed and implemented in
WDDL style In this implementation bottom-up approach is used The low level entities
are designed and later they are all combined to form the entire module The key
scheduling is online The sub-keys generated for a particular key can be used for the
encryption of the entire data to be encrypted with that key The sub keys are given in
reverse direction for the decryption data path Initially logic gates are implemented in
WDDL and then higher modules have been designed by instantiating the WDDL gates to
form the entire module thus resulting in constant power dissipation irrespective of any
input data combination The entire design works in two phases namely Precharge phase and
Evaluation phase In the Precharge phase all the signals of the design are zeroed and
during the Evaluation phase the functionality of the design is achieved This sort of design
has been found simple and very effective in thwarting the side-channel attack namely
Differential Power analysis (DPA)92 ConclusionThe crypto processor has been
44
designed for the key size of 448 bits and plain text of 64 bits The code for the
implementation has been written in VHDL The functional verification has been done using
the Cadence (NC Launch) Simulation package and Synthesis using RTL Compiler The
Backend of the design is done using the SOC EncounterAccording to the specifications
desired functionality has been achieved In the output during the Evaluation phase there
has been same number of transitions thus resulting in constant power dissipation During
Synthesis it has been observed that a simple WDDL gate comprised many conventional
gates Therefore the area of the design has grown nearly three-fold when compared to the
design implemented in conventional CMOS logic at the cost of security incorporated into
the IC against Differential Power Analysis (DPA) Due to the constant power dissipation at
the output hacker cannot apply Differential Power Analysis (DPA) scheme to find the
secret key that is being used in the crypto-processor Thus security against DPA is
incorporated into the IC at hardware level by implementing the design in WDDL style
which is quite simple and effectiveCHAPTER 10
REFERENCES 101 Referred Technical papers[1] Kris Tiri Member
IEEE and Ingrid Verbauwhede Senior Member IEEE ldquoA Digital Design Flow for
Secure Integrated Circuitsrdquo IEEE Transaction on Computer-Aided Design of Integrated
Circuits and Systems Vol 25 No 7 July 2006[2] Prof Jean-Jacques Quisquater Math
RiZK laquo Side Channel Attacks raquo October 2002[3] Ross Anderson Mike Bond Jolyon
Clulow and Sergei Skorobogatov ldquoCrypto processors ndash A Surveyrdquo IEEE Proceedings[4]
Noohul Basheer Zain Ali James M Noras ldquoOptimal Data path Design for a Cryptographic
Processor The Blowfish Algorithmrdquo Malaysian Journal of Computer Science Vol 14 No
1 June 2001 pp 16-27[5] Dan Rinehimer Derek Wilson ldquoSummary of B Schneierrsquos
Description of New Variable Length Key 64-Bit Block Cipher (Blowfish)rdquo[6] Kris Tiri and
Ingrid Verbauwhede ldquoA Dynamic and Differential CMOS Logic Style to Resist Power and
Timing Attacks on Security ICrsquosrdquo[7] Discretix Technologies Limited ldquoIntroduction to Side
45
Channel Attacksrdquo[8] Kris Tiri Moonmoon Akmal and Ingrid Verbauwhede ldquoA Dynamic
and Differential Logic with Signal Independent Power Consumption to withstand
Differential Power Analysis on Smart CardsrdquoReference books[1] William Stallings
ldquoCryptography and Network Security Principles and Practicesrdquo Pearson Education
2003[2] Samir Palnitkar ldquoVerilog HDL A Guide to Digital Design and Synthesisrdquo
Prentice Hall Referred Web Pages[1] httpwwwschneiercomblowfishhtml[2]
httpwwwnistgov[3] httpwwwdiscretixcomPDFIntroduction20to20Side20Channel
20Attackspdf[4] httpwwwwipointpctdbenwojsp
IA=WO2005081085ampDISPLAY=CLAIMS
46
differential signal through during the evaluation phase (clk-signal low)
Fig
ure WDDL Pre-charge wave generationCHAPTER 7
WDDL GATESThe methodology used in the project is bottom-up approach Lower
modules are designed and later integrated to form larger modules whose further integration
leads to the final top module As it is a fact that logic gates form lower level modules
initially logic gates required for the design are implemented in WDDL style WDDL
demands a parallel combination of two positive complementary gates one calculating the
true value and the other negative value The logic gates like OR AND XOR have been
implemented Besides there is even implementation of Full Adder 32-bit XOR
etc71WDDL OR gateA WDDL OR gate is constructed by considering conventional
OR gate in parallel to its complementary gate ie AND gate as shown in the following
32
figure Figure
41 WDDL OR GateThe NAND and NOT gates used act as Precharge operator injecting
signal lsquo0rsquo into the gates when lsquoclkrsquo signal is high ie during the Precharge phase72
WDDL AND gateA WDDL AND gate is constructed by considering conventional
AND gate in parallel to its complementary gate ie OR gate as shown in the following
33
figure Figure
42 WDDL AND Gate73 WDDL NAND gateA WDDL NAND gate is constructed by
considering conventional AND gate in parallel to its complementary gate ie OR gate as
shown in the following figure
34
Figure
43 WDDL NAND Gate74 WDDL NOR gateA WDDL NOR gate is constructed by
considering conventional OR gate in parallel to its complementary gate ie AND gate as
shown in the following figure
35
Figure 44 WDDL
NOR Gate 75 WDDL XOR gate XOR function can be implemented by the
Boolean Equation A B = ABrsquo + ArsquoB Therefore XOR Gate is implemented
in terms of AND gate and OR gate as shown in the figure 44 It can also be implemented
by instantiating a WDDL AND gate and WDDL OR gate But the number of gates
involved in the latter one is greater than the former one Therefore the first method of
implementation is followed rather than the second one
36
Figure 45
WDDL XOR gateWith the help of the above basic gates Full adder circuit has been
designed by instantiating the above designed WDDL gates During the implementation of
the Blowfish algorithm a 32-bit XOR gate and 32-bit Adder circuit are required They can
be easily implemented by instantiating the corresponding lower module 32 number of
timesCHAPTER 8 FRONT END
RESULTSWDDL OR GATESynthesis
Report==========================================================
= Final Report
===========================================================Final
ResultsRTL Top Level Output File Name wddlorngrTop Level Output File Name
wddlorOutput Format NGCOptimization Goal SpeedKeep
Hierarchy NODesign Statistics IOs 5Cell Usage
BELS 2 LUT3 2 IO Buffers 5
37
IBUF 3 OBUF
2============================================================Devi
ce utilization summary---------------------------Selected Device 3s250eft256-4 Number
of Slices 1 out of 2448 0 Number of 4 input LUTs 2
out of 4896 0 Number of IOs 5 Number of bonded IOBs
5 out of 172 2 Timing Summary---------------Speed Grade -4Maximum
combinational path delay 6236nsSimulation Result
S
ynthesis Result
38
WDD
L AND GATESynthesis
Report==========================================================
== Final Report
============================================================Fina
l ResultsRTL Top Level Output File Name wddlgatesngrTop Level Output File
Name wddlgatesOutput Format NGCOptimization Goal
SpeedKeep Hierarchy NODesign Statistics IOs
5Cell Usage BELS 2 LUT3 2 IO Buffers
5 IBUF 3 OBUF
2===========================================================Devic
e utilization summary---------------------------Selected Device 3s250etq144-4 Number
of Slices 1 out of 2448 0 Number of 4 input LUTs 2
out of 4896 0 Number of IOs 5 Number of bonded IOBs
5 out of 108 4 Timing Summary---------------Speed Grade -4Maximum
combinational path delay 6236nsSimulation Result
39
Sy
nthesis Result
WDDL NAND GATESynthesis
Report==========================================================
== Final Report
============================================================Fina
l ResultsRTL Top Level Output File Name wddlnand1ngrTop Level Output File
Name wddlnand1Output Format NGCOptimization Goal
SpeedKeep Hierarchy NODesign Statistics IOs
5Cell Usage BELS 2 LUT3 2 IO Buffers
40
5 IBUF 3 OBUF
2============================================================Devi
ce utilization summarySelected Device 3s500efg320-4 Number of Slices
1 out of 4656 0 Number of 4 input LUTs 2 out of 9312 0
Number of IOs 5 Number of bonded IOBs 5 out of 232
2 Timing SummarySpeed Grade -4Maximum combinational path delay
6236nsSimulation Result
Synthesis Result
WD
41
DL XOR GATESimulation Result
Synthesis Result
WDDL XOR GATESynthesis
Report==========================================================
== Final Report
===========================================================Final
42
ResultsRTL Top Level Output File Name wddlxorgatengrTop Level Output File
Name wddlxorgateOutput Format NGCOptimization Goal
SpeedKeep Hierarchy NODesign Statistics IOs
5Cell Usage BELS 2 LUT3 2 IO Buffers
5 IBUF 3 OBUF
2============================================================Devi
ce utilization summary---------------------------Selected Device 3s250eft256-4 Number
of Slices 1 out of 2448 0 Number of 4 input LUTs 2
out of 4896 0 Number of IOs 5 Number of bonded IOBs
5 out of 172 2 Timing Summary---------------Speed Grade -4Maximum
combinational path delay 6236nsSimulation Result
Synthesis Result
43
CHAPTER 9 SUMMARY AND CONCLUSION 91
SummaryIn order to provide security to ICs against side-channel attacks especially
Differential Power Analysis (DPA) it is necessary to implement the design in a logic that
can render constant power dissipation irrespective of the input combination WDDL is
proved to be advantageous to others and therefore is of great significance In this
dissertation work architecture for Blowfish Algorithm is designed and implemented in
WDDL style In this implementation bottom-up approach is used The low level entities
are designed and later they are all combined to form the entire module The key
scheduling is online The sub-keys generated for a particular key can be used for the
encryption of the entire data to be encrypted with that key The sub keys are given in
reverse direction for the decryption data path Initially logic gates are implemented in
WDDL and then higher modules have been designed by instantiating the WDDL gates to
form the entire module thus resulting in constant power dissipation irrespective of any
input data combination The entire design works in two phases namely Precharge phase and
Evaluation phase In the Precharge phase all the signals of the design are zeroed and
during the Evaluation phase the functionality of the design is achieved This sort of design
has been found simple and very effective in thwarting the side-channel attack namely
Differential Power analysis (DPA)92 ConclusionThe crypto processor has been
44
designed for the key size of 448 bits and plain text of 64 bits The code for the
implementation has been written in VHDL The functional verification has been done using
the Cadence (NC Launch) Simulation package and Synthesis using RTL Compiler The
Backend of the design is done using the SOC EncounterAccording to the specifications
desired functionality has been achieved In the output during the Evaluation phase there
has been same number of transitions thus resulting in constant power dissipation During
Synthesis it has been observed that a simple WDDL gate comprised many conventional
gates Therefore the area of the design has grown nearly three-fold when compared to the
design implemented in conventional CMOS logic at the cost of security incorporated into
the IC against Differential Power Analysis (DPA) Due to the constant power dissipation at
the output hacker cannot apply Differential Power Analysis (DPA) scheme to find the
secret key that is being used in the crypto-processor Thus security against DPA is
incorporated into the IC at hardware level by implementing the design in WDDL style
which is quite simple and effectiveCHAPTER 10
REFERENCES 101 Referred Technical papers[1] Kris Tiri Member
IEEE and Ingrid Verbauwhede Senior Member IEEE ldquoA Digital Design Flow for
Secure Integrated Circuitsrdquo IEEE Transaction on Computer-Aided Design of Integrated
Circuits and Systems Vol 25 No 7 July 2006[2] Prof Jean-Jacques Quisquater Math
RiZK laquo Side Channel Attacks raquo October 2002[3] Ross Anderson Mike Bond Jolyon
Clulow and Sergei Skorobogatov ldquoCrypto processors ndash A Surveyrdquo IEEE Proceedings[4]
Noohul Basheer Zain Ali James M Noras ldquoOptimal Data path Design for a Cryptographic
Processor The Blowfish Algorithmrdquo Malaysian Journal of Computer Science Vol 14 No
1 June 2001 pp 16-27[5] Dan Rinehimer Derek Wilson ldquoSummary of B Schneierrsquos
Description of New Variable Length Key 64-Bit Block Cipher (Blowfish)rdquo[6] Kris Tiri and
Ingrid Verbauwhede ldquoA Dynamic and Differential CMOS Logic Style to Resist Power and
Timing Attacks on Security ICrsquosrdquo[7] Discretix Technologies Limited ldquoIntroduction to Side
45
Channel Attacksrdquo[8] Kris Tiri Moonmoon Akmal and Ingrid Verbauwhede ldquoA Dynamic
and Differential Logic with Signal Independent Power Consumption to withstand
Differential Power Analysis on Smart CardsrdquoReference books[1] William Stallings
ldquoCryptography and Network Security Principles and Practicesrdquo Pearson Education
2003[2] Samir Palnitkar ldquoVerilog HDL A Guide to Digital Design and Synthesisrdquo
Prentice Hall Referred Web Pages[1] httpwwwschneiercomblowfishhtml[2]
httpwwwnistgov[3] httpwwwdiscretixcomPDFIntroduction20to20Side20Channel
20Attackspdf[4] httpwwwwipointpctdbenwojsp
IA=WO2005081085ampDISPLAY=CLAIMS
46
figure Figure
41 WDDL OR GateThe NAND and NOT gates used act as Precharge operator injecting
signal lsquo0rsquo into the gates when lsquoclkrsquo signal is high ie during the Precharge phase72
WDDL AND gateA WDDL AND gate is constructed by considering conventional
AND gate in parallel to its complementary gate ie OR gate as shown in the following
33
figure Figure
42 WDDL AND Gate73 WDDL NAND gateA WDDL NAND gate is constructed by
considering conventional AND gate in parallel to its complementary gate ie OR gate as
shown in the following figure
34
Figure
43 WDDL NAND Gate74 WDDL NOR gateA WDDL NOR gate is constructed by
considering conventional OR gate in parallel to its complementary gate ie AND gate as
shown in the following figure
35
Figure 44 WDDL
NOR Gate 75 WDDL XOR gate XOR function can be implemented by the
Boolean Equation A B = ABrsquo + ArsquoB Therefore XOR Gate is implemented
in terms of AND gate and OR gate as shown in the figure 44 It can also be implemented
by instantiating a WDDL AND gate and WDDL OR gate But the number of gates
involved in the latter one is greater than the former one Therefore the first method of
implementation is followed rather than the second one
36
Figure 45
WDDL XOR gateWith the help of the above basic gates Full adder circuit has been
designed by instantiating the above designed WDDL gates During the implementation of
the Blowfish algorithm a 32-bit XOR gate and 32-bit Adder circuit are required They can
be easily implemented by instantiating the corresponding lower module 32 number of
timesCHAPTER 8 FRONT END
RESULTSWDDL OR GATESynthesis
Report==========================================================
= Final Report
===========================================================Final
ResultsRTL Top Level Output File Name wddlorngrTop Level Output File Name
wddlorOutput Format NGCOptimization Goal SpeedKeep
Hierarchy NODesign Statistics IOs 5Cell Usage
BELS 2 LUT3 2 IO Buffers 5
37
IBUF 3 OBUF
2============================================================Devi
ce utilization summary---------------------------Selected Device 3s250eft256-4 Number
of Slices 1 out of 2448 0 Number of 4 input LUTs 2
out of 4896 0 Number of IOs 5 Number of bonded IOBs
5 out of 172 2 Timing Summary---------------Speed Grade -4Maximum
combinational path delay 6236nsSimulation Result
S
ynthesis Result
38
WDD
L AND GATESynthesis
Report==========================================================
== Final Report
============================================================Fina
l ResultsRTL Top Level Output File Name wddlgatesngrTop Level Output File
Name wddlgatesOutput Format NGCOptimization Goal
SpeedKeep Hierarchy NODesign Statistics IOs
5Cell Usage BELS 2 LUT3 2 IO Buffers
5 IBUF 3 OBUF
2===========================================================Devic
e utilization summary---------------------------Selected Device 3s250etq144-4 Number
of Slices 1 out of 2448 0 Number of 4 input LUTs 2
out of 4896 0 Number of IOs 5 Number of bonded IOBs
5 out of 108 4 Timing Summary---------------Speed Grade -4Maximum
combinational path delay 6236nsSimulation Result
39
Sy
nthesis Result
WDDL NAND GATESynthesis
Report==========================================================
== Final Report
============================================================Fina
l ResultsRTL Top Level Output File Name wddlnand1ngrTop Level Output File
Name wddlnand1Output Format NGCOptimization Goal
SpeedKeep Hierarchy NODesign Statistics IOs
5Cell Usage BELS 2 LUT3 2 IO Buffers
40
5 IBUF 3 OBUF
2============================================================Devi
ce utilization summarySelected Device 3s500efg320-4 Number of Slices
1 out of 4656 0 Number of 4 input LUTs 2 out of 9312 0
Number of IOs 5 Number of bonded IOBs 5 out of 232
2 Timing SummarySpeed Grade -4Maximum combinational path delay
6236nsSimulation Result
Synthesis Result
WD
41
DL XOR GATESimulation Result
Synthesis Result
WDDL XOR GATESynthesis
Report==========================================================
== Final Report
===========================================================Final
42
ResultsRTL Top Level Output File Name wddlxorgatengrTop Level Output File
Name wddlxorgateOutput Format NGCOptimization Goal
SpeedKeep Hierarchy NODesign Statistics IOs
5Cell Usage BELS 2 LUT3 2 IO Buffers
5 IBUF 3 OBUF
2============================================================Devi
ce utilization summary---------------------------Selected Device 3s250eft256-4 Number
of Slices 1 out of 2448 0 Number of 4 input LUTs 2
out of 4896 0 Number of IOs 5 Number of bonded IOBs
5 out of 172 2 Timing Summary---------------Speed Grade -4Maximum
combinational path delay 6236nsSimulation Result
Synthesis Result
43
CHAPTER 9 SUMMARY AND CONCLUSION 91
SummaryIn order to provide security to ICs against side-channel attacks especially
Differential Power Analysis (DPA) it is necessary to implement the design in a logic that
can render constant power dissipation irrespective of the input combination WDDL is
proved to be advantageous to others and therefore is of great significance In this
dissertation work architecture for Blowfish Algorithm is designed and implemented in
WDDL style In this implementation bottom-up approach is used The low level entities
are designed and later they are all combined to form the entire module The key
scheduling is online The sub-keys generated for a particular key can be used for the
encryption of the entire data to be encrypted with that key The sub keys are given in
reverse direction for the decryption data path Initially logic gates are implemented in
WDDL and then higher modules have been designed by instantiating the WDDL gates to
form the entire module thus resulting in constant power dissipation irrespective of any
input data combination The entire design works in two phases namely Precharge phase and
Evaluation phase In the Precharge phase all the signals of the design are zeroed and
during the Evaluation phase the functionality of the design is achieved This sort of design
has been found simple and very effective in thwarting the side-channel attack namely
Differential Power analysis (DPA)92 ConclusionThe crypto processor has been
44
designed for the key size of 448 bits and plain text of 64 bits The code for the
implementation has been written in VHDL The functional verification has been done using
the Cadence (NC Launch) Simulation package and Synthesis using RTL Compiler The
Backend of the design is done using the SOC EncounterAccording to the specifications
desired functionality has been achieved In the output during the Evaluation phase there
has been same number of transitions thus resulting in constant power dissipation During
Synthesis it has been observed that a simple WDDL gate comprised many conventional
gates Therefore the area of the design has grown nearly three-fold when compared to the
design implemented in conventional CMOS logic at the cost of security incorporated into
the IC against Differential Power Analysis (DPA) Due to the constant power dissipation at
the output hacker cannot apply Differential Power Analysis (DPA) scheme to find the
secret key that is being used in the crypto-processor Thus security against DPA is
incorporated into the IC at hardware level by implementing the design in WDDL style
which is quite simple and effectiveCHAPTER 10
REFERENCES 101 Referred Technical papers[1] Kris Tiri Member
IEEE and Ingrid Verbauwhede Senior Member IEEE ldquoA Digital Design Flow for
Secure Integrated Circuitsrdquo IEEE Transaction on Computer-Aided Design of Integrated
Circuits and Systems Vol 25 No 7 July 2006[2] Prof Jean-Jacques Quisquater Math
RiZK laquo Side Channel Attacks raquo October 2002[3] Ross Anderson Mike Bond Jolyon
Clulow and Sergei Skorobogatov ldquoCrypto processors ndash A Surveyrdquo IEEE Proceedings[4]
Noohul Basheer Zain Ali James M Noras ldquoOptimal Data path Design for a Cryptographic
Processor The Blowfish Algorithmrdquo Malaysian Journal of Computer Science Vol 14 No
1 June 2001 pp 16-27[5] Dan Rinehimer Derek Wilson ldquoSummary of B Schneierrsquos
Description of New Variable Length Key 64-Bit Block Cipher (Blowfish)rdquo[6] Kris Tiri and
Ingrid Verbauwhede ldquoA Dynamic and Differential CMOS Logic Style to Resist Power and
Timing Attacks on Security ICrsquosrdquo[7] Discretix Technologies Limited ldquoIntroduction to Side
45
Channel Attacksrdquo[8] Kris Tiri Moonmoon Akmal and Ingrid Verbauwhede ldquoA Dynamic
and Differential Logic with Signal Independent Power Consumption to withstand
Differential Power Analysis on Smart CardsrdquoReference books[1] William Stallings
ldquoCryptography and Network Security Principles and Practicesrdquo Pearson Education
2003[2] Samir Palnitkar ldquoVerilog HDL A Guide to Digital Design and Synthesisrdquo
Prentice Hall Referred Web Pages[1] httpwwwschneiercomblowfishhtml[2]
httpwwwnistgov[3] httpwwwdiscretixcomPDFIntroduction20to20Side20Channel
20Attackspdf[4] httpwwwwipointpctdbenwojsp
IA=WO2005081085ampDISPLAY=CLAIMS
46
figure Figure
42 WDDL AND Gate73 WDDL NAND gateA WDDL NAND gate is constructed by
considering conventional AND gate in parallel to its complementary gate ie OR gate as
shown in the following figure
34
Figure
43 WDDL NAND Gate74 WDDL NOR gateA WDDL NOR gate is constructed by
considering conventional OR gate in parallel to its complementary gate ie AND gate as
shown in the following figure
35
Figure 44 WDDL
NOR Gate 75 WDDL XOR gate XOR function can be implemented by the
Boolean Equation A B = ABrsquo + ArsquoB Therefore XOR Gate is implemented
in terms of AND gate and OR gate as shown in the figure 44 It can also be implemented
by instantiating a WDDL AND gate and WDDL OR gate But the number of gates
involved in the latter one is greater than the former one Therefore the first method of
implementation is followed rather than the second one
36
Figure 45
WDDL XOR gateWith the help of the above basic gates Full adder circuit has been
designed by instantiating the above designed WDDL gates During the implementation of
the Blowfish algorithm a 32-bit XOR gate and 32-bit Adder circuit are required They can
be easily implemented by instantiating the corresponding lower module 32 number of
timesCHAPTER 8 FRONT END
RESULTSWDDL OR GATESynthesis
Report==========================================================
= Final Report
===========================================================Final
ResultsRTL Top Level Output File Name wddlorngrTop Level Output File Name
wddlorOutput Format NGCOptimization Goal SpeedKeep
Hierarchy NODesign Statistics IOs 5Cell Usage
BELS 2 LUT3 2 IO Buffers 5
37
IBUF 3 OBUF
2============================================================Devi
ce utilization summary---------------------------Selected Device 3s250eft256-4 Number
of Slices 1 out of 2448 0 Number of 4 input LUTs 2
out of 4896 0 Number of IOs 5 Number of bonded IOBs
5 out of 172 2 Timing Summary---------------Speed Grade -4Maximum
combinational path delay 6236nsSimulation Result
S
ynthesis Result
38
WDD
L AND GATESynthesis
Report==========================================================
== Final Report
============================================================Fina
l ResultsRTL Top Level Output File Name wddlgatesngrTop Level Output File
Name wddlgatesOutput Format NGCOptimization Goal
SpeedKeep Hierarchy NODesign Statistics IOs
5Cell Usage BELS 2 LUT3 2 IO Buffers
5 IBUF 3 OBUF
2===========================================================Devic
e utilization summary---------------------------Selected Device 3s250etq144-4 Number
of Slices 1 out of 2448 0 Number of 4 input LUTs 2
out of 4896 0 Number of IOs 5 Number of bonded IOBs
5 out of 108 4 Timing Summary---------------Speed Grade -4Maximum
combinational path delay 6236nsSimulation Result
39
Sy
nthesis Result
WDDL NAND GATESynthesis
Report==========================================================
== Final Report
============================================================Fina
l ResultsRTL Top Level Output File Name wddlnand1ngrTop Level Output File
Name wddlnand1Output Format NGCOptimization Goal
SpeedKeep Hierarchy NODesign Statistics IOs
5Cell Usage BELS 2 LUT3 2 IO Buffers
40
5 IBUF 3 OBUF
2============================================================Devi
ce utilization summarySelected Device 3s500efg320-4 Number of Slices
1 out of 4656 0 Number of 4 input LUTs 2 out of 9312 0
Number of IOs 5 Number of bonded IOBs 5 out of 232
2 Timing SummarySpeed Grade -4Maximum combinational path delay
6236nsSimulation Result
Synthesis Result
WD
41
DL XOR GATESimulation Result
Synthesis Result
WDDL XOR GATESynthesis
Report==========================================================
== Final Report
===========================================================Final
42
ResultsRTL Top Level Output File Name wddlxorgatengrTop Level Output File
Name wddlxorgateOutput Format NGCOptimization Goal
SpeedKeep Hierarchy NODesign Statistics IOs
5Cell Usage BELS 2 LUT3 2 IO Buffers
5 IBUF 3 OBUF
2============================================================Devi
ce utilization summary---------------------------Selected Device 3s250eft256-4 Number
of Slices 1 out of 2448 0 Number of 4 input LUTs 2
out of 4896 0 Number of IOs 5 Number of bonded IOBs
5 out of 172 2 Timing Summary---------------Speed Grade -4Maximum
combinational path delay 6236nsSimulation Result
Synthesis Result
43
CHAPTER 9 SUMMARY AND CONCLUSION 91
SummaryIn order to provide security to ICs against side-channel attacks especially
Differential Power Analysis (DPA) it is necessary to implement the design in a logic that
can render constant power dissipation irrespective of the input combination WDDL is
proved to be advantageous to others and therefore is of great significance In this
dissertation work architecture for Blowfish Algorithm is designed and implemented in
WDDL style In this implementation bottom-up approach is used The low level entities
are designed and later they are all combined to form the entire module The key
scheduling is online The sub-keys generated for a particular key can be used for the
encryption of the entire data to be encrypted with that key The sub keys are given in
reverse direction for the decryption data path Initially logic gates are implemented in
WDDL and then higher modules have been designed by instantiating the WDDL gates to
form the entire module thus resulting in constant power dissipation irrespective of any
input data combination The entire design works in two phases namely Precharge phase and
Evaluation phase In the Precharge phase all the signals of the design are zeroed and
during the Evaluation phase the functionality of the design is achieved This sort of design
has been found simple and very effective in thwarting the side-channel attack namely
Differential Power analysis (DPA)92 ConclusionThe crypto processor has been
44
designed for the key size of 448 bits and plain text of 64 bits The code for the
implementation has been written in VHDL The functional verification has been done using
the Cadence (NC Launch) Simulation package and Synthesis using RTL Compiler The
Backend of the design is done using the SOC EncounterAccording to the specifications
desired functionality has been achieved In the output during the Evaluation phase there
has been same number of transitions thus resulting in constant power dissipation During
Synthesis it has been observed that a simple WDDL gate comprised many conventional
gates Therefore the area of the design has grown nearly three-fold when compared to the
design implemented in conventional CMOS logic at the cost of security incorporated into
the IC against Differential Power Analysis (DPA) Due to the constant power dissipation at
the output hacker cannot apply Differential Power Analysis (DPA) scheme to find the
secret key that is being used in the crypto-processor Thus security against DPA is
incorporated into the IC at hardware level by implementing the design in WDDL style
which is quite simple and effectiveCHAPTER 10
REFERENCES 101 Referred Technical papers[1] Kris Tiri Member
IEEE and Ingrid Verbauwhede Senior Member IEEE ldquoA Digital Design Flow for
Secure Integrated Circuitsrdquo IEEE Transaction on Computer-Aided Design of Integrated
Circuits and Systems Vol 25 No 7 July 2006[2] Prof Jean-Jacques Quisquater Math
RiZK laquo Side Channel Attacks raquo October 2002[3] Ross Anderson Mike Bond Jolyon
Clulow and Sergei Skorobogatov ldquoCrypto processors ndash A Surveyrdquo IEEE Proceedings[4]
Noohul Basheer Zain Ali James M Noras ldquoOptimal Data path Design for a Cryptographic
Processor The Blowfish Algorithmrdquo Malaysian Journal of Computer Science Vol 14 No
1 June 2001 pp 16-27[5] Dan Rinehimer Derek Wilson ldquoSummary of B Schneierrsquos
Description of New Variable Length Key 64-Bit Block Cipher (Blowfish)rdquo[6] Kris Tiri and
Ingrid Verbauwhede ldquoA Dynamic and Differential CMOS Logic Style to Resist Power and
Timing Attacks on Security ICrsquosrdquo[7] Discretix Technologies Limited ldquoIntroduction to Side
45
Channel Attacksrdquo[8] Kris Tiri Moonmoon Akmal and Ingrid Verbauwhede ldquoA Dynamic
and Differential Logic with Signal Independent Power Consumption to withstand
Differential Power Analysis on Smart CardsrdquoReference books[1] William Stallings
ldquoCryptography and Network Security Principles and Practicesrdquo Pearson Education
2003[2] Samir Palnitkar ldquoVerilog HDL A Guide to Digital Design and Synthesisrdquo
Prentice Hall Referred Web Pages[1] httpwwwschneiercomblowfishhtml[2]
httpwwwnistgov[3] httpwwwdiscretixcomPDFIntroduction20to20Side20Channel
20Attackspdf[4] httpwwwwipointpctdbenwojsp
IA=WO2005081085ampDISPLAY=CLAIMS
46
Figure
43 WDDL NAND Gate74 WDDL NOR gateA WDDL NOR gate is constructed by
considering conventional OR gate in parallel to its complementary gate ie AND gate as
shown in the following figure
35
Figure 44 WDDL
NOR Gate 75 WDDL XOR gate XOR function can be implemented by the
Boolean Equation A B = ABrsquo + ArsquoB Therefore XOR Gate is implemented
in terms of AND gate and OR gate as shown in the figure 44 It can also be implemented
by instantiating a WDDL AND gate and WDDL OR gate But the number of gates
involved in the latter one is greater than the former one Therefore the first method of
implementation is followed rather than the second one
36
Figure 45
WDDL XOR gateWith the help of the above basic gates Full adder circuit has been
designed by instantiating the above designed WDDL gates During the implementation of
the Blowfish algorithm a 32-bit XOR gate and 32-bit Adder circuit are required They can
be easily implemented by instantiating the corresponding lower module 32 number of
timesCHAPTER 8 FRONT END
RESULTSWDDL OR GATESynthesis
Report==========================================================
= Final Report
===========================================================Final
ResultsRTL Top Level Output File Name wddlorngrTop Level Output File Name
wddlorOutput Format NGCOptimization Goal SpeedKeep
Hierarchy NODesign Statistics IOs 5Cell Usage
BELS 2 LUT3 2 IO Buffers 5
37
IBUF 3 OBUF
2============================================================Devi
ce utilization summary---------------------------Selected Device 3s250eft256-4 Number
of Slices 1 out of 2448 0 Number of 4 input LUTs 2
out of 4896 0 Number of IOs 5 Number of bonded IOBs
5 out of 172 2 Timing Summary---------------Speed Grade -4Maximum
combinational path delay 6236nsSimulation Result
S
ynthesis Result
38
WDD
L AND GATESynthesis
Report==========================================================
== Final Report
============================================================Fina
l ResultsRTL Top Level Output File Name wddlgatesngrTop Level Output File
Name wddlgatesOutput Format NGCOptimization Goal
SpeedKeep Hierarchy NODesign Statistics IOs
5Cell Usage BELS 2 LUT3 2 IO Buffers
5 IBUF 3 OBUF
2===========================================================Devic
e utilization summary---------------------------Selected Device 3s250etq144-4 Number
of Slices 1 out of 2448 0 Number of 4 input LUTs 2
out of 4896 0 Number of IOs 5 Number of bonded IOBs
5 out of 108 4 Timing Summary---------------Speed Grade -4Maximum
combinational path delay 6236nsSimulation Result
39
Sy
nthesis Result
WDDL NAND GATESynthesis
Report==========================================================
== Final Report
============================================================Fina
l ResultsRTL Top Level Output File Name wddlnand1ngrTop Level Output File
Name wddlnand1Output Format NGCOptimization Goal
SpeedKeep Hierarchy NODesign Statistics IOs
5Cell Usage BELS 2 LUT3 2 IO Buffers
40
5 IBUF 3 OBUF
2============================================================Devi
ce utilization summarySelected Device 3s500efg320-4 Number of Slices
1 out of 4656 0 Number of 4 input LUTs 2 out of 9312 0
Number of IOs 5 Number of bonded IOBs 5 out of 232
2 Timing SummarySpeed Grade -4Maximum combinational path delay
6236nsSimulation Result
Synthesis Result
WD
41
DL XOR GATESimulation Result
Synthesis Result
WDDL XOR GATESynthesis
Report==========================================================
== Final Report
===========================================================Final
42
ResultsRTL Top Level Output File Name wddlxorgatengrTop Level Output File
Name wddlxorgateOutput Format NGCOptimization Goal
SpeedKeep Hierarchy NODesign Statistics IOs
5Cell Usage BELS 2 LUT3 2 IO Buffers
5 IBUF 3 OBUF
2============================================================Devi
ce utilization summary---------------------------Selected Device 3s250eft256-4 Number
of Slices 1 out of 2448 0 Number of 4 input LUTs 2
out of 4896 0 Number of IOs 5 Number of bonded IOBs
5 out of 172 2 Timing Summary---------------Speed Grade -4Maximum
combinational path delay 6236nsSimulation Result
Synthesis Result
43
CHAPTER 9 SUMMARY AND CONCLUSION 91
SummaryIn order to provide security to ICs against side-channel attacks especially
Differential Power Analysis (DPA) it is necessary to implement the design in a logic that
can render constant power dissipation irrespective of the input combination WDDL is
proved to be advantageous to others and therefore is of great significance In this
dissertation work architecture for Blowfish Algorithm is designed and implemented in
WDDL style In this implementation bottom-up approach is used The low level entities
are designed and later they are all combined to form the entire module The key
scheduling is online The sub-keys generated for a particular key can be used for the
encryption of the entire data to be encrypted with that key The sub keys are given in
reverse direction for the decryption data path Initially logic gates are implemented in
WDDL and then higher modules have been designed by instantiating the WDDL gates to
form the entire module thus resulting in constant power dissipation irrespective of any
input data combination The entire design works in two phases namely Precharge phase and
Evaluation phase In the Precharge phase all the signals of the design are zeroed and
during the Evaluation phase the functionality of the design is achieved This sort of design
has been found simple and very effective in thwarting the side-channel attack namely
Differential Power analysis (DPA)92 ConclusionThe crypto processor has been
44
designed for the key size of 448 bits and plain text of 64 bits The code for the
implementation has been written in VHDL The functional verification has been done using
the Cadence (NC Launch) Simulation package and Synthesis using RTL Compiler The
Backend of the design is done using the SOC EncounterAccording to the specifications
desired functionality has been achieved In the output during the Evaluation phase there
has been same number of transitions thus resulting in constant power dissipation During
Synthesis it has been observed that a simple WDDL gate comprised many conventional
gates Therefore the area of the design has grown nearly three-fold when compared to the
design implemented in conventional CMOS logic at the cost of security incorporated into
the IC against Differential Power Analysis (DPA) Due to the constant power dissipation at
the output hacker cannot apply Differential Power Analysis (DPA) scheme to find the
secret key that is being used in the crypto-processor Thus security against DPA is
incorporated into the IC at hardware level by implementing the design in WDDL style
which is quite simple and effectiveCHAPTER 10
REFERENCES 101 Referred Technical papers[1] Kris Tiri Member
IEEE and Ingrid Verbauwhede Senior Member IEEE ldquoA Digital Design Flow for
Secure Integrated Circuitsrdquo IEEE Transaction on Computer-Aided Design of Integrated
Circuits and Systems Vol 25 No 7 July 2006[2] Prof Jean-Jacques Quisquater Math
RiZK laquo Side Channel Attacks raquo October 2002[3] Ross Anderson Mike Bond Jolyon
Clulow and Sergei Skorobogatov ldquoCrypto processors ndash A Surveyrdquo IEEE Proceedings[4]
Noohul Basheer Zain Ali James M Noras ldquoOptimal Data path Design for a Cryptographic
Processor The Blowfish Algorithmrdquo Malaysian Journal of Computer Science Vol 14 No
1 June 2001 pp 16-27[5] Dan Rinehimer Derek Wilson ldquoSummary of B Schneierrsquos
Description of New Variable Length Key 64-Bit Block Cipher (Blowfish)rdquo[6] Kris Tiri and
Ingrid Verbauwhede ldquoA Dynamic and Differential CMOS Logic Style to Resist Power and
Timing Attacks on Security ICrsquosrdquo[7] Discretix Technologies Limited ldquoIntroduction to Side
45
Channel Attacksrdquo[8] Kris Tiri Moonmoon Akmal and Ingrid Verbauwhede ldquoA Dynamic
and Differential Logic with Signal Independent Power Consumption to withstand
Differential Power Analysis on Smart CardsrdquoReference books[1] William Stallings
ldquoCryptography and Network Security Principles and Practicesrdquo Pearson Education
2003[2] Samir Palnitkar ldquoVerilog HDL A Guide to Digital Design and Synthesisrdquo
Prentice Hall Referred Web Pages[1] httpwwwschneiercomblowfishhtml[2]
httpwwwnistgov[3] httpwwwdiscretixcomPDFIntroduction20to20Side20Channel
20Attackspdf[4] httpwwwwipointpctdbenwojsp
IA=WO2005081085ampDISPLAY=CLAIMS
46
Figure 44 WDDL
NOR Gate 75 WDDL XOR gate XOR function can be implemented by the
Boolean Equation A B = ABrsquo + ArsquoB Therefore XOR Gate is implemented
in terms of AND gate and OR gate as shown in the figure 44 It can also be implemented
by instantiating a WDDL AND gate and WDDL OR gate But the number of gates
involved in the latter one is greater than the former one Therefore the first method of
implementation is followed rather than the second one
36
Figure 45
WDDL XOR gateWith the help of the above basic gates Full adder circuit has been
designed by instantiating the above designed WDDL gates During the implementation of
the Blowfish algorithm a 32-bit XOR gate and 32-bit Adder circuit are required They can
be easily implemented by instantiating the corresponding lower module 32 number of
timesCHAPTER 8 FRONT END
RESULTSWDDL OR GATESynthesis
Report==========================================================
= Final Report
===========================================================Final
ResultsRTL Top Level Output File Name wddlorngrTop Level Output File Name
wddlorOutput Format NGCOptimization Goal SpeedKeep
Hierarchy NODesign Statistics IOs 5Cell Usage
BELS 2 LUT3 2 IO Buffers 5
37
IBUF 3 OBUF
2============================================================Devi
ce utilization summary---------------------------Selected Device 3s250eft256-4 Number
of Slices 1 out of 2448 0 Number of 4 input LUTs 2
out of 4896 0 Number of IOs 5 Number of bonded IOBs
5 out of 172 2 Timing Summary---------------Speed Grade -4Maximum
combinational path delay 6236nsSimulation Result
S
ynthesis Result
38
WDD
L AND GATESynthesis
Report==========================================================
== Final Report
============================================================Fina
l ResultsRTL Top Level Output File Name wddlgatesngrTop Level Output File
Name wddlgatesOutput Format NGCOptimization Goal
SpeedKeep Hierarchy NODesign Statistics IOs
5Cell Usage BELS 2 LUT3 2 IO Buffers
5 IBUF 3 OBUF
2===========================================================Devic
e utilization summary---------------------------Selected Device 3s250etq144-4 Number
of Slices 1 out of 2448 0 Number of 4 input LUTs 2
out of 4896 0 Number of IOs 5 Number of bonded IOBs
5 out of 108 4 Timing Summary---------------Speed Grade -4Maximum
combinational path delay 6236nsSimulation Result
39
Sy
nthesis Result
WDDL NAND GATESynthesis
Report==========================================================
== Final Report
============================================================Fina
l ResultsRTL Top Level Output File Name wddlnand1ngrTop Level Output File
Name wddlnand1Output Format NGCOptimization Goal
SpeedKeep Hierarchy NODesign Statistics IOs
5Cell Usage BELS 2 LUT3 2 IO Buffers
40
5 IBUF 3 OBUF
2============================================================Devi
ce utilization summarySelected Device 3s500efg320-4 Number of Slices
1 out of 4656 0 Number of 4 input LUTs 2 out of 9312 0
Number of IOs 5 Number of bonded IOBs 5 out of 232
2 Timing SummarySpeed Grade -4Maximum combinational path delay
6236nsSimulation Result
Synthesis Result
WD
41
DL XOR GATESimulation Result
Synthesis Result
WDDL XOR GATESynthesis
Report==========================================================
== Final Report
===========================================================Final
42
ResultsRTL Top Level Output File Name wddlxorgatengrTop Level Output File
Name wddlxorgateOutput Format NGCOptimization Goal
SpeedKeep Hierarchy NODesign Statistics IOs
5Cell Usage BELS 2 LUT3 2 IO Buffers
5 IBUF 3 OBUF
2============================================================Devi
ce utilization summary---------------------------Selected Device 3s250eft256-4 Number
of Slices 1 out of 2448 0 Number of 4 input LUTs 2
out of 4896 0 Number of IOs 5 Number of bonded IOBs
5 out of 172 2 Timing Summary---------------Speed Grade -4Maximum
combinational path delay 6236nsSimulation Result
Synthesis Result
43
CHAPTER 9 SUMMARY AND CONCLUSION 91
SummaryIn order to provide security to ICs against side-channel attacks especially
Differential Power Analysis (DPA) it is necessary to implement the design in a logic that
can render constant power dissipation irrespective of the input combination WDDL is
proved to be advantageous to others and therefore is of great significance In this
dissertation work architecture for Blowfish Algorithm is designed and implemented in
WDDL style In this implementation bottom-up approach is used The low level entities
are designed and later they are all combined to form the entire module The key
scheduling is online The sub-keys generated for a particular key can be used for the
encryption of the entire data to be encrypted with that key The sub keys are given in
reverse direction for the decryption data path Initially logic gates are implemented in
WDDL and then higher modules have been designed by instantiating the WDDL gates to
form the entire module thus resulting in constant power dissipation irrespective of any
input data combination The entire design works in two phases namely Precharge phase and
Evaluation phase In the Precharge phase all the signals of the design are zeroed and
during the Evaluation phase the functionality of the design is achieved This sort of design
has been found simple and very effective in thwarting the side-channel attack namely
Differential Power analysis (DPA)92 ConclusionThe crypto processor has been
44
designed for the key size of 448 bits and plain text of 64 bits The code for the
implementation has been written in VHDL The functional verification has been done using
the Cadence (NC Launch) Simulation package and Synthesis using RTL Compiler The
Backend of the design is done using the SOC EncounterAccording to the specifications
desired functionality has been achieved In the output during the Evaluation phase there
has been same number of transitions thus resulting in constant power dissipation During
Synthesis it has been observed that a simple WDDL gate comprised many conventional
gates Therefore the area of the design has grown nearly three-fold when compared to the
design implemented in conventional CMOS logic at the cost of security incorporated into
the IC against Differential Power Analysis (DPA) Due to the constant power dissipation at
the output hacker cannot apply Differential Power Analysis (DPA) scheme to find the
secret key that is being used in the crypto-processor Thus security against DPA is
incorporated into the IC at hardware level by implementing the design in WDDL style
which is quite simple and effectiveCHAPTER 10
REFERENCES 101 Referred Technical papers[1] Kris Tiri Member
IEEE and Ingrid Verbauwhede Senior Member IEEE ldquoA Digital Design Flow for
Secure Integrated Circuitsrdquo IEEE Transaction on Computer-Aided Design of Integrated
Circuits and Systems Vol 25 No 7 July 2006[2] Prof Jean-Jacques Quisquater Math
RiZK laquo Side Channel Attacks raquo October 2002[3] Ross Anderson Mike Bond Jolyon
Clulow and Sergei Skorobogatov ldquoCrypto processors ndash A Surveyrdquo IEEE Proceedings[4]
Noohul Basheer Zain Ali James M Noras ldquoOptimal Data path Design for a Cryptographic
Processor The Blowfish Algorithmrdquo Malaysian Journal of Computer Science Vol 14 No
1 June 2001 pp 16-27[5] Dan Rinehimer Derek Wilson ldquoSummary of B Schneierrsquos
Description of New Variable Length Key 64-Bit Block Cipher (Blowfish)rdquo[6] Kris Tiri and
Ingrid Verbauwhede ldquoA Dynamic and Differential CMOS Logic Style to Resist Power and
Timing Attacks on Security ICrsquosrdquo[7] Discretix Technologies Limited ldquoIntroduction to Side
45
Channel Attacksrdquo[8] Kris Tiri Moonmoon Akmal and Ingrid Verbauwhede ldquoA Dynamic
and Differential Logic with Signal Independent Power Consumption to withstand
Differential Power Analysis on Smart CardsrdquoReference books[1] William Stallings
ldquoCryptography and Network Security Principles and Practicesrdquo Pearson Education
2003[2] Samir Palnitkar ldquoVerilog HDL A Guide to Digital Design and Synthesisrdquo
Prentice Hall Referred Web Pages[1] httpwwwschneiercomblowfishhtml[2]
httpwwwnistgov[3] httpwwwdiscretixcomPDFIntroduction20to20Side20Channel
20Attackspdf[4] httpwwwwipointpctdbenwojsp
IA=WO2005081085ampDISPLAY=CLAIMS
46
Figure 45
WDDL XOR gateWith the help of the above basic gates Full adder circuit has been
designed by instantiating the above designed WDDL gates During the implementation of
the Blowfish algorithm a 32-bit XOR gate and 32-bit Adder circuit are required They can
be easily implemented by instantiating the corresponding lower module 32 number of
timesCHAPTER 8 FRONT END
RESULTSWDDL OR GATESynthesis
Report==========================================================
= Final Report
===========================================================Final
ResultsRTL Top Level Output File Name wddlorngrTop Level Output File Name
wddlorOutput Format NGCOptimization Goal SpeedKeep
Hierarchy NODesign Statistics IOs 5Cell Usage
BELS 2 LUT3 2 IO Buffers 5
37
IBUF 3 OBUF
2============================================================Devi
ce utilization summary---------------------------Selected Device 3s250eft256-4 Number
of Slices 1 out of 2448 0 Number of 4 input LUTs 2
out of 4896 0 Number of IOs 5 Number of bonded IOBs
5 out of 172 2 Timing Summary---------------Speed Grade -4Maximum
combinational path delay 6236nsSimulation Result
S
ynthesis Result
38
WDD
L AND GATESynthesis
Report==========================================================
== Final Report
============================================================Fina
l ResultsRTL Top Level Output File Name wddlgatesngrTop Level Output File
Name wddlgatesOutput Format NGCOptimization Goal
SpeedKeep Hierarchy NODesign Statistics IOs
5Cell Usage BELS 2 LUT3 2 IO Buffers
5 IBUF 3 OBUF
2===========================================================Devic
e utilization summary---------------------------Selected Device 3s250etq144-4 Number
of Slices 1 out of 2448 0 Number of 4 input LUTs 2
out of 4896 0 Number of IOs 5 Number of bonded IOBs
5 out of 108 4 Timing Summary---------------Speed Grade -4Maximum
combinational path delay 6236nsSimulation Result
39
Sy
nthesis Result
WDDL NAND GATESynthesis
Report==========================================================
== Final Report
============================================================Fina
l ResultsRTL Top Level Output File Name wddlnand1ngrTop Level Output File
Name wddlnand1Output Format NGCOptimization Goal
SpeedKeep Hierarchy NODesign Statistics IOs
5Cell Usage BELS 2 LUT3 2 IO Buffers
40
5 IBUF 3 OBUF
2============================================================Devi
ce utilization summarySelected Device 3s500efg320-4 Number of Slices
1 out of 4656 0 Number of 4 input LUTs 2 out of 9312 0
Number of IOs 5 Number of bonded IOBs 5 out of 232
2 Timing SummarySpeed Grade -4Maximum combinational path delay
6236nsSimulation Result
Synthesis Result
WD
41
DL XOR GATESimulation Result
Synthesis Result
WDDL XOR GATESynthesis
Report==========================================================
== Final Report
===========================================================Final
42
ResultsRTL Top Level Output File Name wddlxorgatengrTop Level Output File
Name wddlxorgateOutput Format NGCOptimization Goal
SpeedKeep Hierarchy NODesign Statistics IOs
5Cell Usage BELS 2 LUT3 2 IO Buffers
5 IBUF 3 OBUF
2============================================================Devi
ce utilization summary---------------------------Selected Device 3s250eft256-4 Number
of Slices 1 out of 2448 0 Number of 4 input LUTs 2
out of 4896 0 Number of IOs 5 Number of bonded IOBs
5 out of 172 2 Timing Summary---------------Speed Grade -4Maximum
combinational path delay 6236nsSimulation Result
Synthesis Result
43
CHAPTER 9 SUMMARY AND CONCLUSION 91
SummaryIn order to provide security to ICs against side-channel attacks especially
Differential Power Analysis (DPA) it is necessary to implement the design in a logic that
can render constant power dissipation irrespective of the input combination WDDL is
proved to be advantageous to others and therefore is of great significance In this
dissertation work architecture for Blowfish Algorithm is designed and implemented in
WDDL style In this implementation bottom-up approach is used The low level entities
are designed and later they are all combined to form the entire module The key
scheduling is online The sub-keys generated for a particular key can be used for the
encryption of the entire data to be encrypted with that key The sub keys are given in
reverse direction for the decryption data path Initially logic gates are implemented in
WDDL and then higher modules have been designed by instantiating the WDDL gates to
form the entire module thus resulting in constant power dissipation irrespective of any
input data combination The entire design works in two phases namely Precharge phase and
Evaluation phase In the Precharge phase all the signals of the design are zeroed and
during the Evaluation phase the functionality of the design is achieved This sort of design
has been found simple and very effective in thwarting the side-channel attack namely
Differential Power analysis (DPA)92 ConclusionThe crypto processor has been
44
designed for the key size of 448 bits and plain text of 64 bits The code for the
implementation has been written in VHDL The functional verification has been done using
the Cadence (NC Launch) Simulation package and Synthesis using RTL Compiler The
Backend of the design is done using the SOC EncounterAccording to the specifications
desired functionality has been achieved In the output during the Evaluation phase there
has been same number of transitions thus resulting in constant power dissipation During
Synthesis it has been observed that a simple WDDL gate comprised many conventional
gates Therefore the area of the design has grown nearly three-fold when compared to the
design implemented in conventional CMOS logic at the cost of security incorporated into
the IC against Differential Power Analysis (DPA) Due to the constant power dissipation at
the output hacker cannot apply Differential Power Analysis (DPA) scheme to find the
secret key that is being used in the crypto-processor Thus security against DPA is
incorporated into the IC at hardware level by implementing the design in WDDL style
which is quite simple and effectiveCHAPTER 10
REFERENCES 101 Referred Technical papers[1] Kris Tiri Member
IEEE and Ingrid Verbauwhede Senior Member IEEE ldquoA Digital Design Flow for
Secure Integrated Circuitsrdquo IEEE Transaction on Computer-Aided Design of Integrated
Circuits and Systems Vol 25 No 7 July 2006[2] Prof Jean-Jacques Quisquater Math
RiZK laquo Side Channel Attacks raquo October 2002[3] Ross Anderson Mike Bond Jolyon
Clulow and Sergei Skorobogatov ldquoCrypto processors ndash A Surveyrdquo IEEE Proceedings[4]
Noohul Basheer Zain Ali James M Noras ldquoOptimal Data path Design for a Cryptographic
Processor The Blowfish Algorithmrdquo Malaysian Journal of Computer Science Vol 14 No
1 June 2001 pp 16-27[5] Dan Rinehimer Derek Wilson ldquoSummary of B Schneierrsquos
Description of New Variable Length Key 64-Bit Block Cipher (Blowfish)rdquo[6] Kris Tiri and
Ingrid Verbauwhede ldquoA Dynamic and Differential CMOS Logic Style to Resist Power and
Timing Attacks on Security ICrsquosrdquo[7] Discretix Technologies Limited ldquoIntroduction to Side
45
Channel Attacksrdquo[8] Kris Tiri Moonmoon Akmal and Ingrid Verbauwhede ldquoA Dynamic
and Differential Logic with Signal Independent Power Consumption to withstand
Differential Power Analysis on Smart CardsrdquoReference books[1] William Stallings
ldquoCryptography and Network Security Principles and Practicesrdquo Pearson Education
2003[2] Samir Palnitkar ldquoVerilog HDL A Guide to Digital Design and Synthesisrdquo
Prentice Hall Referred Web Pages[1] httpwwwschneiercomblowfishhtml[2]
httpwwwnistgov[3] httpwwwdiscretixcomPDFIntroduction20to20Side20Channel
20Attackspdf[4] httpwwwwipointpctdbenwojsp
IA=WO2005081085ampDISPLAY=CLAIMS
46
IBUF 3 OBUF
2============================================================Devi
ce utilization summary---------------------------Selected Device 3s250eft256-4 Number
of Slices 1 out of 2448 0 Number of 4 input LUTs 2
out of 4896 0 Number of IOs 5 Number of bonded IOBs
5 out of 172 2 Timing Summary---------------Speed Grade -4Maximum
combinational path delay 6236nsSimulation Result
S
ynthesis Result
38
WDD
L AND GATESynthesis
Report==========================================================
== Final Report
============================================================Fina
l ResultsRTL Top Level Output File Name wddlgatesngrTop Level Output File
Name wddlgatesOutput Format NGCOptimization Goal
SpeedKeep Hierarchy NODesign Statistics IOs
5Cell Usage BELS 2 LUT3 2 IO Buffers
5 IBUF 3 OBUF
2===========================================================Devic
e utilization summary---------------------------Selected Device 3s250etq144-4 Number
of Slices 1 out of 2448 0 Number of 4 input LUTs 2
out of 4896 0 Number of IOs 5 Number of bonded IOBs
5 out of 108 4 Timing Summary---------------Speed Grade -4Maximum
combinational path delay 6236nsSimulation Result
39
Sy
nthesis Result
WDDL NAND GATESynthesis
Report==========================================================
== Final Report
============================================================Fina
l ResultsRTL Top Level Output File Name wddlnand1ngrTop Level Output File
Name wddlnand1Output Format NGCOptimization Goal
SpeedKeep Hierarchy NODesign Statistics IOs
5Cell Usage BELS 2 LUT3 2 IO Buffers
40
5 IBUF 3 OBUF
2============================================================Devi
ce utilization summarySelected Device 3s500efg320-4 Number of Slices
1 out of 4656 0 Number of 4 input LUTs 2 out of 9312 0
Number of IOs 5 Number of bonded IOBs 5 out of 232
2 Timing SummarySpeed Grade -4Maximum combinational path delay
6236nsSimulation Result
Synthesis Result
WD
41
DL XOR GATESimulation Result
Synthesis Result
WDDL XOR GATESynthesis
Report==========================================================
== Final Report
===========================================================Final
42
ResultsRTL Top Level Output File Name wddlxorgatengrTop Level Output File
Name wddlxorgateOutput Format NGCOptimization Goal
SpeedKeep Hierarchy NODesign Statistics IOs
5Cell Usage BELS 2 LUT3 2 IO Buffers
5 IBUF 3 OBUF
2============================================================Devi
ce utilization summary---------------------------Selected Device 3s250eft256-4 Number
of Slices 1 out of 2448 0 Number of 4 input LUTs 2
out of 4896 0 Number of IOs 5 Number of bonded IOBs
5 out of 172 2 Timing Summary---------------Speed Grade -4Maximum
combinational path delay 6236nsSimulation Result
Synthesis Result
43
CHAPTER 9 SUMMARY AND CONCLUSION 91
SummaryIn order to provide security to ICs against side-channel attacks especially
Differential Power Analysis (DPA) it is necessary to implement the design in a logic that
can render constant power dissipation irrespective of the input combination WDDL is
proved to be advantageous to others and therefore is of great significance In this
dissertation work architecture for Blowfish Algorithm is designed and implemented in
WDDL style In this implementation bottom-up approach is used The low level entities
are designed and later they are all combined to form the entire module The key
scheduling is online The sub-keys generated for a particular key can be used for the
encryption of the entire data to be encrypted with that key The sub keys are given in
reverse direction for the decryption data path Initially logic gates are implemented in
WDDL and then higher modules have been designed by instantiating the WDDL gates to
form the entire module thus resulting in constant power dissipation irrespective of any
input data combination The entire design works in two phases namely Precharge phase and
Evaluation phase In the Precharge phase all the signals of the design are zeroed and
during the Evaluation phase the functionality of the design is achieved This sort of design
has been found simple and very effective in thwarting the side-channel attack namely
Differential Power analysis (DPA)92 ConclusionThe crypto processor has been
44
designed for the key size of 448 bits and plain text of 64 bits The code for the
implementation has been written in VHDL The functional verification has been done using
the Cadence (NC Launch) Simulation package and Synthesis using RTL Compiler The
Backend of the design is done using the SOC EncounterAccording to the specifications
desired functionality has been achieved In the output during the Evaluation phase there
has been same number of transitions thus resulting in constant power dissipation During
Synthesis it has been observed that a simple WDDL gate comprised many conventional
gates Therefore the area of the design has grown nearly three-fold when compared to the
design implemented in conventional CMOS logic at the cost of security incorporated into
the IC against Differential Power Analysis (DPA) Due to the constant power dissipation at
the output hacker cannot apply Differential Power Analysis (DPA) scheme to find the
secret key that is being used in the crypto-processor Thus security against DPA is
incorporated into the IC at hardware level by implementing the design in WDDL style
which is quite simple and effectiveCHAPTER 10
REFERENCES 101 Referred Technical papers[1] Kris Tiri Member
IEEE and Ingrid Verbauwhede Senior Member IEEE ldquoA Digital Design Flow for
Secure Integrated Circuitsrdquo IEEE Transaction on Computer-Aided Design of Integrated
Circuits and Systems Vol 25 No 7 July 2006[2] Prof Jean-Jacques Quisquater Math
RiZK laquo Side Channel Attacks raquo October 2002[3] Ross Anderson Mike Bond Jolyon
Clulow and Sergei Skorobogatov ldquoCrypto processors ndash A Surveyrdquo IEEE Proceedings[4]
Noohul Basheer Zain Ali James M Noras ldquoOptimal Data path Design for a Cryptographic
Processor The Blowfish Algorithmrdquo Malaysian Journal of Computer Science Vol 14 No
1 June 2001 pp 16-27[5] Dan Rinehimer Derek Wilson ldquoSummary of B Schneierrsquos
Description of New Variable Length Key 64-Bit Block Cipher (Blowfish)rdquo[6] Kris Tiri and
Ingrid Verbauwhede ldquoA Dynamic and Differential CMOS Logic Style to Resist Power and
Timing Attacks on Security ICrsquosrdquo[7] Discretix Technologies Limited ldquoIntroduction to Side
45
Channel Attacksrdquo[8] Kris Tiri Moonmoon Akmal and Ingrid Verbauwhede ldquoA Dynamic
and Differential Logic with Signal Independent Power Consumption to withstand
Differential Power Analysis on Smart CardsrdquoReference books[1] William Stallings
ldquoCryptography and Network Security Principles and Practicesrdquo Pearson Education
2003[2] Samir Palnitkar ldquoVerilog HDL A Guide to Digital Design and Synthesisrdquo
Prentice Hall Referred Web Pages[1] httpwwwschneiercomblowfishhtml[2]
httpwwwnistgov[3] httpwwwdiscretixcomPDFIntroduction20to20Side20Channel
20Attackspdf[4] httpwwwwipointpctdbenwojsp
IA=WO2005081085ampDISPLAY=CLAIMS
46
WDD
L AND GATESynthesis
Report==========================================================
== Final Report
============================================================Fina
l ResultsRTL Top Level Output File Name wddlgatesngrTop Level Output File
Name wddlgatesOutput Format NGCOptimization Goal
SpeedKeep Hierarchy NODesign Statistics IOs
5Cell Usage BELS 2 LUT3 2 IO Buffers
5 IBUF 3 OBUF
2===========================================================Devic
e utilization summary---------------------------Selected Device 3s250etq144-4 Number
of Slices 1 out of 2448 0 Number of 4 input LUTs 2
out of 4896 0 Number of IOs 5 Number of bonded IOBs
5 out of 108 4 Timing Summary---------------Speed Grade -4Maximum
combinational path delay 6236nsSimulation Result
39
Sy
nthesis Result
WDDL NAND GATESynthesis
Report==========================================================
== Final Report
============================================================Fina
l ResultsRTL Top Level Output File Name wddlnand1ngrTop Level Output File
Name wddlnand1Output Format NGCOptimization Goal
SpeedKeep Hierarchy NODesign Statistics IOs
5Cell Usage BELS 2 LUT3 2 IO Buffers
40
5 IBUF 3 OBUF
2============================================================Devi
ce utilization summarySelected Device 3s500efg320-4 Number of Slices
1 out of 4656 0 Number of 4 input LUTs 2 out of 9312 0
Number of IOs 5 Number of bonded IOBs 5 out of 232
2 Timing SummarySpeed Grade -4Maximum combinational path delay
6236nsSimulation Result
Synthesis Result
WD
41
DL XOR GATESimulation Result
Synthesis Result
WDDL XOR GATESynthesis
Report==========================================================
== Final Report
===========================================================Final
42
ResultsRTL Top Level Output File Name wddlxorgatengrTop Level Output File
Name wddlxorgateOutput Format NGCOptimization Goal
SpeedKeep Hierarchy NODesign Statistics IOs
5Cell Usage BELS 2 LUT3 2 IO Buffers
5 IBUF 3 OBUF
2============================================================Devi
ce utilization summary---------------------------Selected Device 3s250eft256-4 Number
of Slices 1 out of 2448 0 Number of 4 input LUTs 2
out of 4896 0 Number of IOs 5 Number of bonded IOBs
5 out of 172 2 Timing Summary---------------Speed Grade -4Maximum
combinational path delay 6236nsSimulation Result
Synthesis Result
43
CHAPTER 9 SUMMARY AND CONCLUSION 91
SummaryIn order to provide security to ICs against side-channel attacks especially
Differential Power Analysis (DPA) it is necessary to implement the design in a logic that
can render constant power dissipation irrespective of the input combination WDDL is
proved to be advantageous to others and therefore is of great significance In this
dissertation work architecture for Blowfish Algorithm is designed and implemented in
WDDL style In this implementation bottom-up approach is used The low level entities
are designed and later they are all combined to form the entire module The key
scheduling is online The sub-keys generated for a particular key can be used for the
encryption of the entire data to be encrypted with that key The sub keys are given in
reverse direction for the decryption data path Initially logic gates are implemented in
WDDL and then higher modules have been designed by instantiating the WDDL gates to
form the entire module thus resulting in constant power dissipation irrespective of any
input data combination The entire design works in two phases namely Precharge phase and
Evaluation phase In the Precharge phase all the signals of the design are zeroed and
during the Evaluation phase the functionality of the design is achieved This sort of design
has been found simple and very effective in thwarting the side-channel attack namely
Differential Power analysis (DPA)92 ConclusionThe crypto processor has been
44
designed for the key size of 448 bits and plain text of 64 bits The code for the
implementation has been written in VHDL The functional verification has been done using
the Cadence (NC Launch) Simulation package and Synthesis using RTL Compiler The
Backend of the design is done using the SOC EncounterAccording to the specifications
desired functionality has been achieved In the output during the Evaluation phase there
has been same number of transitions thus resulting in constant power dissipation During
Synthesis it has been observed that a simple WDDL gate comprised many conventional
gates Therefore the area of the design has grown nearly three-fold when compared to the
design implemented in conventional CMOS logic at the cost of security incorporated into
the IC against Differential Power Analysis (DPA) Due to the constant power dissipation at
the output hacker cannot apply Differential Power Analysis (DPA) scheme to find the
secret key that is being used in the crypto-processor Thus security against DPA is
incorporated into the IC at hardware level by implementing the design in WDDL style
which is quite simple and effectiveCHAPTER 10
REFERENCES 101 Referred Technical papers[1] Kris Tiri Member
IEEE and Ingrid Verbauwhede Senior Member IEEE ldquoA Digital Design Flow for
Secure Integrated Circuitsrdquo IEEE Transaction on Computer-Aided Design of Integrated
Circuits and Systems Vol 25 No 7 July 2006[2] Prof Jean-Jacques Quisquater Math
RiZK laquo Side Channel Attacks raquo October 2002[3] Ross Anderson Mike Bond Jolyon
Clulow and Sergei Skorobogatov ldquoCrypto processors ndash A Surveyrdquo IEEE Proceedings[4]
Noohul Basheer Zain Ali James M Noras ldquoOptimal Data path Design for a Cryptographic
Processor The Blowfish Algorithmrdquo Malaysian Journal of Computer Science Vol 14 No
1 June 2001 pp 16-27[5] Dan Rinehimer Derek Wilson ldquoSummary of B Schneierrsquos
Description of New Variable Length Key 64-Bit Block Cipher (Blowfish)rdquo[6] Kris Tiri and
Ingrid Verbauwhede ldquoA Dynamic and Differential CMOS Logic Style to Resist Power and
Timing Attacks on Security ICrsquosrdquo[7] Discretix Technologies Limited ldquoIntroduction to Side
45
Channel Attacksrdquo[8] Kris Tiri Moonmoon Akmal and Ingrid Verbauwhede ldquoA Dynamic
and Differential Logic with Signal Independent Power Consumption to withstand
Differential Power Analysis on Smart CardsrdquoReference books[1] William Stallings
ldquoCryptography and Network Security Principles and Practicesrdquo Pearson Education
2003[2] Samir Palnitkar ldquoVerilog HDL A Guide to Digital Design and Synthesisrdquo
Prentice Hall Referred Web Pages[1] httpwwwschneiercomblowfishhtml[2]
httpwwwnistgov[3] httpwwwdiscretixcomPDFIntroduction20to20Side20Channel
20Attackspdf[4] httpwwwwipointpctdbenwojsp
IA=WO2005081085ampDISPLAY=CLAIMS
46
Sy
nthesis Result
WDDL NAND GATESynthesis
Report==========================================================
== Final Report
============================================================Fina
l ResultsRTL Top Level Output File Name wddlnand1ngrTop Level Output File
Name wddlnand1Output Format NGCOptimization Goal
SpeedKeep Hierarchy NODesign Statistics IOs
5Cell Usage BELS 2 LUT3 2 IO Buffers
40
5 IBUF 3 OBUF
2============================================================Devi
ce utilization summarySelected Device 3s500efg320-4 Number of Slices
1 out of 4656 0 Number of 4 input LUTs 2 out of 9312 0
Number of IOs 5 Number of bonded IOBs 5 out of 232
2 Timing SummarySpeed Grade -4Maximum combinational path delay
6236nsSimulation Result
Synthesis Result
WD
41
DL XOR GATESimulation Result
Synthesis Result
WDDL XOR GATESynthesis
Report==========================================================
== Final Report
===========================================================Final
42
ResultsRTL Top Level Output File Name wddlxorgatengrTop Level Output File
Name wddlxorgateOutput Format NGCOptimization Goal
SpeedKeep Hierarchy NODesign Statistics IOs
5Cell Usage BELS 2 LUT3 2 IO Buffers
5 IBUF 3 OBUF
2============================================================Devi
ce utilization summary---------------------------Selected Device 3s250eft256-4 Number
of Slices 1 out of 2448 0 Number of 4 input LUTs 2
out of 4896 0 Number of IOs 5 Number of bonded IOBs
5 out of 172 2 Timing Summary---------------Speed Grade -4Maximum
combinational path delay 6236nsSimulation Result
Synthesis Result
43
CHAPTER 9 SUMMARY AND CONCLUSION 91
SummaryIn order to provide security to ICs against side-channel attacks especially
Differential Power Analysis (DPA) it is necessary to implement the design in a logic that
can render constant power dissipation irrespective of the input combination WDDL is
proved to be advantageous to others and therefore is of great significance In this
dissertation work architecture for Blowfish Algorithm is designed and implemented in
WDDL style In this implementation bottom-up approach is used The low level entities
are designed and later they are all combined to form the entire module The key
scheduling is online The sub-keys generated for a particular key can be used for the
encryption of the entire data to be encrypted with that key The sub keys are given in
reverse direction for the decryption data path Initially logic gates are implemented in
WDDL and then higher modules have been designed by instantiating the WDDL gates to
form the entire module thus resulting in constant power dissipation irrespective of any
input data combination The entire design works in two phases namely Precharge phase and
Evaluation phase In the Precharge phase all the signals of the design are zeroed and
during the Evaluation phase the functionality of the design is achieved This sort of design
has been found simple and very effective in thwarting the side-channel attack namely
Differential Power analysis (DPA)92 ConclusionThe crypto processor has been
44
designed for the key size of 448 bits and plain text of 64 bits The code for the
implementation has been written in VHDL The functional verification has been done using
the Cadence (NC Launch) Simulation package and Synthesis using RTL Compiler The
Backend of the design is done using the SOC EncounterAccording to the specifications
desired functionality has been achieved In the output during the Evaluation phase there
has been same number of transitions thus resulting in constant power dissipation During
Synthesis it has been observed that a simple WDDL gate comprised many conventional
gates Therefore the area of the design has grown nearly three-fold when compared to the
design implemented in conventional CMOS logic at the cost of security incorporated into
the IC against Differential Power Analysis (DPA) Due to the constant power dissipation at
the output hacker cannot apply Differential Power Analysis (DPA) scheme to find the
secret key that is being used in the crypto-processor Thus security against DPA is
incorporated into the IC at hardware level by implementing the design in WDDL style
which is quite simple and effectiveCHAPTER 10
REFERENCES 101 Referred Technical papers[1] Kris Tiri Member
IEEE and Ingrid Verbauwhede Senior Member IEEE ldquoA Digital Design Flow for
Secure Integrated Circuitsrdquo IEEE Transaction on Computer-Aided Design of Integrated
Circuits and Systems Vol 25 No 7 July 2006[2] Prof Jean-Jacques Quisquater Math
RiZK laquo Side Channel Attacks raquo October 2002[3] Ross Anderson Mike Bond Jolyon
Clulow and Sergei Skorobogatov ldquoCrypto processors ndash A Surveyrdquo IEEE Proceedings[4]
Noohul Basheer Zain Ali James M Noras ldquoOptimal Data path Design for a Cryptographic
Processor The Blowfish Algorithmrdquo Malaysian Journal of Computer Science Vol 14 No
1 June 2001 pp 16-27[5] Dan Rinehimer Derek Wilson ldquoSummary of B Schneierrsquos
Description of New Variable Length Key 64-Bit Block Cipher (Blowfish)rdquo[6] Kris Tiri and
Ingrid Verbauwhede ldquoA Dynamic and Differential CMOS Logic Style to Resist Power and
Timing Attacks on Security ICrsquosrdquo[7] Discretix Technologies Limited ldquoIntroduction to Side
45
Channel Attacksrdquo[8] Kris Tiri Moonmoon Akmal and Ingrid Verbauwhede ldquoA Dynamic
and Differential Logic with Signal Independent Power Consumption to withstand
Differential Power Analysis on Smart CardsrdquoReference books[1] William Stallings
ldquoCryptography and Network Security Principles and Practicesrdquo Pearson Education
2003[2] Samir Palnitkar ldquoVerilog HDL A Guide to Digital Design and Synthesisrdquo
Prentice Hall Referred Web Pages[1] httpwwwschneiercomblowfishhtml[2]
httpwwwnistgov[3] httpwwwdiscretixcomPDFIntroduction20to20Side20Channel
20Attackspdf[4] httpwwwwipointpctdbenwojsp
IA=WO2005081085ampDISPLAY=CLAIMS
46
5 IBUF 3 OBUF
2============================================================Devi
ce utilization summarySelected Device 3s500efg320-4 Number of Slices
1 out of 4656 0 Number of 4 input LUTs 2 out of 9312 0
Number of IOs 5 Number of bonded IOBs 5 out of 232
2 Timing SummarySpeed Grade -4Maximum combinational path delay
6236nsSimulation Result
Synthesis Result
WD
41
DL XOR GATESimulation Result
Synthesis Result
WDDL XOR GATESynthesis
Report==========================================================
== Final Report
===========================================================Final
42
ResultsRTL Top Level Output File Name wddlxorgatengrTop Level Output File
Name wddlxorgateOutput Format NGCOptimization Goal
SpeedKeep Hierarchy NODesign Statistics IOs
5Cell Usage BELS 2 LUT3 2 IO Buffers
5 IBUF 3 OBUF
2============================================================Devi
ce utilization summary---------------------------Selected Device 3s250eft256-4 Number
of Slices 1 out of 2448 0 Number of 4 input LUTs 2
out of 4896 0 Number of IOs 5 Number of bonded IOBs
5 out of 172 2 Timing Summary---------------Speed Grade -4Maximum
combinational path delay 6236nsSimulation Result
Synthesis Result
43
CHAPTER 9 SUMMARY AND CONCLUSION 91
SummaryIn order to provide security to ICs against side-channel attacks especially
Differential Power Analysis (DPA) it is necessary to implement the design in a logic that
can render constant power dissipation irrespective of the input combination WDDL is
proved to be advantageous to others and therefore is of great significance In this
dissertation work architecture for Blowfish Algorithm is designed and implemented in
WDDL style In this implementation bottom-up approach is used The low level entities
are designed and later they are all combined to form the entire module The key
scheduling is online The sub-keys generated for a particular key can be used for the
encryption of the entire data to be encrypted with that key The sub keys are given in
reverse direction for the decryption data path Initially logic gates are implemented in
WDDL and then higher modules have been designed by instantiating the WDDL gates to
form the entire module thus resulting in constant power dissipation irrespective of any
input data combination The entire design works in two phases namely Precharge phase and
Evaluation phase In the Precharge phase all the signals of the design are zeroed and
during the Evaluation phase the functionality of the design is achieved This sort of design
has been found simple and very effective in thwarting the side-channel attack namely
Differential Power analysis (DPA)92 ConclusionThe crypto processor has been
44
designed for the key size of 448 bits and plain text of 64 bits The code for the
implementation has been written in VHDL The functional verification has been done using
the Cadence (NC Launch) Simulation package and Synthesis using RTL Compiler The
Backend of the design is done using the SOC EncounterAccording to the specifications
desired functionality has been achieved In the output during the Evaluation phase there
has been same number of transitions thus resulting in constant power dissipation During
Synthesis it has been observed that a simple WDDL gate comprised many conventional
gates Therefore the area of the design has grown nearly three-fold when compared to the
design implemented in conventional CMOS logic at the cost of security incorporated into
the IC against Differential Power Analysis (DPA) Due to the constant power dissipation at
the output hacker cannot apply Differential Power Analysis (DPA) scheme to find the
secret key that is being used in the crypto-processor Thus security against DPA is
incorporated into the IC at hardware level by implementing the design in WDDL style
which is quite simple and effectiveCHAPTER 10
REFERENCES 101 Referred Technical papers[1] Kris Tiri Member
IEEE and Ingrid Verbauwhede Senior Member IEEE ldquoA Digital Design Flow for
Secure Integrated Circuitsrdquo IEEE Transaction on Computer-Aided Design of Integrated
Circuits and Systems Vol 25 No 7 July 2006[2] Prof Jean-Jacques Quisquater Math
RiZK laquo Side Channel Attacks raquo October 2002[3] Ross Anderson Mike Bond Jolyon
Clulow and Sergei Skorobogatov ldquoCrypto processors ndash A Surveyrdquo IEEE Proceedings[4]
Noohul Basheer Zain Ali James M Noras ldquoOptimal Data path Design for a Cryptographic
Processor The Blowfish Algorithmrdquo Malaysian Journal of Computer Science Vol 14 No
1 June 2001 pp 16-27[5] Dan Rinehimer Derek Wilson ldquoSummary of B Schneierrsquos
Description of New Variable Length Key 64-Bit Block Cipher (Blowfish)rdquo[6] Kris Tiri and
Ingrid Verbauwhede ldquoA Dynamic and Differential CMOS Logic Style to Resist Power and
Timing Attacks on Security ICrsquosrdquo[7] Discretix Technologies Limited ldquoIntroduction to Side
45
Channel Attacksrdquo[8] Kris Tiri Moonmoon Akmal and Ingrid Verbauwhede ldquoA Dynamic
and Differential Logic with Signal Independent Power Consumption to withstand
Differential Power Analysis on Smart CardsrdquoReference books[1] William Stallings
ldquoCryptography and Network Security Principles and Practicesrdquo Pearson Education
2003[2] Samir Palnitkar ldquoVerilog HDL A Guide to Digital Design and Synthesisrdquo
Prentice Hall Referred Web Pages[1] httpwwwschneiercomblowfishhtml[2]
httpwwwnistgov[3] httpwwwdiscretixcomPDFIntroduction20to20Side20Channel
20Attackspdf[4] httpwwwwipointpctdbenwojsp
IA=WO2005081085ampDISPLAY=CLAIMS
46
DL XOR GATESimulation Result
Synthesis Result
WDDL XOR GATESynthesis
Report==========================================================
== Final Report
===========================================================Final
42
ResultsRTL Top Level Output File Name wddlxorgatengrTop Level Output File
Name wddlxorgateOutput Format NGCOptimization Goal
SpeedKeep Hierarchy NODesign Statistics IOs
5Cell Usage BELS 2 LUT3 2 IO Buffers
5 IBUF 3 OBUF
2============================================================Devi
ce utilization summary---------------------------Selected Device 3s250eft256-4 Number
of Slices 1 out of 2448 0 Number of 4 input LUTs 2
out of 4896 0 Number of IOs 5 Number of bonded IOBs
5 out of 172 2 Timing Summary---------------Speed Grade -4Maximum
combinational path delay 6236nsSimulation Result
Synthesis Result
43
CHAPTER 9 SUMMARY AND CONCLUSION 91
SummaryIn order to provide security to ICs against side-channel attacks especially
Differential Power Analysis (DPA) it is necessary to implement the design in a logic that
can render constant power dissipation irrespective of the input combination WDDL is
proved to be advantageous to others and therefore is of great significance In this
dissertation work architecture for Blowfish Algorithm is designed and implemented in
WDDL style In this implementation bottom-up approach is used The low level entities
are designed and later they are all combined to form the entire module The key
scheduling is online The sub-keys generated for a particular key can be used for the
encryption of the entire data to be encrypted with that key The sub keys are given in
reverse direction for the decryption data path Initially logic gates are implemented in
WDDL and then higher modules have been designed by instantiating the WDDL gates to
form the entire module thus resulting in constant power dissipation irrespective of any
input data combination The entire design works in two phases namely Precharge phase and
Evaluation phase In the Precharge phase all the signals of the design are zeroed and
during the Evaluation phase the functionality of the design is achieved This sort of design
has been found simple and very effective in thwarting the side-channel attack namely
Differential Power analysis (DPA)92 ConclusionThe crypto processor has been
44
designed for the key size of 448 bits and plain text of 64 bits The code for the
implementation has been written in VHDL The functional verification has been done using
the Cadence (NC Launch) Simulation package and Synthesis using RTL Compiler The
Backend of the design is done using the SOC EncounterAccording to the specifications
desired functionality has been achieved In the output during the Evaluation phase there
has been same number of transitions thus resulting in constant power dissipation During
Synthesis it has been observed that a simple WDDL gate comprised many conventional
gates Therefore the area of the design has grown nearly three-fold when compared to the
design implemented in conventional CMOS logic at the cost of security incorporated into
the IC against Differential Power Analysis (DPA) Due to the constant power dissipation at
the output hacker cannot apply Differential Power Analysis (DPA) scheme to find the
secret key that is being used in the crypto-processor Thus security against DPA is
incorporated into the IC at hardware level by implementing the design in WDDL style
which is quite simple and effectiveCHAPTER 10
REFERENCES 101 Referred Technical papers[1] Kris Tiri Member
IEEE and Ingrid Verbauwhede Senior Member IEEE ldquoA Digital Design Flow for
Secure Integrated Circuitsrdquo IEEE Transaction on Computer-Aided Design of Integrated
Circuits and Systems Vol 25 No 7 July 2006[2] Prof Jean-Jacques Quisquater Math
RiZK laquo Side Channel Attacks raquo October 2002[3] Ross Anderson Mike Bond Jolyon
Clulow and Sergei Skorobogatov ldquoCrypto processors ndash A Surveyrdquo IEEE Proceedings[4]
Noohul Basheer Zain Ali James M Noras ldquoOptimal Data path Design for a Cryptographic
Processor The Blowfish Algorithmrdquo Malaysian Journal of Computer Science Vol 14 No
1 June 2001 pp 16-27[5] Dan Rinehimer Derek Wilson ldquoSummary of B Schneierrsquos
Description of New Variable Length Key 64-Bit Block Cipher (Blowfish)rdquo[6] Kris Tiri and
Ingrid Verbauwhede ldquoA Dynamic and Differential CMOS Logic Style to Resist Power and
Timing Attacks on Security ICrsquosrdquo[7] Discretix Technologies Limited ldquoIntroduction to Side
45
Channel Attacksrdquo[8] Kris Tiri Moonmoon Akmal and Ingrid Verbauwhede ldquoA Dynamic
and Differential Logic with Signal Independent Power Consumption to withstand
Differential Power Analysis on Smart CardsrdquoReference books[1] William Stallings
ldquoCryptography and Network Security Principles and Practicesrdquo Pearson Education
2003[2] Samir Palnitkar ldquoVerilog HDL A Guide to Digital Design and Synthesisrdquo
Prentice Hall Referred Web Pages[1] httpwwwschneiercomblowfishhtml[2]
httpwwwnistgov[3] httpwwwdiscretixcomPDFIntroduction20to20Side20Channel
20Attackspdf[4] httpwwwwipointpctdbenwojsp
IA=WO2005081085ampDISPLAY=CLAIMS
46
ResultsRTL Top Level Output File Name wddlxorgatengrTop Level Output File
Name wddlxorgateOutput Format NGCOptimization Goal
SpeedKeep Hierarchy NODesign Statistics IOs
5Cell Usage BELS 2 LUT3 2 IO Buffers
5 IBUF 3 OBUF
2============================================================Devi
ce utilization summary---------------------------Selected Device 3s250eft256-4 Number
of Slices 1 out of 2448 0 Number of 4 input LUTs 2
out of 4896 0 Number of IOs 5 Number of bonded IOBs
5 out of 172 2 Timing Summary---------------Speed Grade -4Maximum
combinational path delay 6236nsSimulation Result
Synthesis Result
43
CHAPTER 9 SUMMARY AND CONCLUSION 91
SummaryIn order to provide security to ICs against side-channel attacks especially
Differential Power Analysis (DPA) it is necessary to implement the design in a logic that
can render constant power dissipation irrespective of the input combination WDDL is
proved to be advantageous to others and therefore is of great significance In this
dissertation work architecture for Blowfish Algorithm is designed and implemented in
WDDL style In this implementation bottom-up approach is used The low level entities
are designed and later they are all combined to form the entire module The key
scheduling is online The sub-keys generated for a particular key can be used for the
encryption of the entire data to be encrypted with that key The sub keys are given in
reverse direction for the decryption data path Initially logic gates are implemented in
WDDL and then higher modules have been designed by instantiating the WDDL gates to
form the entire module thus resulting in constant power dissipation irrespective of any
input data combination The entire design works in two phases namely Precharge phase and
Evaluation phase In the Precharge phase all the signals of the design are zeroed and
during the Evaluation phase the functionality of the design is achieved This sort of design
has been found simple and very effective in thwarting the side-channel attack namely
Differential Power analysis (DPA)92 ConclusionThe crypto processor has been
44
designed for the key size of 448 bits and plain text of 64 bits The code for the
implementation has been written in VHDL The functional verification has been done using
the Cadence (NC Launch) Simulation package and Synthesis using RTL Compiler The
Backend of the design is done using the SOC EncounterAccording to the specifications
desired functionality has been achieved In the output during the Evaluation phase there
has been same number of transitions thus resulting in constant power dissipation During
Synthesis it has been observed that a simple WDDL gate comprised many conventional
gates Therefore the area of the design has grown nearly three-fold when compared to the
design implemented in conventional CMOS logic at the cost of security incorporated into
the IC against Differential Power Analysis (DPA) Due to the constant power dissipation at
the output hacker cannot apply Differential Power Analysis (DPA) scheme to find the
secret key that is being used in the crypto-processor Thus security against DPA is
incorporated into the IC at hardware level by implementing the design in WDDL style
which is quite simple and effectiveCHAPTER 10
REFERENCES 101 Referred Technical papers[1] Kris Tiri Member
IEEE and Ingrid Verbauwhede Senior Member IEEE ldquoA Digital Design Flow for
Secure Integrated Circuitsrdquo IEEE Transaction on Computer-Aided Design of Integrated
Circuits and Systems Vol 25 No 7 July 2006[2] Prof Jean-Jacques Quisquater Math
RiZK laquo Side Channel Attacks raquo October 2002[3] Ross Anderson Mike Bond Jolyon
Clulow and Sergei Skorobogatov ldquoCrypto processors ndash A Surveyrdquo IEEE Proceedings[4]
Noohul Basheer Zain Ali James M Noras ldquoOptimal Data path Design for a Cryptographic
Processor The Blowfish Algorithmrdquo Malaysian Journal of Computer Science Vol 14 No
1 June 2001 pp 16-27[5] Dan Rinehimer Derek Wilson ldquoSummary of B Schneierrsquos
Description of New Variable Length Key 64-Bit Block Cipher (Blowfish)rdquo[6] Kris Tiri and
Ingrid Verbauwhede ldquoA Dynamic and Differential CMOS Logic Style to Resist Power and
Timing Attacks on Security ICrsquosrdquo[7] Discretix Technologies Limited ldquoIntroduction to Side
45
Channel Attacksrdquo[8] Kris Tiri Moonmoon Akmal and Ingrid Verbauwhede ldquoA Dynamic
and Differential Logic with Signal Independent Power Consumption to withstand
Differential Power Analysis on Smart CardsrdquoReference books[1] William Stallings
ldquoCryptography and Network Security Principles and Practicesrdquo Pearson Education
2003[2] Samir Palnitkar ldquoVerilog HDL A Guide to Digital Design and Synthesisrdquo
Prentice Hall Referred Web Pages[1] httpwwwschneiercomblowfishhtml[2]
httpwwwnistgov[3] httpwwwdiscretixcomPDFIntroduction20to20Side20Channel
20Attackspdf[4] httpwwwwipointpctdbenwojsp
IA=WO2005081085ampDISPLAY=CLAIMS
46
CHAPTER 9 SUMMARY AND CONCLUSION 91
SummaryIn order to provide security to ICs against side-channel attacks especially
Differential Power Analysis (DPA) it is necessary to implement the design in a logic that
can render constant power dissipation irrespective of the input combination WDDL is
proved to be advantageous to others and therefore is of great significance In this
dissertation work architecture for Blowfish Algorithm is designed and implemented in
WDDL style In this implementation bottom-up approach is used The low level entities
are designed and later they are all combined to form the entire module The key
scheduling is online The sub-keys generated for a particular key can be used for the
encryption of the entire data to be encrypted with that key The sub keys are given in
reverse direction for the decryption data path Initially logic gates are implemented in
WDDL and then higher modules have been designed by instantiating the WDDL gates to
form the entire module thus resulting in constant power dissipation irrespective of any
input data combination The entire design works in two phases namely Precharge phase and
Evaluation phase In the Precharge phase all the signals of the design are zeroed and
during the Evaluation phase the functionality of the design is achieved This sort of design
has been found simple and very effective in thwarting the side-channel attack namely
Differential Power analysis (DPA)92 ConclusionThe crypto processor has been
44
designed for the key size of 448 bits and plain text of 64 bits The code for the
implementation has been written in VHDL The functional verification has been done using
the Cadence (NC Launch) Simulation package and Synthesis using RTL Compiler The
Backend of the design is done using the SOC EncounterAccording to the specifications
desired functionality has been achieved In the output during the Evaluation phase there
has been same number of transitions thus resulting in constant power dissipation During
Synthesis it has been observed that a simple WDDL gate comprised many conventional
gates Therefore the area of the design has grown nearly three-fold when compared to the
design implemented in conventional CMOS logic at the cost of security incorporated into
the IC against Differential Power Analysis (DPA) Due to the constant power dissipation at
the output hacker cannot apply Differential Power Analysis (DPA) scheme to find the
secret key that is being used in the crypto-processor Thus security against DPA is
incorporated into the IC at hardware level by implementing the design in WDDL style
which is quite simple and effectiveCHAPTER 10
REFERENCES 101 Referred Technical papers[1] Kris Tiri Member
IEEE and Ingrid Verbauwhede Senior Member IEEE ldquoA Digital Design Flow for
Secure Integrated Circuitsrdquo IEEE Transaction on Computer-Aided Design of Integrated
Circuits and Systems Vol 25 No 7 July 2006[2] Prof Jean-Jacques Quisquater Math
RiZK laquo Side Channel Attacks raquo October 2002[3] Ross Anderson Mike Bond Jolyon
Clulow and Sergei Skorobogatov ldquoCrypto processors ndash A Surveyrdquo IEEE Proceedings[4]
Noohul Basheer Zain Ali James M Noras ldquoOptimal Data path Design for a Cryptographic
Processor The Blowfish Algorithmrdquo Malaysian Journal of Computer Science Vol 14 No
1 June 2001 pp 16-27[5] Dan Rinehimer Derek Wilson ldquoSummary of B Schneierrsquos
Description of New Variable Length Key 64-Bit Block Cipher (Blowfish)rdquo[6] Kris Tiri and
Ingrid Verbauwhede ldquoA Dynamic and Differential CMOS Logic Style to Resist Power and
Timing Attacks on Security ICrsquosrdquo[7] Discretix Technologies Limited ldquoIntroduction to Side
45
Channel Attacksrdquo[8] Kris Tiri Moonmoon Akmal and Ingrid Verbauwhede ldquoA Dynamic
and Differential Logic with Signal Independent Power Consumption to withstand
Differential Power Analysis on Smart CardsrdquoReference books[1] William Stallings
ldquoCryptography and Network Security Principles and Practicesrdquo Pearson Education
2003[2] Samir Palnitkar ldquoVerilog HDL A Guide to Digital Design and Synthesisrdquo
Prentice Hall Referred Web Pages[1] httpwwwschneiercomblowfishhtml[2]
httpwwwnistgov[3] httpwwwdiscretixcomPDFIntroduction20to20Side20Channel
20Attackspdf[4] httpwwwwipointpctdbenwojsp
IA=WO2005081085ampDISPLAY=CLAIMS
46
designed for the key size of 448 bits and plain text of 64 bits The code for the
implementation has been written in VHDL The functional verification has been done using
the Cadence (NC Launch) Simulation package and Synthesis using RTL Compiler The
Backend of the design is done using the SOC EncounterAccording to the specifications
desired functionality has been achieved In the output during the Evaluation phase there
has been same number of transitions thus resulting in constant power dissipation During
Synthesis it has been observed that a simple WDDL gate comprised many conventional
gates Therefore the area of the design has grown nearly three-fold when compared to the
design implemented in conventional CMOS logic at the cost of security incorporated into
the IC against Differential Power Analysis (DPA) Due to the constant power dissipation at
the output hacker cannot apply Differential Power Analysis (DPA) scheme to find the
secret key that is being used in the crypto-processor Thus security against DPA is
incorporated into the IC at hardware level by implementing the design in WDDL style
which is quite simple and effectiveCHAPTER 10
REFERENCES 101 Referred Technical papers[1] Kris Tiri Member
IEEE and Ingrid Verbauwhede Senior Member IEEE ldquoA Digital Design Flow for
Secure Integrated Circuitsrdquo IEEE Transaction on Computer-Aided Design of Integrated
Circuits and Systems Vol 25 No 7 July 2006[2] Prof Jean-Jacques Quisquater Math
RiZK laquo Side Channel Attacks raquo October 2002[3] Ross Anderson Mike Bond Jolyon
Clulow and Sergei Skorobogatov ldquoCrypto processors ndash A Surveyrdquo IEEE Proceedings[4]
Noohul Basheer Zain Ali James M Noras ldquoOptimal Data path Design for a Cryptographic
Processor The Blowfish Algorithmrdquo Malaysian Journal of Computer Science Vol 14 No
1 June 2001 pp 16-27[5] Dan Rinehimer Derek Wilson ldquoSummary of B Schneierrsquos
Description of New Variable Length Key 64-Bit Block Cipher (Blowfish)rdquo[6] Kris Tiri and
Ingrid Verbauwhede ldquoA Dynamic and Differential CMOS Logic Style to Resist Power and
Timing Attacks on Security ICrsquosrdquo[7] Discretix Technologies Limited ldquoIntroduction to Side
45
Channel Attacksrdquo[8] Kris Tiri Moonmoon Akmal and Ingrid Verbauwhede ldquoA Dynamic
and Differential Logic with Signal Independent Power Consumption to withstand
Differential Power Analysis on Smart CardsrdquoReference books[1] William Stallings
ldquoCryptography and Network Security Principles and Practicesrdquo Pearson Education
2003[2] Samir Palnitkar ldquoVerilog HDL A Guide to Digital Design and Synthesisrdquo
Prentice Hall Referred Web Pages[1] httpwwwschneiercomblowfishhtml[2]
httpwwwnistgov[3] httpwwwdiscretixcomPDFIntroduction20to20Side20Channel
20Attackspdf[4] httpwwwwipointpctdbenwojsp
IA=WO2005081085ampDISPLAY=CLAIMS
46
Channel Attacksrdquo[8] Kris Tiri Moonmoon Akmal and Ingrid Verbauwhede ldquoA Dynamic
and Differential Logic with Signal Independent Power Consumption to withstand
Differential Power Analysis on Smart CardsrdquoReference books[1] William Stallings
ldquoCryptography and Network Security Principles and Practicesrdquo Pearson Education
2003[2] Samir Palnitkar ldquoVerilog HDL A Guide to Digital Design and Synthesisrdquo
Prentice Hall Referred Web Pages[1] httpwwwschneiercomblowfishhtml[2]
httpwwwnistgov[3] httpwwwdiscretixcomPDFIntroduction20to20Side20Channel
20Attackspdf[4] httpwwwwipointpctdbenwojsp
IA=WO2005081085ampDISPLAY=CLAIMS
46