Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
FACULDADE DE ENGENHARIA DA UNIVERSIDADE DO PORTO
RTL Guidelines for Static PowerReduction
Ciro de Moura Monteiro
Mestrado Integrado em Engenharia Eletrotécnica e de Computadores
Synopsys Supervisor: Hélder Silva
FEUP Supervisor: José Carlos Alves
July 27, 2016
c© Ciro de Moura Monteiro, 2015
Resumo
Nos dias que correm, com o crescimento de aparelhos portáteis, operados por baterias decapacidade limitada, é importante uma boa gestão de energia para garantir o maior período deoperação possível. O consumo dinâmico de energia foi em tempos uma das maiores consideraçõesa ter no design de circuitos para baixo consumo de energia, mas hoje em dia, algumas técnicas deredução de consumo dinâmico são aplicadas automaticamente pelas ferramentas.
Cada vez se conseguem produzir circuitos integrados com mais transístores e até mesmo comtransístores mais pequenos. No entanto, daí advém também o problema do aumento nas correntesde fuga. Uma abordagem possível para tentar reduzir o efeito da potência estática consumida pelascorrentes de fuga destes circuitos é apelidada de power gating.
Power gating consiste no uso de transístores como interruptores para ligar e desligar a alimen-tação de partes de um circuito integrado. Para tal, podem ser usados transístores de cabeçalho outransístores de rodapé, cada um com as suas vantagens e desvantagens.
i
ii
Abstract
In today’s world, we are witnessing a growth in battery operated portable devices, that requiresmart power choices due to their limited battery life. Dynamic power consumption has been amajor consideration when designing power aware devices, but some dynamic power savings arealready automatically introduced by the designing tools.
Current technology has evolved into having smaller transistors and that enables building chipswith bigger transistor density. With this technology, new problems arise, as the existence of higherleakage currents. These may or may not be resolved by power reduction techniques, as not all ofthem are leakage oriented. One possible solution for this problem would be the power gatingtechnique.
Power Gating consists in using switching transistors to control the power supply of certainareas of the circuit. This power reduction technique allows the use of header or footer transistor,each one with its benefits and disadvantages.
iii
iv
Agradecimentos
Gostava de deixar um agradecimento...Em especial ao Hélder Silva e Athul Stripad por me acompanharem todas as semanas e aju-
darem em algumas decisões importantes.Ao professor José Carlos Alves por me orientar neste trabalho, e me ajudar a tomar decisões.Ao Nelson Eira pela paciência para me explicar o funcionamento dos scripts utilizados pelo
ambiente de implementação.À empresa Synopsys pela possibilidade que me foi dada em fazer este projecto de dissertação
em ambiente empresarial.À minha família por me ajudar a crescer.À Susana Carvalho por me ajudar a escolher a minha especialização, da qual fiquei a gostar.À Inês Teixeira e Gabriel Ribeiro por me ajudarem com o meu inglês.A todos os meus outros amigos pela paciência para me aturarem.E por fim à FEUP e aos seus professores por me ajudarem na minha formação.
Ciro de Moura Monteiro
v
vi
“Laugh and the world laughs with you.Snore and you sleep alone”
Anthony Burgess
vii
viii
Contents
1 Introduction 11.1 Context . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.2 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.3 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.4 Power Gating . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.5 EDA Team Organisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41.6 Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2 Related Work 72.1 Power Consumption . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.1.1 Power and Energy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82.1.2 Dynamic Power . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102.1.3 Static Power . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.2 Power Gating . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142.2.1 Consideration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152.2.2 Header vs Footer Switching . . . . . . . . . . . . . . . . . . . . . . . . 152.2.3 Fine Grain vs Coarse Grain . . . . . . . . . . . . . . . . . . . . . . . . . 172.2.4 Power Intent Languages . . . . . . . . . . . . . . . . . . . . . . . . . . 202.2.5 UPF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3 Design Flow 333.1 Flow Without Power . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.1.1 Synthesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 353.2 UPF flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
3.2.1 Voltage Aware Synthesis . . . . . . . . . . . . . . . . . . . . . . . . . . 373.3 Power Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
4 Implementation 414.1 Steps taken . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 414.2 Obstacles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 434.3 Power Management Unit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 444.4 Final Result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
4.4.1 Power Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 464.5 Verification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
5 Results 495.1 Power Reduction Outcome . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 495.2 Guidelines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
ix
x CONTENTS
5.3 Alternatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
6 Conclusion 556.1 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
References 57
List of Figures
2.1 High power consumption . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92.2 Low power consumption . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92.3 Power consumption on a CMOS inverter. Source: [1] . . . . . . . . . . . . . . . 102.4 Inverter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102.5 Clock Gating Cell . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112.6 Summary of leakage currents of deep-submicrometer transistors. Source: [2] . . 122.7 Sub-threshold leakage path in a CMOS inverter . . . . . . . . . . . . . . . . . . 122.8 Header switching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162.9 Footer switching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162.10 Fine grain and cell . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172.11 Fine grain and cell with isolation clamp transistor . . . . . . . . . . . . . . . . . 182.12 Fine grain header switching and cell with isolation clamp transistor . . . . . . . 192.13 Companies involved in IEEE P1801 working group. Source [3]. . . . . . . . . . 212.14 Example of power domains. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222.15 Isolation cell between power domains . . . . . . . . . . . . . . . . . . . . . . . 232.16 State Retention Isolation. Source: [4] . . . . . . . . . . . . . . . . . . . . . . . 242.17 Power domain with heterogeneous fan-out . . . . . . . . . . . . . . . . . . . . . 252.18 Retention register. Source: [4] . . . . . . . . . . . . . . . . . . . . . . . . . . . 262.19 Enable Level Shifter Example . . . . . . . . . . . . . . . . . . . . . . . . . . . 272.20 Header switch cell . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 282.21 Isolation on input of heterogeneous fan-out . . . . . . . . . . . . . . . . . . . . 292.22 Redundant isolation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 302.23 Output isolation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3.1 UPF tool flow. Source: [5] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 333.2 Design flow for multi-voltage, power gated designs. Source: [4] . . . . . . . . . 34
4.1 Basic representation of the module used. . . . . . . . . . . . . . . . . . . . . . . 424.2 Function State Machine. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 454.3 Block representation of the power domains . . . . . . . . . . . . . . . . . . . . 47
xi
xii LIST OF FIGURES
List of Tables
2.1 Main parameter for the seven-metal-layer 90-nm CMOS technology node. Source:[6] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.2 Example of PST . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
4.1 EDMA power state table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 434.2 Edma power state table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
5.1 Power during activity. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 505.2 Power consumption related to current implementation, during activity. . . . . . . 505.3 Power consumption during full simulation. . . . . . . . . . . . . . . . . . . . . . 515.4 Power consumption for a full simulation, write traffic. . . . . . . . . . . . . . . . 515.5 Power consumption for a full simulation, read traffic. . . . . . . . . . . . . . . . 515.6 Relative power consumption for a full write simulation. . . . . . . . . . . . . . . 51
xiii
xiv LIST OF TABLES
Symbols and Abbreviations
CAD Computer-Aided DesignCPF Common Power FormatCTS Clock Tree SynthesisDC Design CompilerDFT Compiler Design-for-test CompilerDMA Direct Memory AccessDRC Design Rule CheckDVE Debugging and Visualisation EnvironmentDVFS Dynamic Voltage and Frequency ScalingDVS Dynamic Voltage ScalingEDA Electronic Design AutomationeDMA Embedded DMAFET Field Effect transistorGIDL Gate Induced Drain LeakageHDL Hardware Description LanguageIC Integrated CircuitIEEE Institute of Electrical and Electronics EngineersIoE Internet of EverythingIoT Internet of ThingsIP Intellectual PropertyIP Intellectual PropertyIR drop Voltage drop due to energy losses in a resistive pathMOS Metal–Oxide–SemiconductorMTCMOS Multi-Threshold CMOSMVSIM Multi-Voltage SimulationNLP Native Low PowerNMOS N channel MOSFETPG Power and GroundPMU Power Management UnitPST Power State TablePVT Process, Voltage, TemperatureQoR Quality of ResultsRCE Regression Control EnvironmentRTL Register Transfer LevelSAIF Switching Activity Interchange formatSDC Synopsys Design ConstraintsSI International System of Units (Système international d’unités)SI2 Silicon Integration Initiative
xv
xvi SYMBOLS AND ABBREVIATIONS
SoC System on a ChipSPEF Standard Parasitic Exchange FormatTCL Tool Command LanguageUPF Unified Power FormatUVM Universal Verification MethodologyVC LP Verification Compiler Low PowerVCD IEEE Standard 1364-1995, Value Change DumpVCS Verilog Compiled code SimulatorVHDL VHSIC Hardware Description LanguageVHSIC Very High Speed Integrated CircuitVIP Verification IP/Verification Link PartnerVLSI Very-Large-Scale IntegrationVPD VCD PlusVTB Verilog Test BenchWWW World Wide Web
Concepts
Clock Gating Technique to reduce clock activityPower Density Power consumed by areaPower Gating Cutting power using a switchPower Island Island that kept ON inside power domain that is OFFShadow Registers Register used to store data in sleep modeVDD Positive voltage railVSS Negative voltage rail/Reference voltage
xvii
Chapter 1
Introduction
Energy efficiency is a very important aspect of electronic circuit design nowadays. Just a few
decades ago, designers used to focus only in having a working chip, and power consumption was
not a primary design concern. The requirements for portability, mobility or battery dependency
were very infrequent and the early CMOS technologies for digital electronics were sufficiently
constrained in terms of power consumption. Then, as technology evolved, transistors size de-
creased, making it possible to fit more of them in one die, and so, power consumption gained
importance. Nonetheless, only dynamic power seemed to matter, due to the fact that for CMOS
technology above 130nm leakage is negligible [4]. Nowadays, power is more important than ever,
especially for battery operated and mobile devices.
Power hungry chips tend to have high power density, which generates a lot of heat for a small
dissipation area. This raises power dissipation issues, requires expensive cooling systems and may
cause chip lifetime reduction.
Due to environmental concerns, there is an interest on reducing power consumption from
devices in an effort of reducing pollution and power wasted without activity. Systems may have
most of their modules idling, consuming power without executing any operation. Chips should
idle efficiently, to reduce the energy consumed on their operation.
In the last years, portability and mobility have gained a still growing importance, and in that
field, power consumption is of paramount importance. For example, there is a significant inconve-
nience for users if portable equipments have to be constantly charged. Not only batteries have little
autonomy without power management, they have short lifetime as well and have to be replaced.
Efficient power management can make a better use of batteries’ energy, making them last longer
and avoiding the hassle of replacing or charging them constantly.
To reduce power consumption in digital integrated VLSI systems, various considerations must
be kept in mind. Dynamic power used to be the major concern in power aware designs, but as
the technology nodes decrease, and transistor density increases, power consumption has gained
increasing importance due to transistor’s leakage. Besides, most circuits today already implement
techniques to reduce dynamic power consumption, such as clock gating. Clock gating effec-
tively reduces the activity in the system consequently reducing dynamic power consumption (as
1
2 Introduction
deductible from equation 2.4).
Today’s techniques to reduce power consumption include using higher threshold transistors in
non-critical paths of the design. That includes using high K dielectric gate oxide, well biasing
and bigger transistors. These techniques are applied at a very low level, they don’t save that
much power and become expensive because they require extra masks for the different transistors.
Sometimes it may be essential to use low vt or ultra low vt transistors to achieve the very fast
speeds that industry demands nowadays. These techniques are also very time consuming since
they may require the designer’s attention to single gates.
Before the 90nm technology node became available, designers used to simply migrate chips
to lower geometries to reduce power. This would take advantage of lower supply voltages as well
as lower capacitance. Integrating a 180nm chip into a 130nm would cut down power to almost
half. Although voltage decreases, current will increase. At 90nm, the increase in current is more
significant than the decrease in voltage and this results in a higher power consumption than the
expected [7].
Static power importance has increased significantly as technology size decreases. Once negli-
gible, at the deep sub-micron level, static consumption can get almost as high as 50% of the total
power consumption [8]. A very useful and effective technique to reduce static power consumption
is power gating. Power gating consists in shutting down inactive parts of a circuit. Implementing
this technique is a very time consuming task due to power necessity identification, and testing.
Power gating was created to reduce static power at the block level. By cutting off the supply to
the module, when power gating is applied, power consumption drops theoretically to zero. Major
dynamic and static power reductions can be achieved by addressing power on the RTL (Register
Transfer Level) and system level [9] [10]. This provides a higher abstraction level to the designer,
and allows the RTL designer to produce power aware circuits, otherwise only implemented at the
back-end.
1.1 Context
This dissertation is being developed in the scope of the Master in Electrical and Computers
Engineering from the Faculty of Engineering from University of Porto. This work was proposed
by Synopsys R© from a previous contact between July and September 2015 on a summer job.
1.2 Motivation
Everyday, a great number of portable devices is in development. Smart phones and smart
watches are two good examples of technological evolution. These are evidences of the growing of
IoT (Internet of Things), also named IoE (Internet of Everything) by some, given its extent. These
devices are small and most of them are battery operated, which requires smart power choices.
Transistor’s size has been decreasing during the past few years, allowing smaller devices with
higher transistor density. Also, this lowers the operating supply voltage, which translates to a
1.3 Objectives 3
quadratic reduction in the power consumption (P(t) = V (t)2/R). On the other hand, leakage
power also increases with smaller transistors due to lower threshold voltage and increasing quan-
tity of transistors per die. Besides, smaller transistors also allow faster transitions which increases
dynamic power consumption.
When designing digital circuits, reducing power consumption has been a major concern in the
last few years. Designers often implement several methodologies to reduce dynamic power con-
sumption, like clock gating, but as technology improves and gets smaller, static energy consump-
tion has been gaining importance and becoming a bigger slice in the total power consumption.
Even when a circuit is not active, it will consume energy. This is because transistors are not
perfect and permit small currents to flow, despite the logical off state. With the increase in the
number of transistors in a single chip, the sum of these small currents becomes significant. This
problem is known as leakage, because unwanted current is leaking through the transistors.
For this reason, leakage current is an important matter as technology advances, because of
the decreasing size of transistors, which leads to smaller threshold voltage, provoking an increase
in leakage. Nevertheless, switching currents decrease due to smaller capacitance, and therefore,
static power consumption increases, comparing to dynamic power consumption. Switching off the
inactive parts of a circuit can be a solution for the leakage problem. This method is called power
gating.
Power gating is already implemented in several designs and has proved effectiveness, but it is
not yet automatic and there are no guidelines defined for high level static power savings.
1.3 Objectives
The goal in this dissertation is to evaluate strategies for power gating mechanisms implemen-
tation, applied to a high-speed intellectual property (IP) module, characterise the power savings
and the impact in the circuit design, resulting in a set of design guidelines to introduce power
gating in future designs. The study of different approaches is made, and their impact on the design
evaluated.
Since power gating implementation on an IP block designed without power taken into account
can be very troublesome and slow, emerged the need for some guidelines that could be followed
to reduce the implementation time. These guidelines could also be useful in the future with the
intention of automating the process.
Another objective is to evaluate the best way to use the tools to implement power gating in the
RTL design as well as checking the functionality before and after power gating implementation.
1.4 Power Gating
Power gating consists in using transistors as switches to control the supply of power to se-
lected parts of a circuit, according to its activity. Power gating’s purpose is to reduce static power
consumption. As almost every improvement in digital circuits design is a trade-off, power gating
4 Introduction
decreases the power consumption in exchange for a small area increase as well as increased design
complexity.
A concept in power gating is power domains, these domains are composed by circuit blocks
that share similar activity requirements. Each power domain can be controlled by a different signal,
allowing some power domains to be active while others are sleeping, providing functionality while
saving power. When a power domain is in sleep mode, it’s registers lose their value, which may
become a problem if there is a need to keep state during sleep mode. To solve this issue, we can
use retention registers. Retention registers are used to save the value of important registers and
will be looked upon on later chapters.
Connecting circuits that are not active to active circuits may cause incorrect reads of values.
To avoid this, isolation cells are used to connect inactive and active blocks.
Static power consumption increases as transistors get smaller, and has been gaining importance
in power aware designs. A good solution to address this issue is power gating, that effectively cuts
down the leakage currents. The challenge when adopting a power gating strategy is to decide
which modules should be power controlled and decide when to power on/off, according to the
required functional specifications and the design speed/area trade-offs. Even though there are
other techniques, the focus of this work will be mainly on power gating.
There are description languages that allow to integrate power gating in hardware design with
the help of EDA (Electronic Design Automation) and verification tools. These languages are
known as power intent description languages, because they are used to describe the mechanisms
that control power to the modules.
1.5 EDA Team Organisation
The design of an integrated electronic circuit is a very complex task that requires a close
cooperation among different teams with competences in diverse areas. It is common for EDA
teams to be organised in small sub-teams. A project burden is often divided between the sub-
teams. Usually there will be a front-end team, responsible for RTL design. There is also a team
for RTL verification, responsible for writing test-benches and verify the RTL implementation of
the logical intent. The back-end team is responsible for place and route as well as layout.
Power gating is usually implemented at the front-end, but impacts a lot the back-end process,
and some decisions should be made together. Although there are other techniques for static power
reduction, implemented at the back-end stage of the process, they are not as effective as cutting
the supply voltage to the design. Power gating causes an increase in design effort for both the
front-end and back-end teams, resulting in some decisions being taken together.
Power intent specification is made at the RTL level, allowing verification of the logical opera-
tion with RTL simulations. Simulations at the netlist or lower level would increase implementation
and verification times a lot. This simulations should also be performed, but only after having good
results from the RTL simulation.
1.6 Structure 5
This work involves RTL design and verification of the design, and also synthesis, necessary
for power analysis, that is usually not done by the front-end team.
1.6 Structure
This document is divided in 6 chapters, Introduction 1, Related Work 2, Design Flow 3,
Implementation 4, Results 5 and Conclusion 6.
The Introduction chapter (1), as its name states, is an introduction to the work developed
throughout this master’s dissertation.
In the Related Work (2) chapter, a bibliographic review and study of the current state of the art
is made, introducing the major concepts used in power gating.
The Design Flow (3) chapter is dedicated to explaining the differences between a traditional
digital CMOS circuit design and a power oriented one.
Implementation (4) is a chapter dedicated to explaining the implementation decisions taken
during the development as well as the final result.
The Results chapter (5) contains the results obtained from the implementation.
Conclusion (6) is the final chapter, where the last conclusions about this work are made, as
well as possible future improvements to it are described.
6 Introduction
Chapter 2
Related Work
In today’s world, with the growth of the VLSI industry and the portability of devices, the num-
ber of gates inside a single IC chip has been increasing. This allows more logical components in
the same die or even smaller dies, but causes an increasing power consumption and, consequently,
can raise thermal and energy problems.
One concern, that has been gaining importance, is static power consumption. Static power is
consumed when the circuit is in an idle state, that is, when it has no activity.
Power dissipation causes heat, which can be prejudicial to chips, reducing their lifespan and
affecting performance, but reducing the heat may require expensive cooling systems that raise
products market cost.
As most of the devices in these days are battery operated, it is a major concern to improve their
durability. Batteries hold limited charge, and taking into account today’s electronic devices power
consumption, they usually don’t last very long. As such, there is the need of buying new batteries
constantly. Non rechargeable batteries have a big environmental impact, and rechargeable ones last
a limited amount of charging cycles. To reduce the impact of batteries, better energy efficiency is
needed.
This chapter presents a study of power related problems on CMOS digital circuits and some
techniques to avoid them, focusing mainly on power gating, a technique for static power reduction.
2.1 Power Consumption
With the growth of mobile devices and applications, as well as all the environmental concerns,
power consumption is becoming an important criteria in electronics system designs. Synopsys’
EDA tools provide various solutions for power aware design, some of them automated. On the
other hand, new techniques emerge with better trade-offs and better power savings, and these
are available in the market. Therefore implementing these becomes a market advantage for both
Synopsys and its costumers.
Energy loss is converted into heat, which can be prejudicial to electronic components, therefore
there is a need to maintain a low temperature in these devices. One way of guaranteeing this is to
7
8 Related Work
use cooling systems. Reducing power consumption will reduce heat dissipation, creating a better
system in many aspects, and making it possible to drop the cooling system, since the cooling
systems raise the price of the product itself.
Current technology already handles clock gating automatically, as well as other power sav-
ing techniques, but leakage power is gaining importance and a solution to efficiently reduce this
problem is needed.
CMOS digital circuits used to have negligible static power losses, as referred by [11] in 2003:
"Historically, complementary metal-oxide semi-conductor technology has dissipated
much less power than earlier technologies such as transistor-transistor and emitter-
coupled logic. In fact, when not switching, CMOS transistors lost negligible power.
However, the power they consume has increased dramatically with increases in device
speed and chip density."
The power consumption in digital CMOS circuits is given by the equation 2.1. This equation
can be divided into dynamic and static power consumption, as seen in subsections below ( 2.1.2
and 2.1.3). The first term is the dynamic power consumption and the second one the static power
consumption. P represents the total power consumption. A is the fraction of gates switching, C
the total capacitance load of all gates and f the clock frequency. Ileak is the leakage current and V
the supply voltage.
P = ACV 2 f +V Ileak (2.1)
Source: [11]
2.1.1 Power and Energy
Energy and power are two important but different concepts, specially concerning portable
devices. For these type of devices, battery life is a big concern, as well as heat dissipation, because
there is usually not enough space to implement an efficient cooling system, if any.
Power SI unit is watt, and it represents the amount of energy transferred per unit of time.
Instant power is given by:
P(t) =V (t)× I(t) (2.2)
Energy is what a system converts into work or heat. Batteries provide energy for a circuit to
execute a given function. Energy SI unit is joule (J) and is usually measured in a time interval.
Energy can be calculated as the integral of power for a given time interval:
E =∫ T
0P(t)dt (2.3)
From that it is possible to deduce that energy is the power used over a given power interval.
2.1 Power Consumption 9
A higher power demanding circuit will consume more energy than a low power one. Specially
for battery operated devices, power consumption is very important. Having a lower power design
will result on longer operational times. As can be seen in figures 2.1 and 2.2, both graphs have the
same energy, the graph’s area are the same. If two battery operated circuits had power consump-
tions similar to the ones seen in the images, the first one would have less operational time due to
its higher power consumption.
Time
Power
Figure 2.1: High power consumption
Time
Power
Figure 2.2: Low power consumption
Static CMOS logic cells are made of NMOS and PMOS transistor nets, based on their ability
to work like digital switches. Transistors are however not ideal, their gates are capacitive inputs,
which makes the logic gate inputs capacitive. Transistors are also non ideal switches since they
have a non zero ON resistance and have finite OFF resistance. These cause what’s called parasitic
impedance. Parasitic impedances, on CMOS gates, will consume power when switching state,
through charging and discharging of its parasitic capacitors, and static power, when not switching,
due to non infinite OFF resistance and non zero ON resistance.
In figure 2.3, it is possible to observe the paths of static and dynamic power consumption on
a CMOS inverter. Logical gates have input and output capacitance due to transistors parasitic
effects. Pstatic is the static power consumption of the inverter, being V the supply voltage and Ileak
the leakage current. Isc is the short circuit current during state transition, C the capacitive output
of the gate and input of the next gate. Iswtch is the switching current, and fswitch represents the
effective switching frequency, calculated based on clock frequency ( fclk) and activity (A).
10 Related Work
Figure 2.3: Power consumption on a CMOS inverter. Source: [1]
2.1.2 Dynamic Power
Dynamic power consumption arises from constant charging and discharging of parasitic ca-
pacitances on the output of millions of gates inside an integrated circuit. Transistors are not ideal
and have parasitic capacitance. These capacitances are charged and discharged according to the
output of logic gates. If the logical state is one, the capacitance is charged, on the contrary, if
the logical state is zero, the capacitor is discharged. The transitions between zero and one is what
consumes dynamic power, and it was in times the most important factor in power consumption.
CoutCin
Figure 2.4: Inverter
In figure 2.4 it is possible to view a representation of an inverter with its parasitic input and
output capacitor. These parasitic capacitances are the cause of dynamic power consumption, and
port delays. Another parasitic effect comes from the wire interconnections, the bigger the wire,
the bigger the capacitance.
In digital CMOS circuits, fan-out is the ability of a logic port to drive other logical port inputs.
Clock distribution generates big networks, because all synchronous logic will require a reference
clock signal. The clock signal requires complex routing, and complex buffering due to the high
extent of the signal. Clock signal is implemented using a complex tree of buffers to be able to
drive all the gate inputs, and keep the required timing with reduced skew across the system.
Huge clock trees normally are the ones that consume the most dynamic power. That is why the
main focus in reducing dynamic power consumption is the reduction of clock frequency as well
2.1 Power Consumption 11
as the reduction of active cycles to inactive parts of the system through clock gating. Most of the
systems nowadays implement clock gating.
Latch
Enable
Clock
Figure 2.5: Clock Gating Cell
Dynamic power consumption can be calculated from expression 2.4, the power loss is caused
by charging and discharging the gates capacitive loads. A is the fraction of gates actively switch-
ing, C the total capacitive load of the module, f the frequency and V the voltage [11]. As can be
observed, dynamic power losses depend on active gates, capacitance, frequency and voltage.
The number of active gates is dependable on the system needs, being the clock tree one of
the biggest contributors. Activity can be reduced by removing the clock signal from logic that is
currently not necessary, this technique is known as clock gating.
Voltage is the most important factor since it is a quadratic factor, a reduction of voltage to
half, will reduce dynamic power to a quarter, but the frequency the system is able to achieve also
depends on voltage ( 2.5), so reducing voltage can be prejudicial for high-speed interfaces. A
technique named Multi-Voltage can be useful for keeping different areas of a chip operating at
different voltages, according to their necessities.
Pdynamic = ACV 2 f (2.4)
f ∝(V −Vth)
α
V(2.5)
Source: [11]
2.1.3 Static Power
Transistors are not ideal digital switches and conduct small currents even when the gate voltage
is below the threshold voltage. These small currents, cause power consumption, more precisely
static power. Static power consists in the power consumed when the gates are not switching. Static
power, once negligible, gained a lot of importance since it has increased in the last few years.
Static power consumption can sometimes be up to 50% of the total power consumption of a
chip. This comes from the ever increasing number of transistors in each die, as well as the use of
lower threshold voltage transistors. In a effort to reduce power consumption, new techniques have
been developed, and can be used together for better efficiency.
12 Related Work
On a CMOS gate, there are four main leakage sources, sub-threshold leakage, gate leakage,
gate induced drain leakage (GIDL) and reverse bias junction leakage. Sub-threshold leakage
(ISUB) is the current that flows from drain to source when the transistor is operating in the weak
inversion region. Gate induced drain leakage (IGIDL) is the current induced by a high field effect
in the drain caused by a high VDG. Reverse bias junction leakage (IREV) is caused by minority
carrier drift and generation of electron/hole pairs in the depletion regions. Gate leakage (IGATE) is
the current that flows through the gate oxide to the substrate layer due to gate oxide tunnelling and
hot carrier injection [4]. Gate leakage can be improved by using materials with higher dielectric
constant for the gate oxide.
Figure 2.6: Summary of leakage currents of deep-submicrometer transistors. Source: [2]
In figure 2.6, it is possible to observe the sneaky paths in a MOS transistor, where static
current leaks. I1 is the reverse bias pn junction leakage, I2 the subthreshold leakage, I3 the oxide
tunnelling current, I4 the gate current due to hot-carrier injection, I5 the GIDL and I6 is the channel
punchthrough current
According to [11], the two major components of static power consumption are gate leakage
and sub-threshold leakage. Sub-threshold leakage is a weak inversion current across the device,
some devices can be designed to work in sub-threshold mode, but it is out of the scope of this
thesis.
VDD
Isub
Figure 2.7: Sub-threshold leakage path in a CMOS inverter
Static power consumption depends only on voltage and current 2.6. Reducing either voltage
or current is effective to reduce static power consumption. Since reducing voltage can introduce
frequency problems, as seen in 2.1.2, reducing current is the best way to go. To achieve current
2.1 Power Consumption 13
reduction, one can implement power gating, transistors with higher threshold voltage, and lower
leakage currents, switch on and off the module power rails.
Pstatic =V Ileak (2.6)
Leakage current can be approximated as a combination of sub-threshold and gate-oxide leak-
age:
Ileak = Isub + Iox (2.7)
Gate leakage is the current that flows through the gate oxide, due to the quantum-mechanical
tunneling of electrons, as described by [6]:
"For oxide thicknesses below 4 nm, high current leakages through the oxide can occur
due to the quantum-mechanical tunneling of electrons. The gate leakage current can
not only negatively affect the device performance but also significantly increase the
standby power consumption of a chip."
When a transistor works as a digital switch, it operates either in an active mode or cuts off
the signal. More specifically, a MOSFET enters the cut off state when its gate-substract voltage
difference is bellow the transistor’s threshold voltage. Nonetheless, since transistors are not ideal
switches, they have a non infinite resistance in the off state, which means that a small amount
of power will be consumed by this component. This effect is named as sub-threshold leakage
because, as the name suggests, the gate voltage will be below the threshold.
Drain-bulk and Source-bulk contribute with their reverse currents. Two important mecha-
nisms contribute to bulk current, gate induced drain leakage (GIDL) and impact ionisation. For
advanced technologies, impact ionisation is no longer important because supply voltage is in the
same or lower order than the band-gap of silicon, therefore, the carriers are no longer able to create
electron-hole pairs [6].
From table 2.1 one can observe that p-MOS transistors are less leakier than n-MOS of the
same size, but are not capable of carrying the same amount of current in saturation mode.
Table 2.1: Main parameter for the seven-metal-layer 90-nm CMOS technology node. Source: [6]
ParameterLogic (low power)n-MOS p-MOS
Supply voltage(V) 1.2Drawn gate(nm) 90tox(nm) 1.5VT(mV) 420 -400IDsat(mA/µm) 1.0 0.5Ioff(nA/µm) 15 6
14 Related Work
According to [12], the sub-threshold conduction current, for short channel MOSFETs can be
calculated with the following equation:
ID = ISeVGSnVT (2.8)
Source: [12]
In 2.8 IS is a constant, VT is the thermal voltage at room temperature and n is a constant whose
value depends on the material and structure of the device. Although small, power dissipation
becomes a problem on chips with billions of transistors. Other authors consider a more complex
and parameter dependent equation ( 2.9), W and L are the gate width and length respectively, the
other parameters are technological parameters.
ISUB =WL
µV 2thCsthe
VGS−VT +ηVDSnvth (1− e
−VDSvth ) (2.9)
Source: [13] [4]
As seen in the equation 2.9, sub-threshold leakage is exponentially dependent of VGS and
VT. As technology scales down VDD and VT to lower dynamic power consumption, static power
consumption increases.
2.2 Power Gating
This section presents power gating, it’s definition, important concepts, how it appeared, lan-
guages used to implement it as well as who made it happen and how the two actual standards that
exist were created.
Power gating consists in using a switch between the supply rails and the cells supply ports.
When the module is not in use, the switch is turned off, cutting power to the module and avoiding
the static power consumption. Power gating, with ideal digital switches would cut completely off
the leakage current consumed by the module. Since the switches are usually implemented using
CMOS technology, it reduces the leakage current of the whole module to the leakage current of
the switching transistors.
Power gating is implemented with transistors connected between supply and the module,
known as header switching, as seen in figure 2.8, between the module and ground, known as
footer switching, seen in figure 2.9 or both. Each approach has its advantages and disadvantages.
Implementing power gating is a trade-off, it increases the area as well as a dynamic power due
to switching between powered on and off state. So, in order to implement power gating, one must
be aware of this trade-off and make sure the implementation is beneficial. If a module is constantly
switching between the on and off state, the increase in dynamic power turns out to be bigger than
the decrease in static power, therefore turning this technique prejudicial rather than beneficial.
Shutting down inactive parts of a system may result in a loss of state, to avoid this problem less
leakier registers are used to retain state. These registers can introduce significant area overhead
because they are implemented using bigger transistors.
2.2 Power Gating 15
When applying power gating to a design, special care must be taken to avoid changes to critical
paths as well as to avoid creating new ones. Extra logic like the isolation cells and level shifters
cause delays in the data-path. Paths that hardly fulfil timing constraints, with the extra logic may
violate these constraints. Powered off logic takes time when powering back on, which may cause
performance issues.
Power gating can be implemented at the RTL level of abstraction either using power intent lan-
guages like Common Power Format (CPF) or Unified Power Format (UPF), this will be explained
further in another section.
Power switching can be driven either by hardware or software. Hardware implementation re-
quires more area for the control circuitry. Software implementation requires software development
as well as an interface.
2.2.1 Consideration
Some considerations must be kept in mind when implementing power gating. The dynamic
power consumed by the extra circuitry, the static power consumed by always on logic, as well as
the power switching transistors. The area cost should also be kept in mind, because higher Vth
transistors occupy a bigger area. The retention strategy is also important because retention cells
can cause a big area overhead.
A module that is switched off can not be directly connected to an always on module due to
floating voltages, requiring the use of isolation cells.
A single transistor may not be able to drive a full module as its width may not be enough to
drive all the current the module needs. Waking up the circuit too fast may cause a big inrush cur-
rent, which could damage some tracks. The voltage drop at the switching fabric must be carefully
analysed to ensure proper operation of the module.
When a block is power gated, its registers lose their value. It may be important to keep those
values in some cases. In those cases when there is a need to retain state, always on retention
registers are the solution. These registers are usually implemented with less leakier, and lower
voltage retention cells. However, this comes with a time penalty when restoring the values back
to the main registers at wake up time, raises dynamic power consumption and increases area.
2.2.2 Header vs Footer Switching
The switching transistors used for power gating can be placed between the power supply and
the power domain supply pins, or between the ground and the power domain ground pin. This is
known by header and footer switching respectively. Each of this implementations has its advan-
tages and disadvantages.
A single transistor is not able to carry enough current to power a large power domain. For
that reason, several transistors are used in parallel, which are usually staged in time to avoid large
inrush currents [7].
16 Related Work
The sleep transistor efficiency is a relation between the current in the ON state and the OFF
state (ION/IOFF ). The total leakage of the switching fabric is highly dependent on the switching
efficiency, because we need enough transistors to deliver the required ON state current [4].
Header switching is typically implemented using PMOS transistors to switch VDD. PMOS
transistors are less leakier their NMOS counterpart with the same size, however they provide
lower drive current when active. Header switches turn off the supply voltage allowing for simple
clamp of isolation cells to "0" using a single transistor. This type of isolation however should
only be used to close timing constraints due to the fact if they fail, it will cause hard to detect
stuck-at faults. Since at system level signals are usually referenced to ground ("0"), switching VDD
becomes more convenient.
VDD
Load
Figure 2.8: Header switching
Foot switching is typically implemented with NMOS transistors, they can drive a larger amount
of current than a PMOS transistor of the same size, having a smaller area cost on the design. Typ-
ically NMOS transistors have higher switching performance than PMOS [4], therefore allowing
greater energy savings and lower area impact in the same design. As footer transistors will switch
VSS, making the system more sensible to reference noise, which may become a problem.
VDD
Load
Figure 2.9: Footer switching
Some academic paper authors use both header and footer switching, however, the two series
switches cause a more significant voltage drop, which in turn increases the gates delay [4]. This
2.2 Power Gating 17
will also create a bigger area overhead since now we have two series switches performing the work
that only one would be enough for.
2.2.3 Fine Grain vs Coarse Grain
The power switch implementation can be either fine grain or coarse grain. Fine grain power
switches are part of the standard cell. It is required that the library contains standard cells with the
switches attached to them. Coarse grain power switching on the other hand can be implemented
with the addition of some special cells for power gating.
The decision of which implementation to use should be discussed with the back-end team,
since the bigger impact will be after synthesis. Usually, the chosen implementation will be coarse
grain power switching since it will create less area overhead, making it a better option even with
the increased design effort.
2.2.3.1 Fine Grain Switching
Since the switch has to be able to provide the worst case current necessary for the cell to
operate without performance loss, the area overhead can be considerable [4]. They may also
include a pull-up or pull-down transistor for isolation. Since the power switch is already inside the
standard cell, it is possible to use the traditional design flow.
Cells used for fine grain switching, with the embedded switching transistor are called Multi-
Threshold CMOS (MTCMOS) cells. MTCMOS cells contain the usual supply connections, inputs
and outputs and additionally they have an input for the sleep signal. MTCMOS cells are usually
implemented using foot switching, due to their higher switching performance. Those kind of cells
will create less area overhead than header switching. Even with foot switching, area overhead can
get close to four times the size of the original cell [4].
VDD
Sleep
Figure 2.10: Fine grain and cell
2.2.3.2 Fine Grain Advantages
According to [4] fine grain switching has some advantages:
18 Related Work
VDD
Sleep
Figure 2.11: Fine grain and cell with isolation clamp transistor
• Not sensible to ground noise injection because of short virtual power nets;
• Small wake-up latency and in-rush current due to small capacitance of virtual power nets;
• Built-in clamp transistors keep outputs in known states and eliminate wake-up crowbar cur-
rents;
• Timing impact of voltage drop across the switch and clamp behaviour are easy to charac-
terise since they are inside the cell;
• Can be easily analysed and synthesised by conventional ASICs tools and flow, since MTC-
MOS are basically a normal standard cell;
2.2.3.3 Fine Grain Disadvantages
The authors also name a couple of disadvantages from fine grain switching:
• Considerable area overhead, with increases up to three times the size of the original cell;
• Requires special library with MTCMOS cells;
• Significant buffering and routing resources for sleep control distribution;
2.2.3.4 Coarse Grain Switching
In coarse grain power switching, a collection of switches are used to gate a collection of blocks
of cells. Switch network sizing is harder than fine grain switching since the activity can not be
estimated. Coarse grain power switching however introduces significantly smaller area overhead
[4].
Due to the fact that area penalty for fine grain power switching is not worth the saving on
design effort, coarse grain switching became the industry preferred method [4].
2.2 Power Gating 19
VDD
Sleep
Figure 2.12: Fine grain header switching and cell with isolation clamp transistor
2.2.3.5 Coarse Grain Advantages
Coarse grain power switching is more widely accepted in the EDA industry for power gat-
ing implementation, mainly because of area constraints. The main advantages of coarse grain
switching, as explained in [4] are:
• Since sleep transistors can share charge, they are less sensitive to PVT (process, voltage,
temperature) variations and introduces less voltage drop variations;
• Significantly smaller areas than fine grain switching;
• Sleep transistors number can be optimised for voltage drop and speed targets;
• Existing standard cell libraries can be used with a few extra special cells;
2.2.3.6 Coarse Grain Disadvantages
Coarse grain also bring up some disadvantages, as stated in [4]:
• Requires complex power network;
• Power network is hard to synthesise and requires static and dynamic voltage drop analysis.
• Requires wake-up in-rush current control;
• Bigger wake-up latency;
• Power analysis is more complex;
• Has a more complex flow, due to increased complexity;
20 Related Work
2.2.4 Power Intent Languages
Power information is not supported by normal Hardware Description Languages (HDL), there-
fore it has to be described somewhere else. There is where power intent files come along. Written
in a different language, they specify how the module should behave in terms of power. These
files are written independently of the HDL, and later formally verified by the tools. Power intent
languages are used to describe power specifications on the RTL level, this high level of abstraction
allows better power savings.
Two standards are currently defined for power intent, developed by different companies: they
are Unified Power Format (UPF) and Common Power Format (CPF).
CPF is a Silicon Integration Initiative (Si2) standard for low power and has some interoper-
ability with IEEE1801 low power standard [14]. UPF, as it is commonly known, is the IEEE 1801
Standard for Design and Verification of Low-Power Integrated Circuits. UPF is based on Tool
Command Language (TCL) [5]. Some other languages, that are not standards, may also be used
by some companies. These languages however, do not have a high usage, since developers prefer
to use standards in an effort to unify the development for faster and easier integrations.
CPF 2.0 is a widely adopted low-power intent format, approved as an Si2 standard by the Low
Power Coalition. It allows some interoperability with IEEE1801-2009 (UPF). CPF supports hier-
archical low-power flow, output and bidirectional virtual ports, isolation strategies, level-shifting,
retention strategies and more.
This work uses UPF because Synopsys tools offer compatibility with it and some power intent
is already specified in UPF. Some further UPF explanation can be found in section 2.2.5.
2.2.5 UPF
UPF is the IEEE (Institute of Electrical and Electronics Engineers) standard for design and
verification of low power in integrated circuits, under the standard number 1801. It was originally
created in an effort for a open portable power specification standard and approved in 2007 as an
Accellera standard. In the same year, Accellera donated it to the IEEE. The first version of IEEE
Std 1801, second version of UPF, was released in 2009 [5].
Since IEEE Std 1801 is an open standard, it gives EDA tool providers the ability to imple-
ment its latest features. The standard is already supported by a large number of EDA companies.
Synopsys tools already support a large subset of the commands in UPF, as well as some UPF-like
power intent commands that are not part of the standard [15].
UPF focus on controlling voltage and current applied to the transistors, normally technology
used for the switches is assumed to be CMOS, but other technologies can also be used. UPF can
be applied with any of the three HDL description languages, VHDL, Verilog or SystemVerilog,
due to its abstraction level [5].
UPF supports a design hierarchy and is advisable for reusing of power intent across configura-
tions. UPF hierarchy is dependent on the RTL modules’ hierarchy, which can be a downside when
it was not designed taking power gating into account.
2.2 Power Gating 21
Figure 2.13: Companies involved in IEEE P1801 working group. Source [3].
The current active version of the standard is IEEE Std 1801-2015, approved on 8 December
2015 by the IEEE-SA Standards Board [16].
2.2.5.1 Concepts
When defining power intent with UPF, a few concepts must be learnt for better understanding
of its structure. This section explains the major concepts used in UPF for power gating.
Modules are put together in power domains according to their power specifications, if we have
two modules that turn off at the same time and use the same voltage, they can be put together in
the same power domain.
Ports are connection points between adjacent levels of hierarchy , connected together using
nets. UPF assumes a more abstract model of the design hierarchy, using its commands to change
the scope within the hierarchy levels. Ports have an HighConn, visible to the parent instance, and
a LowConn side, visible to the instance itself.
Power domains are collections of instances that are powered in the same way, child instances
are included in the same power domain as their parents. A power domain does not need to be con-
tiguous, this means that instances on the same power domain can be placed in different locations.
In the example present in figure 2.14, both modules A and B have the same power requirements,
22 Related Work
A B C
PD_A
PD_B
Figure 2.14: Example of power domains.
so they have been put together in power domain PD_A. As for module C, since it has a different
power requirement than A and B, it belongs to a different power domain.
Supply ports are connections for supply nets on hierarchical boundaries. Supply sets represent
a collection of supply nets. Supply switches control supply connections between supply ports.
2.2.5.2 Scope
The scope is the design hierarchy where the UPF commands are executed. Defining a scope
is particularly useful for a reusable power intent. Using the set_scope command will change
the current scope, and signals will be pulled from the current scope. It is possible to write UPF in
which the current scope is the same as the root scope, but small changes in hierarchy will imply
changing all of the UPF, as in a reusable UPF only the scope would need to be changed.
2.2.5.3 Power domains
A concept introduced with power intent is power domain. When a design is power aware,
modules belong to power domains. A power domain defines a set of rules for the modules that
belong to it. A design can have several power domains, each of which has its own independent set
of rules. A power domain can be switched off, or have a defined voltage. Power domains from the
same design can be in different states independently from each other.
This is the power domain definition present in the standard:
"power domain: A collection of instances that are treated as a group for power-
management purposes. The instances of a power domain typically, but do not always,
share a primary supply set. A power domain can also have additional supplies, in-
cluding retention and isolation supplies." [16]
The other components defined in the IEEE 1801 standard are usually associated with a power
domain. That applies to the retention strategies, isolation strategies, power switches and level
shifters.
2.2 Power Gating 23
Power domains are characterised by their power availability. A power domain that is not
switchable, and remains always powered, is said to be an always on power domain. Power domains
may also be characterised in relation to other power domains, this is, if power domain PD_A is
on when power domain PD_B is off, power domain PD_A is said to be relatively always on in
relation to power domain PD_B.
Three supply set handles are usually created with the power domain, primary, default_retention
and default_isolation. Extra supply sets handles can also be created with the -supply argument.
The power domain’s supply set handles default_retention and default_isolation are usually
associated with an always on supply set from the top power domain.
2.2.5.4 Isolation strategies
Powered off logical outputs can not be directly connected to active logic inputs, since values
are unpredictable they can cause incorrect readings and lead to unwanted behaviour. Isolation cells
exist to address this issue. They are placed on the border of the power domain and are responsible
for clamping the cell output. They can also be used together with level-shifters in a multi-voltage
design.
OFF ONIsolation
cell
Power
Management
Unit
Figure 2.15: Isolation cell between power domains
There are three types of isolation cells, according to their functionality, they can clamp to "0",
"1" or the last value. A simple AND gate can be used to clamp the signal to "0", as well as an OR
gate can be used to clamp it to "1". To clamp the signal to the last value before power down, a
more complex cell is used, consisting of a latch to keep state and a multiplexer, as can be seen in
figure 2.16.
Isolation cells are placed using the UPF command set_isolation. Depending on the
version of the standard used by the tools, it may be necessary to define an isolation control.
This is true for the IEEE 1801-2009 Std. version. For the newer versions of the standard, the
24 Related Work
Figure 2.16: State Retention Isolation. Source: [4]
set_isolation_control command has been superseded and all isolation information can be
defined in a single set_isolation command.
Isolation cells can either be inserted at the output of the gated power domain, or at the input
of the power domain that is connected to it. In case this second power domain is less on than the
first one, there may be no need for the insertion of the isolation cells. Both types of isolation can
coexist in the same design.
Isolation is only needed at either the input of the on power domain or the output of the power
gated one. Having isolation on both will create redundant isolation inserting more cells than the
ones necessary for the operation.
Isolating inputs of a power domain from the outputs a less active one is a way of ensuring
all the signals are isolated. Leaving nets from a power gated module without isolation may cause
incorrect behaviour of the system as well as sneaky paths for current to leak. When isolating
inputs, it is necessary to make sure that no always on cells are inserted before the isolation cells
by the synthesis tools.
Another option is to isolate the outputs from the power domain that is to be powered off.
Nevertheless, this would insert isolation cells in all the output ports of that power domain, some
of which may be connected to itself or a less on power domain.
Outputs from modules that connect to the same power domain do not need to be isolated,
although isolation cells are typically small, they introduce delays in the data-path. Manually se-
lecting each port that should or not be isolated is possible, but impracticable for large designs.
UPF already accounts for this, by using the -diff_supply_only switch when creating the iso-
lation rule, will prevent tools from inserting isolation cells for nets connected to the same supply
set. This however will also foreclose the insertion of isolation cells for output ports with hetero-
geneous fan-out, this is, that connect to both another power domain and itself.
2.2 Power Gating 25
Likewise the -diff_supply_only command, it is also possible to specify a source and/or
sink filter. This filter will only apply the isolation rule to nets that come from one of the source
supply sets and enter one of the specified sink supply sets. This is very useful when isolating
designs that have several power domains.
OFF ON
OFF
Figure 2.17: Power domain with heterogeneous fan-out
Using -diff_supply_only will however fail to create isolation cells in a domain port with
heterogeneous fan-out, like the one on figure 2.17 resulting in a warning message. This case is a
good example where isolation could be placed on the input of the ON power domain. It could also
be place on the output of the OFF power domain, but the second off power domain input does not
need to be isolated as it as the same power needs as the first one.
2.2.5.5 Supply sets
Supply sets are an aggregation of supply functions that together provide a complete power
source [16]. Supply sets provide a higher level of abstraction to the designer, replacing the need
of creating individual supply nets and supply ports. Supply sets have their implicit supply nets,
such as power, ground and well biasing. Supply sets provide the needed supply nets for modules
to operate. Explicitly created supply nets can be associated with an existing supply set via the
-function argument of create_supply_set command.
A power domain can have several supply set handles, which are then associated to supply sets.
Supply sets are usually associated with power domain’s supply set handles.
2.2.5.6 Retention strategies
When powering off some designs, there may be a need to keep some state. To keep state, some
registers need their value to be preserved when the module is turned off. There are several possible
26 Related Work
approaches to achieve this, either using retention registers, power islands or external memory.
Retention registers are made of two registers, the main register, for normal operation and the
shadow register. Shadow registers are less leakier but produce a big area overhead.
Figure 2.18: Retention register. Source: [4]
Another way to retain state is keeping the modules which contain the registers needed to keep
state in a different always on power domain. This technique is named power islands, due to the
fact that those modules will be in a different always on domain, inside a powered off domain. This
adds some complexity to the design, since the back-end designers will need to pull the supply rails
a module inside a powered on domain. This does not cause a considerable area increase, if any,
but it is not advisable in large areas with low activity, since we would be wasting an opportunity
to reduce leakage.
Retention may be one of the power gating components with major impact. Retention registers
can create huge area overhead if not planned carefully. The need for full state retention or only
partial retention should be taken into consideration for area optimisation and restore time reduc-
tion. If the system is able to recover from a power down with only partial state retention, this
becomes an attractive solution give the registers time overhead and size.
Low standby voltage is also a possibility, but this solution increases testing complexity since
it will require a multi-voltage design, as well as a library with cells able to operate on the specified
voltage range, from standby voltage to normal operation voltage.
2.2.5.7 Level Shifters
In a multi-voltage design, communication between modules that operate at different voltages
may cause reading errors or even damage the circuitry. To ensure the correct expected operation,
level shifters must be inserted in between those modules. Level shifters are gates responsible to
shift logical signals across different voltages. If two power domains use different voltage, level
shifters must be used to ensure the correct functionality of the system. Level shifters have a low
voltage and a high voltage side.
Level shifters can be of two types, low to high or high to low. As the name suggests, high to
low level shifters, shift from the high voltage to the low voltage and low to high ones shift from
low to high voltage.
2.2 Power Gating 27
2.2.5.8 Enable Level Shifters
In multi-voltage designs, ports on the boundary of power domains may need both level shifters
and to be isolated. In order to have lower area overhead, a single cell called enable level shifter
can be used instead of the isolation cell and level shifter.
Figure 2.19: Enable Level Shifter Example
2.2.5.9 Power Switch
The power switch is usually implemented in CMOS technology and consists in a transistor
between the power supply and the standard cells power input pins. The switch can be either
NMOS (footer switch) or PMOS (header switch).
Liberty libraries may have several different switch cells. Switch cells in the library may contain
several switches and are usually defined by their type. Switches types can be coarse grain or fine
grain. DC will select a switch able to carry the needed current for the on state. To force DC to
select a specific switch cell, the designer can mark all other switches as dont_use or dont_touch
and recompile the library.
Most switch related decisions are made by the back-end designer, so tampering with the library
may not be a good option. A single switch will usually not be enough to supply an entire power
domain, leaving to the back-end team the decision of selecting coarse grain switching or fine grain
switching and grid or array topology.
Switch cells have an output acknowledge port. The acknowledge port is usually connected to
the PMU to indicate that the power is now stable, or has been removed. This particular signal is
very important to avoid incorrect behaviours, if the PMU transitioned state based on a timer, since
small manufacturing process variations can affect wake up and shutdown times, it could transition
into an operative state before the power domain was actually operational, or even spend more time
than necessary waiting for power up.
28 Related Work
At the back-end phase, the decision of switch topology goes into the design is also made. Most
designs use coarse grain power switching because the reduced complexity in implementation does
not compensate for the increase in area.
It is up to the back-end engineer to introduce delays between the switches in order to avoid
large inrush currents, since this kind of analysis is not able to be performed at the synthesis level.
VVDD
ACK
VDD
Sleep
Figure 2.20: Header switch cell
The figure 2.20 represents a PMOS header switching cell. VDD represents the input voltage
from the power rail. VVDD is the virtual voltage supply that is to be input of the power domain to
be gated. The sleep signal is responsible for controlling the virtual supply rail. The acknowledge
port reports the power state back to the power management unit, with the help of a buffer.
2.2.5.10 Cell Location
UPF provides the option of defining the physical location for cell insertion. This is a somewhat
important decision since it will affect layout complexity. This decision is taken at the RTL level,
but it is important that the power architect is aware of the back-end flow in order to not difficult
the implementation. The cell location is defined by the -location argument present in the UPF
cell insertion commands.
Cells can be inserted on the power domain they belong, in the parent domain or even both.
When working on IP, to be integrated in other designs, it is useful to place the cells in the power
domain they belong to, since putting them outside will create a area overhead in the parent design
in relation to the predicted area of the IP. If the cells are inside the IP area estimation already ac-
counts for them. Cells located inside the IP also provide a more abstract model to the designer that
is going to integrate the IP, this way there is no need to worry with power intent since everything
is already implemented inside the IP, reducing verification and implementation times.
UPF related cells inside the power domain may however cause a more complex back-end
implementation. Isolation cells inside a gated power domain require a extra pg pin connection to
an always on net, to power the cell, since the primary power net will be shut off. This means an
extra power rail has to be pulled inside the power domain on the layout stage of design.
2.2 Power Gating 29
Inserting cells in the parent power domain may be a good option for internal power domains.
That means no extra supply rail needs to be pulled inside the power domain since it will be more
on than the one isolation cells are coming from.
2.2.5.11 Input vs Output Strategies
When creating isolation, level shifter or enable level shifter strategies, it is possible to chose
if that strategy applies to the power domain inputs, outputs or both. This is a quite an important
decision, since it may avoid uninsulated paths or redundant strategies.
As described in the isolation section (2.2.5.4), using the -diff_supply_only true switch
when defining an isolation strategy will not insert cells if the output has heterogeneous fan-out.
Instead, if that happens to be the case, it is better to define the strategy for the input port of the
active power domain, given it is the only power domain needing isolation or level shifting for that
signal.
OFF ON
OFF
iso_enable
Figure 2.21: Isolation on input of heterogeneous fan-out
Figure 2.21 is a good example where the strategy should be applied to the input, however,
in figure 2.22 it is the opposite. Since both power domains require isolation, because they are
active when the output of the first power domain is corrupt, it would be better to just isolate the
output of the first power domain. This represents an example of redundant isolation, and creates
unnecessary cells.
In figure 2.23 displays a situation when using output isolation would be the best option. The
output port connects to two power domains, and using input isolation would create an unnecessary
extra cell.
2.2.5.12 Power State Table
The power state table is a very important component to help verification. The power state table
has no physical implementation, that means it is only a table that defines all possible voltages that
30 Related Work
OFF ONiso_enable
iso_enable
ON
Figure 2.22: Redundant isolation
OFF ONiso_enable
ON
Figure 2.23: Output isolation
can be applied to the power domains. If during the simulation, a power domain enters in a state
that is not defined in the power state table, it is said to be in an illegal state and will trigger an
error, causing the simulation to fail.
The power state table (PST) can contain several possible states, and several supply sets. The
power architect should write all possible states for the power domains in the power state table,
although, it is also possible to have several power state tables in the same design. Having several
power state tables allows unrelated power domains to operate independently. All power domains
related should be included in the same table to catch bugs on the power intent.
Values on the power state table are real and define the voltage applied to the supply net. A zero
in the PST does not mean the net is off, it means the defined voltage is zero. Ground net when
defined as 0, it means the net is ON. Gated nets in the power state table are defined as "OFF".
In the example table 2.2 it is defined the possible states of two power domains, PDA and PDB.
This table has three possible states, PS_ALL_ON, PS_ALL_OFF and PS_LP_1.
2.2 Power Gating 31
Table 2.2: Example of PST
PDA.primary PDB.primaryState power ground power ground
PS_ALL_ON 1.0 0.0 0.8 0.0PS_ALL_OFF OFF 0.0 OFF 0.0
PS_LP_1 1.0 0.0 OFF 0.0
In PS_ALL_ON state, both power domains are on, PDA with 1.0V and PDB with 0.8V. Tools
when analysing the PST will notice this and check if level shifters have been inserted on connec-
tions between the two power domains.
The PS_ALL_OFF state is a state usually present in all PST, designs without it risk missing
states in the power up or power down sequence [7]. It is possible to observe that for this particular
design, header switching was chosen, since the supply net that is gated is the power one.
The last state, PS_LP_1, has one power domain active, PDA, and the other one power gated.
This means there need to be isolation cells from PDB to PDA. As the two power domains operate
at different voltages, enable level shifters should be used instead of both an isolation cell and a
level shifter.
From this power state table, it is possible to see that PDA can not be turned OFF when PDB is
ON. This situation creates a violation of the power state table and will cause the simulation to fail
with an illegal state.
32 Related Work
Chapter 3
Design Flow
This chapter summarises the design flow used in hardware simulation, verification and synthe-
sis and it introduces the necessary differences for a power aware flow.
UPF files are part of the design source. While HDL files are used to specify logic intent, UPF
files are used to specify power intent. UPF files are refined as they go down in the flow, and their
information grows as they get refined. They are inputs to the simulation tools, synthesis tools,
formal verification tools and place and route tools, the output is a new UPF file that should be
formally verified against the original one. This process is illustrated in figure 3.1.
Figure 3.1: UPF tool flow. Source: [5]
UPF files are created at the RTL level of the design and are synthesised with the HDL files for
logical verification. Then they are refined to better suit the needs during the consecutive phases.
On the final phase, together with power analysis, time analysis, validation, functional verification
33
34 Design Flow
are performed to ensure UPF did not affect the expected logical behaviour of the circuit. The
original power intent is kept from the start in order to be formally verified against the succes-
sive refinements to ensure consistency of power intent throughout the development. This original
power intent is referred as golden UPF.
Figure 3.2: Design flow for multi-voltage, power gated designs. Source: [4]
3.1 Flow Without Power
Since the team this project is being developed in at Synopsys is a front-end team, the flow does
not reach the place and route phase. RCE (Regression Control Environment) is the tool respon-
sible for building the working environment. RCE uses CoreConsultant, and CoreConsultant uses
CoreBuilder. CoreBuilder is responsible of preparing files for compilation according to the de-
fined configuration, that means removing pragmas and ifdefs, so that the output RTL lines up with
3.1 Flow Without Power 35
the configuration. CoreBuilder receives a TCL script as input that uses to build the configuration
intended.
VCS (Verilog Compiled code Simulator) is a functional verification tool, responsible for ver-
ifying the RTL against the test bench. VCS performs both compile and run-time verification. In
this project, the test bench has been developed using SystemVerilog, and the design intent Verilog.
When developing tests for the test bench, it is possible to enable wave dumps adding the command
to do so in the test file. VCS recognises that command and generate a VPD (VCD plus) dump file
with the wave forms generated from the simulation activity. This is a file format used with Syn-
opsys tools, the IEEE standard format for wave dumps is VCD (Value Change Dump). VPD files
can be easily converted to VCD, with vpd2vcd tool, if there is a need to use other industry tools.
To help understand undefined port states, a tool called Xprop can be run together with VCS.
Xprop propagates unknown port states across the design. Unknown port states are easier to debug
at the RTL level because the descriptions are closer to the design intent. Xprop is useful to find
the origin of the unknown signal, reducing debugging time.
Wave analysis is a good last resource to catch design errors, incorrect protocol implementa-
tions that may have escaped from the test and find the signal or sequence responsible for a test
failure. DVE (Debugging and Visualisation Environment) is used to visualise the waves. DVE
allows the designer to view code, and points to the source of a signal when double clicking on it.
Another useful feature is hierarchy visualisation, as it is possible to view the modules location,
as well as parent and child instances. In DVE it is also possible to visualise schematics and trace
back signals, this has been very useful to find the source of incorrect behaviour.
After inspecting the simulation results, it is necessary to generate a SAIF (Switching Activity
Interchange format) file with the activity. The SAIF file will later be used by DC to map names
on the netlist, which is essential for PrimeTime to perform power analysis. To get the SAIF file
from the simulation, it is necessary to convert the VPD dump file from VCS to VCD with vpd2vcd
tool using the +includemda switch to include multidimensional arrays. It is possible to select a
power interval, but any time interval will work since the file will only be used for name mapping.
Then the VCD file is post processed with the vcdpost utility. This ensures unique identifiers codes
for nets and registers.
After obtaining the post processed VCD file, running it through vcd2saif generates the SAIF
file. The switches -top and -instance are used to define the top module and instance. This
is particularly useful for removing test bench instances and test modules from the activity file, as
they are not synthesised and therefore not necessary for the process.
3.1.1 Synthesis
CoreTools are also used to generate the workspace used by the synthesis tools. The tool used
to perform synthesis is Design Compiler (DC). Synthesis consists in generating a netlist based on
the verilog logical description of the circuit. Synthesis maps the verilog functions to standard cells
from the given libraries, resulting in a functional netlist able to perform the intended operations.
36 Design Flow
Synthesis tools are driven by TCL scripts, previously written to guide the synthesis process
and provide options for optimisation. The SAIF file extracted from the simulation is now added to
the workspace in order to be used by DC for name mapping. Name mapping is an optional activity
that needs to be added to the existing scripts and consists in creating a new file containing a map
of names between the RTL code from the simulation and the netlist generated by DC. This name
map will later be used by Primetime to annotate activity from the simulation to the netlist during
power analysis.
To perform synthesis, DC needs to be provided with libraries. The type of library used in this
case is liberty. Liberty is a library standard in the VLSI industry used to describe standard cells.
Liberty defines power pins, and logical pins, timing performance, as well as cell power consump-
tion and function. Liberty libraries may have one or several operating conditions for which its
cells attributes are characterised. Operating conditions include process variation, temperature and
voltage.
Power compiler is an integrated extension of DC used to minimise power consumption. Power
compiler also allows for concurrent timing, area and power optimisation [17]. Power Compiler
uses multi-corner multi-mode optimisation.
DFT (Design-for-test) Compiler is responsible for last stage, the insertion of the scan cells.
DFT Compiler also tries to repair DRC (design rule check) violations at the gate level. There is
also some optimisation of area and timing at this phase.
Formality is used to formally verify the equivalence between the RTL logical intent and the
synthesised netlist. Formal equivalence checking is used in the EDA industry to validate the
behavioural equality between two representations of the same circuit. Formality is used in this
case to compare the verilog logical intent against the synthesis generated gate level netlist.
3.2 UPF flow
The UPF flow works similar to the usual design flow, but with increased complexity. Extra
tools are needed to analyse the power intent and apply new signal constraints, such as power down
corruption. This added complexity increases simulation, development and testing times.
For power aware simulation, VCS needs to be run in MVSIM (Multi-Voltage Simulation)
NLP (Native Low Power) mode. Normal simulation assumes that an always on constant voltage is
provided to the chip, which is not true if there is power gating or multi-voltage implemented on the
design. For power gating effect simulation, MVSIM corrupts signals when in low power mode.
Logical outputs are now also dependent on the supply state, the lower the voltage, the slower will
be signal propagation. Since DVS (Dynamic Voltage Scaling) was not implemented due to the IP
complexity, different voltages were not simulated.
MVSIM checks for the correct transition of power states and compares them with the power
state table to ensure that there are no illegal transitions. The correct implementation of the power
control sequence is also checked, which helps catching low power bugs early in the design. Isola-
tion and retention strategies are also checked to ensure their correct behaviour and implementation.
3.3 Power Analysis 37
DVE has enhanced signal visualisation for power aware simulation wave dumps. Corrupted
signals due to power down will be displayed differently for easy identification and not to be con-
fused with signals in unknown state due to logical errors.
3.2.1 Voltage Aware Synthesis
Complexity also increases for voltage aware synthesis. A library with low power kit is needed
in order to map all the new cells introduced by the UPF files. DC has to insert power switches,
isolation cells, level shifters and retention registers according to the power intent. Those cells
usually are marked only for area optimisation, so that DC does not replace them with normal cells.
Before synthesis, it is important to check the library that is going to be used for the presence
of the cells necessary for power gating implementation. Some vendors may mark them as "don’t
use" or "don’t touch", in which case, synthesis tools will ignore those cells and introduce GTECH
(General Technology) cells.
GTECH cells are part of a generic library used to map cells that are not available for DC
from other libraries. GTECH cells can not go into production and should not be present on final
designs. GTECH cells have generic characterisation, translating into incorrect power estimations
due to their big difference from silicon.
Depending on their location, power gating cells may need to have dual supply rails, one is the
power domain supply and the other one the always on supply, in order to ensure always on cells
remain powered during low power mode.
Power aware synthesis requires the power net voltage to select the cells from the library. Lib-
erty cells are designed to operate at a designated voltage, in order to select which cells it will
use, DC checks in the UPF files for the defined voltage of each power domain. In case of a
multi-voltage design, DC will insert cells from different libraries for the different power domains
according to their defined voltage. DVFS (Dynamic Voltage and Frequency Scaling) designs re-
quire cells capable of operating in the voltage range used in the dynamic scaling.
DC will sometimes flatten the hierarchy to perform optimisations, which may not be problem-
atic, but may difficult power analysis. For the case of analysing a specific wrapper consumption,
it is desirable to keep the hierarchy as defined in the logical intent. To achieve this, it is necessary
to force DC to keep hierarchy with the -keep_hierarchy switch statement on invocation.
3.3 Power Analysis
After synthesis, it is important to analyse the synthesis reports for violations and errors. The
cock tree has to be declared as an ideal network as it is a high fan-out network and will not be
optimised at this design stage. The clock tree is declared in a script that serves as input for DC,
and will not be synthesised.
PrimeTime PX is an extension of PrimeTime for power analysis, and is the tool used for
power analysis at the netlist level. PrimeTime will execute normal TCL commands, that allow it
to execute an already written script, instead of manually typing every command in.
38 Design Flow
Running pt_shell at the command line will execute Primetime. The easiest way to obtain
several results from different part of the same simulation is to create a TCL script and execute
it with Primetime, since the flow will be the same, only the input activity file changes. To run a
script with Primetime, the -f <script file> switch is added at invocation.
First, it is necessary to load the libraries used in the synthesis. To do that it is enough to set the
target_library, link_library and search_path. The
It is necessary to make Primetime enter PX mode, enabling power analysis. To enable power
analysis it is necessary to set the power_enable_analysis variable to true. Next it is neces-
sary to set the power analysis mode. In the scope of this work, average power is the one that is
important to analyse. Average power is helpful to analyse energy consumption, which is specially
useful to estimate battery life.
Next it is necessary to pass the netlist to Primetime, using the command read_verilog
<path to netlist>. Primetime needs to know which design it is working with, to perform
that, the current_design <top instance> command is entered.
Since libraries may contain several corner cases, it is necessary to specify the one that will be
used for the power analysis. With this information, Primetime is able to select cell consumptions
and timing corners for a given voltage and temperature.
Next it is necessary to read the power intent. The power intent comes from the UPF file used
during simulation. These files have also been imported by the synthesis tools to insert the special
cells necessary for power gating. The root scope in which the power intent will be executed is the
current design, defined already.
Since wires parasitic effects can’t be ignored, they are must also be taken into account when
performing power analysis. The wire parasitics depend mostly on back-end implementation, but
synthesis results provide an estimation of their effect sooner in the design. Parasitics are read
from a Standard Parasitic Exchange Format (SPEF) file, which is an IEEE standard for parasitic
representation of data wires in ASIC development flow [18].
Design constraints are loaded from the SDC (Synopsys Design Constrain) file, and analysed.
The SDC file defines timing constraints and domain voltage definitions. The SDC file contains the
clock definitions, as well as some other networks that are defined as ideal, since they will later be
optimised at the place and route phase, during clock tree synthesis (CTS).
After this process, running update_timingwill instruct Primetime to take the input files and
configuration previously defined and start analysing the design. Activity has not been provided yet.
It is possible to do a power analysis based on a specific expected operation of the design. This
expected operation comes from a simulation, by providing both the name map file, created during
synthesis and the switching activity file, either a VCD or a SAIF file.
Two important reports are the power report and the switching activity report. The power report
provides an estimation of power consumption, discriminated by module, which is a good way of
checking power budget and power savings across different implementations.
The switching activity report can be used for debugging. If something is not correctly imple-
mented, weird results will appear in the switching activity report. The switching activity report
3.3 Power Analysis 39
shows the design activity according to logic type.
This type of power analysis will not take the clock tree into consideration as it has not yet
been synthesised. At this stage, clocks are considered to be ideal networks. Clock tree synthesis
will provide better optimisation for high fan-out networks, such as the clock tree. CTS is also
important to minimise clock skew and ensure proper clock distribution and a balanced clock tree.
40 Design Flow
Chapter 4
Implementation
The design used in this implementation is proprietary and confidential, given that fact, some
explanations will be reduced to a minimum necessary for understanding implementation choices.
The implementation focus mainly on power gating, using the IEEE Std 1801. The tools used
for this implementation support the IEEE Std 1801-2009 version of this standard. This is not the
last version of the standard, the current active version of the standard is IEEE Std 1801-2015.
This chapter explains how power reduction was implemented on the design, as well as how
the guidelines were defined. These guidelines provide some guide points for easier power gating
implementation, since it can become a difficult job, and adds complexity to design testing.
4.1 Steps taken
The specific sub-design studied is part of a bigger design that communicates with it. In the
figure 4.1, it is possible to view a summary of the IP. The module used for this implementation
is represented as eDMA (Embeded Direct Memory Access), and is composed by a couple of sub
modules. In summary, it is divided in two channels, and some arbitration logic. The write channel
generates mainly traffic directed to the core module while the read channel will generate traffic
mainly for the application module. The eDMA, as the name explicits is a feature for direct memory
access that will offload the core processor to do other tasks while it sends information from the
memory to the application. There is also traffic directly from the core module to the application
and vice versa. Arbitration logic is responsible for selecting the source of traffic to the receiving
modules.
There is some common logic used by both channels and register configuration. The arbitration
logic must be kept powered on even when the remaining blocks in the eDMA are not being used.
Common logic must be powered on when there is an access to write or read from it’s configuration
registers, and be kept on till the eDMA is disabled.
The design has already some power intent implemented. The implementation consists on
a switchable power domain (PD_VMAIN_SW), an always on power domain (PD_VAUX) and
some power islands for state retention. The IP entry in low power state involves fairly complex
41
42 Implementation
Core Application
Data
Data
Data
Data
eDMA
Read
Write
Figure 4.1: Basic representation of the module used.
negotiations, but it is outside of the scope of this work. For a flawless integration of new power
intent onto a design that already has some implemented, the system must be extensively tested to
avoid power down bugs.
On a first approach, it was decided to create a new power domain to gate the less active modules
of the eDMA. Modules that were not used during core to application and application to core traffic
were selected and put inside that power domain. This involves some knowledge of the design and
some trial and error technique, as well as some traffic analysis. The eDMA module is already
integrating part of the switchable PD_VMAIN_SW power domain, making it necessary to test the
full design instead of isolation the eDMA module.
One of the output signals from the eDMA is necessary for the configuration of the application,
and its values may change during run-time. This means that isolation can not be stuck at either "0"
or "1" via simple and or or isolation, as its value may change and cause inconsistencies between
the isolation value and the actual value. This will cause collisions on transactions. To address the
problem, isolation latch cells have been used. However, the libraries available do not possess those
cells, which resulted in GTECH cells insertion.
In an effort to avoid the insertion of GTECH cells, and based on the module that requires
latching isolation cells power consumption, it was decided to remove this module from the power
domain. Its power consumption is relatively low, so the impact of removing it from the power
domain is negligible.
Upon inspection, it was possible to conclude that the eDMA block had no necessity of saving
its state when entering low power. This is because the software already configures registers each
time the module is reactivated.
The control signals for the power domain need to be controlled from a power management
unit. Since there is no need for register retention, this power management unit is also simpler. The
power management unit is a design module and will be explained in more detail on subsection
4.3.
4.2 Obstacles 43
Table 4.1: EDMA power state table
PD_VDMA_SWState power ground
PS_ALL_ON 0.8 0PS_ALL_OFF OFF 0
The PST for this implementation (4.1) is a simple two state table. The PD_VDMA_SW power
domain is either on, at 0.8V, or gated off.
After some analysis, it was possible to conclude that the modules belonging to the read channel
were not necessary for write channel traffic formation and vice versa. On a more aggressive power
saving approach, it was decided to create new power domains for each channel.
There are three different power requirements, making it a good choice to use three power
domains to further reduce static power consumption during read and write operations.
It was possible to take advantage of signals present in the design to enable the read and write
channel independently to control ther PMUs. The PMUs work in parallel to control the power
domains independently from each other. This is further explained in section 4.4 as it is the imple-
mentation with the best results.
4.2 Obstacles
Large IP designs take very long to simulate and even more to synthesise. Even small errors in
the process can cost a lot of time. Since each modification requires the test of the design to ensure
functionality remains as expected, even small improvements and features require new simulation
and, since the design changes, a new synthesis is also required. Synthesis with power, for such big
designs may even take a day or two.
It took some time to understand that the tools installed for synthesis do not yet support the
IEEE 1801-2013 standard, but support the old IEEE 1801-2009. This caused DC to not recog-
nise isolation cells due to some new commands not yet supported. The actual active standard is
IEEE 1801-2015, but the industry always takes some time to support standards, and the version
supported by the available tools is the IEEE Std. 1801-2009.
Since the libraries do not have isolation latch cells, it was necessary to remove the module from
power domain, because synthesis would not introduce isolation cells on that particular path. When
cells are not present on a library, synthesis tools will introduce GTECH (Generic Technology)
cells, these cells are present on a generic technology library, and are will cause incorrect power
and area estimations. Avoiding the insertion of GTECH cells provides a more accurate power
estimation, since parameters are usually well defined on a technology library.
44 Implementation
4.3 Power Management Unit
The power management unit (PMU) is a very important component in a design with power
gating. The PMU is responsible for controlling all the power related components in the design.
Due to the fact that no state retention was necessary in the scope of this project, the power man-
agement unit is a very simple state machine with eight states, that can be reused in most power
gating implementation, as long as there are no state retention registers. For a design with power
off retention necessities, two states, save and restore, have to be added to the state machine. If
power islands are used for state retention, this state machine will be enough though.
This power management unit interacts with the clock and reset control block, that has the
ability of providing different clock and reset signals. It controls clock, reset, power switch and
isolation enable signals. The clock and reset control block was modified to have extra clocks and
extra reset signals for the power domains created. The control block provides synchronous reset
on request and an acknowledge signal that the PMU uses to change state. It also provides the clock
signal when requested, but since this is a test, non-synthesisable module, it has to be implemented
by the client to support these control signals and provide the correct clocks and reset.
The state machine starts in the idle state, assuming the chip has power on reset. The idle state
has the isolation enabled, power switch disabled and no clock. The power domain controlled by
the PMU initiates in the off state, with isolation enabled to prevent unknown signal propagation.
The clock request signal is also disabled to save dynamic power. The enable signal is responsible
for triggering the wake up process. In this design, the enable signal is an or combination of several
signals that are asserted when an operation from the module is required. In a different design, this
could be an internal signal from another module, requesting the gated design to wake up.
In the wake_up state, the power domain is powered up, however reset is kept low and the
isolation enabled. There is no clock signal either. This state waits confirmation from the power
switch, ensuring power is stable.
When the power is stable, the PMU enters the deisolate state. This state only takes one clock
cycle and the main function is to release the isolation. It also asks for the clock signal to the clock
and reset block, since it take one clock cycle to arrive.
The clk state is where the power domain gets its clock, and reset is released. When receiving
an acknowledge signal from the clock and reset control block, the state changes into active.
The active state is where the power domain is fully on and working as if there was no other
logic than the one described in the logical intent. The PMU remains in this state until the enable
signal is deasserted.
When the enable signal is deasserted, the PMU changes into the gate_clk state where it gates
the clock and asks for a reset, this way the design goes into a known state.
After a clock cycle, it enters into the isolate state. In this state, isolation is enabled and after
one clock cycle, it enters the gate_power state.
The gate_power state turns off the power switch and waits for an acknowledgement from the
switch confirming power removal. After receiving the acknowledgement, it transits back into the
4.4 Final Result 45
idle state, waiting for a new enable signal.
If the power domain is needed in the gate_clk or isolate state, the PMU will jump into it’s
counterpart state, to prevent it from entering in low power and causing a big time overhead.
IDLE
WAKE_UP
DEISOLAT E
CLK
ACT IV E
GAT E_CLK
ISOLAT E
GAT E_POWER
enable
pwr_ack
rst_ack!enable
!pwr_ack
enable
enable
enable
!main_rst_n
Figure 4.2: Function State Machine.
An improved version of the PMU unit has been implemented, but not fully tested due to the
traffic profiles used by the system. The improved PMU contains a timer in the ACTIVE state.
This timer prevents the block from entering in low power unless there has been no activity for the
during its defined timeout. The objective of this is to filter out sequential traffic, preventing the
system from constantly shutting down and powering back up at each consecutive transaction. The
timer is configurable by software.
4.4 Final Result
Since the IP power intent is either full on, or in a low power mode that gates the hole system,
it is possible to reduce static power consumption even further by gating modules that may not be
used when the IP is in not in the power down mode.
The final implementation consists of three power domains, one for each power requirement,
PD_VDMA_RD_SW, PD_VDMA_WR_SW and PD_VDMA_SW The first power domain in-
cludes the modules that make up the read channel, the second one includes the modules from
46 Implementation
write channel and the last one is composed by the common logic and configuration registers.
There is a module created for power management. It consists in a simple eight state, state
machine. This module is replicated three times, one for each power domain. The difference is the
signals used to activate the transition of the state machine to the on state. The three PMUs work
in parallel controlling the power domains independently.
Table 4.2: Edma power state table
PD_VDMA_SW PD_VMDA_RD_SW PD_VDMA_WR_SWState power ground power ground power ground
PS_ALL_ON 0.8 0 0.8 0 0.8 0PS_ALL_OFF OFF 0 OFF 0 OFF 0PS_WR_ON 0.8 0 OFF 0 0.8 0PS_RD_ON 0.8 0 0.8 0 OFF 0
The table 4.2 represents the power state table defined in the implementation. It is composed
by four states: PS_ALL_ON, PS_ALL_OFF, PS_WR_ON and PS_RD_ON. Each of these power
states represents a possible state the power domains could be in. The defined operating voltage is
0.8V because it is the voltage cells from library used in the implementation work with.
From the PST it is possible to observe that PD_VDMA_SW could never be gated off when
either of the other two power domains are active. This comes from the fact that PD_VDMA_SW
contains common logic necessary for both read and write traffic operations, as well as configura-
tion registers.
The power states PD_WR_ON and PD_RD_ON are the states used during exclusive write or
read operations, respectively. Those are the power states that take advantage of different power
necessities from the read and write channels and grant some extra power savings.
4.4.1 Power Architecture
All the power domains created are dependent of PD_VMAIN_SW, the main power domain of
the IP, since all supply sets are connected to PD_VMAIN_SW primary supply set.
As it is possible to observe from figure 4.3, each power domain has its own switch, controlled
from its PMU. The pm_en_sw signals are outputs from those power management units, responsi-
ble for the control of the power switch.
4.5 Verification
Testing is a very important and time consuming task in the VLSI industry. To ensure the correct
implementation of power gating in this design, it had to be tested. For the IP, one of the verification
solutions is a test bench named VTB (Verification Test Bench), based on the Universal Verification
Methodology (UVM). UVM is an accellera standard to enable reuse of verification environment
and Verification IP (VIP) [19]. UVM is implemented on top of SystemVerilog (IEEE Std. 1800),
a IEEE standard for hardware design, specification and verification language, a commonly used
4.5 Verification 47
PD_VMAIN_SW
PD_VDMA_SWPD_VDMA_RD_SW PD_VDMA_WR_SW
pm_en_rd_sw pm_en_swpm_en_wr_sw
Figure 4.3: Block representation of the power domains
language for verification. SystemVerilog is very similar to the Verilog HDL, but it also has some
object oriented properties.
VTB has a set of tests, used to exercise and test different interfaces and functions of the IP.
Some of the tests are used to exercise the eDMA block, with different traffic profiles.
Although there are already many tests, none of them take power into account. To ensure the
fully functionality of the block, it is necessary to test the entrance in low power mode as well as
the exit sequence from it. For that, the conditions to enter in low power must be met during the
test. The correct behaviour of the remaining system must also be tested when the module is turned
off.
In order to test the correct behaviour of the implementation, two already existing tests were
modified. One of the tests generates traffic from the eDMA block, both read and write traffic. The
other one sends generic traffic between the core and the application. With those tests, three new
tests were created. The difference between these three tests is the type of traffic generated by the
eDMA. The first test will generate read traffic, the second write traffic and the last one both read
and write traffic.
The test sequence is the same for all of the three tests. The eDMA starts powered off, then it
wakes up because of the configuration process. After the initial configuration, the test will force
generic (core/application) traffic in parallel with eDMA traffic. The eDMA traffic will be write
traffic, when it is generated in the eDMA write channel, read traffic, when generated in the eDMA
read channel, or both, depending on the test used.
After the transactions are complete, the test disables the eDMA block, by deasserting an in-
ternal enable signal, that is accessible to software for chip implementation. Then it proceeds to
send generic traffic, this helps testing the chip functionality with the eDMA module in low power
mode. At this stage the eDMA has been turned off by hardware via the PMU, since it detected no
activity. When generic transactions finish, ensuring proper low power operation, it is necessary to
48 Implementation
test if the eDMA is able to recover from low power and will work as expected. This also helps
testing the correct power up sequence and correct reset of the modules. Since the eDMA operation
consists on, as it’s name explicits, accessing memory, the test will need to program its registers
specifying the amount of data it should transfer and its location on memory. Due to confidentiality,
it is not possible to enter into eDMA configuration details.
The difference between the three tests provide a point to analyse power savings for each type
of traffic. The results will be a good comparative measure for the validation of the three power
domains solution.
When writing tests for power gating, it is important to also have a top power domain at the
root scope that contains all of the design. This power domain will emulate all power components
placed outside of the IP. If the power architect decides to use an external switch, it should put it in
this power domain. The top power domain is optional and is not synthesised.
Since IP normally is bought by other companies to integrate in their designs or SoCs (Sistem
on a Chip), the top power domain can also be useful to simulate the behaviour of the clients power
intent. It is important, when writing specifying power intent that it integrates with power intent
implemented at the chip level.
Chapter 5
Results
The IP used in this project is a highly configurable design that supports different data-path
size, different number of channels for a given instance and other parameters. The results were ob-
tained using a single configuration, well defined throughout the process. From this case study was
possible to establish some base guidelines for power gating implementation at the RTL stage of the
design process. These guidelines have the purpose of simplifying power gating implementation
when the power architect has little knowledge of the design to be optimised.
It is important to make a clean and correct power intent, since it should be understandable by
the back-end team for proper implementation.
5.1 Power Reduction Outcome
The technology node chosen for this implementation was 28nm. This technology node is
not the smallest one available, but it is currently being used by the industry. From the available
libraries, it was the only one that contained standard cells for power gating implementation.
Due to IP complexity and the time necessary to run the whole power characterisation flow, it
was mandatory to select a single corner for power analysis. The chosen corner was 125oC from
the 28nm technology node library. The available temperature corners were 125oC, 0oC and -40oC,
being the 125oC the one with the worst leakage.
This implementation proved to be efficient in reducing static power consumption, and it also
had quite an impact in total power consumption, mainly due to clock gating, since dynamic power
consumption has a bigger slice of the total power consumption.
It was not possible to perform a good area overhead evaluation. This comes from the fact that,
in the flow used for this implementation, area will vary in each synthesis, even using the exact
same design.
In one of the synthesis, it was possible to observe an increase of 2% of the total design area,
comparing to the original design with no power gating. This is not a significant area increase, but
as stated before it is also not a very accurate measurement.
49
50 Results
Since clock gating has also been implemented together with power gating, the area impact
is smaller. Due to register optimisation, clock gating is a technology that reduces area when
implemented.
Power reports at this level in ASICs design flow are not accurate, but are a reference for the
design power trend, becoming a good indication of possible savings.
During activity there is a negligible increase in power consumption, due to extra logic from
power gating and the power management units. The following tables reproduce the results from
power analysis. Current is the current design before power gating implementation on the eDMA
module. 1 PD is the solution with a single power domain for the eDMA module and 3 PD is the
final solution with independent channel power gating.
Table 5.1: Power during activity.
Full TrafficPower (µW) Dynamic Static Total
CurrentIP 3.29E-02 9.81E-03 0.19
DMA 4.18E-04 1.30E-03 3.11E-02
1 PDIP 3.33E-02 9.95E-03 0.196
DMA 0.00062 0.00142 0.0317
3 PDIP 0.0333 0.00995 0.196
DMA 0.00062 0.00142 0.0317
Table 5.2: Power consumption related to current implementation, during activity.
Full TrafficPower (µW) Dynamic Static Total
1 PDIP 101% 101% 103%
DMA 148% 109% 102%
3 PDIP 101% 101% 103%
DMA 148% 109% 102%
As it can be seen from the relative values presented on table 5.2, the increase in power con-
sumption for both single power domain and triple power domain is about 3%, which is not a bad
price to pay for the reduction provided during no activity. The results for the read and write sim-
ulation are very similar, due to the fact that logic is similar and the time interval chosen to extract
activity had similar traffic characteristics.
Results are different for the full simulation, where it is clearly noticeable that the three power
domains solution is much more effective on cutting down power consumption, especially if there
is only read or write traffic. Table 5.6 presents the relative power consumption of both solutions.
It is noticeable that the one power domain solution ended up increasing power consumption, this
is due to the fact that when a single channel is used, both channels are powered on, and remain on
until no channel is needed, as with three power domains, one channel will remain powered unless
there is traffic on both directions.
5.2 Guidelines 51
Table 5.3: Power consumption during full simulation.
Full TrafficPower (µW) Dynamic Static Total
CurrentIP 2.04E-02 9.75E-03 0.123
DMA 4.52E-04 1.29E-03 1.89E-02
1 PDIP 1.85E-02 9.89E-03 0.113
DMA 0.00063 0.00142 0.0176
3 PDIP 1.59E-02 8.51E-03 9.37E-02
DMA 5.00E-04 5.00E-04 8.80E-03
Table 5.4: Power consumption for a full simulation, write traffic.
Write TrafficPower (µW) Dynamic Static Total
CurrentIP 1.80E-02 9.72E-03 0.107
DMA 4.35E-04 1.28E-03 1.62E-02
1 PDIP 1.71E-02 9.87E-03 0.103
DMA 0.00061 0.00141 0.0161
3 PDIP 0.0153 0.00839 0.0868
DMA 0.0005 0.00038 0.0059
Table 5.5: Power consumption for a full simulation, read traffic.
Read TrafficPower (µW) Dynamic Static Total
CurrentIP 1.81E-02 9.72E-03 0.108
DMA 4.36E-04 1.28E-03 1.64E-02
1 PDIP 1.80E-02 9.87E-03 0.109
DMA 6.20E-04 1.41E-03 1.70E-02
3 PDIP 0.0155 0.0084 0.0885
DMA 0.0005 0.00039 0.0066
Table 5.6: Relative power consumption for a full write simulation.
Write TrafficPower (uW) Dynamic Static Total
1 PDIP 95% 102% 96%
DMA 140% 110% 99%
3 PDIP 85% 86% 81%
DMA 115% 30% 36%
5.2 Guidelines
From all the simulations and analysis made during the development of this dissertation, it was
possible to define some guidelines, useful for future power gating implementations and automation
of the power gating process.
First it is necessary to identify the modules that are able to be turned off. This can be achieved
52 Results
by analysing the modules’ activity from the waves provided by a previous simulation. If large
periods of activity are detected, the module becomes a good candidate for power gating.
Making a preliminary power analysis allows to know the current consumption of the modules.
By analysing the module’s leakage power consumption, it is possible to conclude if it is worth
creating a power domain for it or not. If the module has high leakage power, but at the same time
also has great activity, it is worth looking into it and partition it according to activity requirements
of its inner logic.
After selecting the candidates, it is important to group them. In a design, modules are related
to each other, which means, a couple of modules activity will be dependent from another modules’
activity, so, if the last module is not performing any function, all of the other related modules can
also be powered off. By grouping them into a single power domain, it will reduce the amount
of extra logic created from power gating. A good way to group modules together is creating a
wrapper. Although not necessary, it will ease the implementation.
If it is not possible to create a wrapper due to logical hierarchy, or design complexity, it is still
possible to implement power gating. Instead of the power domain being composed by the wrapper,
it will be composed from the independent modules, however this may raise complications in the
back-end phase, if a disjoint power domain has to be created. For new designs it is helpful to take
power hierarchy into consideration when designing the logical hierarchy of the system.
Having the modules selected, the next phase will be implementation. To implement power
gating it is necessary to understand the basics of the design it will be implemented on. If the
design requires state retention, a retention strategy has to be defined.
After deciding the need of state retention, it is necessary to define the signal that will enable
the power domain to turn on. Several power domains can also be an option, in that case, an enable
signal has to be selected for each of them. The enable signal can be a simple signal from the
parent module or a logical function from several signals. If two power domains have the same
enable signal and the same voltage, they can be grouped together, because they have the same
power requirements. The enable signal should be active during the whole activity phase of the
power domain.
Now that all the decisions are made, it is helpful to create a power intent diagram, this way
writing the UPF code will be easier.
5.3 Alternatives
Another power management unit can be designed, but some considerations should be kept in
mind. The wake up/power down and isolation signal sequences are important to avoid behavioural
errors. The power should only be removed after signals are isolated to avoid sampling of corrupted
signals. Isolation should also only be removed after power is stable, for the same reason.
The clock gating and restoration may depend on the design architecture, but removing the
clock when going into low power will reduce dynamic power.
5.3 Alternatives 53
When using retention the PMU flow may differ from the one presented above 4.2. Flows with
full state retention may not require to be reset, but could be a good measure to reset them anyway,
to ensure no logic is corrupted. However, it is important to respect the save/restore sequence. The
reset signal should be applied before restoration, otherwise, data would be lost. The save operation
should also be applied before the reset signal. The isolation should also only be lifted when the
restore operation is concluded.
54 Results
Chapter 6
Conclusion
Reducing power consumption is a concern that has been growing through time. It is important
to consider power in digital circuits design, given the problems introduced in this document and
considering the pollution generated by some power stations.
Power aware implementations are important for saving energy, specially on battery operated
devices. Technology advancements come with advantages and disadvantages. Smaller transistors
can be operated at lower voltages lowering effectively dynamic power consumption, that is highly
voltage dependent. It also allows more complex circuits with more logic at lower prices. Although
dynamic gets lower, static power increases due to lower threshold voltages on smaller transistors.
EDA industries also develop power saving solutions that can be applied to the design on a
high level of abstraction and provide good trade-offs. These power saving solutions are great
since reducing power consumption also reduces heat dissipation, decreasing the need for cooling
systems and consequently lowering devices cost. It even reduces thermal stress of components
increasing their life and reducing thermal related effects on transistors.
The PMU is very versatile and could be used for other retention-less implementations, by
identifying and selecting a good enable signal.
The amount of power the power architect will be able to save when implemented power gating
depends on the approach used. A more aggressive approach is able to save more power, but
requires a high understanding of the design itself and may become more complex. A more complex
implementation may save more power but the more complex is the implementation, the longer it
takes to implement, test and debug.
6.1 Future Work
This section presents possible future to help automate power gating implementation. The idea
is to create a power partitioning tool composed by several scripts with well defined functions. This
tool would be based on the guidelines studied in this dissertation and apply them automatically to
a system, with reduced human interaction.
55
56 Conclusion
One of the scripts has to be capable of reading a verilog module and analyse its processes. If
the module is composed by several processes independent from each other, the script will extract
this processes and promote them into new modules, this way they could be used in the power
intent.
Another important script receives several modules as inputs and places them inside a wrapper
module. The wrapper module will contain only inputs and outputs that are connected to modules
outside of itself, reducing the overall number of ports, and therefore simplifying isolation and level
shifter strategies.
The last script from this power partitioning tool evaluates switching activity and selects the
modules that are good candidates for power gating. Then, with that information it would construct
the power intent for those modules. The script should allow user introduction of the enable signal
as well as retention registers, since those two constraints require some knowledge of the design
itself.
Another important work to do in the future is to characterise the savings provided by the addi-
tion of the timer to the PMU’s state machine. This will require the implementation of power gating
into a new module, that has subsequent transaction requirements, with help from the guidelines
deduced during this dissertation.
References
[1] Advanced Low Power Techniques, May 2016. URL: http://www.synopsys.com/Solutions/EndSolutions/advanced-lowpower/verification-lowpower/Pages/advanced-low-power-techniques.aspx.
[2] K. Roy, S. Mukhopadhyay, and H. Mahmoodi-Meimand. Leakage current mechanisms andleakage reduction techniques in deep-submicrometer cmos circuits. Proceedings of the IEEE,91(2):305–327, Feb 2003. doi:10.1109/JPROC.2002.808156.
[3] Sushma Honnavara-Prasad. System level power with ieee1801, 2015. URL: http://systempower.org/wp-content/uploads/2015/04/1801_Sushma.pdf.
[4] Michael Keating, David Flynn, Rob Aitken, Alan Gibbons, and Kaijian Shi. Low PowerMethodology Manual: For System-on-Chip Design. Springer Publishing Company, Incorpo-rated, 2007.
[5] IEEE P1801 Working Group. Ieee standard for design and verification of low-power inte-grated circuits. IEEE Std 1801-2013 (Revision of IEEE Std 1801-2009), pages 1–348, May2013. doi:10.1109/IEEESTD.2013.6521327.
[6] M.C. Schneider and C. Galup-Montoro. CMOS Analog Design Using All-Region MOSFETModeling. Cambridge University Press, 2010. URL: https://books.google.com/books?id=SDPG0Lz39HcC.
[7] S. Jadcherla. Verification Methodology Manual for Low Power. Synopsys, 2009. URL:https://books.google.pt/books?id=qz2NYgEACAAJ.
[8] T. Hattori. Challenges for low-power embedded soc’s. In VLSI Design, Automation andTest, 2007. VLSI-DAT 2007. International Symposium on, pages 1–4, April 2007. doi:10.1109/VDAT.2007.373214.
[9] F. Bin Muslim, A. Qamar, and L. Lavagno. Low power methodology for an asic design flowbased on high-level synthesis. In Software, Telecommunications and Computer Networks(SoftCOM), 2015 23rd International Conference on, pages 11–15, Sept 2015. doi:10.1109/SOFTCOM.2015.7314103.
[10] A. Mathur and Qi Wang. Power reduction techniques and flows at rtl and system level.In VLSI Design, 2009 22nd International Conference on, pages 28–29, Jan 2009. doi:10.1109/VLSI.Design.2009.113.
[11] N.S. Kim, T. Austin, D. Baauw, T. Mudge, K. Flautner, J.S. Hu, M.J. Irwin, M. Kandemir,and V. Narayanan. Leakage current: Moore’s law meets static power. Computer, 36(12):68–75, Dec 2003. doi:10.1109/MC.2003.1250885.
57
58 REFERENCES
[12] A.S. Sedra and K.C. Smith. Microelectronic Circuits: International edition. OUP USA,2010. URL: https://books.google.pt/books?id=KuGCRAAACAAJ.
[13] Farzan Fallah and Massoud Pedram. Standby and active leakage current control and mini-mization in cmos vlsi circuits. IEICE transactions on electronics, 88(4):509–519, 2005.
[14] S. Carver, A. Mathur, L. Sharma, P. Subbarao, S. Urish, and Qi Wang. Low-power designusing the si2 common power format. IEEE Design & Test of Computers, 29(2):62– 70, 2012/04/. low-power design;common power format standard;CPF standard;IC de-sign;power consumption;power domain;power node;interoperability;IEEE1801 low-powerstandard;SoC design;. URL: http://dx.doi.org/10.1109/MDT.2012.2183574.
[15] V. Gourisetty, H. Mahmoodi, V. Melikyan, E. Babayan, R. Goldman, K. Holcomb, andT. Wood. Low power design flow based on unified power format and synopsys tool chain.In Interdisciplinary Engineering Design Education Conference (IEDEC), 2013 3rd, pages28–31, March 2013. doi:10.1109/IEDEC.2013.6526754.
[16] Ieee standard for design and verification of low-power, energy-aware electronic systems.IEEE Std 1801-2015 (Revision of IEEE Std 1801-2013), pages 1–515, March 2016. doi:10.1109/IEEESTD.2016.7445797.
[17] Power Optimization in Design Compiler, June 2016. URL: http://www.synopsys.com/Tools/Implementation/RTLSynthesis/Pages/PowerCompiler.aspx.
[18] Ieee standard for integrated circuit (ic) open library architecture (ola). IEEE Std 1481-2009,pages c1–658, 2009. doi:10.1109/IEEESTD.2009.5430852.
[19] Universal verification methodology, June 2016. URL: http://www.accellera.org/community/uvm/.