Upload
subhash-iyer
View
381
Download
0
Embed Size (px)
Citation preview
Delivered by:
Subhash Iyer,
Program Head,
Soft Polynomials (I) Pvt. Ltd., Nagpur
(CDAC ATC)
Created by Subhash Iyer for Soft Polynomials (I) Pvt. Ltd. 1
Introduction
What is SoC ?
SoC characteristics
Benefits and drawbacks
Solution
Major SoC Applications
Summary
3Created by Subhash Iyer for Soft Polynomials (I) Pvt. Ltd.
Technological Advances
today’s chip can contains 100M transistors
transistor gate lengths are now in term of nanometers
approximately every 18 months the number of transistors on a chip doubles – Moore’s law
The Consequences
components connected on a Printed Circuit Board can now be integrated onto single chip
hence the development of System-On-Chip design
4Created by Subhash Iyer for Soft Polynomials (I) Pvt. Ltd.
Introduction
What is SoC ?
SoC characteristics
Benefits and drawbacks
Solution
Major SoC Applications
Summary
8Created by Subhash Iyer for Soft Polynomials (I) Pvt. Ltd.
Version A:The VLSI manufacturing technology advances has made possible to put millions of transistors on a single die. It enables designers to put systems-on-a-chip that move everything from the board onto the chip eventually.
Version B:SoC is a high performance microprocessor, since we can program and give instruction to the uP to do whatever you want to do.
Version C:SoC is the efforts to integrate heterogeneous or different types of silicon IPs on to the same chip, like memory, uP, random logics, and analog circuitry.
All of the above are partially right, but not very accurate!!!
9Created by Subhash Iyer for Soft Polynomials (I) Pvt. Ltd.
• SoC not only chip, but more on “system”.
• SoC = Chip + Software + Integration
• The SoC chip includes:
• Embedded processor
• ASIC Logics and analog circuitry
• Embedded memory
• The SoC Software includes:
• OS, compiler, simulator, firmware, driver, protocol stack
• Integrated development environment (debugger, linker, ICE)
• Application interface (C/C++, assembly)
• The SoC Integration includes :
• The whole system solution
• Manufacture consultant
• Technical Supporting
10Created by Subhash Iyer for Soft Polynomials (I) Pvt. Ltd.
A typical digital system design involves a significant amount of custom logic circuitry, but also includes pre-designed major components, such as processors, memory units and various types of input/output (I/O) interfaces.
In the traditional approach for designing such systems, a new integrated circuit (IC) chip is created for the custom logic circuits, but each pre-designed component is included as a separate chip
Different approach for realizing digital systems, called embedded system design. It leverages the advanced capabilities of today's IC technology by implementing many of the components of the system within a single chip, such as a field programmable gate array (FPGA).
11Created by Subhash Iyer for Soft Polynomials (I) Pvt. Ltd.
Offer large logic capacity, exceeding severalmillion equivalent logic gates, and includededicated memory resources
Include special hardware circuitry that isoften needed in digital systems, such asdigital signal processing (DSP) blocks (withmultiply and accumulate functionality) andphase-locked loops (PLLs) (or delay-lockedloops (DLLs)) that support complex clockingschemes
Support a wide range of interconnectionstandards, such as double data rate (DDRSRAM) memory, PCI and high-speed serialprotocols.
12Created by Subhash Iyer for Soft Polynomials (I) Pvt. Ltd.
Introduction
What is SoC ?
SoC characteristics
Benefits and drawbacks
Solution
Major SoC Applications
Summary
14Created by Subhash Iyer for Soft Polynomials (I) Pvt. Ltd.
Top Level Design
Unit Block Design
Integration and Synthesis
Trial Netlists
System Level Verification
Timing Convergence& Verification
Fabrication
DVT
DVT Prep
6 12 12 4 14 ?? 5 8 Time in Weeks
Time to Mask order4861
Unit Block Verification
ASIC Typical Design Steps • Typical ASIC design can take up to two years to complete
15Created by Subhash Iyer for Soft Polynomials (I) Pvt. Ltd.
Top Level Design
Unit Block Design
Integration and Synthesis
Trial Netlists
System Level Verification
Timing Convergence& Verification
Fabrication
DVT
DVT Prep
4 14 5 4
Time in Weeks
Time to Mask order24
33
Unit Block Verification
4 2
• With increasing Complexity of IC’s and decreasing Geometry, IC Vendor steps of Placement, Layout and Fabrication are unlikely to be greatly reduced
• In fact there is a greater risk that Timing Convergence steps will involve more iteration.
• Need to reduce time before Vendor Steps.
• Need to consider Layout issues up-front.
SoC Typical Design Steps
16Created by Subhash Iyer for Soft Polynomials (I) Pvt. Ltd.
Design reuse is facilitated if “standard”
internal connection buses are used .
All cores connect to the bus via a standard
interface .
Any-to-any connections easy but …
Not all connections are necessary .
Global clocking scheme .
Power consumption .
Standardization is being addressed by the
Virtual Socket Interface Alliance (VSIA)
17Created by Subhash Iyer for Soft Polynomials (I) Pvt. Ltd.
• AMBA (Advanced Microcontroller Bus Architecture) is a collection of buses from ARM for satisfying a range of different criteria.
• APB (Advanced Peripheral Bus): simple strobed-access bus with minimal interface complexity. Suitable for hosting peripherals.
• ASB (Advanced System Bus): a multimastersynchronous system bus.
• AHB (Advanced High Performance Bus): a high-throughput synchronous system backbone. Burst transfers and split transactions.
18Created by Subhash Iyer for Soft Polynomials (I) Pvt. Ltd.
• One solution to the design productivity gap is to make ASIC designs more standardized by reusing segments of previously manufactured chips.
• These segments are known as “blocks”, “macros”, “cores” or “cells”.
• The blocks can either be developed in-house or licensed from an IP company.
• Cores are the basic building blocks .
19Created by Subhash Iyer for Soft Polynomials (I) Pvt. Ltd.
• Soft Macro– Reusable synthesizable RTL or netlist of generic library elements
– User of the core is responsible for the implementation and layout
• Firm Macro– Structurally and topologically optimized for performance and area
through floor planning and placement
– Exist as synthesized code or as a netlist of generic library elements
• Hard Macro– Reusable blocks optimized for performance, power, size and
mapped to a specific process technology
– Exist as fully placed and routed netlist and as a fixed layout such
as in GDSII format .
20Created by Subhash Iyer for Soft Polynomials (I) Pvt. Ltd.
Reusability
portability
flexibility
Predictability, performance, time to market
Soft
core
Firm
core
Hard
core
21Created by Subhash Iyer for Soft Polynomials (I) Pvt. Ltd.
• Locating the required cores and associated contract discussions can be a lengthy process– Identification of IP vendors
– Evaluation criteria
– Comparative evaluation exercise
– Choice of core
– Contract negotiations• Reuse restrictions
• Costs: license, royalty, tool costs
– Core integration, simulation and verification
22Created by Subhash Iyer for Soft Polynomials (I) Pvt. Ltd.
MPSoC is a system-on-chip that contains multiple instruction-set processors (CPUs).
The typical MPSoC is a heterogeneous multiprocessor: there may be several different types of processing elements (PEs), the memory system may be heterogeneously distributed around the machine, and the interconnection network between the PEs and the memory may also be heterogeneous.
MPSoCs often require large amounts of memory. The device may have embedded memory on-chip as well as relying on off-chip commodity memory.
24Created by Subhash Iyer for Soft Polynomials (I) Pvt. Ltd.
These chips have:
• one (several) processors
• large amounts of memory
• bus-based architectures
• peripherals
• coprocessors
• and I/O channels
25Created by Subhash Iyer for Soft Polynomials (I) Pvt. Ltd.
Introduction
What is SoC ?
SoC characteristics
Benefits and drawbacks
Solution
Major SoC Applications
Summary
26Created by Subhash Iyer for Soft Polynomials (I) Pvt. Ltd.
• There are several benefits in integrating a large digital system into a single integrated circuit .
• These include– Lower cost per gate .
– Lower power consumption .
– Faster circuit operation .
– More reliable implementation .
– Smaller physical size .
– Greater design security .
27Created by Subhash Iyer for Soft Polynomials (I) Pvt. Ltd.
• The principle drawbacks of SoC design are associated with the design pressuresimposed on today’s engineers , such as :
– Time-to-market demands .
– Exponential fabrication cost .
– Increased system complexity .
– Increased verification requirements .
28Created by Subhash Iyer for Soft Polynomials (I) Pvt. Ltd.
Why does it take longer to design SOCs compared to traditional ASICs?
We must examine factors influencing the degree of difficulty and Turn Around Time (TAT) (the time taken from gate-level netlist to metal mask-ready stage) for designing ASICs and SOCs.
For an ASIC, the following factors influence TAT:
• Frequency of the design
• Number of clock domains
• Number of gates
• Density
• Number of blocks and sub-blocks
The key factor that influences TAT for SOCs is system integration (integrating different silicon IPs on the same IC).
29Created by Subhash Iyer for Soft Polynomials (I) Pvt. Ltd.
Introduction
What is SoC ?
SoC characteristics
Benefits and drawbacks
Solution
Major SoC Applications
Summary
30Created by Subhash Iyer for Soft Polynomials (I) Pvt. Ltd.
• Overcome complexity and verification issues by designing Intellectual Property (IP) to be re-usable .
• Done on such a scale that a new industry has been developed.
• Design activity is split into two groups:– IP Authors – producers .– IP Integrators – consumers .
• IP Authors produce fully verified IP libraries – Thus making overall verification task more
manageable• IP Integrators select, evaluate, integrate IP from
multiple vendors– IP integrated onto Integration Platform designed
with specific application in mind
31Created by Subhash Iyer for Soft Polynomials (I) Pvt. Ltd.
IP cores are classified into three
distinct categories:
Hard IP Cores
Firm IP Cores
Soft IP Cores
33Created by Subhash Iyer for Soft Polynomials (I) Pvt. Ltd.
Hard IP cores consist of hard layouts
using particular physical design libraries
and are deliverid in masked-level
designed blocks (GDSII format). The
integration of hard IP cores is quite
simple, but hard cores are technology
dependent and provide minimum
flexibility and portability in
reconfiguration and integration.
34Created by Subhash Iyer for Soft Polynomials (I) Pvt. Ltd.
Soft IP cores are delivered as RTL
VHDL/Verilog code to provide functional
descriptions of IPs. These cores offer
maximum flexibility and reconfigurability
to match the requirements of a specific
design application, but they must be
synthesized, optimized, and verified by
their user before integration into designs.
35Created by Subhash Iyer for Soft Polynomials (I) Pvt. Ltd.
Firm IP cores bring the best of both
worlds and balance the high performance
and optimization properties of hard IPs
with the flexibility of soft IPs.These cores
are delivered in form of targeted netlists
to specific physical libraries after going
through synthesis without performing the
physical layout.
36Created by Subhash Iyer for Soft Polynomials (I) Pvt. Ltd.
Resusability
portability
flexibility
Predictability, performance, time to market
Soft
core
Firm
core
Hard
core
37Created by Subhash Iyer for Soft Polynomials (I) Pvt. Ltd.
Introduction
What is SoC ?
SoC characteristics
Benefits and drawbacks
Solution
Major SoC Applications
Summary
39Created by Subhash Iyer for Soft Polynomials (I) Pvt. Ltd.
eS/W: Current application complexity Set-top box: >1 million lines of code
Digital audio processing: >1 million lines of code
Recordable DVD: Over 100 person-years effort
Hard-disk drive: Over 100 person-years effort
In multimedia systems S/W cost (licenses) 6X larger than H/W chip cost
eS/W uses 50% to 80% of design resources
eS/W now an essential part of SoC products
40Created by Subhash Iyer for Soft Polynomials (I) Pvt. Ltd.
Speech Signal Processing .
Image and Video Signal Processing .
Information Technologies
PC interface (USB, PCI,PCI-Express, IDE,..etc)
Computer peripheries (printer control, LCD
monitor controller, DVD controller,.etc) .
Data Communication
Wireline Communication: 10/100 Based-T, xDSL,
Gigabit Ethernet,.. Etc
Wireless communication: BlueTooth, WLAN,
2G/3G/4G, WiMax, UWB, …,etc
41Created by Subhash Iyer for Soft Polynomials (I) Pvt. Ltd.
• Consumer devices,
• Networking,
• Communications, and
• other segments of the electronics industry.
microprocessor, media processor,
GPS controllers, cellular phones,
GSM phones, smart pager ASICs,
digital television, video games,
PC-on-a-chip
42Created by Subhash Iyer for Soft Polynomials (I) Pvt. Ltd.
Systems on chip are everywhere
Technology advances enable increasingly more complex designs
Central Question: how to exploit deep-submicron technologies efficiently?
46Created by Subhash Iyer for Soft Polynomials (I) Pvt. Ltd.
Introduction
What is SoC ?
SoC characteristics
Benefits and drawbacks
Solution
Major SoC Applications
Summary
47Created by Subhash Iyer for Soft Polynomials (I) Pvt. Ltd.
Technological advances mean that complete
systems can now be implemented on a single
chip .
The benefits that this brings are significant in
terms of speed , area and power .
The drawbacks are that these systems are
extremely complex requiring amounts of
verification .
The solution is to design and verify re-
useable IP .
48Created by Subhash Iyer for Soft Polynomials (I) Pvt. Ltd.
Delivered by:
Subhash Iyer,
Program Head,
Soft Polynomials (I) Pvt. Ltd., Nagpur
(CDAC ATC)
Created by Subhash Iyer for Soft Polynomials (I) Pvt. Ltd. 1
At each level of circuit abstraction, the circuit is equivalent and performs the same target operation, but its structural components (and hence the component’s granularity) are different, and the design issues may be different
Created by Subhash Iyer for Soft Polynomials (I) Pvt. Ltd. 3
Embedded applications in multimedia, wireless communications or networking domain were implemented on Printed Circuit Boards (PCBs).
Composed of discrete Integrated Circuits (ICs) General Purpose Processors
Digital Signal Processors
Application Specific Integrated Circuits
Memories
Further peripherals.
Communication between discrete processing elements and memories is realized by shared bus architectures (like PCi Express)
Created by Subhash Iyer for Soft Polynomials (I) Pvt. Ltd. 4
The transition is from board level integration towards System-on-Chip (SoC) implementations of embedded applications.
Today multiple heterogeneous processing elements and memories can be integrated on a single chip Increased performance
Reduced cost
Improved energy efficiency
This trend originates from tremendous increase in features as well as the multitude of co-existing standards.
Resulting functional complexity clearly promotes Software enabled solutions to achieve the required flexibility and cope with the demanding time-to-market conditions.
However, stringent energy efficiency constraints of mobile applications and cost sensitive consumer devices prohibit the use of general purpose processors.
Tight cost and performance requirements of versatile embedded systems lead to application specific heterogeneous multi-processor architectures
Created by Subhash Iyer for Soft Polynomials (I) Pvt. Ltd. 5
Created by Subhash Iyer for Soft Polynomials (I) Pvt. Ltd. 6
Classical vertical partitioning approach to HW/SW Codesign, where the performance critical parts are implemented as dedicated HW blocks and the rest is executed in SW, is no longer applicable.
Instead HW/SW Co-design can be seen as: Multi-dimensional horizontal mapping problem of an application running on a
heterogeneous multiprocessor platform.
During the mapping process, Exploit application inherent parallelism to achieve performance at reasonable
cost.
For the computationally intensive portions of typical embedded applications the extraction of Task Level Parallelism (TLP) is mostly straight forward:
The partitioning into a set of loosely coupled functional blocks can be naturally derived from the algorithmic block diagram
Two major aspects Processing : A set of processing elements has to
be provided for the efficient execution of the functional tasks.
Communication mapping: The inter-task data exchange has to be mapped to a communication architecture.
Only a joint consideration of architectural choices in both areas bears the opportunity for near optimal quality of results.
Recent architectural advances offer a huge design space with enormous potential for optimization
Created by Subhash Iyer for Soft Polynomials (I) Pvt. Ltd. 7
Bus paradigm as inherited from the PCB era constitutes the major power and performance bottleneck.
Chip-wide communication is envisioned to be handled by full-scale Network-on-Chip (NoC) architectures.
Network-on-Chip architectures Resolve the physical issues
Address the functional aspects of on-chip communication.
Created by Subhash Iyer for Soft Polynomials (I) Pvt. Ltd. 8
So far, the dynamic priority based arbitration scheme of shared busses
creates a mutual dependency between all components connected to the
bus.
Due to this lack of traffic management capabilities every change in the
traffic requirements of the application requires a re-design of the bus
architecture.
Instead, NoC architectures take advantage of sophisticated networking
algorithms to provide elaborated traffic-management capabilities.
By that, the ad-hoc communication mapping is replaced with a
disciplined allocation of the required communication services and the
on-chip network takes care to provide the required resources.
From the system architecture perspective, this separation of the
offered communication services from the architectural resources can be
considered as a virtualization of the actual communication
architecture.
This virtualization effectively decouples the mapping problem for
communication and computation.
The price to pay for the physical and functional benefits of NoC based
communication is a significant penalty in terms of chip area as well as
transfer latency.Created by Subhash Iyer for Soft Polynomials (I) Pvt. Ltd. 9
Programmable processing elements achieve significant gains with respect to performance and computational efficiency by: tailoring instruction set
micro architecture to the respective set of tasks
Examples are innovative architectures exploiting Instruction Level Parallelism (ILP)
Data Level Parallelism (DLP)
Despite the increased computational performance, the effective performance is often constricted by the communication architecture, since memory accesses latency does not keep pace with the processing power.
Created by Subhash Iyer for Soft Polynomials (I) Pvt. Ltd. 10
General purpose processors resolve the memory access bottleneck by using sophisticated cache and memory hierarchies.
This is generally not applicable for embedded applications due to the poor memory locality of stream driven and packet based data processing.
Instead, processor architectures are equipped with hardware supported Multi-Threading (HW-MT) to perform task switches with virtually no performance overhead.
By that, the application inherent TLP is exploited with the purpose of hiding memory latency, which effectively leads to a significant increase in the processor utilization.
This technique is already widely employed in the network processor domain but recently finds its way into advanced multimedia and signal processing platforms.
In the light of the latency issue caused by NoC architectures, the importance of memory hiding techniques is likely to increase in the future.
Created by Subhash Iyer for Soft Polynomials (I) Pvt. Ltd. 11
Taking the above considerations together,
future SoCs can be considered as
NoC enabled multi-processor architectures.
On-chip communication backbone connects a
large number of heterogeneous processing
clusters and global storage elements.
Individual processing clusters consist of one
or few application specific programmable
kernels together with tightly coupled
instruction and data memories as well as
local peripherals.
Created by Subhash Iyer for Soft Polynomials (I) Pvt. Ltd. 12
To cope with the resulting design complexity: Achieve virtualization of the architectural resources,
They can be allocated by the system architect in a deterministic way.
This virtualization is provided by NoC approach for communication part
SW and HW operating systems for the control and data processing respectively.
Divide-and-conquer oriented design paradigm
Enables individual optimization of the architectural elements
The price for these benefits A penalty in terms of chip area,
Generally considered to be of constantly decreasing importance.
Created by Subhash Iyer for Soft Polynomials (I) Pvt. Ltd. 13
HW/SW Co-design of a given embedded
application is defined to
Architect a heterogeneous MP-SoC platform
Allocate the architectural resources for the
execution of the application.
Architecture virtualization resolves the
mutual dependencies in the mapping process
Trade-offs in the design space still require a
joint consideration of application and
architecture as well as communication
Created by Subhash Iyer for Soft Polynomials (I) Pvt. Ltd. 14
For example:
Latency of a more complex on-chip network
can be compensated by either:
introducing memory hierarchy
employing hardware multi-threaded processor
kernels.
Obviously, the resulting design space is
virtually infinite
Architecting and the mapping phase cannot
be considered independently without
sacrificing quality of results.
Created by Subhash Iyer for Soft Polynomials (I) Pvt. Ltd. 15
What is needed is:
A system level design methodology
Corresponding tool supported modeling
framework
Transaction-Level Modeling (TLM)
Advocated by the SystemC language
The system level design paradigm
Already incorporated into state-of-the-art
Electronic System Level (ESL) tools
Created by Subhash Iyer for Soft Polynomials (I) Pvt. Ltd. 16
TLM greatly improves modeling efficiency
simulation speed
Abstracts from Low-level communication
details of the Register Transfer Level (RTL),
To complete transaction
Is usually employed in a byte and cycle accurate fashion
We will look more at packet-level TLM paradigm
Cycle-level TLM is still too detailed to explore large design spaces.
Created by Subhash Iyer for Soft Polynomials (I) Pvt. Ltd. 17
Since communication becomes the driving design paradigm for MP-SoC
Exploration framework is based on a sophisticated, communication centric timing model: Generic synchronization interface
Defines a concise set of communication primitives,
Follows the Open Core Open Core Protocol (OCP)
Not biased towards any specific communication architecture.
Additionally the primitives incorporate timing-annotation to achieve reasonable timing accuracy at the highly abstract packet-level TLM layer
The communication timing model captures the impact on performance of the interconnection architecture.
This communication timing model supports the full spectrum of available and proposed communication architectures ranging from today’s shared busses to the emerging NoC paradigm.
Created by Subhash Iyer for Soft Polynomials (I) Pvt. Ltd. 18
Implemented by means of a versatile
modeling framework for architecture
exploration and hardware/software
partitioning
Key advantages:
Modeling efficiency
Higher simulation speed
A declarative specification mechanism for better
design space exploration
Created by Subhash Iyer for Soft Polynomials (I) Pvt. Ltd. 19
TLM is a method used for SoC Design
To specify at a higher level of abstraction
Involves Communication and Computation
Architectures
Unified Timing Model aims to standardize the
TLM approach
Created by Subhash Iyer for Soft Polynomials (I) Pvt. Ltd. 20
Networking Domain
Multimedia Domain
Wireless Communications
Created by Subhash Iyer for Soft Polynomials (I) Pvt. Ltd. 23
Constitutes implementation of networking standards
IEEE, ITU, ETSI, etc work out communication standards
The purpose of these standards to achieve a high degree of interoperability
ISO/OSI reference model has been providing a common terminology
Created by Subhash Iyer for Soft Polynomials (I) Pvt. Ltd. 24
Networking layer standards in the middle of the ISO/OSI stack address a multitude of higher layer application standards as well as lower physical/link layer standards
Major implementation challenge and effort is of the networking layer
Layer three multi-service access switches are considered as one of the potential killer applications for MP-SoC platforms, since they combine the physical wire speed throughput requirements with flexibility constraints imposed by the individual treatment of different service classes and application characteristics.
Today’s de facto networking layer standard is given by the rather simplistic Internet Protocol (IP).
Lower level layers are nowadays built in as ready made blocks
Physical & link layer data rates of core network equipment are imposing demanding performance requirements
Higher application layers are only present in the terminal devices,
So the relatively low to medium throughput requirements allow for a software implementation of the flexible and control dominated functionality.
Created by Subhash Iyer for Soft Polynomials (I) Pvt. Ltd. 25
Processing of all kinds of media data
Pictures
Audio
Video decoding
Video pixel processing
2D/3D graphics
Standards enable the exchange of media data as well as device inter-operability
MOPS: Mega Operations Per secondCreated by Subhash Iyer for Soft Polynomials (I) Pvt. Ltd. 26
Advances in processing capabilities and multimedia algorithms together with increased user expectations fuels a constant proliferation of new multimedia standards Digital audio decoding (AC3, OGG, MP3),
Video decoding (MPEG2, MEPEG4, H.263, H.264, DivX, quicktime)
3D graphic processing (DirectX 9)
Apart from the multitude and dynamics of multimedia standards, a flexible implementation platform is also mandatory to meet demanding cost constraints of converging consumer electronics devices such as the Advanced Set-Top Box (ASTB).
Here the processing and communication fabrics have to be shared among the multitude of supported multimedia applications to limit implementation cost.
Created by Subhash Iyer for Soft Polynomials (I) Pvt. Ltd. 27
Wireless communication applications aggressively use digital signal processing to maximize bandwidth efficiency
Again, a multitude of standards exists
Each marks a local optimum in implementation cost
Mobility
power dissipation
performance bandwidth efficiency
Multimedia and wireless communication domains are converging into a new generation of Personal Digital Assistant (PDA) or SmartPhone devices
PDAs have started to support a huge variety of travel and fun related applications with much higher processing requirements, like e.g. localization, navigation, travel assistant, video camera, digital camera, picture editing, MP3 player or games
Additionally, this kind of portable, multimedia enabled PDA devices are obliged to support multiple communication standards, both cable (USB, FireWire) and wireless (3G, WLAN).
Created by Subhash Iyer for Soft Polynomials (I) Pvt. Ltd. 28
Summary of common trends:
New features and value added services: lead to exponentially increasing processing performance and communication requirements.
Standards become more dynamic and sophisticated and are introduced more rapidly: calls for high flexibility of the SoC implementation to meet the resulting time-in-market as well as time-in-market requirements.
For mobile applications and cost sensitive consumer electronic devices: energy efficiency becomes the prevailing cost factor
Heterogeneous Multi-Processor SoC(MP-SoC) platforms are generally believed to meet the above mentioned conflicting performance, flexibility and energy efficiency requirements of demanding embedded applications
Hence, in the course of an MP-SoCplatform design the partitioning of a specific application is a task of major importance
Created by Subhash Iyer for Soft Polynomials (I) Pvt. Ltd. 29
Main Partitioning Principle
Control dominated domain
Data dominated domain
This first order partitioning has major
influence on both the target processing and
communication elements as well as on the
appropriate design methodology.
Created by Subhash Iyer for Soft Polynomials (I) Pvt. Ltd. 30
Control-plane processing is characterized by: Moderate performance requirements,
Huge amounts of functionality
Calling for maximum flexibility
Developed using an Integrated Design Environment (IDE) which is Architecture agnostic
Software centric
Software engineering techniques Object Oriented Programming (OOP) using
Unified Modeling Language (UML)
C++
Java
Created by Subhash Iyer for Soft Polynomials (I) Pvt. Ltd. 33
To increase the reuse of the control plane Software (across multiple MP-SoC platform generations): Hardware dependant Software (HdS) portions are
wrapped into: stack of middleware Real Time Operating System (RTOS) device driver layers
Parallelism in Control Plane Processing: Instruction Level Parallelism (ILP) Extracted by a VLIW compiler Or a superscalar processor architecture Helps gain performance
Task Level Parallelism Generally not possible due to huge amount of
functionality
Created by Subhash Iyer for Soft Polynomials (I) Pvt. Ltd. 34
Data-plane processing is characterized by:
Computationally intensive data manipulations
Performance at high data rates
Demand for high processing
Demand for high communication performance.
Rapidly evolving standards in all application
domains impose increasing flexibility
constraints.
Created by Subhash Iyer for Soft Polynomials (I) Pvt. Ltd. 35
Need to reach performance requirements of networking, multimedia and wireless communications applications
Requires aggressively exploiting abundant inherent parallelism available in data-plane processing tasks because: Functionality can be straightforwardly partitioned into a set of
loosely coupled tasks with well predictable or even cyclo-stationary execution timing
A well confined data set is associated with a single activation of an individual task.
Data sets associated with successive activations of an individual tasks are mostly independent.
These spatial and temporal properties with respect to second order task partitioning and data dependency can already be identified during the algorithm development stage and lead to an identification of coarse grain TLP.
This application inherent TLP enables the concurrent and parallel execution on MP-SoC platforms.
Created by Subhash Iyer for Soft Polynomials (I) Pvt. Ltd. 36
More about SoC design concepts next !!!
Created by Subhash Iyer for Soft Polynomials (I) Pvt. Ltd. 37
The mains aspects of
SoC architectural elements
Created by Subhash Iyer for Soft Polynomials (I) Pvt. Ltd. 38
Macroscopic metrics for the classification
and evaluation of architectural elements
Cost
Performance
Power Dissipation
Computational Efficiency
Flexibility
Created by Subhash Iyer for Soft Polynomials (I) Pvt. Ltd. 39
Cost of embedded architecture is separated into Non Recurrent Engineering (NRE) cost for the initial design
Recurring chip fabrication cost.
NRE costs factor is caused by the Design effort for HW
SW development
Fabrication of the initial mask set.
Typical NRE cost for 90 nm SoC 10-100 Million USD design effort
1 Million USD per mask set
Fabrication cost determined by Silicon die area
Packaging
Number of pins
Power dissipation requirements
Created by Subhash Iyer for Soft Polynomials (I) Pvt. Ltd. 40
Performance of both computational and communication architectures is classified into: Latency
Throughput
Latency Absolute time passing between the
start and completion of a task,
Throughput Number of accomplished tasks per
time.
Communication throughput is measured in bits per second (bps).
Throughput of programmable processing elements is measured in Millions Instructions Per Second (MIPS)
MIPS measurement is not very accurate
Created by Subhash Iyer for Soft Polynomials (I) Pvt. Ltd. 41
Measured in Watt
Denotes the energy per time required to operate
an embedded system
Is an architecture metric of growing importance
Battery lifetime of mobile devices immediately
depends on the energy consumption.
Packaging cost depends on the heat dissipation
properties, which in turn depends on the power
consumption.
Striving for low power and energy consumption
constitutes the key driver for architecture
differentiation of embedded SoC platforms
Created by Subhash Iyer for Soft Polynomials (I) Pvt. Ltd. 42
Derived from performance and power
consumption
Characterizes efficiency of a given
architectural element with a single value
Computational efficiency of programmable
architectures is predominantly measured in
MIPS/Watt.
Alternatively measured in energy
consumption per task (since MIPS
measurement is not very accurate)
Created by Subhash Iyer for Soft Polynomials (I) Pvt. Ltd. 43
Related to the effort to change the
functionality of a given architectural
element
In contrast to the previous metrics, flexibility
can be hardly measured in an accurate way.
Nonetheless, in the context of rapidly
evolving functionality and standards of
embedded applications, architectural
flexibility is of major importance to achieve
both decreasing time-to-market as well as
increasing time-in-market
Created by Subhash Iyer for Soft Polynomials (I) Pvt. Ltd. 44
A processing element (PE) provides the computational resource to execute a given portion of the application
Dedicated hardware implementation yields best performance
Programmable PEs are controlled by an instruction stream in a highly flexible way
The rather poor performance of programmable PEs has ever fueled computer architecture research towards parallelizing the execution of instructions
Early efforts in parallel computer architectures are classified according to the deployment of control-and data-level parallelism SISD
SIMD
MIMD
MISD
Created by Subhash Iyer for Soft Polynomials (I) Pvt. Ltd. 45
SISD: Single Instruction Single Data
Traditional von-Neumann kind of computer architectures
Sequentially execute a single instruction stream on a single processing resource
SIMD: Single Instruction Multiple Data
Vector processing machines
Perform a single instruction on multiple data items in parallel
Used in architectures for embedded DSP and graphic applications
Exploit inherent data-level parallelism (DLP)
MIMD: Multiple Instruction Multiple Data
Traditional homogeneous multi-processor type of architectures
Employed in scientific supercomputers
MISD: Multiple Instruction Single Data
Rarely encountered class of architectures,
Exploit temporal ILP by: Setting pipeline stages
Executing several instructions simultaneously,
Created by Subhash Iyer for Soft Polynomials (I) Pvt. Ltd. 46
Superpipelining:
Uses deep execution pipelines to increase the clock frequency
Superscalarity
Employs parallel functional units and complex dispatcher architectures to dynamically extract Instruction Level Parallelism (ILP)
Very Large InstructionWord (VLIW)
Execute several statically scheduled instructions on parallel functional units,
Hence the effort for ILP extraction is moved into the compiler
Hardware Multi-Threading (HW-MT)
Such architectures are able to concurrently pursue two or more threads of control by providing separate register resources for each thread context
Domain Specific (DS) Instruction Set
Tailors the programmable PE to a specific application domain
Provide specialized functional units.
DS processor examples are Digital Signal Processors (DSPs) employed in multimedia and wireless communications, or Network Processing Units (NPUs) for networking applications
Created by Subhash Iyer for Soft Polynomials (I) Pvt. Ltd. 47
The applicability of the above listed performance improvement techniques depends on the considered set of target applications.
Superpipelining and Superscalarity are heavily used in high performance General Purpose Processor (GPP) architectures to increase single thread performance of arbitrary applications on the vast expense of silicon area and power dissipation.
On the one hand, embedded applications are severely energy and cost constrained, but still have significant performance and flexibility requirements.
The most promising approach to jointly optimize flexibility and performance is to exploit coarse-grain TLP instead of ILP and map the loosely coupled tasks to individually optimized PEs.
This kind of embedded PEs mostly rely on the more power aware performance optimization techniques, like VLIW, multi-threading and a domain specific or even application specific instruction set.
Created by Subhash Iyer for Soft Polynomials (I) Pvt. Ltd. 48
MIMD control parallelism plays an important
role in embedded SoC architectures
Parallel execution of specialized PEs offers
Chance for improving application performance
Without sacrificing power efficiency
Created by Subhash Iyer for Soft Polynomials (I) Pvt. Ltd. 49
Refers to the multiple instantiation of identical PEs
Corresponds to a single chip implementation of the MIMD principle
Homogeneous multi-processing of general purpose embedded micro controllers Achieves the performance scaling
required for control-plane processing portion of embedded applications
Also found for dataplane processing in domain specific MP-SoCplatforms, where the identical instruction set of the PEs is tailored to a certain application domain
Created by Subhash Iyer for Soft Polynomials (I) Pvt. Ltd. 50
Employs multiple PEs
Different PEs individually tailored to a certain task or task set
Dedicated optimization
Applicable for the data-plane processing as it allows for a manual and static task allocation
The high degree of specialization in heterogeneous multi-processing further optimizes computational efficiency for a well defined set of target applications at the expense of generality
Created by Subhash Iyer for Soft Polynomials (I) Pvt. Ltd. 51
Parallel execution
Requires multiple computational resources
More than one task can be active at the same
point in time.
Concurrent execution
Interleaved processing of several tasks on a
single resource,
At any time only one task can be active
Created by Subhash Iyer for Soft Polynomials (I) Pvt. Ltd. 52
Benefit of concurrent execution is depicted in figure
2 tasks are mapped to a single processing element
Both tasks are divided into 2 processing portions
These are separated by a communication request
After Δtdelay the processing of the first portion is finished and the task is blocked for Δtresponse until the request is accomplished.
Instead of wasting the processor resource during this period, the processor context is swapped to the second task by a scheduler.
Utilization of the processor is increased and the request latency is hidden
Created by Subhash Iyer for Soft Polynomials (I) Pvt. Ltd. 53
The mains aspects of
SoC on-chip communication elements
Created by Subhash Iyer for Soft Polynomials (I) Pvt. Ltd. 55
Basic cost, performance, power, and flexibility metrics apply.
Additionally, Quality of Service (QoS) metrics known from the networking application domain are of increasing importance to manage complex on-chip traffic
The scalability of the communication architecture gains growing attention
Created by Subhash Iyer for Soft Polynomials (I) Pvt. Ltd. 56
Bus based on-chip communication paradigm is derived from the Printed Circuit Board (PCB) domain.
Examples:
VME (Versa Module Eurocard bus)
PCI (Peripheral Component Interconnect)
Advantages:
Easy programming model
High flexibility
Abundant availability of Intellectual Property (IP)
Suited for small and medium scale embedded systems where a small number of blocks exchange moderate amounts of data.
Created by Subhash Iyer for Soft Polynomials (I) Pvt. Ltd. 57
Implement master-slave communication scheme,
Active initiators along with passive target modules are hooked to a shared communication medium
Typical masters: Processors
DMA controllers
Autonomous ASIC blocks,
Typical slaves: Memories
Co-processors
Other peripherals
Other components: Arbitration units: Grant the access to the
communication medium to one of the competing master modules
Decoder units: Activate the target module based on the actual address and the address map, which maps the target modules into the bus address space
Created by Subhash Iyer for Soft Polynomials (I) Pvt. Ltd. 58
Bandwidth
Is the premier performance metric
Denotes the maximum transfer capacity of the
bus
Available bandwidth is measured in bits per
second
Corresponds to the number of parallel data wires
divided by the bus clock period
Created by Subhash Iyer for Soft Polynomials (I) Pvt. Ltd. 59
Pipelining: Well known technique to improve the communication
throughput
Clock frequency is limited by the critical path
Inserting an additional pipeline stage into the critical path allows a higher clock frequency
Yields a higher communication bandwidth
Since the address decoder is usually integral part of the critical path, bus transactions in high performance buses are executed in separate address and data stages
Created by Subhash Iyer for Soft Polynomials (I) Pvt. Ltd. 60
Burst modes: Improve communication throughput for the linear
access of subsequent addresses by a single master
Address counter is incremented automatically
Next data item is transferred with every cycle without renewed arbitration
Created by Subhash Iyer for Soft Polynomials (I) Pvt. Ltd. 61
Unidirectional data links
Distinguish on-chip buses from most on-board
buses
The latter are based on tristate data wires to
maximize the utilization of expensive on-board
wires
Created by Subhash Iyer for Soft Polynomials (I) Pvt. Ltd. 62
Hierarchy
Common bus systems separate high
performance from low performance
communication
Two buses with different speed
characteristics
Created by Subhash Iyer for Soft Polynomials (I) Pvt. Ltd. 63
Multilayer bus architectures
Provide dedicated point-to-point connections
between distinctive initiators and targets to
eliminate bandwidth bottlenecks
Required de-multiplexer at the initiator side is
called input stages, the respective target
multiplexer is called output stage
Created by Subhash Iyer for Soft Polynomials (I) Pvt. Ltd. 64
Created by Subhash Iyer for Soft Polynomials (I) Pvt. Ltd. 65
Crossbar bus architectures:
Provide multiple parallel resources between initiators and targets
Significantly improve the traffic throughput
Degree of parallelism may vary from partial crossbar to full crossbar architectures, where the latter provides an individual resource for each connected target
Arbitration:
Can be based on various algorithms,
Simple round robin
Fixed, Configurable or dynamic priority schemes
Static or Dynamic Time Division Multiple Access
(TDMA).
Even more advanced algorithms are known to
further improve the quality of service.
Created by Subhash Iyer for Soft Polynomials (I) Pvt. Ltd. 66
Locking of a bus: By a single master is a necessary feature to support
read-modify-write kind of semaphore operations.
This feature is required by most micro-controller architectures, which run operating systems
Split transaction buses Allow the master to issue multiple requests without
waiting for a response, i.e. request and response are separated
Out-of-order execution Improves the bus throughput by reordering the sequence
of responses, depending on the availability of the slave component
This feature requires advanced state-machines in the master modules to cope with non-deterministic sequence of responses
Created by Subhash Iyer for Soft Polynomials (I) Pvt. Ltd. 67
Physical Issues. Implemented using a standard cell based semi-custom implementation
flow
Transmission wires are not physically optimized,
timing closure issues and unreliable communication links.
Examples of physical effects are crosstalk noise, electromagnetic interference, and radiation-induced charge injection
Synchronous Design. Most current bus architectures require all connected modules in a
single clock domain.
Due to the parasitic capacities of long bus wires, strong driver transistors are necessary to achieve timing closure
Leads to power dissipation
Future SoC designs will follow the Globally Asynchronous Locally Synchronous (GALS) paradigm,
Chip-wide wires will span multiple clock domains, which disqualifies bus architectures as the future chip-level transport mechanism
Created by Subhash Iyer for Soft Polynomials (I) Pvt. Ltd. 69
Traffic Management. Due to the rather simple arbitration mechanisms, shared buses
provide only rudimentary traffic management support.
Since the communication pattern highly depends on the spatial and temporal execution of the application tasks, meeting the individual QoS requirements like throughput, jitter, or ordering of the respective tasks is very challenging.
This also causes the poor scalability of bus-based communication infrastructures, since every change in the traffic profile of one part of the application and every additional component influences the other parts and requires renewed balancing of the bus architectures.
Interoperability. Although simple standard peripherals, like DMA, IRC, or
memories are available for respective bus systems, it is a tedious and error-prone task to adapt complex IP blocks to a specific bus architecture.
So far efforts to create standard bus interfaces, have not been successful
Created by Subhash Iyer for Soft Polynomials (I) Pvt. Ltd. 70
Alternative on-chip communication concepts To cope with the limitations of shared bus architectures forms the Networks on Chip (NoC) design paradigm
Aims to replace current adhocwiring of IP blocks with a disciplined approach where full-scale on-chip networks provide communication services according to the ISO/OSI reference model
Problems in on-chip communication like signal integrity issues, link reliability, or Quality of Service (QoS) are separately resolved on the respective OSI layer
Created by Subhash Iyer for Soft Polynomials (I) Pvt. Ltd. 71
The four lower layers of the are of interest
Physical Layer deals with the electrical aspects of the data
transmission
E.g. signal voltages, clock recovery, and pulse shape
Data Link Layer provides a reliable data transfer over the physical link.
Error detection by means of block codes and error
correction mechanisms like: Automatic Repeat Request (ARQ)
Forward Error Correction (FEC)
Network Layer implements the arbitration algorithms, buffering
strategies and flow-control mechanisms
So, the networking layer has dominant impact on the
performance and functional behavior of network.
Transport Layer protocols establish and maintain end-to-end connections.
The transport layer manages rate-based flow control,
performs packet segmentation and reassembly, and
ensures message ordering
This abstraction hides the topology of the network,
and the implementation of the links that make up the
network
Created by Subhash Iyer for Soft Polynomials (I) Pvt. Ltd. 72
The challenge in the development of Network-
on-Chip architectures is to combine the know-
how from both the networking and VLSI domain.
Also the users of on-chip networks have to
understand basic networking principles:
First the system architect has to specify design time
parameters of the selected NoC architecture like
topology, buffer sizes, arbitration algorithm.
Later the platform programmer has to configure
runtime parameters like priorities, routing tables,
buffer management thresholds to take advantage of
the capabilities
Created by Subhash Iyer for Soft Polynomials (I) Pvt. Ltd. 74
Transport layer is the first to provide services which are independent of the implementation of the network
Enables the platform programmer to develop embedded software independently from the interconnect architecture
A key ingredient in tackling the challenge of decoupling the computation from communication
Interaction with the network becomes deterministic, rather than prognostic or reactive like in today’s bus based communication architectures
For complex multi-hop networks it is difficult to provide uniform Quality of Service (QOS) guarantees like lower bandwidth bounds, or packet ordering for the complete on-chip traffic
To combine high resource utilization with high QoS requirements of certain traffic types, researchers in the field of computer networks distinguish guaranteed services and best effort service classes
Created by Subhash Iyer for Soft Polynomials (I) Pvt. Ltd. 75
Guaranteed Services Require resource reservation for worst-case scenarios
Can be expensive as guaranteeing the throughput for a stream of data implies reserving bandwidth for the peak throughput, even when its average is much lower.
So, resources are often underutilized
Best-effort Services So not reserve any resources, and hence provide no
guarantees.
Best-effort services utilize resources well as they are typically designed for average-case scenarios instead of worst-case scenarios.
Are also easy to configure,
Require no resource reservation
Main disadvantage: unpredictability of the effective performance
Created by Subhash Iyer for Soft Polynomials (I) Pvt. Ltd. 76
Created by Subhash Iyer for Soft Polynomials (I) Pvt. Ltd. 77
Networking layer is implemented by the
routing nodes of the NoC.
Router based network implementations
classified as:
Switching Mode
Routing Mode
Queuing
Congestion Control
Switching mode:
Circuit switching Connections are set up by establishing a
conceptual physical path from a source to a destination.
Links can be shared between two connections only at different points in time, by using the time-division multiplexing (TDM) scheme
Packet switching
Data is divided into packets and every packet is composed of a header and the payload.
The header contains information that is used by the router to switch the packet to the appropriate output port
Created by Subhash Iyer for Soft Polynomials (I) Pvt. Ltd. 78
Routing mode: applies to packet-switched networks and defines the way packets are transmitted and buffered between network nodes
Store-and-forward An incoming packet is received and stored entirely before it is
forwarded to the next node.
Worm-hole routing An incoming packet is forwarded as soon as the packet header is
evaluated and the next router guarantees that the complete packet will be accepted.
In case the next hob is blocked, the packet tail potentially blocks other resources
Virtual cut-through An incoming packet is forwarded as soon as the next router
guarantees, that the complete packet will be accepted.
In case the next hob is blocked, the packet tail is stored in a local buffer
Created by Subhash Iyer for Soft Polynomials (I) Pvt. Ltd. 79
Created by Subhash Iyer for Soft Polynomials (I) Pvt. Ltd. 80
Queuing: Buffering strategies can be distinguished by the location of the buffers inside
the router.
In the following, N denotes the number of bi-directional router ports.
Input queuing: A router has a single input queue for every incoming link.
Suffers from the so-called head-offline blocking problem, i.e. the router utilization saturates at
about 59%,
Weak link utilization.
Output queuing: ` There are N output queues for every outgoing link resulting in N2 queues.
Yields optimal performance,
The costly N2-fold storage and wiring effort prohibits the implementation for a large number of
ports
Virtual output queuing: Combines the advantages of input queuing and output queuing
Avoids the head-of-line blocking problem.
Each input port maintains a separate queue for each output port
Key factor in achieving high performance using VOQ switches is the scheduling algorithm
Congestion control: Packet switched networks without mechanisms for
bandwidth reservation may run into resource contention and subsequent buffer overflow.
Several solutions prevent packets from entering until contention is reduced Packet discarding: Simply drops packets in case of buffer
overflow Credit based flow control: Packet loss is prevented in a
deterministic way by either signaling congestion via separate wires (back-pressure) or the receiver regularly informs the sender about the available buffer space (window).
Rate based flow control: the sender gradually adjusts the traffic generation rate in response to control flow messages from the receiver. Rate based flow control has to be implemented by the transfer layer and potentially suffers from instability due to long control loops
Created by Subhash Iyer for Soft Polynomials (I) Pvt. Ltd. 81
Architectural trends
Set the stage for the discussion of appropriate system level design methodologies
Processing elements Requirements for performance, power efficiency and
flexibility SIMD, VLIW, super-pipelining, and hardware multi-
threading exploit application inhérent instruction-, data-, and task-level parallelism
Communication: Bus Architectures Vs Network-on-Chip
Virtualization of architectural resources enables ’divide-and-conquer’ Embedded control-plane processing tasks are executed
in the user space the Real Time Operating System (RTOS),
Embedded data-plane processing tasks are executed on HW multi-threaded processing elements
Global communication of control- and data-plane processing elements is performed by elaborated on-chip networks
Created by Subhash Iyer for Soft Polynomials (I) Pvt. Ltd. 83
Delivered by:
Subhash Iyer,
Program Head,
Soft Polynomials (I) Pvt. Ltd., Nagpur
(CDAC ATC)
Created by Subhash Iyer for Soft Polynomials (I) Pvt. Ltd. 1
At each level of circuit abstraction, the circuit is equivalent and performs the same target operation, but its structural components (and hence the component’s granularity) are different, and the design issues may be different
Created by Subhash Iyer for Soft Polynomials (I) Pvt. Ltd. 3
System level: Highest level circuit abstraction
The system is specified as processes and tasks
A mix of hardware and software.
Concerned with overall system structure and information flow.
Computer systems are described as an interconnected set of processors, memories and switches
Behavioral level, algorithmic level or high level Also called as instruction set level or algorithmic level.
Focus is on the computations performed by an individual processor; i.e., the way it maps sequences of inputs to sequences of outputs
Architecture, microarchitecture, RTL Viewed as a set of interconnected storage elements and functional
blocks.
Behavior of the system is described as a series of data transfers and transformations between the storage elements
Microarchitectural-level representation of the chip resources, such as adders and subtractors, is determined along with decisions such as single-cycle, multicycle, pipelined or superscalar implementation
Created by Subhash Iyer for Soft Polynomials (I) Pvt. Ltd. 5
Logic level System is described as a network of gates and flip-flops,
Behavior is specified by logic equations
Circuit is represented in the form of a netlist at which level logic realizations of functional blocks are determined
Circuit or transistor level Circuit is a netlist of transistors
Decisions such as how and what types of transistors will be used, complementary CMOS, pass transistors, etc. are the main issues
Physical or layout level System is specified in terms of the individual transistors of which it is
composed
Behavior of the system can be described in terms of the network equations
Lowest level of circuit abstraction
Chip is a sequence of layers (masks), each layer of which is composed of polygons.
It is this level that is transferred to the manufacturing process
Created by Subhash Iyer for Soft Polynomials (I) Pvt. Ltd. 6
Design automation terminology,: Optimization
Synthesis
Analysis
In circuit analysis, the behavior or
characteristics of a circuit are studied
The task of synthesis is to take the
specifications of the behavior required for
a system and a set of constraints and goals
to be satisfied and to find a structure that
implements the behavior while satisfying
the goals and constraints
Behavior, structure and physical design: 3
domains in which hardware is described “Behavior”:
Refers to the ways in which the system or its
components interact with their environment
(mapping from inputs to outputs)
interest is in what a design does, not in how it is
built
“Structure” Refers to the set of interconnected components
that constitute the system (described by a netlist)
Focus on constraints, such as area, cost and delay.
“Physical” design Mapping of the structure onto the technology
Ignores what the design is supposed to do
and binds its structure in space or to
silicon
Created by Subhash Iyer for Soft Polynomials (I) Pvt. Ltd. 7
The automatic design process of VLSI circuits is called synthesis
Created by Subhash Iyer for Soft Polynomials (I) Pvt. Ltd. 8
System-synthesis process partitions the tasks
into hardware, software and their
communications
High-level synthesis process is the translation
from behavioral description to its equivalent
structural description
Logic synthesis is the process of mapping
from the design at the RTL to a gate-level
representation that is suitable for input to
physical design
Created by Subhash Iyer for Soft Polynomials (I) Pvt. Ltd. 9
Physical design then addresses aspects of chip implementation Floor planning
Placement
Routing
Extraction
Performance analysis
Output of physical design is the handoff (“tapeout”) to manufacturing
A generalized data stream, GDSII, stream file
Verification of correctness Design rules
Layout versus schematic
Constraints (timing, power, reliability, etc.)
Created by Subhash Iyer for Soft Polynomials (I) Pvt. Ltd. 10
During each phase of the synthesis process,
the functional equivalence of two
consecutive phases is to be checked to
ensure that they are functionally the same
A power and timing analysis study can be
done by using compact models at the
transistor level
At the physical level, more accurate power
and time analysis is possible through the
extraction of accurate parasitics
Created by Subhash Iyer for Soft Polynomials (I) Pvt. Ltd. 11
High-level synthesis is the translation process
from a behavioral description to a structural
description
Created by Subhash Iyer for Soft Polynomials (I) Pvt. Ltd. 12
Analogous to “compilation” that translates a high-level language program in C/C++ to an assembly language program
HLS Also known as behavioral-level synthesis or algorithmic-level synthesis.
Constraints to be considered in HLS are: Area
Performance
Power consumption
Reliability
Testability
Cost.
HLS synthesis allows a design engineer to make decisions at an early stage of the design cycle, thus ensuring correct design.
Typical steps involved are scheduling, binding, allocation, etc.
Created by Subhash Iyer for Soft Polynomials (I) Pvt. Ltd. 13
Advantages:
Continuous and reliable design flow
From system-level abstraction to RTL abstraction automatically without manual handling
Automatic translations from high-level specifications in the form of C or SystemC to RTL description of the circuit in the form of VHDL or Verilog.
Shorter design cycle
More automation: faster designs, lesser cost
Fewer errors
Synthesis process can be verified easily, so the chances of errors will be smaller.
Correct design decisions at the higher levels of circuit abstraction can ensure that the errors are not propagated to the lower levels, which are too detailed and costly to correct
Easy and flexible to search the design space
Synthesis system can produce several designs in a short time
So, the designer has more flexibility to choose the proper design considering different trade-offs of power, leakage, area and delay.
Balanced degree of freedom for power optimization
Power and performance optimization can be performed at any level of circuit abstraction
As the level of abstraction goes lower, the complexity of the circuit increases
Additionally, the degrees of freedom, and thus power reduction opportunities decrease
Hence, high level or behavioral level is an attractive level and provides a balanced degree of freedom for design space exploration.
Documenting the design process
Automated system can track design decisions and their effects
Design debugging and continuation by third parties can be easily done
Useful for macrocell-based design and the sale of designs as intellectual property cores
Availability of circuit technology to more people
Design expertise is moved into synthesis systems
It becomes easier for a non-expert to produce a chip that eets a given set of specifications
Cost of manpower required reduces
Created by Subhash Iyer for Soft Polynomials (I) Pvt. Ltd. 14
The high-level synthesis process takes a system in the form of a hardware description language (HDL) as input and generates an optimal RTL description by:
Compilation
Transformation
Scheduling
Allocation
Binding
Other steps
Power optimization
Leakage optimization
Register optimization
Interconnect optimization
Take place in synthesis either sequentially or along with the fundamental steps
No fixed sequence for performing various high-level synthesis tasks
They are independent of each other
Yet, these tasks should be performed simultaneously for effective optimization
Created by Subhash Iyer for Soft Polynomials (I) Pvt. Ltd. 15
The behavior of a system to be synthesized is usually specified at the algorithmic level using a high-level programming language like C/C++ or a hardware description language (HDL) such as VHDL and Verilog.
The behavior of the system is then compiled into internal representations, which are usually data flow graphs (DFGs) and control flow graphs (CFGs).
Each behavioral specification is transformed into a unique graphical representation.
The DFG is a directed graph that represents data movement, whereas the CFG is a directed graph that indicates the sequence of operations.
Created by Subhash Iyer for Soft Polynomials (I) Pvt. Ltd. 16
In the transformation step, the initial DFG is transformed so that the resultant DFG is more suitable for scheduling and allocation.
These transformations include compiler-like optimizations such as dead-code elimination, common sub-expression elimination, loop unrolling, constant propagation and code motion.
In addition, some hardware-specific transformations like minimization of syntactic variances and retiming may be applied to take advantage of the associativity and commutativityof certain operations
Created by Subhash Iyer for Soft Polynomials (I) Pvt. Ltd. 17
Scheduling is the process of partitioning the set of arithmetic and logical operations in the DFG into groups so that the operations in the same group can be executed concurrently, while taking into consideration possible trade-offs between the total execution cost and hardware cost.
A group of concurrent computations to be executed simultaneously is referred to as a control step.
The total number of control steps needed to execute all operations in the DFG, the minimum number of functional units of each type to be used in the design and the lifetimes of the variables generated during the computation of operations are determined in the scheduling step.
Created by Subhash Iyer for Soft Polynomials (I) Pvt. Ltd. 18
Selection is the process of choosing resources from the library, which involves tradeoffs according to different features like delay, area, power and leakage.
Resource allocation is the process of determining the number of functional units of each type for performing operations, memory units (registers) for storing data values and interconnects for data transportation.
Often, the selection and allocation processes are a single task.
Allocation is further divided into sub-tasks, such as functional unit allocation, memory unit allocation and interconnect allocation.
Resource allocation and binding may share resources so that the same hardware can be used to execute different operations or so that the same register can be used to store more than one variable.
Created by Subhash Iyer for Soft Polynomials (I) Pvt. Ltd. 19
Binding or assignment is the process of assigning variables to memory units and data transfers to interconnections.
Binding is further divided into several sub-tasks, such as functional unit binding, memory unit binding and interconnect binding.
Functional unit binding involves the mapping of operations in the behavioral description into a set of selected functional units.
Memory unit binding maps data carriers (constants, variables, arrays) in the behavioral description onto storage elements (read-only memories, registers, memory units) in the data path.
The interconnect binding task maps every data transfer in the behavior onto a set of interconnection units for data routing.
Created by Subhash Iyer for Soft Polynomials (I) Pvt. Ltd. 20
In the output generation phase, design
output is generated.
The output should be in a form such that
logic-level synthesis tools can optimize the
combinational logic and layout synthesis
tools can design the chip geometry.
The generated output is generally in a low-
level HDL, such as structural VHDL
Created by Subhash Iyer for Soft Polynomials (I) Pvt. Ltd. 21
Data Path Synthesis
Control Synthesis
The controller is typically a finite state machine
that is either microcoded or hardwired
Created by Subhash Iyer for Soft Polynomials (I) Pvt. Ltd. 23
HLS is important for several reasons Reduction of design cycle time
Rapid design space exploration at the higher level of abstraction
Wrong decisions are not propagated to lower levels of design abstraction,
HLS involves several important steps, such as: Scheduling
Allocation
Binding
Several graph theoretical algorithms are available that can perform optimization while performing these tasks.
Two Types Data path
Control synthesis
There are existing tools to perform high-level synthesis explicitly, and some tools perform the behavioral to RTL compilation as an intermediate process.
Created by Subhash Iyer for Soft Polynomials (I) Pvt. Ltd. 24
Delivered by:
Subhash Iyer,
Program Head,
Soft Polynomials (I) Pvt. Ltd., Nagpur
(CDAC ATC)
Created by Subhash Iyer for Soft Polynomials (I) Pvt. Ltd. 1
Design flow of integrated circuits
Application phase
Implementation phase
Both are decoupled
Application to implementation
A specification document written by:
Application team
System architecture specialist
Ad-hoc and informal approach
Created by Subhash Iyer for Soft Polynomials (I) Pvt. Ltd. 3
Problems
Ambiguity of the informal specification document leads to misinterpretations and implementation errors
Lack of reliable performance information before the implementation often causes an over- or under-provisioning of processing and communication resources
Quality of results mainly depends on the intuition and experience of the system architect
Manual creation of the verification environment requires significant effort and again represents a potential source of inconsistencies with the original design intend
Created by Subhash Iyer for Soft Polynomials (I) Pvt. Ltd. 4
Created by Subhash Iyer for Soft Polynomials (I) Pvt. Ltd. 5
Electronic System Level (ESL)
Application is jointly considered with the system architecture to find a feasible and cost effective application to architecture mapping
The declared goal of ESL design is to increase the engineering productivity and quality of results during the specification of the MP-SoCplatform architecture and application mapping
New design paradigm to cope with the: complexity
economics
of the emerging billion-transistor System-on-Chip era.
Architecture centric definition We define platform-based design
as the creation of a stable microprocessor-based architecture that can be rapidly extended, customized for a range of applications, and delivered to customers for quick deployment
Design process based definition The general definition of a
platform is an abstraction layer in the design flow that facilitates a number of possible refinements into a subsequent abstraction layer in the design flow
Created by Subhash Iyer for Soft Polynomials (I) Pvt. Ltd. 6
Multiple, almost orthogonal phases Functional phase
Performed by application specialists
Completely agnostic to architectural considerations.
Includes
Embedded SW development of the control-plane portion
Data-plane algorithm development
The latter is carried out using highly application domain specific tools and methodologies
MP-SoC platform phase All designs tasks, which have to be performed under consideration of the full functional and
architectural complexity the MP-SoC platforms
Example
Specification of the system-architecture
Mapping of the application onto the MP-SoC platform
Development of the hardware dependant Software layers
High-level IP creation phase Design of processing elements (RISC, DSP, MCU, ASIPs)
On-chip interconnect technologies (busses, NoC),
Somain specific standard I /O (PCI-variants, SPIx variants, HyperTransport, I2C, FireWire, QDR, etc.),
Creation of well defined ASIC IP blocks (e.g. an MPEG4 video codec).
Not completely orthogonal to the functional phase, since the design of application specific processing elements and communication IP indeed depends on the considered application
Semiconductor technology and basic IP creation phase Covers standard cells, I/O, memories and the basic technology processes supporting them.
More heterogeneous technologies, combining embedded DRAM, embedded Flash, mixed-signal BiCMOS, RF, and analog
More to do with fabrication technologiesCreated by Subhash Iyer for Soft Polynomials (I) Pvt. Ltd. 7
Represent the results of the functional phase as a well defined application model as the Executable Specification of the system
System architecture needs to be defined in terms of mapping the application model to the hardware (Main Task)
Embedded SW development
Hardware-Software co-verification task: RTL is verified along with embedded software
Methodology used: Transaction Level Modeling (TLM)
Created by Subhash Iyer for Soft Polynomials (I) Pvt. Ltd. 8
Engineering of integrated circuits has always employed models on different levels of abstraction Model: unique, idealized description of the considered system
Degree of abstraction characterizes the type of model used in the respective design phase
Goal of abstraction is to provide a description the system, which is simple enough
yet sufficiently accurate to enable the necessary investigations
take design decisions
proceed to the next design phase.
Indeed, the design-flow of an embedded system can be considered as a sequence of steps which successively reduce the degree of abstraction in the system model
Created by Subhash Iyer for Soft Polynomials (I) Pvt. Ltd. 9
Functionality refers to the modeling of the system behavior On the highest level of abstraction, the
functionality is condensed to pure mathematic expressions.
Later the functionality is refined to operators,
Finally mapped to logic gates
Timing model captures the temporal properties of the system Degree of abstraction ranges from causality
of events to physical timing of transistors and wires
Data representation Higher level data resolution is reduced to
Tokens and Abstract Data Types (ADT)
Lower levels employ word or bit representations.
The Component granularity describes the finest resolution of the sub-blocks First the component resolution is restricted
to coarse-grain building blocks,
Finally the complete embedded system is composed of fine-grain silicon transistors.
Created by Subhash Iyer for Soft Polynomials (I) Pvt. Ltd. 10
Creation of a system model requires: Modeling language
Well defined execution semantic coordinating the activation of the individual blocks
Model of Computation (MoC) is composed of two parts: Coordination language describes
basic execution semantics with respect to properties like parallelism, synchronism, reactivity and provides the abstracted communication mechanism
The host language provides the language elements for the specification of the system models
Created by Subhash Iyer for Soft Polynomials (I) Pvt. Ltd. 11
Characterized by the total temporal ordering of all occurring communication events
Example is the discrete event simulation MoC, which defines the execution semantics for HDL simulators
Further examples of timed MoCs are synchronous languages like Esterel, Lustre, or Signal, where the events of all communication signals are constrained to occur at identical time stamps
Thanks to their sound mathematical foundation, synchronous languages have gained adoption for the specification, analysis and code-generation of reactive control-dominated applications
Created by Subhash Iyer for Soft Polynomials (I) Pvt. Ltd. 12
Characterized by the fact, that communication
events are only partially ordered
However, various untimed MoCs are popular for
the specification of both data and control
dominated applications
Data-Flow MoCs are heavily employed for algorithmic
modeling and analysis of signal processing
applications
Communicating Sequential Processes (CSP) and
Calculus for Communicating Systems (CCS) are
prominent untimed MoCs which are based on
sequential processes that communicate using a
rendezvous communication mechanism.
Created by Subhash Iyer for Soft Polynomials (I) Pvt. Ltd. 13
The definition of a proper MoC has long been considered to be the silver bullet for system level design and by that for the solving of the design productivity crisis
Initially, the complete system functionality is to be created using the ideal MoC, which provides highest modeling efficiency, simulation speed, and smooth IP reuse
Next, the initial specification would be automatically verified using formal verification technology and metrics like determinism, causality, dead-lock absence, consistency, completeness, and fairness. The golden system specification would then provide the foundation for an automated path to design space exploration to take functional and architectural design decisions
Finally, system level synthesis would be applied to the partitioned system specification providing an automated path to implementation.
Created by Subhash Iyer for Soft Polynomials (I) Pvt. Ltd. 14
Object Oriented Programming (OOP) is a powerful abstraction mechanism, Data and functionality is partitioned and encapsulated inside classe
OOP based languages: UML,C++, or Java
Widely adopted in engineering of arbitrary SW
Gaining importance for the specification of embedded control-plane processing
OOP components interact primarily by sequentially transferring control through method calls
Sequential nature of OOP hinders the intuitive specification, analysis and refinement of the inherent parallel data-plane processing tasks
Created by Subhash Iyer for Soft Polynomials (I) Pvt. Ltd. 15
For this purpose the actor-oriented abstraction scheme has been conceived, where parallel objects interact by sending and receiving messages
Within an actor-oriented design environment, the designer can focus on the specification and analysis of the algorithmic behavior of the individual tasks whereas the communication and synchronization aspects are handled by the underlying parallel Model of Computation
SystemC allows Actor Oriented Programming
Created by Subhash Iyer for Soft Polynomials (I) Pvt. Ltd. 16
Actor-based design languages achieve high modularity in communication modeling by using the Interface Method Call (IMC) principle
IMC mechanism is realized by A set of language elements for
Modules
Ports
Interfaces
Channels.
Processes modeling the behavior are wrapped into modules and access communication services through ports
Available methods are
Declared in the interface specification
Implemented by the channel
Thus the access methods in an interface reflect the specialized properties of the communication style implemented by an particular channel
Actor-oriented design languages offers a generic Model of Computation, which in case of SystemC is based on an event driven simulation kernel
Channels serve as containers for communication and synchronization
The user can extend the generic MoC by creating his own methodology specific channel library
Created by Subhash Iyer for Soft Polynomials (I) Pvt. Ltd. 17
Challenge of System Level Design
The architecture definition and application
mapping have to be considered jointly by taking
the full functional and architectural complexity
into account
In case of a fixed target platform, SLD is
reduced to the application mapping task,
which as a synonym term is also called the
partitioning of the application
Created by Subhash Iyer for Soft Polynomials (I) Pvt. Ltd. 18
Orthogonalization of concerns with respect to all modeling attributes generally enables a divide-and-conquer approach to System Level Design
Separation of interfaces and behavior according to the interface based design paradigm fosters successive communication and structural refinement as well as IP reuse
High modeling efficiency and simulation speed is mandatory to handle the high complexity of SoC designs
Incorporation of hardware specific concepts like timing, reactivity, parallelism, and determinism to express the impact of the platform architecture
Incorporation of software specific concepts like Object Oriented Programming, Operating System (OS) encapsulation, Inter Process Communication (IPC), process concurrency, as well as the creation, mutual preemption, and termination of processes to enable smooth integration of the embedded Software part.
Support for Verification and Validation verification, to first gain evidence on the highest possible level of abstraction, that the correct system is being developed and all performance and cost requirements are met (validation). Later, the validated specification should be reused as a golden reference model for the subsequent refinement, IP integration and implementation steps (verification).
Seamless transition between design phases and abstraction levels from system to gates to avoid long iteration cycles caused by gaps in the design flow.
Created by Subhash Iyer for Soft Polynomials (I) Pvt. Ltd. 19
Question Remains - - - How to do it???
Created by Subhash Iyer for Soft Polynomials (I) Pvt. Ltd. 20
HW/SW Co-simulation has been recognized as a necessary ingredient for HW/SW Co-design.
First HW/SW Co-simulation prototypes linked Hardware Description Language (HDL) simulators to an ISS (Instruction Set Simulators) executing the Software part.
Soon, HDL/ISS Co-simulation environments like became commercially available and are still idelyemployed.
This HDL/ISS approach is severely limited by the slow simulation speed of the HDL simulator, especially in case of large systems with several ISSes and significant hardware portions.
The concept of flexible hardware abstraction levels has been developed,
Here accuracy can be traded against simulation speed.
Created by Subhash Iyer for Soft Polynomials (I) Pvt. Ltd. 22
Maximum simulation speed can be achieved
by using compiled ISS technology together
with highly abstract functional SystemCmodels of the hardware part
Created by Subhash Iyer for Soft Polynomials (I) Pvt. Ltd. 23
The original goal of HW/SW Co-design was to reach the same degree of tool automation known from RTL synthesis, i.e. a formalized system specification is automatically partitioned and synthesized to the optimal target architecture
automated HW/SW partitioning and System Synthesis have never gained industrial relevance Partitioning decision metric is restricted to worst case execution time,
Other important metrics like average performance, cost, and power dissipation are not taken into account.
Even the worst case execution time proved to be hard to estimate in the general case of parallel, data dependent, and interleaved software execution
HW/SW partitioning and automated synthesis is still not recognized as a dominant issue
system architects are interested in the impact on performance of a specific target architecture
To partly automate this mapping, Communication Synthesis
HW/SW Interface Synthesis
emerged as new branches of HW/SW Co-design
Created by Subhash Iyer for Soft Polynomials (I) Pvt. Ltd. 24
Techniques for the analysis of communication requirements and synthesis of the communication architecture
As of today, Communication Analysis and Synthesis techniques need further advancement to cope with emerging Network-on-Chip architectures.
One attempt is to instantiate the NoC library elements (routers, network interfaces, links) from a high-level view of the SoC floorplan
Selection of the actual library elements can be in different ways:
In a application-centric approach, the network topology can be generated from a communication graph of the application
In an architecture-centric approach, the communication architecture can be refined from an abstract channel view via a network topology view towards a micro-architecture view .
So far the analysis of Network on Chip architectures is performed using handcrafted simulation models, which are mostly based on SystemC
The absence of standardized APIs, abstraction levels and modeling frameworks beyond the plain SystemC language so far hinders the creation of interoperable IP models for NoC architectures.
Some of the current projects working on a unified modeling environment for the exploration of NoC architectures are discussed in section 5.3.3 below.
Created by Subhash Iyer for Soft Polynomials (I) Pvt. Ltd. 25
Here, the designer decides on the
partitioning and architecture mapping
The realization of these decisions are
supported by automating the tedious task of
generating the required Software driver
functions as well as the Hardware glue-logic
Recently the technology has been ported to
the SystemC
Created by Subhash Iyer for Soft Polynomials (I) Pvt. Ltd. 26
MP-SoC platform phase is concerned with:
System architecture specification
Application mapping
Abstraction concepts on this level have to support the joint consideration of application and architecture
High level of detail inherent to Register Transfer Level (RTL) implementation models prohibits the investigation and optimization across heterogeneous communication and processing elements
Significant research has been spent on the definition of the appropriate System Level Design language.
Today SystemC is generally considered as the standard language for all kinds of SLD tasks.
Created by Subhash Iyer for Soft Polynomials (I) Pvt. Ltd. 28
SystemC has initially been conceived to replace VHDL and Verilog as a Hardware Description Language
For this reason it naturally provides all hardware specific concepts e.g., time, parallelism, and hierarchy
With version 2.0 SystemC has been thoroughly revised to become a fully elaborated actor oriented design language
The incorporated Interface Method Call (IMC) principle enables a clean separation of interfaces and behavior as well as orthogonalization of further modeling attributes
All kinds of methodology and application domain specific Models of Computation (MoC) can be implemented on top of the generic event-driven SystemC simulator
SystemC 2.0 enables a smooth transition from functional phase to the MP-SoC platform phase, e.g. hybrid simulation of an architecture model in the context of an algorithmic Data-Flow model
Created by Subhash Iyer for Soft Polynomials (I) Pvt. Ltd. 29
Since SystemC is a native C++ library, it
inherently supports Object Oriented
Programming
Final version 2.1 of the language has become
an official IEEE standard
Development of the Transaction Level
Modeling (TLM) kit
Synthesizable subset of SystemC
Created by Subhash Iyer for Soft Polynomials (I) Pvt. Ltd. 30
The characteristic property of TLM:
Pin-level communication interface of RTL models
replaced by a set of interface methods.
This IMC based communication mechanism is
provided by all actor-oriented specification
languages
Created by Subhash Iyer for Soft Polynomials (I) Pvt. Ltd. 32
SystemC based TLM has demonstrated the potential in terms of increased simulation speed and modeling efficiency
The basic TLM API consist of a bidirectional transport and a set of unidirectional put and get interfaces
The bidirectional transport has blocking synchronization Implementation of the interface is allowed to call wait(.)
The unidirectional interfaces are available in a blocking and a non-blocking version
These interfaces can be seen a foundation layer for the creation of more advanced TLM interfaces, which serve a specific methodology or model a specific communication protocol
Created by Subhash Iyer for Soft Polynomials (I) Pvt. Ltd. 33
The two cycle-level TLM layers Bus Accurate (BA)
Cycle Callable (CC)
These levels are particularly suitable to create a cycle-accurate prototype of the system architecture
The (usually cycle-accurate) Instruction Set Simulators (ISS) of the programmable architectures are connected to cycle- and bit-accurate models of memories, communication resources and peripherals
Created by Subhash Iyer for Soft Polynomials (I) Pvt. Ltd. 34
BA and CC difference: BA captures a transaction
within a single method call,
CC models provide separate methods for every phase of a transaction.
The Programmer’s View (PV) abstraction levels address early integration of (usually instruction accurate) ISSes for SW development purposes PV provides a bit and
address-map accurate view of the MP-SoC architecture context for the programmable processing elements
PV is based on the bidirectional blocking transport API
Created by Subhash Iyer for Soft Polynomials (I) Pvt. Ltd. 35
The Open Core Protocol International Partnership (OCP-IP) is getting a lot of traction throughout the industry
OCPIP provides a high configurable SoC protocol and their System Level Design working group has worked from the early days on Transaction Level Modeling
Created by Subhash Iyer for Soft Polynomials (I) Pvt. Ltd. 36
Lowest level: Transaction Layer 1 (TL1)
provides a fully cycle accurate model of the OCP protocol
Fully aligned with the CC abstraction level from OSCI.
Next higher level: Transaction Layer 2 (TL2)
Represents basically a cycle-approximate abstraction of the OCP protocol.
The API contains a large number of OCP specific features
like e.g. thread-busy, handshaketiming, or sideband signals.
The timing is not cycle accurate, but can be annotated to a near-cycle accurate level
Highest Level: Transaction Layer 3 (TL3)
protocol agnostic subset of TL2
API is limited to a concise set of primitives,
Model timing approximate on-chip communication
Created by Subhash Iyer for Soft Polynomials (I) Pvt. Ltd. 37
PV TLM platforms for early SW development as well as cycle-level TLM for HW/SW and TLM/RTL co-verification are successfully deployed throughout the industry
However, both use-cases solve only parts of the challenges during the MP-SoC design phase
Especially the architecture definition and task partitioning is not adequately addressed
PV platforms simulate very fast and are well suited for SW development
Unfortunately they do not contain sufficient timing information for architectural investigations
The blocking semantics of the underlying bidirectional transport API hinders the smooth annotation of further timing information
Created by Subhash Iyer for Soft Polynomials (I) Pvt. Ltd. 38
Cycle accurate models of the SoC platform are too detailed and too slow for architecture definition and task partitioning
First, the effort to create such a cycle-accurate model of the complete platform is way too high to allow for the investigation of a large number of architecture and application mapping alternatives
Second the reachable simulation speed in the order of 100k cycles per second is not sufficient for the analysis of large design parameter choices
As a result, the exploration of broad design spaces is still a cumbersome process in cycle-level TLM based design flows
Cycle-level TLM communication models have architecture specific interfaces.
Thus, every time the designer is inclined to explore a new communication architecture he has to change the interface of the connected functional models
Created by Subhash Iyer for Soft Polynomials (I) Pvt. Ltd. 39
For this reason the Design Space Exploration framework deploys a generic synchronization interface, which provides the same primitives as the newly standardized OCP TL3 API
Obviously, the TL3 API presents the best fit for this purpose
It is compliant with the OSCI TLM standard
Additionally, it is of reasonable complexity, and yet offers sufficient expressiveness to meet the accuracy requirements for design space exploration
By deploying SystemC based Transaction Level Modeling the framework is nicely integrated into the flourishing ESL ecosystem.
This method is interoperable with the PV and cycle-accurate modeling methodologies and can benefit from the commercial tool support, available IP models, and established ESL design methodologies
Created by Subhash Iyer for Soft Polynomials (I) Pvt. Ltd. 40
Component Based Design Ffounded on the assumption, that the processing elements and
communication templates are available IP blocks
Communication Based Design: envisions MP-SoC platform design as a composition of reusable
IP blocks
Different from Component Based Design
Omits the consideration of processing elements
Is exclusively focused on the conceptualization and implementation of the communication architecture.
Communication Based Design can be seen as the corresponding design paradigm to match emerging NoC architectures.
Design Space Exploration (DSE) Environment The goal is to take early design decisions with respect to
system architecture and application mapping on the basis of an abstract performance model.
The embedded application needs to be modeled together with the MP-SoC architecture at a high level of abstraction
Created by Subhash Iyer for Soft Polynomials (I) Pvt. Ltd. 41
Introduction to SoC
Design Space Exploration (DSE)
Methodology
42Created by Subhash Iyer for Soft Polynomials (I) Pvt. Ltd.
Ultimate goal is to meet the System Level Design requirements as specified and to cope with the full architectural complexity of emerging MP-SoCarchitectures
Created by Subhash Iyer for Soft Polynomials (I) Pvt. Ltd. 43
MP-SoC Framework follows the y-chart principle Set of functional application models is
merged with a set of architecture models in a dedicated mapping step
Developed embodiment of the y-chart principle is called Virtual Architecture Mapping (VAM) which comprises of: Well defined abstraction level above
cycle-level TLM for efficient modeling of embedded applications
Set of generic, parameterizablearchitecture models, which capture the notion of shared and resource limited architectural fabrics for communication and computation
Rigorous definition of a timing model, that embodies the performance of a selected application-architecture-mapping
MP-SoC simulation framework featuring a declarative mapping mechanism to minimize turn-around times during the iterative architecture exploration cycle
Comprehensive set of analysis tools for functional and performance validation
Created by Subhash Iyer for Soft Polynomials (I) Pvt. Ltd. 44