View
185
Download
3
Category
Tags:
Preview:
DESCRIPTION
ahb self motivated doc
Citation preview
Abstract
The multilayer advanced high-performance bus (ML-AHB) busmtrix employs slave-side
arbitration. Slave-side arbitration is different from master-side arbitration in terms of request and
grant signals since, in the former, the master merely starts a burst transaction and waits for the
slave response to proceed to the next transfer. Therefore, in the former, the unit of arbitration can
be a transaction or a transfer. However, the ML-AHB busmatrix of ARM offers only transfer-
based fixed-priority and round-robin arbitration schemes.
In this project, the design a flexible arbiter for the ML-AHB busmatrix to support three
priority policies—fixed priority, round robin, and dynamic priority—and three data multiplexing
modes—transfer, transaction, and desired transfer length. In total, there are nine possible
arbitration schemes. The proposed arbiter, which is self-motivated (SM), selects one of the nine
possible arbitration schemes based upon the priority-level notifications and the desired transfer
length from the masters so that arbitration leads to the maximum performance.
In this project, a flexible arbiter based on the SM arbitration scheme for the ML-AHB
busmatrix will be designed and also arbiter should supports three priority policies-fixed priority,
round-robin, and dynamic priority-and three approaches to data multiplexing- transfer,
transaction, and desired transfer length; in other words, there are nine possible arbitration
schemes. In addition, the SM arbiter design should selects one of the nine possible arbitration
schemes based on the priority-level notifications and the desired transfer length from the masters
to allow the arbitration to lead to the maximum performance. The design is simulated using
Modelsim and synthesized using Xilinx tools.
CHAPTER 1CHAPTER 1
INTRODUCTION
1.1 Introduction
The Advanced Microcontroller Bus Architecture (AMBA) was introduced by ARM Ltd
in 1996 and is widely used as the on-chip bus in System-on-a-chip (SoC) designs. AMBA is a
registered trademark of ARM Ltd. The first AMBA buses were Advanced System Bus (ASB)
and Advanced Peripheral Bus (APB). In its 2nd version, AMBA 2, ARM added AMBA High-
performance Bus (AHB) that is a single clock-edge protocol.
Figure 1.1 ARM soc Block Diagram
In 2003, ARM introduced the 3rd generation, AMBA 3, including AXI to reach even higher
performance interconnects and the Advanced Trace Bus (ATB) as part of the CoreSight on-chip
debugs and trace solution. These protocols are today the de-facto standard for 32-bit embedded
processors because they are well documented and can be used without royalties. Some
manufacturers utilize AMBA buses for non-ARM designs. As an example Infineon uses an
AMBA bus for the ADM5120 SoC based on the MIPS architecture.
The important aspect of a SoC is not only which components or blocks it houses, but also how they are interconnected. AMBA is a solution for the blocks to interface with each other.
Since its inception, the scope of AMBA has gone far beyond microcontroller devices, and is now widely used on a range of ASIC and SoC parts including applications processors used in modern portable mobile devices like smartphones.
1.2 Advanced High performance Bus (AHB) Protocol:
The AMBA advanced microcontroller bus architecture (AMBA) Specification defines an On-
Chip Communications standard for designing high performance embedded microcontrollers.
Three distinct buses are defined within the AMBA specification
• The Advanced High-Performance Bus (AHB)
• The Advanced System Bus (ASB)
• The Advanced Peripheral Bus (APB)
A test methodology is included with the AMBA specification which provides an infrastructure for
modular test and diagnostic access.
1.2.1 Advanced High Performance Bus (AHB):
The AMBA AHB is for high-performance, high clock frequency system modules.
The AHB acts as the high-performance system backbone bus. AHB supports the efficient
connection of processors, on-chip memories and off-chip external memory interfaces with low-
power peripheral macro cell functions. AHB is also specified to ensure ease of use in an efficient
design flow using synthesis and automated test techniques.
1.2.2 Advanced System Bus (ASB):
The AMBA ASB is for high-performance system modules.
AMBA ASB is an alternative system bus suitable for use where the high-performance features of
AHB are not required. ASB also supports the efficient Connection of processors, on-chip
memories and off-chip external memory interfaces with low-power peripheral macro cell
functions.
1.2.3 Advanced Peripheral Bus (APB):
The AMBA APB is for low-power peripherals. AMBA APB is optimized for minimal power
consumption and reduced interface complexity to support peripheral functions. APB can be used
in conjunction with either version of the system bus.
1.3 Objectives of the AMBA Specification:
The AMBA specification has been derived to satisfy four key requirements:
•To facilitate the right-first-time development of embedded microcontroller products with one or
more CPUs or signal processors.
•To be technology-independent and ensure that highly reusable peripheral and system macro
cells can be migrated across a diverse range of IC processes and be appropriate for full-custom,
standard cell and gate array technologies.
•To encourage modular system design to improve processor independence, providing a
development road-map for advanced cached CPU cores and the development of peripheral
libraries.
•To minimize the silicon infrastructure required to support efficient on-chip and off-chip
communication for both operation and manufacturing test.
1.4 A Typical AMBA-based Microcontroller:
An AMBA-based microcontroller typically consists of a high-performance system
backbone bus (AMBA AHB or AMBA ASB), able to sustain the external memory bandwidth, on
which the CPU, on-chip memory and other Direct Memory Access (DMA) devices reside. This
bus provides a high-bandwidth interface between the elements that are involved in the majority
of transfers. Also located on the high-performance bus is a bridge to the lower bandwidth APB,
where most of the peripheral devices in the system are located (see Figure 1.1).
Figure1.2: A typical AMBA AHB-based System
AMBA APB provides the basic peripheral macro cell communications infrastructure as a
secondary bus from the higher bandwidth pipelined main system bus such peripherals typically.
• Have interfaces which are memory-mapped registers
• Have no high-bandwidth interfaces
• Are accessed under programmed control
The external memory interface is application-specific and may only have a narrow data path, but
may also support a test access mode which allows the internal AMBA AHB, ASB and APB
modules to be tested in isolation with system-independent test sets.
1.5 Multi-layer AHB
THE ON-CHIP bus plays a key role in the system-on-a-chip (SoC) design by enabling
the efficient integration of heterogeneous system components such as CPUs, DSPs, application
specific cores, memories, and custom logic. Recently, as the level of design complexity has
become higher, SoC designs require a system bus with high bandwidth to perform multiple
operations in parallel. To solve the bandwidth problems, there have been several types of high-
performance on-chip buses proposed, such as the multilayer AHB (ML-AHB) busmatrix from
ARM, the PLB crossbar switch from IBM, and CONMAX from Silicore . Among them, the ML-
AHB busmatrix has been widely used in many SoC designs. This is because of the simplicity of
the AMBA bus of ARM, which attracts many IP designers, and the good architecture of the
AMBA bus for applying embedded systems with low power.
The ML-AHB busmatrix is an interconnection scheme based on the AMBA AHB
protocol, which enables parallel access paths between multiple masters and slaves in a system.
This is achieved by using a more complex interconnection matrix and gives the benefit of both
increased overall bus bandwidth and a more flexible system structure. In particular, the ML-
AHB busmatrix uses slave-side arbitration. Slave-side arbitration is different from master-side
arbitration in terms of request and grant signals since, in the former, the master merely starts a
burst transaction and waits for the slave response to proceed to the next transfer. Therefore, the
unit of arbitration can be a transaction or a transfer. The transaction-based arbiter multiplexes
the data transfer based on the burst transaction, and the transfer-based arbiter switches the data
transfer based on a single transfer. However, the ML-AHB busmatrix of ARM presents only
transfer-based arbitration schemes, i.e., transfer based fixed-priority and round-robin arbitration
schemes. This limitation on the arbitration scheme may lead to degradation of the system
performance because the arbitration scheme is usually dependent on the application
requirements; recent applications are likewise becoming more complex and diverse. By
implementing an efficient arbitration scheme, the system performance can be tuned to better suit
applications. For a high-performance on-chip bus, several studies related to the arbitration
scheme have been proposed, such as table-lookup-based crossbar arbitration, two-level time-
division multiplexing (TDM) scheduling, token-ring mechanism , dynamic bus distribution
algorithm , and LOTTERYBUS. However, these approaches employ master-side arbitration.
Therefore, they can only control priority policy and also present some limitations when handling
the transfer-based arbitration scheme since master-side arbitration uses a centralized arbiter. In
contrast, it is possible to deal with the transfer-based arbitration scheme as well as the
transaction- based arbitration scheme in slave-side arbitration. In this paper, we propose a
flexible arbiter based on the self-motivated (SM) arbitration scheme for the ML-AHB busmatrix.
Our SM arbitration scheme has the following advantages: 1) It can adjust the processed data unit;
2) it changes the priority policies during runtime; and 3) it is easy to tune the arbitration scheme
according to the characteristics of the target application. Hence, our arbiter is able to not only
deal with the transfer-based fixed-priority, round-robin, and dynamic-priority arbitration
schemes but also manage the transaction-based fixed-priority, round-robin, and dynamic-priority
arbitration schemes. Furthermore, our arbiter provides the desired-transfer-length-based fixed-
priority, round-robin, and dynamic-priority arbitration schemes. In addition, the proposed SM
arbiter selects one of the nine possible arbitration schemes based on the priority-level
notifications and the desired transfer length from the masters to ensure that the arbitration leads
to the maximum performance.
Multi-layer AHB is an interconnection scheme, based on the AHB protocol, which enables
parallel access paths between multiple masters and slaves in a system. This is achieved by using
a more complex interconnection matrix.
Key advantages are:
• You can develop multi-master systems with an increased available bus bandwidth.
• You can construct complex multi-master systems that have a flexible architecture. This
removes the requirement to fix design decisions about the allocation of system resources to
particular masters at the hardware design stage.
• You can use standard AHB master and slave modules without requiring modification.
• Each AHB layer can be very simple because it only has one master, so no arbitration or master-
to-slave muxing is required. These layers can use the AHB-Lite protocol, meaning that they do
not have to support request and grant, or retry and split transactions.
• The arbitration effectively becomes point arbitration at each peripheral and is only necessary
when more than one master wants to access the same slave simultaneously.
• The only hardware you have to add to the standard AHB transport infrastructure is the
multiplexor block to connect the multiple masters to the peripherals.
• Because the multi-layer architecture is based on the existing AHB protocol, you can reuse
previously-designed masters and slaves without modification.
Figure 1.3 shows a block diagram of the basic multi-layer concept.
Figure 1.3 Basic multi-layer concept
1.6 APPLICATIONS
AMBA-AHB can be used in the different application and also it is technology
independent.
ARM Controllers are designed according to the specifications of AMBA.In the present
technology, high performance and speed are required which are convincingly met by
AMBA-AHB Compared to the other architectures AMBA-AHB is far more advanced
and efficient.
To minimize the silicon infrastructure to support on-chip and off-chip communications
Any embedded project which involve in ARM processors or microcontroller must always
make use of this AMBA-AHB as the common bus throughout the project.
CHAPTER 2CHAPTER 2
Literature survey
2.1 Introduction to Advanced High-performance Bus AHB
The AHB (Advanced High-performance Bus) is a high-performance bus in AMBA
(Advanced Microcontroller Bus Architecture) family. This AHB can be used in high clock
frequency system modules. The AHB acts as the high-performance system backbone bus. AHB
supports the efficient connection of processors, on-chip memories and off-chip external memory
interfaces with low-power peripheral macro cell functions. AHB is also specified to ensure ease
of use in an efficient design flow using automated test techniques. This AHB is a technology-
independent and ensure that highly reusable peripheral and system macro cells can be migrated
across a diverse range of IC processes and be appropriate for full-custom, standard cell and gate
array technologies.
2.2 FeaturesAMBA Advanced High-performance Bus (AHB) supports the following features.
High performance
Burst transfers
Split transactions
Single edge clock operation
SEQ, NONSEQ, BUSY, and IDLE Transfer Types
Programmable number of idle cycles
Large Data bus-widths - 32, 64, 128 and 256 bits wide
Address Decoding with Configurable Memory Map
2.3 MeritsSince AHB is a most commonly used bus protocol, it must have many advantages from
designer’s point of view and are mentioned below.
AHB offers a fairly low cost (in area), low power (based on I/O) bus with a moderate
amount of complexity and it can achieve higher frequencies when compared to others
because this protocol separates the address and data phases.
AHB can use the higher frequency along with separate data buses that can be defined to
128-bit and above to achieve the bandwidth required for high-performance bus
applications.
AHB can access other protocols through the proper bridging converter. Hence it supports
the bridge configuration for data transfer.
AHB allows slaves with significant latency to respond to read with an HRESP of
“SPLIT”. The slave will then request the bus on behalf of the master when the read data
is available. This enables better bus utilization.
AHB offers burst capability by defining incrementing bursts of specified length and it
supports both incrementing and wrapping. Although AHB requires that an address phase
be provided for each beat of data, the slave can still use the burst information to make the
proper request on the other side. This helps to mask the latency of the slave.
AHB is defined with a choice of several bus widths, from 8-bit to 1024-bit. The most
common implementation has been 32-bit, but higher bandwidth requirements may be
satisfied by using 64 or 128-bit buses.
AHB used the HRESP signals driven by the slaves to indicate when an error has
occurred.
AHB also offers a large selection of verification IP from several different suppliers. The
solutions offered support several different languages and run in a choice of environments.
Access to the target device is controlled through a MUX, thereby admitting bus-access to
one bus-master at a time.
AHB Masters, Slaves and Arbiters support Early Burst Termination. Bursts can be early
terminated either as a result of the Arbiter removing the HGRANT to a master part way
through a burst or after a slave returns a non-OKAY response to any beat of a burst.
However that a master cannot decide to terminate a defined length burst unless prompted
to do so by the Arbiter or Slave responses.
Any slave which does not use SPLIT responses can be connected directly to an AHB
master. If the slave does use SPLIT responses then a simplified version of the arbiter is
also required.
Thus the strengths of the AHB protocol is listed above which clearly resembles the
reason for the wide use of this protocol.
2.4 DemeritsEven though AHB protocol is commonly used bus in the design, it has some affordable
demerits which are listed below.
AHB cannot achieve full data bus utilization and bandwidth if some slaves have a
relatively high latency.
AHB defines transfer sizes of 1, 2, 4, 8, and 16 bytes. Because byte enables are not
defined, there are cases where multiple transfers must be made inside a single quadword.
AHB defines timing parameters for many of the relationships between signals on the bus.
However, these are not associated with requirements relative to a clock cycle. Therefore,
SoC developers must integrate AHB cores and run chip level static timing analysis to
judge how compatible AHB masters and slaves are with one another.
Power-based SoCs cover a wide range of applications, and there is a corresponding wide
range of address map requirements. Having the address decodes for all AHB slaves reside
within the interconnect means having to support the most complex split address ranges,
even for the simplest of slaves.
Thus the weakness of AHB protocol is mentioned above which can be tolerated with
respect to its useful advantages.
2.5 Block DiagramThe block diagram of the Advanced High-Performance Bus Protocol is shown in the
Figure 2.1.
Totally this block diagram comprises of four components.
Arbiter
Master
Slave
Decoder
2.5.1 Arbiter
The arbitration mechanism is used to ensure that only one master has access to the bus at
any one time. The arbiter performs this function by observing a number of different requests to
use the bus and deciding which is currently the highest priority master requesting the bus.
Figure 2.1 AMBA – AHB block diagram
2.5.2 Master
A bus master is able to initiate read and write information by providing address and
control information. Only one bus master can use the bus at the same time An AHB bus master
has the most complex bus interface in an AMBA system. Typically an AMBA system designer
would use predesigned bus masters and therefore would not need to be concerned with the detail
of the bus master interface. No provision is made within the AHB specification for a bus master
to cancel a transfer once it has commenced.
2.5.3 Slave
After a master has started a transfer, the slave then determines how the transfer should
progress. Whenever a slave is accessed it must provide a response which indicates the status of
the transfer. The HREADY signal is used to extend the transfer and this works in combination
with the response signal HRESP which provide the status of the transfer.
The slave can complete the transfer in a number of ways. It can:
Complete the transfer immediately
Signal an error to indicate that the transfer has failed
Delay the completion of the transfer, but allow the master and slave to back off the bus,
leaving it available for other transfers.
2.5.4 Decoder
The AHB decoder is used to decode the address of each transfer and provide a select
signal for the slave that is involved in the transfer. A central address decoder is used to provide a
select signal ‘HSELx’ for each slave on the bus. The select signal is a combinatorial decode of
the high-order address signals. A slave must only sample the address and control signals and
HSELx is asserted when HREADY is HIGH, indicating that the current transfer is completing.
2.6 Working of AHB
The AMBA AHB bus protocol is designed with a central multiplexor interconnection
scheme.
Using this scheme all bus masters drive out the address and control signals indicating the
transfer, they wish to perform and the arbiter determines which master has its address and control
signals routed to all of the slaves. Before which initially the master who needs to perform the
operation should give the request signal to the arbiter and the arbiter will give the grant signal to
the master for further proceedings. Similarly, a decoder is used to select the slave which has to
be active during the operation based on the address given by the master. A central decoder is also
required to control the read data and response signal multiplexor, which selects the appropriate
signals from the slave that is involved in the transfer. These make the read and write operation
smoothly.
Thus the working of AMBA AHB protocol is explained with the help of its block
diagram shown in Figure 2.1.
2.6 SpecificationThe following points should be considered when reading the AMBA specification
Technology independence
Electrical characteristics
Timing specification
2.6.1 Technology Independence
AMBA is a technology-independent on-chip protocol. The specification only details the
bus protocol at the clock cycle level.
2.6.2 Electrical Characteristics
No information regarding the electrical characteristics is supplied within the AMBA
specification as this will be entirely dependent on the manufacturing process technology that is
selected for the design.
2.6.3 Timing Specification
The AMBA protocol defines the behavior of various signals at the cycle level. The exact
timing requirements will depend on the process technology used and the frequency of operation.
Because the exact timing requirements are not defined by the AMBA protocol, the system
integrator is given maximum flexibility in allocating the signal timing budget amongst the
various modules on the bus.
2.7 AMBA Signals
All AMBA signals are named such that the first letter of the name indicates which bus the
signal is associated with. A lower case n in the signal name indicates that the signal is active
LOW, otherwise signal names are always all upper case.
Test signals have a prefix T regardless of the bus type.
2.7.1 AHB Signal Prefixes
‘H’ indicates an AHB signal.
For example, HREADY is the signal used to indicate that the data portion of an AHB
transfer can complete. It is active HIGH.
2.7.2 AMBA AHB Signal List:
All signals are prefixed with the letter H, ensuring that the AHB signals are differentiated
from other similarly named signals in a system design. The signals involved in designing the
AMBA AHB are listed in the Table 2.1 which also gives the specification of each signal.
Table 2.1 AMBA AHB signal specification
S.No. NAME WIDTH DRIVER FUNCTION
1 HCLK 1 Clock SourceThis clock times all bus transfers at the
rising edge of HCLK
2 HADDR 32 Master The system address bus of width 32-bit
3 HTRANS 2 MasterIndicates the type of the current
transfer happening
4 HWRITE 1 Master
When HIGH this signal indicates a
write transfer and when LOW a read
transfer
5 HSIZE 3 Master Indicates the size of the transfer
6 HBURST 3 MasterIndicates if the transfer forms part of a
burst.
7 HWDATA 8 Master
The write data bus is used to transfer
data from the master to the bus slaves
during write operations.
8 HSELx 1 Decoder
Each AHB slave has its own slave
select signal and this signal indicates
that the current transfer is intended for
the selected slave.
9 HRDATA 8 Slave
The read data bus is used to transfer
data from bus slaves to the bus master
during read operations.
10 HREADY 1 Slave
When HIGH the HREADY signal
indicates that a transfer has finished on
the bus. This signal may be driven
LOW to extend a transfer.
11 HRESP 2 Slave
The transfer response provides
additional information on the status of
a transfer
The table also includes the function of each signal and the source from which the each
signal is driven. The operation is performed in a synchronized clock frequency and hence the
signals should be changed with respect to the rising edge of the clock.
2.8 Overview of AMBA AHB Operation
Before an AMBA AHB transfer can commence the bus master must be granted access to
the bus. This process is started by the master asserting a request signal to the arbiter. Then the
arbiter indicates when the master will be granted use of the bus.
A granted bus master starts an AMBA AHB transfer by driving the address and control
signals. These signals provide information on the address, direction and width of the transfer, as
well as an indication if the transfer forms part of a burst. Two different forms of burst transfers
are allowed.
• Incrementing bursts, which do not wrap at address boundaries
• Wrapping bursts, which wrap at particular address boundaries
A write data bus is used to move data from the master to a slave, while a read data bus is
used to move data from a slave to the master.
Every transfer consists of:
• An address and control cycle
• One or more cycles for the data.
The address cannot be extended and therefore all slaves must sample the address during
this time. The data, however, can be extended using the HREADY signal. When LOW this
signal causes wait states to be inserted into the transfer and allows extra time for the slave to
provide or sample data.
During a transfer the slave shows the status using the response signals, HRESP OKAY.
The OKAY response is used to indicate that the transfer is progressing normally and when
HREADY goes HIGH this shows the transfer has completed successfully.
2.8.1 Address Decoding
A central address decoder is used to provide a select signal, HSELx, for each slave on the
bus. The select signal is a combinatorial decode of the high-order address signals, and simple
address decoding schemes are encouraged to avoid complex decode logic and to ensure high-
speed operation.
Figure 2.2 Decoder and Slave select signals
A slave must only sample the address and control signals and HSELx when HREADY is
HIGH, indicating that the current transfer is completing. Under certain circumstances it is
possible that HSELx will be asserted when HREADY is LOW, but the selected slave will have
changed by the time the current transfer completes.
In the case where a system design does not contain a completely filled memory map an
additional default slave should be implemented to provide a response when any of the
nonexistent address locations are accessed. Typically the default slave functionality will be
implemented as part of the central address decoder.
2.8.2 Slave Transfer Responses
After a master has started a transfer, the slave then determines how the transfer should
progress. No provision is made within the AHB specification for a bus master to cancel a transfer
once it has commenced.
Whenever a slave is accessed it must provide a response which indicates the status of the
transfer. The HREADY signal is used to extend the transfer and this works in combination with
the response signals, HRESP [1:0], which provide the status of the transfer. The slave can
complete the transfer by doing its transfer immediately.
2.8.3 AHB Decoder:
The decoder in an AMBA system is used to perform a centralized address decoding
function, which improves the portability of peripherals, by making them independent of the
system memory map.
Figure 2.3 AHB Decoder Interface Diagram
2.8.4 Arbitration
The arbitration mechanism is used to ensure that only one master has access to the bus at
any one time. The arbiter performs this function by observing a number of different requests to
use the bus and deciding which is currently the highest priority master requesting the bus. The
arbiter also receives requests from slaves that wish to complete SPLIT transfers.
Any slaves which are not capable of performing SPLIT transfers do not need to be aware
of the arbitration process, except that they need to observe the fact that a burst of transfers may
not complete if the ownership of the bus is changed.
Interface Diagram
Figure 2.4 AHB Arbiter Interface Diagram
The role of the arbiter in an AMBA system is to control which master has access to the
bus. Every bus master has a REQUEST/GRANT interface to the arbiter and the arbiter uses a
prioritization scheme to decide which bus master is currently the highest priority master
requesting the bus.
The detail of the priority scheme is not specified and is defined for each application. It is
acceptable for the arbiter to use other signals, either AMBA or non-AMBA, to influence the
priority scheme that is in use.
Signal Description
A brief description of each of the arbitration signals is given below.
HREQx
The bus request signal is used by a bus master to request access to the bus. Each bus
master has its own HBUSREQx signal to the arbiter and there can be up to 16 separate bus
masters in any system.
HGRANTx
The grant signal is generated by the arbiter and indicates that the appropriate master is
currently the highest priority master requesting the bus, taking into account locked transfers and
SPLIT transfers.
A master gains ownership of the address bus when HGRANTx is HIGH and HREADY is
HIGH at the rising edge of HCLK.
HMASTER
The arbiter indicates which master is currently granted the bus using the HMASTER[3:0]
signals and this can be used to control the central address and control multiplexer.
2.8.5 Requesting Bus Access
A bus master uses the HREQx signal to request access to the bus and may request the bus
during any cycle. The arbiter will sample the request on the rising of the clock and then use an
internal priority algorithm to decide which master will be the next to gain access to the bus.
Normally the arbiter will only grant a different bus master when a burst is completing.
However, if required, the arbiter can terminate a burst early to allow a higher priority master
access to the bus.
When a master is granted the bus and is performing a fixed length burst it is not
necessary to continue to request the bus in order to complete the burst. The arbiter observes the
progress of the burst and uses the HBURST[2:0] signals to determine how many transfers are
required by the master. If the master wishes to perform a second burst after the one that is
currently in progress then it should re-assert the request signal during the burst. If a master loses
access to the bus in the middle of a burst then it must re-assert the HREQx request line to regain
access to the bus.
For undefined length bursts the master should continue to assert the request until it has
started the last transfer. The arbiter cannot predict when to change the arbitration at the end of an
undefined length burst. It is possible that a master can be granted the bus when it is not
requesting it. This may occur when no masters are requesting the bus and the arbiter grants
access to a default master. Therefore, it is important that if a master does not require access to the
bus it drives the transfer type HTRANS to indicate an IDLE transfer.
2.8.6 Granting Bus Access
The arbiter indicates which bus master currently the highest priority is requesting the bus
by asserting the appropriate HGRANTx signal. When the current transfer completes, as indicated
by HREADY HIGH, then the master will become granted and the arbiter will change the
HMASTER [3:0] signals to indicate the bus master number.
The arbiter changes the HGRANTx signals when the penultimate (one before last)
address has been sampled. The new HGRANTx information will then be sampled at the same
point as the last address of the burst is sampled.
Figure 2.5 Bus Master Grant Signals
Because a central multiplexer is used, each master can drive out the address of the
transfer it wishes to perform immediately and it does not need to wait until it is granted the bus.
The HGRANTx signal is only used by the master to determine when it owns the bus and hence
when it should consider that the address has been sampled by the appropriate slave. A delayed
version of the HMASTER bus is used to control the write data multiplexer.
2.8.7 Default Bus Master
Every system must include a default bus master which is granted the bus if all other
masters are unable to use the bus. When granted, the default bus master must only perform IDLE
transfers. If no masters are requesting the bus then the arbiter may either grant the default master
or alternatively it may grant the master that would benefit the most from having low access
latency to the bus.
Granting the default master access to the bus also provides a useful mechanism for
ensuring that no new transfers are started on the bus and is a useful step to perform prior to
entering a low-power mode of operation.
2.8.8 AHB Data Bus Width:
One way to improve bus bandwidth without increasing the frequency of operation is to
make the data path of the on-chip bus wider. Both the increased layers of metal and the use of
large on-chip memory blocks (such as Embedded DRAM) are driving factors which encourage
the use of wider on-chip buses.
Specifying a fixed width of bus will mean that in many cases the width of the bus is not
optimal for the application. Therefore an approach has been adopted which allows flexibility of
the width of bus, but still ensures that modules are highly portable between designs.
The protocol allows for the AHB data bus to be 8, 16, 32, 64, 128, 256, 512 or 1024-bits
wide. However, it is recommended that a minimum bus width of 32 bits is used and it is expected
that a maximum of 256 bits will be adequate for almost all applications.
For both read and write transfers the receiving module must select the data from the correct byte
lane on the bus. Replication of data across all byte lanes is not required.
2.8.9 AHB Bus Master
An AHB bus master has the most complex bus interface in an AMBA system. Typically
an AMBA system designer would use pre designed bus masters and therefore would not need to
be concerned with the detail of the bus master interface.
Here the Arbiter signals and the transfer signals were mentioned in the below data flow
AHB bus master interface diagram.
Figure 2.6 AHB Bus Master Interface Diagram
2.8.10 AHB Bus Slave
An AHB bus slave responds to transfers initiated by bus masters within the system. The
slave uses a HSELx select signal from the decoder to determine when it should respond to a bus
transfer. All other signals required for the transfer, such as the address and control information,
will be generated by the bus master.
Figure 2.7 AHB Bus Slave Interface
Hence all the signals involved in the slave and decoder were mentioned in the above
AHB bus slave interface diagram.
2.9 Multi Layered Advanced High performance Bus protocol (ML-AHB)
In the simplest implementation of a multi-layer system, each master has its own AHB Layer and
is connected to the slave devices by an interconnect matrix, as shown in Figure 2.8
Figure 2.8 Multi-layer interconnect topology
Within the interconnect matrix:
1. Every layer has a Decode stage that determines which slave is required for a transfer.
2. A mux routes the transfer from the appropriate layer to the required slave. If two layers require
access to the same slave at the same time, the arbitration within the interconnect matrix must
determine which layer has highest priority. The layer that is not given access is waited using
HREADY until it is given access to the required slave. When a layer is waited an Input Stage is
used to store a copy of the pipelined address and control information until the access to the
shared slave is given. Each slave port has its own arbitration and a number of different schemes
can be used.
For example:
• Input layers can be serviced in round-robin manner, changing every transfer or every
burst
• The arbitration can use a fixed priority scheme where certain high priority layers are
always given access in preference to lower priority layers.
The number of input/output ports on the interconnect matrix is completely flexible and can be
adapted to suit the system requirements.
2.10 Bus configurations
As the number of masters and slaves in a system increases, the complexity of the interconnect
matrix can become significant. You can use the following techniques to optimize the system
architecture:
• Figure 2.9 shows how you can make slaves local, or private, to a particular layer. This reduces
the complexity of the interconnect matrix. By using this when it is acceptable that a slave can
only be accessed by masters on the same layer.
Figure 2.9 Local slaves
• Figure 2.10 shows how you can make multiple slaves appear as a single slave to the
interconnect matrix. This is useful to combine a number of low-bandwidth slaves. You can also
use it where a set of slaves are normally accessed by just one master, such as a DMA controller,
and the interconnect matrix is used only to give access to other masters under special
circumstances, such as debugging the system.
Figure 2.10 multiple slaves on one slave port
Figure 2.11 multiple masters on one layer
• In the simplest implementation of a multi-layer system, each master has its own layer. Figure
2.11 shows how you can also build a system where multiple masters share a layer. This is well
suited to combine masters that have low-bandwidth requirements or masters, such as the Test
Interface Controller (TIC), that have particular characteristics.
• Each layer can be a complete AHB subsystem, with the interconnect matrix being used to
enable communication between the two subsystems. The example in Figure 2.12 shows only a
single slave is shared. Typically this is an on-chip memory slave that is used as a buffer area
between the two subsystems.
Figure 2.12 Separate AHB subsystems
2.12 Multi-port slaves
In a multi-layer AHB system certain slaves, such as an SDRAM controller, are able to
operate more efficiently by processing transfers from different layers in parallel. Figure 2.13
shows how you can do this by designing the slave with multiple AHB slave ports.
Figure 2.13 Multi-port slaves
2.9 Summary
The literature survey is carried out with merits and demerits of AHB and the signal flow
diagram is identified.
The specification for the signals shown in the signal flow diagram is identified and its
working is explained with the help of its block diagram.
The discussion on the overview of the AMBA-AHB operation was made which includes
all the components involved in the AHB
The ML AHB bus configurations and the multi layer AHB bus matrix interconnection
scheme are studied.
.
\
CHAPTER 3 CHAPTER 3
Design of Self-Motivated Arbitration Scheme for the Multilayer AHB Bus matrix
The ML-AHB bus matrix of ARM consists of the input stage, decoder,
and output stage, including an arbiter Fig. 1 shows the overall structure of
the ML-AHB bus matrix of ARM.
Fig.3.1 Overall structure of the ML-AHB bus matrix of ARM
The input stage is responsible for holding the address and control
information when transfer to a slave is not able to commence immediately.
The decoder determines which slave that a transfer is destined for. The
output stage is used to select which of the various master input ports is
routed to the slave. Each output stage has an arbiter. The arbiter determines
which input stage has to perform a transfer to the slave and decides which
the highest priority is currently. The ML-AHB bus matrix employs slave-side
arbitration, in which the arbiters are located in front of each slave port, as
shown in Fig. 1; the master simply starts a transaction and waits for the
slave response to proceed to the next transfer. Therefore, the unit of
arbitration can be a transaction or a transfer. However, the ML-AHB
busmatrix of ARM furnishes only transfer-based arbitration schemes,
specifically transfer-based fixed-priority and round-robin arbitration schemes.
The transfer-based fixed-priority (round-robin) arbiter multiplexes the data
transfer based on a single transfer in a fixed-priority or round-robin fashion.
SM ARBITRATION SCHEME FOR THE ML-AHB BUSMATRIX
An assumption is made that the masters can change their priority level
and can issue the desired transfer length to the arbiters in order to
implement a SM arbitration scheme. This assumption should be valid
because the system developer generally recognizes the features of the
target applications. For example, some masters in embedded systems are
required to complete their job for given timing constraints, resulting in the
satisfaction of system-level timing constraints. The computation time of each
master is predictable, but it is not easy to foresee the data transfer time
since the on-chip bus is usually shared by several masters. Previous works
solved this issue by minimizing the latencies of several latency-critical
masters, but a side effect of these methods is that they can increase the
latencies of other masters; hence, they may violate the given timing
constraints. Unlike existing works, our scheme can keep the latency close to
its given constraint by adjusting the priority level and transfer length of the
masters. Fig. 3.2 shows an example. In this example, the service latencies
(latency-limit times) of M1, M2, and M3 are 4, 8, and 2 cycles (T14, T8, and
T10), respectively. The requests for three masters are all initiated at T0, and
M3 is the most latency-sensitive master. Fig.3.2(a) shows an arbitration
scheme that does not use latency constraints for arbitration.
Therefore, M2 and M3 violate the latency constraint as the masters are
selected in ascending order. Only M1 meets the constraint. Fig.3.2(b) shows
the scheduling of a typical latency- minimizing arbiter. It minimizes the
latency of the most latency-sensitive module, namely, M3, causing M2 to
violate its constraint. Although neither of these two arbitration schemes can
meet the latency constraints for all three masters, in the SM arbitration
shown in Fig.3.2(c), all masters use the bus with no violations by configuring
the priority levels (transfer lengths) of M1, M2, and M3 as the lowest,
highest, and intermediate priorities (4, 8, and 2), respectively.
Fig. 3.2. Arbitration scheme examples in an embedded system. (a)
Arbitration scheme with no consideration of the latency constraint. (b)
Arbitration scheme minimizing latency. (c) SM arbitration scheme.
A 32-b address bus of the masters is used to inform the arbiters of the
priority level and the desired transfer length of the masters. Fig. 3 shows the
decoding information for our address bus.
Fig.3.3. Decoding information of the 32-b address bus.
In Fig.3.3, S_Number indicates the target slave number, P_Level means
the priority level of a master, T_Length denotes the desired transfer length of
a master, and Offset_Add specifies the internal address of the target slave.
Each of S_Number and P_Level consists of 3 b because the maximum
number of master–slave sets is 8x8. Also, T_Length is composed of 4 b
because the maximum number of burst lengths is 16. Although we used 7 b
for P_Level and T_Length in the 32-b address bus to notify the arbiters of the
priority level and the desired transfer length of a master, we consider it
adequate to express the internal address of a slave because the range of
Offset_Add is from 0 to 222-1. Through the aforementioned assumption, the
priority level and transfer length can then be changed by the SM demand of
each master.
Figure 3.4. Internal structure of our arbiter.
Fig.3. 4 shows the internal structure of our arbiter based upon the SM
arbitration scheme.
In Fig.3.4, the NoPort signal means that none of the masters must be
selected and that the address and control signals to the shared slave must
be driven to an inactive state, while Master No. indicates the currently
selected master number generated by the controller for the SM arbitration
scheme. In general, our arbiter consists of an RR block, a P block, two
multiplexers, a counter, a controller, and two flip-flops. MUX_1 and MUX_2
are used to select the arbitration scheme and the desired transfer length of a
master, respectively. A counter calculates the transfer length, with two flip-
flops being inserted to avoid the attempts by the critical path to arbitrate. An
RR block (P block) performs the round-robin- or priority-based arbitration
scheme.
A controller compares the priority levels of the requesting masters. If
the masters have equal priorities, the controller selects the round-robin
arbitration scheme (RR block); in other cases, it chooses the priority
arbitration scheme (P block). The controller also makes the final decision on
the master for the next transfer based on the transfer length of the selected
master.
The control process follows the following three steps.
1) If HMASTLOCK is asserted, the same master remains selected.
2) If HMASTLOCK is not asserted and the currently selected master does not
exist, the following hold.
a) If no master is requesting access, the NoPort signal is asserted.
b) Otherwise, a new master for the next transfer is initially selected. If the
masters have equal priorities, the round-robin arbitration scheme is selected;
otherwise, the priority arbitration scheme is chosen. In addition, the counter
is updated based on the transfer length of the selected master.
3) If none of the previous statements applies, the following hold.
a) If the counter is expired, the following hold.
i) If the requesting masters do not exist, the No-Port signal is updated based
on the HSEL signal of the currently selected master. If the HSEL signal is “1,”
the same master remains selected, and the NoPort signal is deasserted.
Otherwise, the NoPort signal is asserted.
ii) Otherwise, a master for the next transfer is selected based on the priority
levels of the requesting masters. Also, the counter is updated.
b) If the counter is not expired, and the HSEL signal of the current master is
“1,” the same master remains selected, and the counter is decreased.
c) If the currently selected master completes a transaction before the
counter is expired, the following hold.
i) If the requesting masters do not exist, the No-Port signal is asserted.
ii) Otherwise, a master for the next transfer is chosen based on the priority
levels of the requesting masters, and the counter is updated.
The SM arbitration scheme is achieved through iteration of the
aforementioned steps. Combining the priority level and the desired transfer
length of the masters allows our arbiter to handle the transfer-based fixed-
priority, round-robin, and dynamic-priority arbitration schemes (abbreviated
as the FT, RT, and DT arbitration schemes, respectively), as well as the
transaction-based fixed-priority, round-robin, and dynamic-priority arbitration
schemes (abbreviated as the FR, RR, and DR arbitration schemes,
respectively). Moreover, our arbiter can also deal with the desired-transfer-
length-based fixed-priority, round-robin, and dynamic-priority arbitration
schemes (abbreviated as the FL, RL, and DL arbitration schemes,
respectively).
The transfer- or transaction-based arbiter switches the data transfer based
upon a single transfer (burst transaction), and the desired-transfer-length-
based arbiter multiplexes the data transfer based on the transfer length assigned by
the masters.
Fig. 3.5 shows the internal process of an RR block. Initially, we create the up- and down-mask
vectors (Up_Mask and Dn_Mask) based on the number of currently selected masters, as shown
in Fig. 3.5.
Fig.3.5. Internal process of the RR block.
We then generate the up- or down-masked vector created through bitwise AND-ing
operation between the mask vectors Fig. 3.5. Internal process of the RR block. And the requested
master vector. After generating the up- and down-masked vectors, we examine each masked
vector as to whether they are zero or not. If the up-masked vector is zero, the down-masked
vector is inserted to the input parameter of the round-robin function; if it is not zero, the up-
masked vector is the one inserted. A master for the next transfer is chosen by the round-robin
function, and the current master is updated after 1 clock cycle. The RR block is then performed
by repeating the arbitration procedure shown in Fig.3. 5.
Fig. 3.6 shows the internal procedure of the P block. First of all, we create the highest
priority vector (V) through the round-robin function of Fig. 3.6. After generating the highest
priority vector (V), the priority-level vectors and the highest priority vector (V) are inserted to
the input parameters of the priority function. The master with the highest priority is chosen by
the priority function, while the current master is updated after 1 clock cycle.
Fig. 3.6. Internal procedure of the P block.
FUTURE SCOPE
Presently there is much other architecture along with AMBA ML -AHB, but with its high
performance, speed and reliability its usage is widely increasing.
The AMBA-AHB architecture is now only confined to ARM Company, but due to
AMBA ML- Self Motivated AHB advantages, many other companies are moving towards
implementing this architecture.
Presently the AMBA ML- Self Motivated AHB is using the 1024 bit transfer per clock period
but this rate can further increased with the improving technology.
The configurations of the SM arbitration scheme with the maximum
performance need to be found automatically during run time.
In future The AMBA ML- Self Motivated AHB may applicable to AMBA AXI.
CONCLUSION
In this paper, the proposed a flexible arbiter based on the SM arbitration scheme for the
ML-AHB bus matrix. This arbiter supports three priority policies-fixed priority, round-robin, and
dynamic priority-and three approaches to data multiplexing- transfer, transaction, and desired
transfer length; in other words, there are nine possible arbitration schemes. In addition, the
proposed SM arbiter selects one of the nine possible arbitration schemes based on the priority-
level notifications and the desired transfer length from the masters to allow the arbitration to lead
to the maximum performance.
This design proposed ML- AHB SM arbitration schemes increases area than the other
arbitration schemes in ML-AHB, but ML-AHB SM arbitration scheme gives the better
performance when it selects the input stage and output stage in self motivated manner. Therefore
expect that it would be better to apply our SM arbitration scheme to an application- specific
system because it is easy to tune the arbitration scheme according to the features of the target
system.
REFERENCES
[1] M. Drinic, D. Kirovski, S. Megerian, and M. Potkonjak, “Latencyguided on-chip bus-
network design,” IEEE Trans. Comput.-Aided Design Integr. Circuits Syst., vol. 25, no. 12, pp.
2663–2673, Dec. 2006.
[2] S. Y. Hwang, K. S. Jhang, H. J. Park, Y. H. Bae, and H. J. Cho, “An ameliorated design
method of ML-AHB busmatrix,” ETRI J., vol. 28, no. 3, pp. 397–400, Jun. 2006.
[3] ARM, “AHB Example AMBA System,” 2001 [Online]. Available:
http://www.arm.com/products/solutions/AMBA_Spec.html
[4] IBM, New York, “32-bit Processor Local Bus Architecture Specification,” 2001.
[5] R. Usselmann, “WISHBONE interconnect matrix IP core,” Open- Cores, 2002. [Online].
Available: http://www.opencores.org/ ?do=project=wb_conmax
[6] N.-J. Kim and H.-J. Lee, “Design of AMBA wrappers for multipleclock operations,” in Proc.
Int. Conf. ICCCAS, Jun. 2004, vol. 2, pp. 1438–1442.
[7] D. Flynn, “AMBA: Enabling reusable on-chip designs,” IEEE Micro, vol. 17, no. 4, pp. 20–
27, Jul./Aug. 1997. 1http://www.arm.com/products/solutions/axi_spec.html, accessed Feb. 2008
[8] S. Y. Hwang, H.-J. Park, and K.-S. Jhang, “Performance analysis of slave-side arbitration
schemes for the multi-layer AHB busmatrix,” J. KISS, Comput. Syst. Theory, vol. 34, no. 5, pp.
257–266, Jun. 2007.
[9] S. S. Kallakuri and A. Doboli, “Customization of arbitration policies and buffer space
distribution using continuous-time Markov decision processes,” IEEE Trans. Very Large Scale
Integr. (VLSI) Syst., vol. 15, no. 2, pp. 240–245, Feb. 2007.
[10] D. Seo and M. Thottethodi, “Table-lookup based crossbar arbitration for minimal-routed,
2D mesh and torus networks,” in Proc. Int. Conf. IPDPS, Mar. 2007, pp. 1–10.
[11] K. Lahiri, A. Raghunathan, and S. Dey, “Performance analysis of systems with multi-
channel communication architectures,” in Proc. Int. Conf. VLSI Design, Jan. 2000, pp. 530–537.
[12] J. Turner and N. Yamanaka, “Architectural choices in large scale ATM switches,” IEICE
Trans. Commun., vol. E-81B, no. 2, pp. 120–137, Feb. 1998.
[13] C. H. Pyoun, C. H. Lin, H. S. Kim, and J. W. Chong, “The efficient bus arbitration scheme
in SoC environment,” in Proc. Int. Conf. SoC Real-Time Appl., Jul. 2003, pp. 311–315.
[14] K. Lahiri, A. Raghunathan, and G. Lakshminarayana, “The LOTTERYBUS on-chip
communication architecture,” IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 14, no. 6,
pp. 596–608, Jun. 2006.
[15] J. H. Han, M. Y. Lee, B. Younghwan, and C. Hanjin, “Application specific processor design
for H.264 decoder with a configurable embedded processor,” ETRI J., vol. 27, no. 5, pp. 491–
496, Oct. 2005. [16] M. Jun, K. Bang, H.-J. Lee, N. Chang, and E.-Y. Chung, “Slack-based bus
arbitration scheme for soft real-time constrained embedded systems,” in Proc. Int. Conf. ASP-
DAC, Jan. 2007, pp. 159–164.
[17] S. Y. Hwang, H. J. Park, and K. S. Jhang, An Efficient Implementation Method of Arbiter for
the ML-AHB Busmatrix. Berlin, Germany: Springer-Verlag, May 2007, vol. 4523, LNCS, pp.
229–240.
[18] E.-G. Jeong, J.-G. Lee, K.-S. Jhang, J.-A. Lee, and D. Har, “Asynchronous layered interface
of multimedia socs for multiple outstanding transactions,” J. VLSI Signal Process. Syst., vol. 46,
no. 2/3, pp. 133–151, Mar. 2007.
[19] S. Y. Hwang, H. J. Park, and K. S. Jhang, “An implementation and performance analysis of
slave-side arbitration schemes for the ML-AHB busmatrix,” in Proc. Int. Conf. ACM Symp.
Appl. Comput., Mar. 2007, vol. 2, pp. 1545–1551.
Recommended