Upload
aman-grewal
View
219
Download
0
Embed Size (px)
Citation preview
7/31/2019 14907_sharc
1/26
Clare Smtih SHARC Presentation 1
The SHARC
Super Harvard Architecture Computer
7/31/2019 14907_sharc
2/26
Clare Smtih SHARC Presentation2
The SHARC
Developed by Analog Devices
Optimized for demanding DSP and imaging
applications.
32 Bit floating point, with 40 bit extended
floating point capabilities.
Large on-chip memory.
Ideal for scalable multi-processing
applications.
7/31/2019 14907_sharc
3/26
3
Harvard Architecture
Program memory can store data.
Able to simultaneously read or write data at
one location and get instructions from
another place in memory.
2 buses1 Data memory bus.
2 Program bus.
Either two separate memories or a single
dual-port memory.
7/31/2019 14907_sharc
4/26
Clare Smtih SHARC Presentation4
Super Harvard Architecture
Many processor employ Harvard
Architecture by having two separate
memories or caches integrated into the
processor chip
The SHARC is unique in that its internal
memory is capable of holding a largeprogram as well a large amount of data.
This is what makes it SUPER!!!
7/31/2019 14907_sharc
5/26
Clare Smtih SHARC Presentation5
DSP
Digital Signal Processor.
High speed, low overhead data movement
and rapid computations required.
Usually has a small on-board ROM, RAM
and single cycle multiply.
Designed to run single line, serial in, serial
out, signal processing applications very fast.
7/31/2019 14907_sharc
6/26
Clare Smtih SHARC Presentation6
DSP Computations
The inner product of two vectors is a
common computation for determining
energy or correlation.
The following C code is an example:
for (n=0; n
7/31/2019 14907_sharc
7/26
7/31/2019 14907_sharc
8/26
8
Floating Point and
Extended Floating Point
The SHARC supports floating, extended-
floating and non-floating point.
No additional clock cycles for floating point
computations.
Data automatically truncated and zero
padded when moved between 32-bit
memory and internal registers.
Not accurate enough for scientific
algorithms. Excellent signal to noise ratio.
7/31/2019 14907_sharc
9/26
9
SHARCs Internal Memory
Makes SHARC unique.
Size Allows many complex functions to be preformed
on-chip. Eliminating the need to move data between
internal and external memory.
Memory size is significantly larger then most other
high speed computational devices.
Dual-block, Dual-port Optimizes the Harvard Architecture by allowing the
fetch of instructions while performing data memoryaccesses.
7/31/2019 14907_sharc
10/26
10
Multiply and Accumulate
Instructions on the SHARC
Like most DSPs the SHARC is able to
compute a product and add the product to a
running total in a single clock cycle.
The SHARCs super instruction is that it
can multiply and accumulate while adding,
subtracting, or averaging data in two otherregisters.
These instructions give the SHARC its 120
megaflop rating.
7/31/2019 14907_sharc
11/26
11
Zero Overhead Looping
on the SHARC
A single instruction outside the loop
performs loop set-up. Informing the
SHARC that there is a loop approaching.
The instruction also includes the iteration
count and termination condition.
This causes the pipeline to remain full
during loop execution and also allows the
termination condition to be tested in
parallel.
7/31/2019 14907_sharc
12/26
12
DAGs on the SHARC
Data Address Generators are integer
computation units that manage the indexing
of registers.
Allows the SHARC to to fetch a value and
update the index value.
If the updated value exceeds a limit, the
DAB adjusts the index so that it wraps.
This occurs in the same clock cycle as the
read or write.
7/31/2019 14907_sharc
13/26
Clare Smtih SHARC Presentation13
DAG Capabilities
Circular Buffering Rather then actually moving data in and out of a
vector, circular buffers are used. Updating the index modulo, the oldest entry can be
conveniently replaced by the newest entry.
Bit Reverse Addressing The bit pattern of a vector index is reversed.
Done automatically by the SHARC.
Required for Fast Fourier Transform (FFT), which
is often critical to DSP applications.
7/31/2019 14907_sharc
14/26
Clare Smtih SHARC Presentation14
SHARC DSP
What Makes the SHARC unique?
It also has some features not related directly
related to optimizing numeric computations.
Pipelining
Handling Branches
Why has this not emerged sooner?Technology has only recently become available
to make it economical to integrate general
single computing devices.
7/31/2019 14907_sharc
15/26
Clare Smtih SHARC Presentation15
SHARCs Pipeline
3 stages1 Instruction Fetch
2 Decode
3 Execution
Takes three clock cycles for an instruction
to propagate through the pipeline. The processor execution speed is one
instruction per clock cycle even though
each instruction requires three clock cycles.
7/31/2019 14907_sharc
16/26
16
SHARCs Handling Branches
Delayed Branching
When a branch instruction is encountered
the two instructions which have been loaded
and decoded are executed before the branch.
This keeps the pipeline full and avoids
junking those two instructions and reloading
the pipeline.
Beneficial in situations such as a few
instruction loops. When the ratio of wasted
clock cycles to instructions is significant.
7/31/2019 14907_sharc
17/26
Clare Smtih SHARC Presentation17
SHARCs Handling Branches
Non-delayed Branching
Traditional branching.
If the pipeline cannot be reordered to use
delayed branching, non-delayed branching
is space saving.
Uses only one word of storage.
Although, it takes three cycles as the
pipeline gets reloaded.
7/31/2019 14907_sharc
18/26
Clare Smtih SHARC Presentation18
Multi-processing
SHARC is uniquely equipped for multi-
processing.
Links to ports are very powerful multi-
processing capabilities.
Two main program models depending on
the application.
Adapts well to different multi-processing
architectures.
7/31/2019 14907_sharc
19/26
Clare Smtih SHARC Presentation19
Multi-processingSHARC Links
SHARC has 6 link ports that can transport
data at rates up to 40Mbytes/sec.
Links designed for point-to-point
connections.
Data can be transmitted in either direction
but not both simultaneously.
7/31/2019 14907_sharc
20/26
Clare Smtih SHARC Presentation20
Multi-processing Program Model
MIMD
Multiple instruction, multiple data.
Good for applications that require multiple
instruction threads to execute concurrently.
Processors operate individually. Each processor executes different code.
Typically used for image reconstruction and
multi-channel DSP.
7/31/2019 14907_sharc
21/26
Clare Smtih SHARC Presentation21
Multi-processing Program ModelSIMD
Single instruction, multiple data.
Works best when all processors execute
identical instruction sequences.
Do not require overhead for inter-processor
synchronization.
Typically used for synthetic aperture radar
and automatic target recognition.
7/31/2019 14907_sharc
22/26
Clare Smtih SHARC Presentation22
Multi-processing ArchitecturesCluster Design
Groups of up to 6 in a cluster
Most common for joining multiple
SAHRC's
All processors, global I/O and global
memory connected to a common
Cluster bus.
Each SHARC can drive the bus.
7/31/2019 14907_sharc
23/26
23
Multi-processing ArchitecturesMesh Design
All SHARCs joined by their link ports and
are connected to a common bus.
In SIMD mode one single master SHARC
drives the bus.
In MIMD mode mesh architecture cannot
function if data is lager then on chip
available memory.
Advantageous scalability over a wider range
of applications.
7/31/2019 14907_sharc
24/26
Clare Smtih SHARC Presentation24
Summary of what makes the
SHARCSuper
It performs excellently for DSP
applications.
Employs a Harvard Architecture with very
large on chip memory.
Respectable Megaflop rating.
Its multiprocessing capabilities.
7/31/2019 14907_sharc
25/26
Clare Smtih SHARC Presentation25
How optimal is the SHARC for
non-DSP Applications?
It is obviously geared for DSP applications.
While it may fare better then other
processors it is still behind those which are
designed specifically for non-DSP
applications.
7/31/2019 14907_sharc
26/26
Clare Smtih SHARC Presentation26
Sources
www.alacron.com/news/tp_mimd_simd.htm
www.analog.com
www.cs.seas.gwu.edu/~cs339/cs339-
lecture2.pdf
www.ixthos.aa.psiweb.com/technical/notes
_articles/articles