1
ISLE INPUT MANAGEMENT MONITORING OPERATIONS DYNAMIC ADAPTION OF THE SYSTEM ISLE Custom FFT ISLE Localization algorithm ISLE Accelerator To host SHARED MEMORY LEON3 LEON3 AXI Controller LEON3 AHB/APB Bridge Memory Controller PHY AMBA AHB UART UART - USB SRAM S1 S2 S3 LEON3 ARM ARM Memory Controller Ethernet MAC ISLE #1 ISLE #2 Non-parallel region: Master thread only Parallel region starts: #pragma omp parallel ID:0 fork ID:0 ID:1 ID:2 ID:3 Parallel region: Several thread execute simultaneously join Parallel region ends: program waits for all threads to terminate ID:0 Program reverts to single threaded execution APB Interface Decode Section Event Monitor Time Monitor Counter AHB - Adapter APB Bus AHB Bus SNIFFER BLOCK DIAGRAM LEON3 7 - Stage Integer Pipeline 3-Port Register File IEEE-754 FPU Co-Processor HW MUL/DIV Trace Buffer Debug port Interrupt port I-Cache D-Cache SRMMU AHB I/F Local IRAM ITLB Local DRAM DTLB AMBA AHB Master (32-bit) SYSTEM BEHAVIOUR Perfomance evaluation of the platform by means of Pi calculation algorithm, proposed in four different versions: serial computation, single process multiple data (SPMD) technique with false sharing, SPMD technique without false sharing and OMP reduction function. Proposed profiling technique, used to monitor computational behaviour of the A-LOOP platform, follows the approach of runtime bus sampling. LEGEND: 1 Thread 2 Threads 3 Threads 4 Threads Event monitor: strobe generation (ld_ac_event) during access on specified address range (delimited by sig_out_inf and sig_out_sup). Time monitor: counter activated by read operation (during_read) and stopped by write operation (during_write), both on specified address (0x808). LEON3 HW PROFILING SYSTEM SYSTEM DESCRIPTION The LEON3 processor is designed for Embedded applications, combining high performance with low complexity and low power consumption. The LEON3 processor is highly configurable. A distributed hardware profiling system has been developed for runtime analysis. It is composed of distributed AHB bus monitoring elements (sniffers) that moni- tor AHB bus, initialized by means of an AXI bus. A global monitor unit, represented by Isle #1, provides sniffers initialization and collects results. Isle #1 runs a customized SMP Linux distribution provided by Xilinx. Isle #2 runs a customized SMP Linux distribution provided by Gaisler. Libraries required to execute shared memoryparallel applications, developed with OpenMP C/C++, have been cross-compiled and added to the adopted Leon3 Linux distribution. LINUX OPENMP PROPOSED ARCHITECTURE THE CONCEPT OVERVIEW Embedded systems development is driven by basic functional specifications, enriched with a set of non-functional requirements (performances, power dissipation, etc.). One of the techniques that can be exploited is to develop Isles of computational elements (Mo- dules) with different characteristics, each one able to satisfy some non-functional specifica- tions, in order to realize smart System On Modules (SoM). SoC with FPGA can be viewed as platforms useful to prototype these kind of architectures. 1) 2) 3) Proposed platform represents a SoM with 2 modules that share a memory region on external memory: -> ISLE #1: a dual-core ARM Cortex A9 with SMP Linux OS, able to interface with external world, provides data to Isle #2 and collects results from it. It is also able to monitor performances of Isle#2, without introducing software overhead, by means of a hardware profiling system. -> ISLE #2: a quad-core Leon3 with SMP Linux OS, able to execute parallel applications based on OpenMP library. In particular, in this demo, it executes a MANET localization algorithm. PROPOSED PLATFORM MOTIVATIONS A-LOOP IS A SYSTEM ON MODULE (SoM) PROTOTYPE FOR AEROSPACE APPLICATIONS, DEVELOPED STARTING FROM ZYNQ7000, THAT FOCUSES ON THE INTERACTIONS BETWEEN 2 MODULES ("ISLES OF COMPUTATIONAL ELEMENTS") AND ON THE MONITORING ACTION WITHOUT OVERHEAD INSERTION A - LOOP AMP system: 2-cores ARM Cortex A9/Linux OS and 4-cores Leon3/Linux OS, OpenMP library and Hardware Profiling system G. Valente, V. Muttillo, A. Bufalino, M. Santic, L. Pomante, M. Faccio, F. Federici Main Contacts: [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], UNIVERSITA’ degli S TUDI dell’ AQUILA - C ENTER of E XCELLENCE D EWS ( I TALY) http://dews.univaq.it

A-LOOP: AMP system: 2-cores ARM Cortex A9/Linux OS and 4-cores Leon3/Linux OS, OpenMP library and Hardware Profiling System

Embed Size (px)

Citation preview

Page 1: A-LOOP: AMP system: 2-cores ARM Cortex A9/Linux OS and 4-cores Leon3/Linux OS, OpenMP library and Hardware Profiling System

ISLE

INPUT MANAGEMENT

MONITORING OPERATIONS

DYNAMIC ADAPTION OF THE SYSTEM

ISLE Custom FFT

ISLE Localization algorithm

ISLE Accelerator

To host

SHAR

ED M

EMOR

Y

LEON3 LEON3

AXIController LEON3 AHB/APB

Bridge

MemoryController

PHY

AMBA AHB

UART

UART - USB

SRAMS1 S2 S3

LEON3

ARM

ARM

MemoryController

EthernetMAC

ISLE #1

ISLE #2

Non-parallel region:Master thread only

Parallel region starts:#pragma omp parallel

ID:0fork

ID:0 ID:1 ID:2 ID:3

Parallel region:Several thread executesimultaneously

joinParallel region ends:program waits for all threads to terminate

ID:0

Program reverts to single threaded execution

APBInterface

DecodeSection

EventMonitor

TimeMonitor

Counter

AHB - Adapter

APBBus

AHB BusSNIFFER BLOCK DIAGRAM

LEON37 - Stage

Integer Pipeline

3-Port Register FileIEEE-754 FPUCo-ProcessorHW MUL/DIV

Trace BufferDebug port

Interrupt port

I-Cache D-CacheSRMMUAHB I/F

Local IRAMITLB

Local DRAMDTLB

AMBA AHB Master (32-bit)

SYSTEM BEHAVIOURPerfomance evaluation of the platform by means of Pi calculation algorithm, proposed in four different versions: serial computation, single process multiple data (SPMD) technique with false sharing, SPMD technique without false sharing and OMP reduction function.

Proposed profiling technique, used to monitor computational behaviour of the A-LOOP platform, follows the approach of runtime bus sampling.

LEGEND:1 Thread2 Threads3 Threads4 Threads

Event monitor: strobe generation (ld_ac_event) during access on specified address range (delimited by sig_out_inf and sig_out_sup).

Time monitor: counter activated by read operation (during_read) and stopped by write operation (during_write), both on specified address (0x808).

LEON3 HW PROFILING SYSTEM

SYSTEM DESCRIPTION

The LEON3 processor is designed for Embedded applications, combining high performance with low complexity and low power consumption. The LEON3 processor is highly configurable.

A distributed hardware profiling system has been developed for runtime analysis. It is composed of distributed AHB bus monitoring elements (sniffers) that moni-tor AHB bus, initialized by means of an AXI bus. A global monitor unit, represented by Isle #1, provides sniffers initialization and collects results.

Isle #1 runs a customized SMP Linux distribution provided by Xilinx.Isle #2 runs a customized SMP Linux distribution provided by Gaisler.

Libraries required to execute shared memoryparallel applications, developed with OpenMP C/C++, have been cross-compiled and added to the adopted Leon3 Linux distribution.

LINUX OPENMPPROPOSED ARCHITECTURE

THE CONCEPT

OVERVIEW

Embedded systems development is driven by basic functional specifications, enriched with a

set of non-functional requirements (performances, power dissipation, etc.).

One of the techniques that can be exploited is to develop Isles of computational elements (Mo-

dules) with different characteristics, each one able to satisfy some non-functional specifica-

tions, in order to realize smart System On Modules (SoM).

SoC with FPGA can be viewed as platforms useful to prototype these kind of architectures.

1)

2)

3)

Proposed platform represents a SoM with 2 modules that share a memory region on external

memory:

-> ISLE #1: a dual-core ARM Cortex A9 with SMP Linux OS, able to interface with external world,

provides data to Isle #2 and collects results from it. It is also able to monitor performances

of Isle#2, without introducing software overhead, by means of a hardware profiling system.

-> ISLE #2: a quad-core Leon3 with SMP Linux OS, able to execute parallel applications based

on OpenMP library. In particular, in this demo, it executes a MANET localization algorithm.

PROPOSED PLATFORMMOTIVATIONS

A-LOOP IS A SYSTEM ON MODULE (SoM) PROTOTYPE FOR AEROSPACE APPLICATIONS, DEVELOPED STARTING FROM ZYNQ7000, THAT FOCUSES ON THE INTERACTIONS BETWEEN 2 MODULES ("ISLES OF COMPUTATIONAL ELEMENTS") AND ON THE MONITORING ACTION WITHOUT OVERHEAD INSERTION

A - LOOP AMP system: 2-cores ARM Cortex A9/Linux OS and 4-cores Leon3/Linux OS, OpenMP library and Hardware Profiling system

G. Valente, V. Muttillo, A. Bufalino, M. Santic, L. Pomante, M. Faccio, F. Federici

Main Contacts: [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected],

UNIVERSITA’ degli STUDI dell’AQUILA - CENTER of EXCELLENCE DEWS (ITALY)http://dews.univaq.it