24
Embedded Computer Architecture Laboratory: A Hands-on Experience Programming Embedded Systems with Resource and Energy Constraints Ashkan Beyranvand Nejad, Andrew Nelson, Anca Molnos, Davit Mirzoyan, Sorin Cotofana (Delft University of Technology, NL) G ( dh f h l ) Kees Goossens (Eindhoven University of Technology, NL) 1 Challenge the future Computer Engineering Group

Embedded Computer Architecture Laboratory: A Hands … · Embedded Computer Architecture Laboratory: A Hands-on Exppggerience Programming Embedded Systems with ... • locking the

  • Upload
    haque

  • View
    226

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Embedded Computer Architecture Laboratory: A Hands … · Embedded Computer Architecture Laboratory: A Hands-on Exppggerience Programming Embedded Systems with ... • locking the

Embedded Computer Architecture Laboratory:A Hands-on Experience Programming p g gEmbedded Systems with Resource and Energy Constraints

Ashkan Beyranvand Nejad, Andrew Nelson, Anca Molnos,Davit Mirzoyan, Sorin Cotofana (Delft University of Technology, NL)

G ( dh f h l )Kees Goossens (Eindhoven University of Technology, NL)

1Challenge the futureComputer Engineering Group

Page 2: Embedded Computer Architecture Laboratory: A Hands … · Embedded Computer Architecture Laboratory: A Hands-on Exppggerience Programming Embedded Systems with ... • locking the

Outline

• Introduction

• ECA-Lab Context & Setup

• Target Embedded MPSoCTarget Embedded MPSoC

• Target Application

• Assignment overview

• Conclusions

• Future Labs

2Challenge the futureComputer Engineering Group

Page 3: Embedded Computer Architecture Laboratory: A Hands … · Embedded Computer Architecture Laboratory: A Hands-on Exppggerience Programming Embedded Systems with ... • locking the

Introduction

• Embedded Systems (ES)

• widely used in various domainswidely used in various domains• towards Multi-Processor Systems-on-Chip (MPSoC) architecture• typically resource constrained, e.g.,

• Limited memoryLimited memory• deep SW stack not available

• Energy bound

• resources are shared to reduce cost• to exploit this the applications should be parallelized

3Challenge the futureComputer Engineering Group

Page 4: Embedded Computer Architecture Laboratory: A Hands … · Embedded Computer Architecture Laboratory: A Hands-on Exppggerience Programming Embedded Systems with ... • locking the

Introduction

ES Ed ti• ES Education• aims to prepare students with required skills for undertaking embedded

system projects in nowadays international companies

• One of the challenging projects is to parallelize applications and port them on embedded MPSoC platforms

• Parallelization increases complexity of understanding the available trade-offs for example for performance and energy

• In academia, we should enable students to gain hands-on experience of these challenges and allow them to experimentally explore the trade-offs in design space for programming embedded MPSoCs

4Challenge the futureComputer Engineering Group

in design space for programming embedded MPSoCs

Page 5: Embedded Computer Architecture Laboratory: A Hands … · Embedded Computer Architecture Laboratory: A Hands-on Exppggerience Programming Embedded Systems with ... • locking the

ECA-Lab G l & C t tGoal & Context

• We set up a lab assignment that target embedded MPSoCs• The lab is given to first year master students concurrently with theoretical

lectures on embedded computer architecturelectures on embedded computer architecture• So we call it ECA-Lab

• Five exercises are prepared to walk the students through some challenges of mapping, parallelizing and optimizing an application on a MPSoC platformpp g, p g p g pp p

• initially, provided the sequential C code of a fractal application

• Student are assigned to a group of four people with as much intra group g g p p p g ptechnical skills and nationality diversity as possible

5Challenge the futureComputer Engineering Group

Page 6: Embedded Computer Architecture Laboratory: A Hands … · Embedded Computer Architecture Laboratory: A Hands-on Exppggerience Programming Embedded Systems with ... • locking the

ECA-Lab S tSetup

Th li ti l l tf i l t d FPGA b d• The application runs on a real platform implemented on an FPGA board• The board is accessible remotely on-line by all the groups

• there is a round-robin arbitration scheme

locking the boa d b one g o p is p e ented• locking the board by one group is prevented

• access time is bounded ~ 30 Sec.

• Keeping board access time low• the HW bit stream resides on the server• the HW bit-stream resides on the server

• only, the elf files per core are transferred

• Output information is parsed locally

• Low cost of setup• Low cost of setup• Xilinx ML605 FPGA-board ~ $1800

• Simple Linux server ~ $0

• Internet connection ~ $0

6Challenge the futureComputer Engineering Group

• Internet connection $0

Page 7: Embedded Computer Architecture Laboratory: A Hands … · Embedded Computer Architecture Laboratory: A Hands-on Exppggerience Programming Embedded Systems with ... • locking the

Multiprocessor Platform C S CCompSoC

R h MPS C l tf d l d i TUD lft d TU/• Research MPSoC platform developed in TUDelft and TU/e• Tiled-based, Network-on-Chip (NoC) centric• Distributed memory architecture

CompSoC aims to ed ce s stem comple it th o gh t o techniq es• CompSoC aims to reduce system complexity through two techniques• composability: each application can be developed, verified, and executed in

isolation

• predictability: real time applications can be implemented in CompSoC by well• predictability: real-time applications can be implemented in CompSoC by well-

defining timing properties of the (shared) resources.

• Platform consists of• the AEthereal / aelite / daelite NoC• the AEthereal / aelite / daelite NoC

• multiple MicroBlaze tiles with local memories and DMAs

• the MicroBlazes optionally run the CompOSe real-time operating system (RTOS)

• the Predator real-time DRAM memory scheduler and controller

7Challenge the futureComputer Engineering Group

• the Predator real time DRAM memory scheduler and controller

Page 8: Embedded Computer Architecture Laboratory: A Hands … · Embedded Computer Architecture Laboratory: A Hands-on Exppggerience Programming Embedded Systems with ... • locking the

Multiprocessor Platform ECA C S CECA-CompSoC

• An instance of CompSoC platform with heterogeneous processor tiles

DmemImemsys timer

Tile 0Tile 0

DmemImemsys timer

Tile 1Tile 1

DmemImemsys timer

Tile 2Tile 2

SHsr

DMAmem

DMAsr

RDMA1RDMA0

lcl

tim

erC

M

MicroblazeSHsr

DMAmem

DMAsr

RDMA1RDMA0

lcl

tim

erC

MMicroblaze

Dmem

SHsr

DMAmem

RDMA0

lcl

tim

erC

M

Microblaze

NoC

RDMA1RDMA0

VFC RDMA1RDMA0

VFC RDMA0

VFC

Frame MemoryGlobal shared memory

Monitor Tile

8Challenge the futureComputer Engineering Group

Tile

Page 9: Embedded Computer Architecture Laboratory: A Hands … · Embedded Computer Architecture Laboratory: A Hands-on Exppggerience Programming Embedded Systems with ... • locking the

ECA-CompSoCP Til DImemsys timerProcessor Tile

Microblaze

Dmem

SHsr

DMAmem

DMAsr

Imemsys timer

lcl

tim

erME h il i RDMA1RDMA0

VFC

M• Each processor tile comprises • a MicroBlaze processing core

• a Voltage-Frequency Control Module (VFCM)

l l i t ti (I )

Memory block

Size (Byte)

• a local instruction memory (Imem),

• a local data memory (Dmem)

• a set of Remote Direct Memory Access (RDMA)

mod les ith an associated set of local memo Imem 8 K

Dmem 8 K

DMAmem 4 K

modules with an associated set of local memory

blocks for inter-tile communication

• independent clock domain per tile• allowing independent voltage and frequency scaling

DMAsr 32

SHsr 32

• allowing independent voltage and frequency scaling

by VFCM

• Dmem and Imem are too small• makes fitting program code into them challenging

9Challenge the futureComputer Engineering Group

• makes fitting program code into them challenging

Page 10: Embedded Computer Architecture Laboratory: A Hands … · Embedded Computer Architecture Laboratory: A Hands-on Exppggerience Programming Embedded Systems with ... • locking the

ECA-CompSoCM TilMemory Tile

• Two memory tiles accessible via the NoC• Frame Memoryy

• frame buffer, where visual data can be written for

display

• an API call from the application signals that the

Memory Tile Size (Byte)

frame is ready, initiating its retrieval to the student's

computer.

• Global shared memory

Global shared mem.

32 K

Frame mem. 256 K

• relatively large

• writing and reading data, to and from this memory,

is slow

• A variety of memories of different capacities• not all accessible from every tile

• not the same accessing bandwidth

10Challenge the futureComputer Engineering Group

Page 11: Embedded Computer Architecture Laboratory: A Hands … · Embedded Computer Architecture Laboratory: A Hands-on Exppggerience Programming Embedded Systems with ... • locking the

ECA-CompSoCN C I t tNoC Interconnect

Dmem

SHsr

Imemsys timer

cl mer

Microblaze

Dmem

SHsr

Imemsys timer

cl mer

Dmem

SHsr

Imemsys timer

cl mer

Microblaze

Tile 0Tile 0 Tile 1Tile 1 Tile 2Tile 2

MicroblazeDMAmem

DMAsr

RDMA1RDMA0

lcti

mV

FC M

MicroblazeDMAmem

DMAsr

RDMA1RDMA0

lcti

mV

FC M

DMAmem

RDMA0

lcti

mV

FC M

MicroblazeMicroblaze

NoC

Frame MemoryGlobal shared memory

Tile Connection Bandwidth

Til 0RDMA0 -> Glbl. Shrd. Mem. Slow

memory

Tile 0RDMA1 -> SHsr Medium

Tile 1RDMA0 -> Glbl. Shrd. Mem. Slow

RDMA1 -> SHsr Medium

11Challenge the futureComputer Engineering GroupTile 2

RDMA0 -> Glbl. Shrd. Mem. Fast

RDMA0 -> frame mem. Fast

RDMA0 -> SHsr Medium

Page 12: Embedded Computer Architecture Laboratory: A Hands … · Embedded Computer Architecture Laboratory: A Hands-on Exppggerience Programming Embedded Systems with ... • locking the

ECA-CompSoCA li ti P i I t f (API)Application Programming Interface (API)

Remote memory access API

void hw_declare_dmas (int num_dmas);DMA* hw_dma_add (int id, void * base_addr);

id h d i ( id * d t id * i t bl k i DMA * d )

Remote memory access API

void hw_dma_receive (void * dst, void * src, int block_size, DMA * dma);void hw_dma_send (void * dst, void * src, int block_size, DMA * dma);int hw_dma_busy (DMA * dma);

Timer API VFCM APIunsigned int get_system_time();unsigned int get_local_time();

void hw_vfcm_clk_gate (unsigned int t);void hw_vfcm_set_freq (unsigned intfreq_level);

Timer API VFCM API

void print_debug (int value); void print_framebuffer();

Debug API Frame Output API

12Challenge the futureComputer Engineering Group

Page 13: Embedded Computer Architecture Laboratory: A Hands … · Embedded Computer Architecture Laboratory: A Hands-on Exppggerience Programming Embedded Systems with ... • locking the

ECA-CompSoCO t t I f tiOutput Information

F b ff i d d d i i bi (b ) f• Frame buffer is dumped and written in bitmap (bmp) format• The timing, energy, and debug information per tile is provided

13Challenge the futureComputer Engineering Group

Page 14: Embedded Computer Architecture Laboratory: A Hands … · Embedded Computer Architecture Laboratory: A Hands-on Exppggerience Programming Embedded Systems with ... • locking the

Application double x_min = ‐1.5;double x max = 1.5;_ ;double x_step = (x_max‐x_min)/x_size;double y_min = ‐1.5;double y_max = 1.5;double y_step = (y_max‐y_min)/y_size;double x,y;double new x new y;double new_x,new_y;intm,n,num;unsigned char R,G,B;double real = 0.123;double imaginary = 0.745;for(n=0; n<y_size; n++){

• Features of target application• simple to understand algorithm

• easily parallelizablefor(m=0;m<x_size;m++){

x = x_min + x_step *m;y = y_min + y_step * n;for(num=0; ((pow(x,2) + pow(y,2)) <= 4)

&& (num < 0xFF); num++){new x = pow(x 2) ‐ pow(y 2) + real;

• easily parallelizable

• visual output

• Fractal application provided with new_x = pow(x,2) ‐ pow(y,2) + real;new_y = 2*x*y + imaginary;x = new_x;y = new_y;

}FrameMemory.R = num;

Fractal application provided with• sequential C code

• shape of the output is

controlled by a real and FrameMemory.G = num;FrameMemory.B = num;

}}

y

imaginary parameters

• given Five set of parameters

14Challenge the futureComputer Engineering Group

Page 15: Embedded Computer Architecture Laboratory: A Hands … · Embedded Computer Architecture Laboratory: A Hands-on Exppggerience Programming Embedded Systems with ... • locking the

Assignment OverviewAssignment Overview

15Challenge the futureComputer Engineering Group

Page 16: Embedded Computer Architecture Laboratory: A Hands … · Embedded Computer Architecture Laboratory: A Hands-on Exppggerience Programming Embedded Systems with ... • locking the

Assignment OverviewE i 1Exercise 1

E h h ll li h f l• Each group has to parallelize the fractal application on a desktop computer utilizing the pthreads libraryP id M k fil• Provide a Makefile

• Make run-fractal-pthreads THREAD=X

• Goal• Goal• get the students acquainted to the

fractal application and parallel

programming on a desktop environmentprogramming on a desktop environment

• analyze and explain parallelization

performance

16Challenge the futureComputer Engineering Group

Page 17: Embedded Computer Architecture Laboratory: A Hands … · Embedded Computer Architecture Laboratory: A Hands-on Exppggerience Programming Embedded Systems with ... • locking the

Assignment OverviewE i 2Exercise 2

E h h d h• Each group has to map and execute the sequential fractal app on one core of the platform

• Goal• to familiarize the student to the ECA-

CompSoC platfo m and challenges ofCompSoC platform and challenges of

fitting their code in small Imem

• to get them to know about performance

and energy consumption estimationand energy consumption estimation

17Challenge the futureComputer Engineering Group

Page 18: Embedded Computer Architecture Laboratory: A Hands … · Embedded Computer Architecture Laboratory: A Hands-on Exppggerience Programming Embedded Systems with ... • locking the

Assignment OverviewE i 3Exercise 3

E h h ll li h• Each group has to parallelize the application on at least two processor-tiles

• They have to evaluate the performance and energy consumed by the solution

• Goal• to familiarize the student with the

challenges of parallelizing an applicationchallenges of parallelizing an application

on an embedded MPSoC such as

synchronization and memory

consistency problems

18Challenge the futureComputer Engineering Group

consistency problems

Page 19: Embedded Computer Architecture Laboratory: A Hands … · Embedded Computer Architecture Laboratory: A Hands-on Exppggerience Programming Embedded Systems with ... • locking the

Assignment OverviewE i 4Exercise 4

• Each group has to optimize the performance of the parallel application executing on the platform

• The focus should be on multi-core strategies, e.g., computation versus communication

• The quality of solution is graded according to provided performance list:1. Execution-time > 35000000 cycles

2. Execution-time in [30000000 35000000] cycles

3. Execution-time < 30000000

• Goal• to gain experience with performance optimization on an embedded platform and the

19Challenge the futureComputer Engineering Group

trade-offs involved

Page 20: Embedded Computer Architecture Laboratory: A Hands … · Embedded Computer Architecture Laboratory: A Hands-on Exppggerience Programming Embedded Systems with ... • locking the

Assignment OverviewE i 5Exercise 5

• Each group has to minimize the energy consumed by their mapped application such that the execution finishes before a deadline

• Each group receives a different value for the deadline

• Energy minimization could be achieved by:1. Click gating the cores for a period of time

2. Scaling down the (voltage and) frequency of the core for a period of time

• The groups may chose to combine both of these techniques

• Goal• to introduce the students with performance-constrained energy optimization on an

embedded platform

20Challenge the futureComputer Engineering Group

Page 21: Embedded Computer Architecture Laboratory: A Hands … · Embedded Computer Architecture Laboratory: A Hands-on Exppggerience Programming Embedded Systems with ... • locking the

Assignment OverviewT d ff lTrade-off results

21Challenge the futureComputer Engineering Group

Page 22: Embedded Computer Architecture Laboratory: A Hands … · Embedded Computer Architecture Laboratory: A Hands-on Exppggerience Programming Embedded Systems with ... • locking the

Conclusions

• Students that follow our laboratory have gained hands-on experience programming a multi-core embedded systemp g g y

• They will have overcome the difficulties of programming for a resource constrained platform with limited debug visibility

• They will have investigated their solution‘s design space for both performance and energy consumption, learning the trade-offs that exist on such a platform

• The students works in multi-cultural groups, with diverse backgrounds and experience, similar to what is found in international companies and academia

• The lab. given successfully in the 2011-2012 academic year at Delft University of Technology

• The students’ feedback was in general positive. However, in practice the students were spending more time than scheduled working on the laboratory

22Challenge the futureComputer Engineering Group

project

Page 23: Embedded Computer Architecture Laboratory: A Hands … · Embedded Computer Architecture Laboratory: A Hands-on Exppggerience Programming Embedded Systems with ... • locking the

Future ECA-LabsThiThis year

• Target a new application that receives also an input data• ECA-Lab 2012-2013: Instagram-like image filtering application

Filter app. runson

ECA-CompSoCECA CompSoC

23Challenge the futureComputer Engineering Group

Page 24: Embedded Computer Architecture Laboratory: A Hands … · Embedded Computer Architecture Laboratory: A Hands-on Exppggerience Programming Embedded Systems with ... • locking the

Thank You!

For further information please visitwww.compsoc.eu

24Challenge the futureComputer Engineering Group