17
www.flextiles.eu FlexTiles D a t e / R e f e r e n c e Workshop at AHS’2014 conference: FlexTiles FP7 project Low-Power DSP Accelerator Embedded in a Heterogeneous Many-Core Architecture Marc MORGAN CSEM – Swiss Center for Electronics and Microtechnology

Conference on Adaptive Hardware and Systems (AHS'14) - The DSP for FlexTiles

Embed Size (px)

DESCRIPTION

The FP7 FlexTiles Project uses DSP accelerators. They are connected with each other - and with the general purpose procesors (GPPs) through a Network-on-Chip (NoC). These slides give the details about the DSP accelerator.

Citation preview

Page 1: Conference on Adaptive Hardware and Systems (AHS'14) - The DSP for FlexTiles

www.flextiles.eu

FlexTiles

Da

te /R

efe

renc

e

Workshop at AHS’2014 conference: FlexTiles FP7 project

Low-Power DSP AcceleratorEmbedded in a Heterogeneous

Many-Core ArchitectureMarc MORGAN

CSEM – Swiss Center for Electronics and Microtechnology

Page 2: Conference on Adaptive Hardware and Systems (AHS'14) - The DSP for FlexTiles

2 /

Da

te /R

efe

renc

e

The

info

rmat

ion

cont

aine

d in

thi

s do

cum

ent

and

any

atta

chm

ents

are

the

pro

pert

y of

Fle

xTile

s co

nsor

tium

. Y

ou a

re h

ereb

y no

tifie

d th

at a

ny r

evie

w,

diss

emin

atio

n, d

istr

ibut

ion,

co

pyin

g or

oth

erw

ise

use

of t

his

docu

men

t m

ust

be d

one

in a

ccor

danc

e w

ith t

he C

A o

f th

e pr

ojec

t (T

RT

/DJ/

6244

1278

5.20

11).

Tem

plat

e ve

rsio

n 1

.0

CSEM overview on a single slide

• private company, founded in the 1980’s, not for profit

• approx. 400 employees on 5 sites in Switzerland (HQ in Neuchatel)

and a site Brazil

• 5 research programs:

1. ultra-low power integrated systems (SoC, Vision, Wireless)

2. systems engineering (med tech, instrumentation, automation)

3. MEMS

4. surface engineering (nano, bio, printable electronics)

5. photovoltaic

• approx. 70 MCHF annual budget

• over 20 start-ups and spin-offs since 1995

Page 3: Conference on Adaptive Hardware and Systems (AHS'14) - The DSP for FlexTiles

3 /

Da

te /R

efe

renc

e

The

info

rmat

ion

cont

aine

d in

thi

s do

cum

ent

and

any

atta

chm

ents

are

the

pro

pert

y of

Fle

xTile

s co

nsor

tium

. Y

ou a

re h

ereb

y no

tifie

d th

at a

ny r

evie

w,

diss

emin

atio

n, d

istr

ibut

ion,

co

pyin

g or

oth

erw

ise

use

of t

his

docu

men

t m

ust

be d

one

in a

ccor

danc

e w

ith t

he C

A o

f th

e pr

ojec

t (T

RT

/DJ/

6244

1278

5.20

11).

Tem

plat

e ve

rsio

n 1

.0

Many-core architecture: GPPs + accelerators

• An array of general purpose processors (GPP)

• Connected via a Network-on-Chip (NoC)

• Complemented with accelerators to optimize speed and power:

DSP processors or specialized logic implemented in embedded-FPGA

• Plus memory nodes and I/O

Page 4: Conference on Adaptive Hardware and Systems (AHS'14) - The DSP for FlexTiles

4 /

Da

te /R

efe

renc

e

The

info

rmat

ion

cont

aine

d in

thi

s do

cum

ent

and

any

atta

chm

ents

are

the

pro

pert

y of

Fle

xTile

s co

nsor

tium

. Y

ou a

re h

ereb

y no

tifie

d th

at a

ny r

evie

w,

diss

emin

atio

n, d

istr

ibut

ion,

co

pyin

g or

oth

erw

ise

use

of t

his

docu

men

t m

ust

be d

one

in a

ccor

danc

e w

ith t

he C

A o

f th

e pr

ojec

t (T

RT

/DJ/

6244

1278

5.20

11).

Tem

plat

e ve

rsio

n 1

.0

Many-core architecture: GPPs + accelerators (cont’d)

Several IPs are available for the building blocks

both in the consortium and on the market architectural choices attempt to retain genericity of the platform

CSEM provides an ultra-low power DSP processor for the DSP accelerator

It plugs into a generic accelerator interface (AI)

Page 5: Conference on Adaptive Hardware and Systems (AHS'14) - The DSP for FlexTiles

5 /

Da

te /R

efe

renc

e

The

info

rmat

ion

cont

aine

d in

thi

s do

cum

ent

and

any

atta

chm

ents

are

the

pro

pert

y of

Fle

xTile

s co

nsor

tium

. Y

ou a

re h

ereb

y no

tifie

d th

at a

ny r

evie

w,

diss

emin

atio

n, d

istr

ibut

ion,

co

pyin

g or

oth

erw

ise

use

of t

his

docu

men

t m

ust

be d

one

in a

ccor

danc

e w

ith t

he C

A o

f th

e pr

ojec

t (T

RT

/DJ/

6244

1278

5.20

11).

Tem

plat

e ve

rsio

n 1

.0

Accelerator interface (AI)

Interfaces the NoC’s NI to the accelerator by providing services:

programming, control/status, data in, data out, debug

DMA access, word FIFOs, notification

Page 6: Conference on Adaptive Hardware and Systems (AHS'14) - The DSP for FlexTiles

6 /

Da

te /R

efe

renc

e

The

info

rmat

ion

cont

aine

d in

thi

s do

cum

ent

and

any

atta

chm

ents

are

the

pro

pert

y of

Fle

xTile

s co

nsor

tium

. Y

ou a

re h

ereb

y no

tifie

d th

at a

ny r

evie

w,

diss

emin

atio

n, d

istr

ibut

ion,

co

pyin

g or

oth

erw

ise

use

of t

his

docu

men

t m

ust

be d

one

in a

ccor

danc

e w

ith t

he C

A o

f th

e pr

ojec

t (T

RT

/DJ/

6244

1278

5.20

11).

Tem

plat

e ve

rsio

n 1

.0

DSP accelerator architecture

Choices for the DSP accelerator avoid DSP specific features

the DSP will not run an OS or kernel

the DSP will not use (or at least not require) interruptions

Note: CSEM’s icyflex4 ULP DSP could support both of the above

Implement a FIFO manager to handle input and output tokens from/to the accelerator interface (AI)

Implement debug and tracing facilities

Debug: JTAG 1149.1 TAP

Tracing: programmable tracing unit

Page 7: Conference on Adaptive Hardware and Systems (AHS'14) - The DSP for FlexTiles

7 /

Da

te /R

efe

renc

e

The

info

rmat

ion

cont

aine

d in

thi

s do

cum

ent

and

any

atta

chm

ents

are

the

pro

pert

y of

Fle

xTile

s co

nsor

tium

. Y

ou a

re h

ereb

y no

tifie

d th

at a

ny r

evie

w,

diss

emin

atio

n, d

istr

ibut

ion,

co

pyin

g or

oth

erw

ise

use

of t

his

docu

men

t m

ust

be d

one

in a

ccor

danc

e w

ith t

he C

A o

f th

e pr

ojec

t (T

RT

/DJ/

6244

1278

5.20

11).

Tem

plat

e ve

rsio

n 1

.0

DSP accelerator architecture (cont’d)

Page 8: Conference on Adaptive Hardware and Systems (AHS'14) - The DSP for FlexTiles

8 /

Da

te /R

efe

renc

e

The

info

rmat

ion

cont

aine

d in

thi

s do

cum

ent

and

any

atta

chm

ents

are

the

pro

pert

y of

Fle

xTile

s co

nsor

tium

. Y

ou a

re h

ereb

y no

tifie

d th

at a

ny r

evie

w,

diss

emin

atio

n, d

istr

ibut

ion,

co

pyin

g or

oth

erw

ise

use

of t

his

docu

men

t m

ust

be d

one

in a

ccor

danc

e w

ith t

he C

A o

f th

e pr

ojec

t (T

RT

/DJ/

6244

1278

5.20

11).

Tem

plat

e ve

rsio

n 1

.0

Management of the DSP accelerator

Each accelerator is managed by software running on GPPs

virtualization manager: attribution of the accelerator resource manager: control of the accelerator

These managers are in charge of:

transfer of the application (ELF) to the accelerator signaling the accelerator when to start and when to stop recovering statistics on usage of the accelerator to optimize the

execution of the application on the many-core platform

The tracing unit can be managed from the processor or from the JTAG interface

Page 9: Conference on Adaptive Hardware and Systems (AHS'14) - The DSP for FlexTiles

9 /

Da

te /R

efe

renc

e

The

info

rmat

ion

cont

aine

d in

thi

s do

cum

ent

and

any

atta

chm

ents

are

the

pro

pert

y of

Fle

xTile

s co

nsor

tium

. Y

ou a

re h

ereb

y no

tifie

d th

at a

ny r

evie

w,

diss

emin

atio

n, d

istr

ibut

ion,

co

pyin

g or

oth

erw

ise

use

of t

his

docu

men

t m

ust

be d

one

in a

ccor

danc

e w

ith t

he C

A o

f th

e pr

ojec

t (T

RT

/DJ/

6244

1278

5.20

11).

Tem

plat

e ve

rsio

n 1

.0

Ultra low-power (ULP) processors at CSEM

CSEM was founded in the 1980s to promote innovation

1980s, initially for the Swiss watch industry ULP 4-bit processors: PUNCH, µPUS, Combo, ...

1990s, development of a general purpose ULP 8-bit processor: CoolRISC: licensed to Swatch group, TI, Semtech, ...

2000s, powerful new ULP processors with DSP features 2006: icyflex1, a flexible processor for DSP/control apps 2009: icyflex2, a smaller processor for control applications 2009: icyflex4, a scalable processor for DSP/control apps

icyflex is a registered trademark of CSEM

Page 10: Conference on Adaptive Hardware and Systems (AHS'14) - The DSP for FlexTiles

10 /

Da

te /R

efe

renc

e

The

info

rmat

ion

cont

aine

d in

thi

s do

cum

ent

and

any

atta

chm

ents

are

the

pro

pert

y of

Fle

xTile

s co

nsor

tium

. Y

ou a

re h

ereb

y no

tifie

d th

at a

ny r

evie

w,

diss

emin

atio

n, d

istr

ibut

ion,

co

pyin

g or

oth

erw

ise

use

of t

his

docu

men

t m

ust

be d

one

in a

ccor

danc

e w

ith t

he C

A o

f th

e pr

ojec

t (T

RT

/DJ/

6244

1278

5.20

11).

Tem

plat

e ve

rsio

n 1

.0

icyflex family of ultra-low power processors

icyflex2Control

ComputingPower

DSP

icyflex1

icyflex4

1 MUL 2 MAC 4 MAC … 36 MAC

Application

6 µW/MHz

25 µW/MHz 10-150 µW/MHz

12 MAC

power indicated for TSMC 65 nm CMOS

Page 11: Conference on Adaptive Hardware and Systems (AHS'14) - The DSP for FlexTiles

11 /

Da

te /R

efe

renc

e

The

info

rmat

ion

cont

aine

d in

thi

s do

cum

ent

and

any

atta

chm

ents

are

the

pro

pert

y of

Fle

xTile

s co

nsor

tium

. Y

ou a

re h

ereb

y no

tifie

d th

at a

ny r

evie

w,

diss

emin

atio

n, d

istr

ibut

ion,

co

pyin

g or

oth

erw

ise

use

of t

his

docu

men

t m

ust

be d

one

in a

ccor

danc

e w

ith t

he C

A o

f th

e pr

ojec

t (T

RT

/DJ/

6244

1278

5.20

11).

Tem

plat

e ve

rsio

n 1

.0

icyflex software development kit

GNU C compiler (gcc) v 4.6.3

icyflex instruction parallelism supported by latest releases of gcc libc and libm from RedHat’s NewLib software implementation of IEEE floating-point standard

GNU assembler / linker (binutils), v 2.20

BFD / ELF32 object file format Binary, SREC, IHEX memory image file formats

GNU debugger (gdb), v 6.7.1

Mode 1: instruction set simulator of the icyflex core Mode 2: On-Chip Debug (OCD) through a JTAG interface

icyflex instruction set simulator (ISS), written in C++

Phase-accurate, pipelined Wrappers to SystemC, VHDL (Modelsim), Matlab/Simulink

Eclipse integrated development environment, v Helios

CDT C/C++ IDE plug-in icyflex plug-in

.c

.o

.exe

.log

gcc

ld

gdbgdb

Page 12: Conference on Adaptive Hardware and Systems (AHS'14) - The DSP for FlexTiles

12 /

Da

te /R

efe

renc

e

The

info

rmat

ion

cont

aine

d in

thi

s do

cum

ent

and

any

atta

chm

ents

are

the

pro

pert

y of

Fle

xTile

s co

nsor

tium

. Y

ou a

re h

ereb

y no

tifie

d th

at a

ny r

evie

w,

diss

emin

atio

n, d

istr

ibut

ion,

co

pyin

g or

oth

erw

ise

use

of t

his

docu

men

t m

ust

be d

one

in a

ccor

danc

e w

ith t

he C

A o

f th

e pr

ojec

t (T

RT

/DJ/

6244

1278

5.20

11).

Tem

plat

e ve

rsio

n 1

.0

icyflex2 vs icyflex4

Feature icyflex2 icyflex4 VPS=2

Optimized for Control DSPP, X, Y memory buses,ISA, HW loops, saturation, …

Instruction word [bits] 32 (1 or 2 sub) 64 (1, 2 or 3 sub)

Memory access [bits] 8, 16 or 32 2x (8, 16, 32, 64, 128)

Data processing [bits] 16 or 32, trunc 2x (16 or 32 or 64), full

Single Instr. Multiple Data (SIMD) No Yes, up to 8 MAC

Instruction set is reconfigurable on the fly

No Yes

Software Development Kit (SDK) GNU-based tool suite (gcc, gdb) + cycle-accurate instruction set simulator (ISS)

Hardware Devt Kit (HDK) FPGA-based, customizable

VPS = Vector Processing Slices in the Vector Processing Unit of the DSP

Page 13: Conference on Adaptive Hardware and Systems (AHS'14) - The DSP for FlexTiles

13 /

Da

te /R

efe

renc

e

The

info

rmat

ion

cont

aine

d in

thi

s do

cum

ent

and

any

atta

chm

ents

are

the

pro

pert

y of

Fle

xTile

s co

nsor

tium

. Y

ou a

re h

ereb

y no

tifie

d th

at a

ny r

evie

w,

diss

emin

atio

n, d

istr

ibut

ion,

co

pyin

g or

oth

erw

ise

use

of t

his

docu

men

t m

ust

be d

one

in a

ccor

danc

e w

ith t

he C

A o

f th

e pr

ojec

t (T

RT

/DJ/

6244

1278

5.20

11).

Tem

plat

e ve

rsio

n 1

.0

blank instructionsconfigured at run-time

icyflex: reconfigurable instructions and addressing modes

Instruction set

ADD

MUL

SHR

MAC

JUMP

configurable

configurable

SHIFT

MUX

ALU

ACC

ACC

SHIFT

MUX

ALU

ACC

ACC

inst

ruct

ion

de

cod

ing

cycle N: config MOPcycle N+1: use MOP

Page 14: Conference on Adaptive Hardware and Systems (AHS'14) - The DSP for FlexTiles

14 /

Da

te /R

efe

renc

e

The

info

rmat

ion

cont

aine

d in

thi

s do

cum

ent

and

any

atta

chm

ents

are

the

pro

pert

y of

Fle

xTile

s co

nsor

tium

. Y

ou a

re h

ereb

y no

tifie

d th

at a

ny r

evie

w,

diss

emin

atio

n, d

istr

ibut

ion,

co

pyin

g or

oth

erw

ise

use

of t

his

docu

men

t m

ust

be d

one

in a

ccor

danc

e w

ith t

he C

A o

f th

e pr

ojec

t (T

RT

/DJ/

6244

1278

5.20

11).

Tem

plat

e ve

rsio

n 1

.0

DSP in FlexTiles emulators

Emulator 1 (software):

Using Open Virtual Platform (OVP) Not cycle accurate The icyflex4 DSP is emulated by a GPP running at a higher frequency

Emulator 2 (hardware):

Using an FPGA board with two Xilinx Virtex6 FPGAs Uses a DFF version of the DSP accelerator

Page 15: Conference on Adaptive Hardware and Systems (AHS'14) - The DSP for FlexTiles

15 /

Da

te /R

efe

renc

e

The

info

rmat

ion

cont

aine

d in

thi

s do

cum

ent

and

any

atta

chm

ents

are

the

pro

pert

y of

Fle

xTile

s co

nsor

tium

. Y

ou a

re h

ereb

y no

tifie

d th

at a

ny r

evie

w,

diss

emin

atio

n, d

istr

ibut

ion,

co

pyin

g or

oth

erw

ise

use

of t

his

docu

men

t m

ust

be d

one

in a

ccor

danc

e w

ith t

he C

A o

f th

e pr

ojec

t (T

RT

/DJ/

6244

1278

5.20

11).

Tem

plat

e ve

rsio

n 1

.0

Exploitation of FlexTiles results at CSEM

CSEM specializes in low power solutions

A well-balanced multi-processor design can optimize energy

consumption by reducing voltage and frequency

For multi-core: we offer CSEM solutions

For many-core: CSEM collaborates with 1 or more of our partners

including e.g. a follow up project to produce FlexTiles chips

Page 16: Conference on Adaptive Hardware and Systems (AHS'14) - The DSP for FlexTiles

16 /

Da

te /R

efe

renc

e

The

info

rmat

ion

cont

aine

d in

thi

s do

cum

ent

and

any

atta

chm

ents

are

the

pro

pert

y of

Fle

xTile

s co

nsor

tium

. Y

ou a

re h

ereb

y no

tifie

d th

at a

ny r

evie

w,

diss

emin

atio

n, d

istr

ibut

ion,

co

pyin

g or

oth

erw

ise

use

of t

his

docu

men

t m

ust

be d

one

in a

ccor

danc

e w

ith t

he C

A o

f th

e pr

ojec

t (T

RT

/DJ/

6244

1278

5.20

11).

Tem

plat

e ve

rsio

n 1

.0

FlexTiles FP7 project

For more information regarding the FlexTiles project, visit:

http://www.flextiles.eu

Please take 5 minutes to fill out the surveyon the project web site under the Contact menu

The FlexTiles project is funded in part by FP7, the seventh framework programme of the European Commission.

Page 17: Conference on Adaptive Hardware and Systems (AHS'14) - The DSP for FlexTiles

www.flextiles.eu

FlexTiles

Da

te /R

efe

renc

e

Thank you for your attention!

For more information: http://www.csem.ch

Questions? mailto:[email protected]