Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
Adaptive solutions for the emerging reliability and
multicore resource management challenges
APRES’09
Grenoble, France, October 2009
IMEC vzw.
Dr. Stylianos Mamagkakis
& IMEC MPSoC, TAD, Compiler, 3D, Wireless and Multimedia Teams
APRES’09 Adaptive solutions for the emerging reliability and multicore resource management challenges imec/restricted 2009 2
25 years of IMEC in Leuven, Belgium
• Founded in 1984
• Collaboration with 500 partners worldwide
• About 1500 employees
• More than 20 spin-offs
• Unique clean room laboratories with state-of-the-art equipment
• Research 3-10 years ahead of industrial needs
• Multidisciplinary programs
• 227 Million Euro budget
(191 Million Euro self generated)
• 1700 publications per yearFor more info visit:http://www.imec.be
APRES’09 Adaptive solutions for the emerging reliability and multicore resource management challenges imec/restricted 2009 3
IMEC as a “Transformer”
Time frame
Long term,
many options
Short term,
applications
Low High R&D cost
Universities
Industry
Providing focus for Universities and basic insight and solutions for Industrial Partners
Know-
ledge
Know-
ledge
For more info on academic collaboration see:ArtistDesign FP7 NoE http://www.artist-embedded.orgHiPEAC2 FP7 NoE http://www.hipeac.net
APRES’09 Adaptive solutions for the emerging reliability and multicore resource management challenges imec/restricted 2009 4
IMEC – SSET (Smart Systems and Energy Technology)
Design Components for multimedia & wireless applications
Instantiate in
wireless
platform
Instantiate
in multimedia
platform
Design technology:Composable and scalable mapping
on multi-core platforms
Compiler and processor extensions(SIMD, multi-threading)
Step-wise extension from re-active SDR tocognitive radio including Gbps stationaryand 100’s Mbps mobile, 60 GHz
Technology-aware digital andRF design: semiconductor scaling-proof
Scaled RF and ADC components
3D graphics and video on same platform
End-to-endCross-layeroptimisation
Video
3D objects
Set of independent
applications
Applications
Pareto Surface
Applications
Parallel
Object Code
DESIGN TIME RUN TIME (RT)Pareto Point
Selection
Platform Resource
Assignment
Virtual Platform Simulator
HDL Simulation
MPSoC Chip
MPSoC Platform
Operating
System RT
GPP (e.g ARM)ADRES ADRES
MPSoC Platform
Applications
Parallel Object Code
© imec 2 005 MPSoC A ctiv ity Over view - IMEC Confide ntia l 2
Design-Time MPSoC Tool-Flow
Platform independenttransformations(Methodology)
P latform specific optimizationsATOM IUM TOOLSUITE
(SINGLE PROCESSOR)
Sequentia l Ob ject code
Parallelisation (SPRINT superset)
& Memory Hierarchy
Platform specific optimizat ions
ATOMIUM MA & MC TOOL
Para llel Object Code
Compiler
Application
1
Virtual Platform Simulator
HDL Simulation
MPSoC Chip
3
2
Parallel CFunctional
Simulation
(multi thread lib.)
Compiler
3
Conten t
server
IP Backbone
Wireless MAN
(e.g., 802.16e)W ireless LAN
(e.g., 802.11)
2
1
IMEC-NES http://www2.imec.be/imec_com/nomadic-embedded-systems.php
APRES’09 Adaptive solutions for the emerging reliability and multicore resource management challenges imec/restricted 2009 5
Overview
• Introduction
• Multicore platforms today and in the near future
• Challenges in the multicore era
• Conclusions
APRES’09 Adaptive solutions for the emerging reliability and multicore resource management challenges imec/restricted 2009 6
Moore’s law and µP evolution
APRES’09 Adaptive solutions for the emerging reliability and multicore resource management challenges imec/restricted 2009 7
Evolution in Memory Hierarchies and ILP advances
• Caches: hide latency of DRAM and increase BW– CPU-DRAM access gap has grown by a factor of 30-50!
– Trend 1: Increasingly large caches
• On-chip: from 128 bytes (1984) to 100K+ bytes
• Multilevel caches: add another level of caching
– Trend 2: Advances in caching techniques:
• Reduce or hide cache miss latencies
• Cache aware combos: computers, compilers, code writers
• ILP is the implicit parallelism among instructions exploited by pipelining, superscalar and VLIWs
• “Off-the-shelf” ILP techniques yielded 20 year path.
[Hennessy01] Directions and Challenges in High Performance Microprocessors
APRES’09 Adaptive solutions for the emerging reliability and multicore resource management challenges imec/restricted 2009 8
TI C64 Energy Vdd scaling (130nm)
2000
3000
4000
5000
6000
7000
1000 1500 2000 2500 3000
Performance [MOPS]
Energy efficiency
[MOPS/W]
Moving to Multiple Cores
• Need for flexible platforms– Reduce NRE cost
– Adapt to ever changing standards
– Decreasing time-to-market
• High performance while power efficient achieved by– providing multiple PEs
– at a lower voltage and frequency
– Limiting context switches of a single PE
0.9V410MHz
• Platform components– General Purpose PE(s) for control tasks
– Specialized PEs for computation intensive tasks
– Mem hierarchy: scratchpad and caches
– Flexible, service-providing interconnect
[Lecture Jan Rabaey]
IBM CELL processor
APRES’09 Adaptive solutions for the emerging reliability and multicore resource management challenges imec/restricted 2009 9
Texas Instruments IBM CellARM
L2D1 L2D2
ADRES1
ADRES2
ADRES3
CM
L1D
I$
ADRES4
ADRES5
ADRES6
L2I2L2I1
ARM
IMEC
Multiple cores are here today and fast moving to many cores
[Jerraya04] Multiprocessor Systems-on-Chips
(April’08)
APRES’09 Adaptive solutions for the emerging reliability and multicore resource management challenges imec/restricted 2009 10
Enabling Ambient Intelligence (AmI) Dream
Secure, trustworthy computing and communication chips
embedded in every-thing and every-one.
A pervasive, context aware ambient,
sensitive and responsive to the presence of people
APRES’09 Adaptive solutions for the emerging reliability and multicore resource management challenges imec/restricted 2009 11
Ambient Intelligence: cheap, interoperable, low power
embedded software platforms
Explicit
computing Nomadic &
private spacesSensors,Actuators
Ambient
100Watt 1Watt 100mW 100µµµµW
“Watt” (mains) “Milliwatt” (battery “Microwatt” (ambient)
and cheap consumer)
“More Moore” “More-than-Moore”
1Tflops 100Gops 10Gops 10Mops
UPAD
APRES’09 Adaptive solutions for the emerging reliability and multicore resource management challenges imec/restricted 2009 12
Ambient Intelligence: cheap, interoperable, low power
embedded software platforms
Explicit
computing Nomadic &
private spacesSensors,Actuators
Ambient
100Watt 1Watt 100mW 100µµµµW
“Watt” (mains) “Milliwatt” (battery “Microwatt” (ambient)
and cheap consumer)
“More Moore” “More-than-Moore”
1Tflops 100Gops 10Gops 10Mops
Single cor
e
Single cor
e
Man
y co
re
Man
y co
re
Gen
eral Purp
ose
Gen
eral Purp
ose
GP SW E SW E SW
APRES’09 Adaptive solutions for the emerging reliability and multicore resource management challenges imec/restricted 2009 13
Growing Gaps between AmI-Dreams and
Nano-Scale Realities…
System Design Specs
=> 7th heaven of software
Hell of nano-scale physics
1B+ tr Platform architecture
500 processors
50MB memory
1 Watt
xx nm Si IP Blocks
Platform
Design
IP
Creation
CMOS
Scaling
System
Design
SDR
Tools
& Resource Mngmt
Architectural gapESL
*Techn. Aware Design”
TAD*Uncertainty
Metrics!
Physical gapDFM
APRES’09 Adaptive solutions for the emerging reliability and multicore resource management challenges imec/restricted 2009 14
Overview
• Introduction
• Multicore platforms today and in the near future– About multicore platforms
– Multimedia platforms
– Wireless platforms
– Smart camera applications
• Challenges in the multicore era
• Conclusions
APRES’09 Adaptive solutions for the emerging reliability and multicore resource management challenges imec/restricted 2009 15
Components in a single processor
µP L1 L2main
memory
How does this scale to multiprocessor?
APRES’09 Adaptive solutions for the emerging reliability and multicore resource management challenges imec/restricted 2009 16
Multiprocessor
Memory/communication = shared
L1 L2main
memory
µP
µP
APRES’09 Adaptive solutions for the emerging reliability and multicore resource management challenges imec/restricted 2009 17
Scale processors
also scale memory & communication
L2main
memory
L1
L1
µP
µP
APRES’09 Adaptive solutions for the emerging reliability and multicore resource management challenges imec/restricted 2009 18
Communication over central bus is a
bottleneck
µP 3 µP nµP 4
Interconnect
µP 1 µP …µP 2
NoCs are needed
APRES’09 Adaptive solutions for the emerging reliability and multicore resource management challenges imec/restricted 2009 19
Networks on Chip In a Nutshell:
Properties of NoCs
Distributed Communication ArchitectureMessage-Passing: native communication scheme between PEs
Network Interfaces: cut/reassemble messages
Routers: relay data from source to destination
Switched Medium Short point to point (PTP) links connected by switches (routers)
Packet-switched,
Or Circuit-switched
Topology Spatial PTP link organization
2D: Meshes, Torii, Fat-Trees, …
PEPE PE
PE PE
PE PE PE
PE
PTP Links Are Shared Resource Management = QoS
Different QoS levels eg. BE, GT
APRES’09 Adaptive solutions for the emerging reliability and multicore resource management challenges imec/restricted 2009 20
Memory hierarchy overview
datapath
register filefew words
Main memory (DRAM)MByte…GByte
Storage (flash, HDD)MByte…TByte
singlecycle
single cycle
2-6 cycles
>20ns
instruction cache
data cache
L1: kbit…Mbit
L2: (optional)512 kbit … 4Mbit
cache
1x
10x
100x
Size Type Delay Power
APRES’09 Adaptive solutions for the emerging reliability and multicore resource management challenges imec/restricted 2009 21
IMEC 3MF multicore platform: ADRES core
Multi-format:
• MPEG-2, MPEG-4, H.264, …
90 nm CMOS results after P&R:
• 3.6 mm2
• 300 MHz max. clock
• 30 fps H.264/AVC decoding:
- CIF: 20 mW @ 50MHz
- AVC: 58 mW @ 150 MHz
- D1: 80 mW @ 205 MHz
• competition on H.264/AVC CIF
– ARM Cortex-A8: D1: 350mW @ 350 MHz
– TI: D1: 300mW @ 200 MHz
32 KB Data Memories
13.5 KB Conf.
13.5 KB Conf.
32 KB Instr. Cache
ADRES Core
[Mei03] ADRES: an architecture with tightly coupled VLIW processor and coarse-grained reconfigurable Matrix
APRES’09 Adaptive solutions for the emerging reliability and multicore resource management challenges imec/restricted 2009 22
Test wrapper
NoC
AHB wrapper CA NIU
Init
NIU
Target
AHB lite AHB-L
AHB-L
NTTP
NTTP
Mem
L2D #2a 256 kB
S
M
S
M S
AHB
wrapper CA NIU
Init
NIU
Target
AHB lite AHB-L
AHB-L
NTTP
NTTP
Mem
L2I #1
512 KB S
M
S
M S
CA NIU
Init
NIU Target
AHB lite AHB-L
AHB-L
NTTP
NTTP
Mem
S
M
S
M
ARM926EJ
AHB
wrapper
RAM
32 kB APB bridge
GPIO
PL061
IT ctrl
SMI
EMIF
BA312
AHB I
AHB lite
ADRES
#1..#6 CA NIU
Init
NIU
Target
NIU
Init
L1 L1 L1 L1
I$
Cfg
AHB lite
AHB lite
AHB lite
AHB-L
AHB-L
NTTP
NTTP
NTTP
Mem
S
M
S
M
M
S
M
CA NIU Init
NIU
Target
AHB lite
AHB lite
AHB-L
AHB-L
NTTP
NTTP
Mem
S
M
S
M
CA NIU Init
NIU
Target
NIU
Init
AHB lite
AHB lite
AHB-L
AHB-L
NTTP
NTTP
NTTP
Mem
S
M
S
M
Clk&Rst
control
Production test
controller
JTAG logic Remap&
Pause
AudioDE
AHB D
FIFO In Async
FIFO
AHB
wrapper CA NIU
Init
NIU
Target
AHB lite AHB-L
AHB-L
NTTP
NTTP
Mem
L2D #1a 256 kB
S
M
S
M S
Mem Mem
Main multilayer AHB bus
PLL
FIFO Out Async FIFO
M
M
M
M
M
M
M
AHB lite S
AHB lite S
ADRES
DBGIF
DBG
L2D #2b 256 kB
L2D #1b 256 kB
AHB lite
AHB
wrapper CA NIU
Init
NIU
Target
AHB lite AHB-L
AHB-L
NTTP
NTTP
Mem
L2I #2
512 KB S
M
S
M S
M
Timer
1:N
bridge
Interrupt to ARM
IT controller
Interrupt to ARM
IT controller
Mem
APB
APB
1:1
bridge
Halt from clock
controller Ext_stall
NormalOperation
Fault status
from all 13
NIU-Target
resume resume
resume
resume
resume
resume
resume
resume
resume Interrupt to
IT controller
x6
x6
nIRQ
nFIQ
Interrupt
to IT ctrl
Interrupt
to IT ctrl
…
…
Communicationassist blocks
ADRESx6
ARTERIS NoC L2 SRAM
ARM node
IMEC 3MF MPSoC architecture for multimedia
• 6 ADRES processors– 4x4 array, 3-issue VLIW– 32-bit datapath– 16 video CODEC specific instructions – 8 FUs with multipliers– Performance: 300MHz
• 13 Communication assist– Performance: 75/150MHz
• ARTERIS NoC– Separate instr. and data NoC– Bandwidth: 5Gbps@150MHz
• ARM926– System control– Performance: 75MHz
• L2 memory– L2I: 2 banks of 512kB
– L2D: 4 banks of 256kB
• Voltage islands– ADRES processors– L2I and L2D banks
• Multiple clock domains
APRES’09 Adaptive solutions for the emerging reliability and multicore resource management challenges imec/restricted 2009 23
IMEC 3MF layout
• Technology– TSMC 90nm GP
• Area– 64mm2
– w/o pad cells, test logic, power rings, …
• Clock– 300MHz
• Power– To be confirmed by RT-
level power estimation using MPEG-4 SP mapping activity
– Early estimates
• ADRES: 111mW (x6)• NoC: <50mW• L2I, L2D: <50mW• ARM: 25mW
L2D1 L2D2
ADRES1
ADRES2
ADRES3
CM
L1D
I$
ADRES4
ADRES5
ADRES6
L2I2L2I1
ARM
http://www5.imec.be/ufc/file2/imec_sites/floosen/f6f4ddc469a921d52f730a17ccf6006b/pu/ADRES_3MF.pdf
APRES’09 Adaptive solutions for the emerging reliability and multicore resource management challenges imec/restricted 2009 24
Customer’s demand on multi-mode >100Mbps wireless
communication requires multi-processor SDR solutions
1995 2000 2005 2010
10 kbps 100 kbps 1 Mbps 10 Mbps 100 Mbps 1 Gbps
WIMAX
Low speed/Stationary
2G(digital)
3GMultimedia
3G+
1G(analog)
Medium speed
802.16e
2.4 GHzWLAN
5 GHzWLAN
High rate WLAN
GSMCDMAone
Bluetooth
60 GHz WPAN
4Gresearch target
UMTSCDMA2000
High speed GPRS
EDGE
3GPP-LTE+
UWB WPAN
1995 2000 2005 2010
10 kbps 100 kbps 1 Mbps 10 Mbps 100 Mbps 1 Gbps
WIMAX
Low speed/Stationary
2G(digital)
3GMultimedia
3G+
1G(analog)
Medium speed
802.16e
2.4 GHzWLAN2.4 GHzWLAN
5 GHzWLAN5 GHzWLAN
High rate WLAN
GSMCDMAone
Bluetooth
60 GHz WPAN
4Gresearch target
UMTSCDMA2000
High speed GPRS
EDGE
3GPP-LTE+
UWB WPAN
APRES’09 Adaptive solutions for the emerging reliability and multicore resource management challenges imec/restricted 2009 25
IMEC’s SDR baseband platform:
Heterogeneous MPSoC enabling reactive radio
DFE tile
DFE tile
DFE tile
SyncPro
SyncPro
SyncPro
BW optimizedscalable
interconnect
Platform ctrl (ARM)
BB engine
BB engine
FEC engine
L2Periphand HI
Reconf AFE
signal path
Reconf AFE
signal path
Reconf AFE
signal path
Shared AFE
components
IMEC digital front end:
• Solution for reactive radio
• Ultra low power in idle targeted flexibility
SDR-tuned ADRES:
• C compiler
• high performance/ power ratio
[Derudder09] A 200Mbps+ 2.14nJ/b digital baseband multi processor system-on-chip for SDRs
APRES’09 Adaptive solutions for the emerging reliability and multicore resource management challenges imec/restricted 2009 26
Baseband Engine
FU_4
FU_3
FU_5 FU_6 FU_7
LRF
FU_8 FU_9 FU_10 FU_11
FU_12 FU_13 FU_14 FU_15
FU_2FU_1FU_0
LRFLRFLRF
LRFLRFLRFLRF
LRFLRFLRFLRF
slot slot slot slot slot slot slot slot slot slot slot slot slot slot slot slot
slot slot slot slot slot slot slot slot slot slot slot slot slot slot slot slot
slot slot slot slot slot slot slot slot slot slot slot slot slot slot slot slot
slot slot slot slot slot slot slot slot slot slot slot slot slot slot slot slot
Configuration
memory
Bank1
Configuration
memory
bank2
CDRF
LRF
• 4x4 64-bit 4-way SIMD CGA• VLIW and CGA mode of operations• C-programmable• 4-bank 64K L1 data scratchpad• 32K I$• AHB interface• CGA configurable via DMA
• 25 (theoretical) GOPS• 46MOPS/mW
APRES’09 Adaptive solutions for the emerging reliability and multicore resource management challenges imec/restricted 2009 27
Miniaturized, Low-power Smart Vision Systems
Pixel
Pixel
SensorsArray
sensorControllogic
On-chipImage
processing
Smart
[Heijligers/Kleihorst, Stanford, NES, …]
Smart camera
[INTEGSYS/Massari ISSCC’08]
Low-levelPixel
processing
IntermediateObject
processing
High levelProcessing
Image improvementEdge enhancementConvolution kernels
Shape analysisSegmentation
Decision takingNetworking
Automotive Surveillance Biomedical
APRES’09 Adaptive solutions for the emerging reliability and multicore resource management challenges imec/restricted 2009 28
Motivating the Smart in the Smart Camera:
SOTA energy efficient processors: 12 pJ/pixel operation
R. D. Jackson, P. C. Coffield, J. N. Wilson, “A new SIMD computer vision architecture with image algebra programming environment”,IEEE Aerospace Conference, 1997.
Normalized Processors 90nm 1V
0.1
1
10
100
1000
1 10 100 1000
Energy Efficiency in 32-bit MIPS/mW
"Peak" Performance in 32-bit GIPS
ADRES
Visconoti
onDSP
ADRES Stream
CF6
Macgic
PicoChip
EVP
CELL
VIRAM1
TI C64x
Xtensa
Stream/CGA based DSP based
SiliconHive
Sandbridge
Vector based
SC140
BF561
2007 © Copyright Praveen Raghavan, NES, IMEC vzw
Xetal
1. Assume: VGA 30 fps 9 Mpixels/s
Assume: 500 ops/pixel ~ 5 GOPS
2. Processor: 70 MIPS/mW 14 pJ/32b operation
~ 12 pJ/pixel operation
1
2
APRES’09 Adaptive solutions for the emerging reliability and multicore resource management challenges imec/restricted 2009 29
Overview
• Introduction
• Multicore platforms today and in the near future
• Challenges in the multicore era – Manycore mapping challenges
– Run Time Resource management challenges
– Technology aware variability and reliability challenges
– 3D integration challenges
• Conclusions
APRES’09 Adaptive solutions for the emerging reliability and multicore resource management challenges imec/restricted 2009 30
Multicore mapping
• Parallel mapping (MPA)
• Memory hierarchy (MH)
For more info see:
IMEC CleanC – Open Source projecthttp://www.imec.be/cleanC
IMEC MPSoC Parallelization Assistant (MPA) toolIMEC Memory Hierarchy (MH) tool
…also in MNEMEE FP7 projecthttp://www.mnemee.org…and in MOSART FP7 projecthttp://www.mosart-project.org
APRES’09 Adaptive solutions for the emerging reliability and multicore resource management challenges imec/restricted 2009 31
IMEC Solution: MPA
Compiler assisted parallelization
thread 1 thread 2 thread 3
*.c
*.c *.c *.c
Parallelizationdirectives
Application code Parallelizes sequential C source code Correct-by-construction multi-threaded code
Designer in charge
More expressive directives than OpenMP
Supports types of parallelism Functional split
(Coarse) Data-level split
Combinations
Dumps parallel code Sets up communication
Communication by means of FIFO’s
DMA transfers
FIFO sizes determined by tool (initial version)
Parallelization
assistant
APRES’09 Adaptive solutions for the emerging reliability and multicore resource management challenges imec/restricted 2009 32
Parallelization of code - results
• Prototype tool used to explore different parallel software architecture for MPEG-4 part2 SP encoder – 10 parallelization alternatives explored in half a day
– 20 to 30 lines of parallelization directives
me (97M)
mc (26M)
tc (92M)
tu (17M)
vlc (47M)
ps (0M)
putbits (40M)
dp (2M)
parsect ion(338M)
framepr ocessing(339M)
padding (1M)
ec (6M)
~123 Mcycles ~109 Mcycles ~62 Mcycles
3 processorsspeed x 2.75
5 processorsspeed x 3.11
7 processorsspeed x 5.45
APRES’09 Adaptive solutions for the emerging reliability and multicore resource management challenges imec/restricted 2009 33
Cache vs. scratchpad
• Cache– Hardware
• SRAM memory for data
• Additional hardware
– Use
• No programmer effort required
• Unpredictable performance
• Scratchpad– Hardware
• SRAM memory for data
• DMA
– Use
• Programmer in control: tedious
• Predictable
• More efficient
APRES’09 Adaptive solutions for the emerging reliability and multicore resource management challenges imec/restricted 2009 34
Cache vs. scratchpad - results
• MPEG-4 p2 SP encoder (±8950 lines of C code)
ADRES
L2 (4MBit)
I$
D$
AHB multi-layer
0
50
100
150
200
250
SPM 16k D$ 16k D$ 64k
Millions
Execution tim
e [cycles]
active stall halt
30 frames @ CIF
0
10
20
30
40
50
60
70
SPM 16k D$ 16k D$ 64k
Power [m
W]
ADRES L1 L2 CA
ADRES
L2 (4MBit)
I$
SP
AHB multi-layer
DMA
AHB
DMA
AHB
- 40%
- 22%
- 30%- 19%
Execu
tion tim
e[x106cycles]
SPM 16k D$ 16k D$ 64k
SPM 16k D$ 16k D$ 64k
Power [m
W]
APRES’09 Adaptive solutions for the emerging reliability and multicore resource management challenges imec/restricted 2009 35
Cache vs. scratchpad - results
• MPEG-4 p2 SP encoder (±8950 lines of C code)
ADRES
L2 (4MBit)
I$
D$
AHB multi-layer
0
50
100
150
200
250
SPM 16k D$ 16k D$ 64k
Millions
Execution tim
e [cycles]
active stall halt
30 frames @ CIF
0
10
20
30
40
50
60
70
SPM 16k D$ 16k D$ 64k
Power [m
W]
ADRES L1 L2 CA
ADRES
L2 (4MBit)
I$
SP
AHB multi-layer
DMA
AHB
DMA
AHB
- 40%
- 22%
- 30%- 19%
Execu
tion tim
e[x106cycles]
SPM 16k D$ 16k D$ 64k
SPM 16k D$ 16k D$ 64k
Power [m
W]
±9250 lines of C codecorrect by construction
IMEC M
H
Prototype tool
[Baert08] An automatic scratch pad memory management tool and MPEG-4 encoder case study
APRES’09 Adaptive solutions for the emerging reliability and multicore resource management challenges imec/restricted 2009 36
VideoEngine
GraphicsEngine
On-chip comm infrastructure (BW)
MEM MEM
GPP
Black Box
Shared memory
DSP DSP
DSP
DSP
Sharing is regulated by Run Time Manager (RTM)
RTM
[Nollet08] A Safari Through the MPSoC Run-Time Management Jungle
• Admission– Is it possible to compose the
requested components?
• Resource allocation– How should resources be
distributed among components?
• Switching– How to implement the
resource allocation decision?
Application(s)
MPSoC Platform Hardware Properties and Services
System Quality
Manager
Resource Manager
(Policy)
Resource Manager
(Mechanism)
Run-Time
Library
External Metadata
Interface
System Manager
APRES’09 Adaptive solutions for the emerging reliability and multicore resource management challenges imec/restricted 2009 37
Multicore resource management
• Shared resources/metadata
• Multiple types of components/resources
• Resource request dynamism
• Policies/mechanisms
For more info see:IMEC ARES teamhttp://www2.imec.be/imec_com/mpsoc_runtime.php
IMEC Run Time Library & Management (RTLib & RTM)…also in MultiCube FP7 projecthttp://www.multicube.eu…and in OptiMMA IWT projecthttp://www.imec.be/OptiMMA
APRES’09 Adaptive solutions for the emerging reliability and multicore resource management challenges imec/restricted 2009 38
RT Manager challenges overview I
SoftwareApplication(C/C++)
RT Management Component
SelectedMP-SoC platform
Run-time Metadata:Resources requested
Run-time Metadata:Resources available
Design-time Metadata:SW characteristics
Design-time Metadata:HW characteristics
Pareto Spaces
RT monitors
1. Metadataextraction/
build scenarios
2. Metadata Monitoring/detect scenarios
[Bartzas08] Enabling run-time memory data transfer optimizations at the system level with automated extraction of embedded software metadata information
APRES’09 Adaptive solutions for the emerging reliability and multicore resource management challenges imec/restricted 2009 39
RT Manager challenges overview II
Power consumption
Execution timeTime
Metadata value (eg. # objects processed/workload)
3. RTM configurationaccording to scenarios
5. Calibrating operating points@run-time: When metadata is updated at run-time
Each operating point corresponds to a specific RTM combination and configuration
4. Switching between pre-selected operating points
RTM config. 1
RTM config. 2
RTM config. 3
1
2
3
[Ma07] Systematic Method for Real-Time Cost-Effective Mapping of Dynamic Concurrent Task-Based Systems on Heterogeneous Platforms
APRES’09 Adaptive solutions for the emerging reliability and multicore resource management challenges imec/restricted 2009 40
MNEMEE Example: Dynamic Memory Management
(DMM) for Wireless (WL) applications
Linux
Kernel
DMM
Linux/libc malloc(), free()
Linux/kernel sbrk(), mmap()
DMM
IMEC malloc(), free()
IMEC sbrk(), mmap()
IMEC
Linux
Kernel
WL
2
4
6
8
00 200 400M
emory request (MB)
Time (sec)
Allocate memory as requested
Not Worst-case scenario!
APRES’09 Adaptive solutions for the emerging reliability and multicore resource management challenges imec/restricted 2009 41
E F41,3,5 G4,2 H
DC10,3B4A12,3
SP1 SP2
BrowserPlug-
in 1
Plug-
in 2
DMA DMA
Software (Services)
Hardware (Chosen Memory Hierarchy)
Dynamic Memory Manager gets configured
according to Software application
Software executing
Metadata Format:(Histogram of
memory size requests)
Metadata Interface (Malloc/free API)
APRES’09 Adaptive solutions for the emerging reliability and multicore resource management challenges imec/restricted 2009 42
E F G1,1 H
D12C3,3B12A
SP1 SP2
BrowserPlug-
in 1
Plug-
in 2
DMA DMA
Software (Services)
Hardware (Chosen Memory Hierarchy)
Dynamic Memory Manager switches configuration if the
application changes
Metadata Format is the same,but values have changed
Different Metadata is fed to the Dynamic Memory Manager
Video Plug-in executes
APRES’09 Adaptive solutions for the emerging reliability and multicore resource management challenges imec/restricted 2009 43
Also, the DM Manager configurations change
according to system scenarios
During run-time different scenarios are detected and the DM Manager is configured @ run-time
APRES’09 Adaptive solutions for the emerging reliability and multicore resource management challenges imec/restricted 2009 44
Why do we need DMM in 802.11b?
DM Manager
Up to 1000 packets can
be stored in memory
(Linux kernel 2.4)
APRES’09 Adaptive solutions for the emerging reliability and multicore resource management challenges imec/restricted 2009 45
Reference: Linux DM Manager
APRES’09 Adaptive solutions for the emerging reliability and multicore resource management challenges imec/restricted 2009 46
Step 1
Run Time Situations of WiFi (802.11b)
1460 different packet-sizes which have to be stored at run-time We have 1460 Run Time Situations
APRES’09 Adaptive solutions for the emerging reliability and multicore resource management challenges imec/restricted 2009 47
Step 2:
We cluster the RTSs in 3 main system scenarios
System Scenario 1: Packet = ACK size (40 Bytes)System Scenario 2: Packet = MTU size (1500 Bytes)System Scenario 3: ACK size < Packet < MTU size
APRES’09 Adaptive solutions for the emerging reliability and multicore resource management challenges imec/restricted 2009 48
Step 3:
Exploiting the system scenarios (i)
•With the combination of modules, we are able to create
different ultra-customized DM allocators for each scenario
APRES’09 Adaptive solutions for the emerging reliability and multicore resource management challenges imec/restricted 2009 49
Step 3:
Exploiting the system scenarios (ii)
We simulate/profile thousands of module combinations to evaluate which DM allocator design is better for each scenario!
APRES’09 Adaptive solutions for the emerging reliability and multicore resource management challenges imec/restricted 2009 50
Step 4:
Switching between scenarios at run time
1500 byte
PoolAlloc. Alloc. Free Free
1000 Byte Packet buffer request
(Variable system scenario detected)
Exact Fit
Request satisfied by switching to System scenario 3!
40 Byte
Pool
Fixed sized
blocks
No coalescing or
splittin
g allowed
64 Byte
Pool
128 Byte
Pool
... Byte
Pool ...
1024 Byte
Pool
First Fit
First Fit
System scenario 1 for ‘ACK system scenario’
System scenario 2 for ‘MTU system scenario’
System scenario 3 for ‘Variable system scenario’
RTS detected
APRES’09 Adaptive solutions for the emerging reliability and multicore resource management challenges imec/restricted 2009 51
Some results
APRES’09 Adaptive solutions for the emerging reliability and multicore resource management challenges imec/restricted 2009 52
Technology aware variability and reliability
challenges
• Variability and reliability characterization
• Reliability online countermeasures
• Standardized Knobs and Monitors
For more info see:IMEC TAD teamhttp://www.imec.be/tad
IMEC VAM and SKM…also in Reality FP7 projecthttp://www.fp7-reality.eu/ …and in Elixir IWT project
APRES’09 Adaptive solutions for the emerging reliability and multicore resource management challenges imec/restricted 2009 53
Next generation MOSFETs are atomic scale
devices, hence uncertain!
The simulation
Paradigm today
2016: Physical gate length =
9nm = 30x30x30 atoms
(22nm node)
2008: Physical gate length
= 22nm (65nm node)
Existing device estimation models break!
Boundaries needed with very few atoms and physics walls generate uncertainty via quantum mechanic effects
Courtesy: A. Asenov – Reality FP7 project
APRES’09 Adaptive solutions for the emerging reliability and multicore resource management challenges imec/restricted 2009 54
Variability effect
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
0 0.5 1 1.5 2 2.5
Circuit Delay
Energy per cylcle
Spec on
Circuit delay
Nominal operating point
Spec on
power
Nominal
variability cloud
APRES’09 Adaptive solutions for the emerging reliability and multicore resource management challenges imec/restricted 2009 55
Variability on IMEC’s Digital Front-end processor for
Software Defined Radio (@32nm predictive model)
x
x
+25%
+38%
+40%+80
%
All chips are equal – but some will be more equal
Which figu
re do yo
u use f
or
your wo
rst-cas
e timing
analysi
s?
APRES’09 Adaptive solutions for the emerging reliability and multicore resource management challenges imec/restricted 2009 56
Can this be ignored?
26K recent IBM 65nm CPUFrequency (GHz)
Power (Watts)
~10x variations!
~50% variation
Courtesy S. Reda and Sani R. Nassif (DATE ’09)
APRES’09 Adaptive solutions for the emerging reliability and multicore resource management challenges imec/restricted 2009 57
Is this relevant for Multicore?
• Extract out “common” pattern across dies and wafers.
• Take the mean of the data, and separate out the wafer and die components.– Presented paper on algorithms used, in DATE 2009.
systematic within-wafer systematic within-die
= +
average wafer
Courtesy S. Reda and Sani R. Nassif (DATE ’09)
APRES’09 Adaptive solutions for the emerging reliability and multicore resource management challenges imec/restricted 2009 58
Temperature drift or degradation over time
(reliability issues)
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
0 0.5 1 1.5 2 2.5
Circuit Delay
Energy per cylcle
Spec on
Circuit delay
Nominal operating point
Spec on
power
Cloud shifts due
to degradation
Cloud shifts
due to
temperature
Nominal
variability cloud
APRES’09 Adaptive solutions for the emerging reliability and multicore resource management challenges imec/restricted 2009 59
Reliability issues
• Negative Bias Temperature Instability (NBTI) √
• Hot Carrier Degradation (HCD) √
• Soft Break Down (SBD in oxide) √
• Soft Error √
• Breakdown in interconnect (TBD)
• Electro Migration
• …
√ = today R&D in progress in IMEC’s MemVAM
APRES’09 Adaptive solutions for the emerging reliability and multicore resource management challenges imec/restricted 2009 60
`
SoC
TLM
Co-simulation
Standard cells
Logic
Data sheet-like data
“statistical” device compact model
Timing & Power Reports
.lib
SRAMs, Register Files
“technology”
Variability Aware Modeling (VAM)
Variability =geometrical & chemical
Variability =electrical (V, I, R, C)
Variability =Crit. Path & Leak & Dynamic
Top-down design flow
Energy 1/fCLK
Yield
Variability =YIELD
DelayLeakage
TypicalCornerSlowCorner
Variability =Delay&energy (.lib)
XTT
XSS
APRES’09 Adaptive solutions for the emerging reliability and multicore resource management challenges imec/restricted 2009 61
Standardized Knobs & Monitors (SKM) for variability
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
0 0.5 1 1.5 2 2.5
Circuit Delay
Energy per cylcle
Spec on
Circuit delay
Nominal operating point
Spec on
power
X
“Knob” Selects high-speed/high-energy
configuration to regain the delay spec
2nd “High energy”
operating point
“ variability cloud”
XA particular circuit
instance happens
to have this
operation point
Knob
APRES’09 Adaptive solutions for the emerging reliability and multicore resource management challenges imec/restricted 2009 62
Fine-grained Knobs and Monitors in SoC
Add monitors on performance critical circuit parts
System
Sub circuits
Functional unit
Interconnect network
I/O
PLL
System processor
Core MEM
ALU
cache MEM
L1-L2 Memory
cache
SRAM
NVM
Sub circuits
logicgates
logic
FIFO
regfile
IP
IP
DMA
RF
Add operating point Knobs on performance tunable circuit parts
Put the control intelligence in [embedded] software
System software
APRES’09 Adaptive solutions for the emerging reliability and multicore resource management challenges imec/restricted 2009 63
Continued technology scaling…
• Scaling provides– Improved area, speed, power, leakage, cost…
BUT not all at the same time!
– NEW reference levels with new technologies, device concepts…
• unique trade-offs to be made for different products!
INSITE:
Integrated Solutions for Technology Exploration
• Insite provides an integrated solution to– answer questions relating to designing with emerging technologies
– enable interaction between partners at different stages of product and technology development.
APRES’09 Adaptive solutions for the emerging reliability and multicore resource management challenges imec/restricted 2009 64
INSITE program
0.186 µm2
0.186 µm2
EmergingEmergingTechnologiesTechnologies
0.5Ω
PCB-
ground
PatterningPatterningTechnologies,Technologies,
Lib correctionsLib corrections
System LevelSystem LevelPerformancePerformance
Models, Models,
Layout,Layout,Lib generationLib generation
Circuit SynthesisCircuit Synthesis
& Analysis& Analysis
IMEC targets design program at fabless/fab-lite, foundries, EDA vendors 10/5/2009 - Electronic News - EDN.com
APRES’09 Adaptive solutions for the emerging reliability and multicore resource management challenges imec/restricted 2009 65
Multicore 3D integration related challenges
• Pushing system integration further
• More bandwidth for off-chip communication
• Thermal issues
For more info see:IMEC 3D integration:www2.imec.be/imec_com/3d-integration.php
APRES’09 Adaptive solutions for the emerging reliability and multicore resource management challenges imec/restricted 2009 66
Maintaining Moore’s momentum• increase functionality with number of additional layers
3D Integration – Many System Opportunities
Memory
Processor
RF Chip
DNA Chip
MEMS
Battery
Image Sensor
MemoryMemory
ProcessorProcessor
RF ChipRF Chip
DNA Chip
MEMSMEMS
BatteryBattery
Image Sensor
Heterogeneous integration• build an integrated system with dedicated logic, SRAM, DRAM, FLASH, RF technologies
• add new sensors, batteries, etc.
3D resolves the interconnect performance limitations• the on-chip interconnect length and related repeater cost
More modular & scalable design• add new standardized components, replace existing ones with better performing ones
Sleek form factor• 1mm^3 corresponds to >100Mbit SRAM cells
Courtesy: Samsung
APRES’09 Adaptive solutions for the emerging reliability and multicore resource management challenges imec/restricted 2009 67
3D stacking – for Multicore
• Stacking chips – the traditional way
[Source: STATS ChipPAC]
[Source: IMEC]
• and Through-Silicon-Via (TSV)
APRES’09 Adaptive solutions for the emerging reliability and multicore resource management challenges imec/restricted 2009 68
3D stacking – for Multicore
40% of die area are SRAM
• Scale badly below 45nm
• High leakage
• Read/Write energy
• Low area efficiency
Replace SRAM by
stacked DRAM
APRES’09 Adaptive solutions for the emerging reliability and multicore resource management challenges imec/restricted 2009 69
More Bandwidth to Power-efficient Off-chip Memories
Source: K. Uchiyama., VLSI Circuit Digest of Technical Papers, p 6, 2008.
BW~
• BW ~ αGOPS/(CacheSize)1/3 to 10MB+ memories• Existing solutions breakdown
– On-chip integration is too costly for 10MBs (even if embedded DRAM)
– More cache is not cost efficient either ~ 8x more BW = 256x more cache [IBM]
⇒ Off-chip memories must become more energy efficient, and must support higher IO bandwidth
2012-2015
today
APRES’09 Adaptive solutions for the emerging reliability and multicore resource management challenges imec/restricted 2009 70
• Complex SoC, increase in power density
• Non-uniform hot-spots in 2D MPSoCs
• Partners: EPFL, IBM, Sun-UCSD, Philips-IMEC, Freescale-Verona
[Sun, 1.8 GHz
Sparc v9
Microproc]
[Sun, Niagara
Broadband
Processor]
In 3D chips, heat affects several layers! (even “cool” components)
Higher chances
of thermal
wear-outs
and
very short
lifetimes!
Courtesy:
[IBM
and
Irvine Sens.]
Needed: Thermal Modeling and Adaptive
Management for Multi-Processor SoC
Courtesy of David Atienza (ESL/EPFL)
APRES’09 Adaptive solutions for the emerging reliability and multicore resource management challenges imec/restricted 2009 71
Overview
• Introduction
• Multicore platforms today and in the near future
• Challenges in the multicore era
• Conclusions
APRES’09 Adaptive solutions for the emerging reliability and multicore resource management challenges imec/restricted 2009 72
Conclusions
• Multicore platforms are here to stay and evolving fast to manycore– Multiple computation resources Sharing resources is an issue
– NoCs and complex memory hierarchies Data transfer and storage optimizations challenges
– Application dynamism + shared/heterogenous resources component based run-time resource management and standardized information sharing (metadata) challenges
• Technology scaling stops being ‘business as usual’and system software will take some of the pain– Variability and Reliability Adaptive reliability countermeasure
challenges
– 3D integration Thermal management adaptation challenges
APRES’09 Adaptive solutions for the emerging reliability and multicore resource management challenges imec/restricted 2009 73
And many thanks to:H. De Man, R. Lauwereins, D. Verkest, L. Van der Perre, M. Miranda, P. Christie, G. Groeseneken, F. Catthoor, P. Marchal, V. Nollet, P. Raghavan, A. Lambrechts, K. Tack, M. Jayapalla, B. Geelen