19
http:// variability.org http:// mesl.ucsd.edu Procedure Hopping: a Low Overhead Solution to Mitigate Variability in Shared-L1 Processor Clusters Abbas Rahimi , Luca Benini , and Rajesh Gupta CSE, UC San Diego DEIS, Università di Bologna International Symposium on Low-Power Electronics and Design http:// micrel.deis.unibo.it

Http://variability.org Procedure Hopping: a Low Overhead Solution to Mitigate Variability in Shared-L1 Processor Clusters Abbas Rahimi

Embed Size (px)

Citation preview

Page 1: Http://variability.org  Procedure Hopping: a Low Overhead Solution to Mitigate Variability in Shared-L1 Processor Clusters Abbas Rahimi

http://variability.org http://mesl.ucsd.edu

Procedure Hopping: a Low Overhead Solution to Mitigate Variability in

Shared-L1 Processor ClustersAbbas Rahimi‡, Luca Benini†, and Rajesh Gupta‡

‡CSE, UC San Diego†DEIS, Università di Bologna

International Symposium on Low-Power Electronics and Design 

http:// micrel.deis.unibo.it

Page 2: Http://variability.org  Procedure Hopping: a Low Overhead Solution to Mitigate Variability in Shared-L1 Processor Clusters Abbas Rahimi

Procedure Hopping to Mitigate Variability

2

Main Point

Page 3: Http://variability.org  Procedure Hopping: a Low Overhead Solution to Mitigate Variability in Shared-L1 Processor Clusters Abbas Rahimi

3

Across-wafer Frequency

VCC DroopTemperature

Clock

actual circuit delayguardband

Other uncertainty

Sources of Device Variation

10% VCC, ~160˚C Temperature, 40% VTH Variations are more challenging in a many-core platform!

Page 4: Http://variability.org  Procedure Hopping: a Low Overhead Solution to Mitigate Variability in Shared-L1 Processor Clusters Abbas Rahimi

• Sources of Variations

• Variation-tolerant Shared-L1 Processor Cluster

1. Process Variation → Variation-aware VDD-hopping

2. Dynamic Voltage Variation → Procedure hopping

• Methodology for PLV

– Design time characterization

– Compile time PLV metadata generation

– Runtime preventive compensation

• Experimental Results

4

Outline

Page 5: Http://variability.org  Procedure Hopping: a Low Overhead Solution to Mitigate Variability in Shared-L1 Processor Clusters Abbas Rahimi

Each cluster consists of:• 16 LEON-3 cores• An intra-cluster shared-L1I$ • An on-chip multi-banked tightly

coupled data memory (TCDM)• Two single-cycle logarithmic

interconnections for both instruction and data sides

• A hardware synchronization handler module (SHM) to coordinate and synchronize cores for accessing shared data on TCDM.

• VDD-hopping per core.

5

Shared-L1 TCDM cluster template

4x8 cluster: 4 PEs and an 8-bank TCDM

Shared-L1 Processor Clusters *

* D. Melpignano, L. Benini, et al., “Platform 2012, a many-core computing accelerator for embedded SoCs: performance evaluation of visual analytics applications”, DAC’12

Page 6: Http://variability.org  Procedure Hopping: a Low Overhead Solution to Mitigate Variability in Shared-L1 Processor Clusters Abbas Rahimi

Three cores (f4, f8, f9) cannot meet the target frequency of 830MHz.

6

VDD = 0.81V VDD = 0.99V VA-VDD-Hopping=( 0.81V 0.99V, )

f0

862

f1

909

f2

870

f3

847

f4

826

f5

855

f6

877

f7

893

f8

820

f9

826

f10

909

f11

847

f12

901

f13

917

f14

847

f15

901

f0

862

f1

909

f2

870

f3

847

f4

1370

f5

855

f6

877

f7

893

f8

1370

f9

1370

f10

909

f11

847

f12

901

f13

917

f14

847

f15

901

f0

1408 f1

1389 f2

1408 f3

1370 f4

1370 f5

1408 f6

1408 f7

1408 f8

1370 f9

1370 f10

1389 f11

1370 f12

1408 f13

1408 f14

1389 f15

1389

VDD–hopping to Compensate Process Variation

All cores of the same cluster meet the target frequency of 830MHz.

VA-VDD-hopping can accordingly tune the cores' voltage based on their delay reported by CPMs.

Page 7: Http://variability.org  Procedure Hopping: a Low Overhead Solution to Mitigate Variability in Shared-L1 Processor Clusters Abbas Rahimi

VDD–hopping to Compensate Process Variation

7

... I$Bi-1I$B0

Log. Interc.

Core15

VA

-VD

D-h

oppi

ng

... TCDMBj-1TCDMB0

Log. Interc.

Low VDD

Typical VDD

High VDD

DF

S...

f+18

0°f+

180°

f

CPM

Level ShiftersLevel Shifters

Level ShiftersLevel Shifters

SHM

PSS

Core0

VA

-VD

D-h

oppi

ng

CPM

PSS

• Every core have its own voltage domain• All cores work with the same frequency • VDD-hopping tunes the voltage of each core based on CMP.

Each core increases voltage if its delay is high.

The process variation is compensated

but, cluster will have various Voltage/Temperature-islands!

f0

862

f1

909

f2

870

f3

847

f4

1370

f5

855

f6

877

f7

893

f8

1370

f9

1370

f10

909

f11

847

f12

901

f13

917

f14

847

f15

901

Page 8: Http://variability.org  Procedure Hopping: a Low Overhead Solution to Mitigate Variability in Shared-L1 Processor Clusters Abbas Rahimi

• The IR-drop of execution of FIR on cores with various operating corners.

• FIR does not face any voltage emergency (IR-drop < 4%) at the corners with voltages of 0.81V-0.9V due to their lower power densities.

8

(Vol., Temp.) 0.99V, 125C 0.90V, 25C 0.81V, 125C 0.81V, -40C

Power density 0.66 μW/μm2 0.21 μW/μm2 0.18 μW/μm2 0.16 μW/μm2

Max IR-drop 44 mV < 35 mV < 31 mV < 31 mV4444 44 44

Fast Dynamic IR-drop within Cluster

Page 9: Http://variability.org  Procedure Hopping: a Low Overhead Solution to Mitigate Variability in Shared-L1 Processor Clusters Abbas Rahimi

Procedure hopping to Compensate Voltage Variation

9

... I$Bi-1I$B0

Log. Interc.

Core15

VA

-VD

D-h

oppi

ng

... TCDMBj-1TCDMB0

Log. Interc.

Low VDD

Typical VDD

High VDD

DF

S...

f+18

0°f+

180°

f

CPM

Level ShiftersLevel Shifters

Level ShiftersLevel Shifters

SHM

PSS

Core0

VA

-VD

D-h

oppi

ng

CPM

PSS

Procedure hopping facilitates fast and proactive migration of procedures within a cluster to prevent voltage variation thanks to shared I$ and TCDM resources.

Each procedure hops from one core to another if it causes voltage variation.

Page 10: Http://variability.org  Procedure Hopping: a Low Overhead Solution to Mitigate Variability in Shared-L1 Processor Clusters Abbas Rahimi

• Sources of Variations

• Variation-tolerant Shared-L1 Processor Cluster

1. Process Variation → Variation-aware VDD-hopping

2. Dynamic Voltage Variation → Procedure hopping

• Methodology for PLV

– Design time characterization

– Compile time PLV metadata generation

– Runtime preventive compensation

• Experimental Results

10

Outline

Page 11: Http://variability.org  Procedure Hopping: a Low Overhead Solution to Mitigate Variability in Shared-L1 Processor Clusters Abbas Rahimi

Procedure-level Vulnerability (PLV) • The notion of PLV to fast dynamic voltage variation is

defined.• The design time stage analyzes the dynamic voltage

droops/rises for every ProcX under full operating conditions generating PLVx metadata.

11

int ProcX (…) { …

}

(Vi,Tj)

Corei

Observe IR-drops

(V,T) PLVX

V1,T1 0.75

V2,T2 0.35

V3,T3 0.01

… …

Page 12: Http://variability.org  Procedure Hopping: a Low Overhead Solution to Mitigate Variability in Shared-L1 Processor Clusters Abbas Rahimi

Characterization of PLV to IR-drop: Compile time + Runtime

12

Open-source Leon3

Design Compiler

IC Compiler

PrimeTime PX

ModelSimVsim

VHDL Timing constraints

Verilognet-list

Verilognet-list

Parasitics

Switchingactivity

ProcX

Power @(Vi,Tj)

DynamicVoltage droop/rise @(Vi,Tj)

Object code

PLVcharacterized metadata

For ProcX@Caller :Read current (V,T) sensors of Corei

Read characterized metadata for ProcX

If PLVX > PLV_thresholdInvoke Procedure Hopping (ProcX@Callee)

VA-Proc generation: ProcX/ProcX@Caller/

ProcX@Callee

Generating metadata

Operating condition

(V,T) monitor

Design time Compile time

RuntimeL

eon

-3:

Co

rei

(0.81V,-40˚C)

(0.90V,25˚C)

(0.99V,125˚C)

TSMC 45nm LIBs

Prim

eRail

SD

F

(0.81V,125˚C)

Source code

(V,T)

Exec

uta

ble

s

BCC Compiler

VA-Procedures’ source code

• At compile time, PLVx metadata of ProcX is attached to the procedure.• During runtime, the discretized (V,T) point to the corresponding characterized

PLV metadata to assess the vulnerability of ProcX at the current (V,T).• If PLVx ≥ PLV_threshold, the ProcX will be hopped from caller core to a favor

callee core.

Page 13: Http://variability.org  Procedure Hopping: a Low Overhead Solution to Mitigate Variability in Shared-L1 Processor Clusters Abbas Rahimi

• Sources of Variations

• Variation-tolerant Shared-L1 Processor Cluster

1. Process Variation → Variation-aware VDD-hopping

2. Dynamic Voltage Variation → Procedure hopping

• Methodology of PLV

– Design time characterization

– Compile time PLV metadata generation

– Runtime preventive compensation

• Experimental Results

13

Outline

Page 14: Http://variability.org  Procedure Hopping: a Low Overhead Solution to Mitigate Variability in Shared-L1 Processor Clusters Abbas Rahimi

Max Voltage Variation Across Corners and Procedures

14

(Vol., Temp.) a2tim FIR IFFT bitmnp cacheb IDCT matrix pntrch PWM sspeed tblook ttsprk 0.99V, 125°C 5.39 4.46 6.34 5.03 4.62 6.26 5.89 5.36 5.23 5.05 3.84 5.410.90V, 25°C 3.65 2.98 4.63 3.47 3.11 4.41 4.09 3.63 3.65 3.48 2.44 4.99

0.81V, 125°C 3.45 2.8 3.7 3.43 2.92 3.77 3.63 3.39 3.27 3.33 2.29 3.630.81V, -40°C 3.34 2.72 3.66 3.34 2.84 3.7 3.53 3.26 3.24 3.24 2.22 3.53

Max voltage droop (%)

0

1

2

3

4

5

6

7

8

Max v

olt

ag

e r

ise (

%)

(0.81V, -40°C) (0.81V, 125°C) (0.90V, 25°C) (0.99V, 125°C)• Most of procedures running at cores with 0.99V have voltage emergencies.

• At 0.9V, only four procedures (IFFT, IDCT, matrix, ttsprk) face the voltage emergencies.

• No voltage emergency at 0.81V.• Procedure hopping avoids the voltage emergency for all

procedures by hopping them form a high-voltage core to a low-voltage core.

Page 15: Http://variability.org  Procedure Hopping: a Low Overhead Solution to Mitigate Variability in Shared-L1 Processor Clusters Abbas Rahimi

Cost of Procedure Hopping

• The total roundtrip overhead of the hopping a procedure from the caller core and returning the results from the callee core is less than 800 cycles.

• This overhead is less than 1% of the total cycles needed to execute any of the characterized procedures in EEMBC benchmark.

• During the procedure hopping no voltage emergency can occur even at (0.99V,125˚C), neither in the caller nor the callee core.

15

 Caller

hoppingCaller

not hoppingCalleeservice

Calleeno service

Latency 218 cycles 88 cycles 575 cycles 342 cyclesVoltage droop 1.3% 0.6% 2.9% 1.8%

Page 16: Http://variability.org  Procedure Hopping: a Low Overhead Solution to Mitigate Variability in Shared-L1 Processor Clusters Abbas Rahimi

Conclusion • The notion of procedure-level vulnerability to fast

dynamic voltage variation is defined.• Based on PLV metadata, a fully-software low-cost

procedure hopping technique is proposed which guarantees the voltage emergency-free migration of all procedures, fast and proactively enough within a shared-L1 processor cluster.

• Full post-P&R results in 45nm TSMC technology confirms that the procedure hopping avoids the voltage emergency across a variability-affected cluster, while imposing only an amortized cost of less than 1% latency for any of the characterized embedded procedures.

16

Page 17: Http://variability.org  Procedure Hopping: a Low Overhead Solution to Mitigate Variability in Shared-L1 Processor Clusters Abbas Rahimi

17

Thank you!

http://variability.org

Acknowledgment• NSF Variability Expedition• ERC Multitherman Project

Page 18: Http://variability.org  Procedure Hopping: a Low Overhead Solution to Mitigate Variability in Shared-L1 Processor Clusters Abbas Rahimi

HW/SW Collaborative Architecture to Support Intra-cluster Procedure Hopping

18

• The code is easily accessible via the shared-L1 I$.• The data and parameters are passed through the shared stack in TCDM. • A procedure hopping information table (PHIT) keeps the status for a migrated

procedure.

…ProcX@Callee:if (calculate_PLV ≤ PLV_threshold)

set_statusX_PHIT = runningload_contex&param_from_SSPX

set_all_param&pointerscall ProcX

store_contex_to_SSPX

set_statusX_PHIT = donesend_broadcast_ack

else resume_normal_execution

…Broadcast_req_ISR:ProcX@Callee = search_in_PHITcall ProcX@Callee

…call ProcX //conventional compile Call ProcX@Caller //VA-compile…ProcX@Caller:If (calculate_PLV ≤ PLV_threshold)

call ProcX

else create_shared_stack_layoutset_PHIT_for_ProcX

send_broadcast_reqset_timerwait_on_ack_or_timer

…Broadcast_ack_ISR:if (statusX_PHIT == done)

load_context&return_from_SSPX

Shared Local

Heap

Shared Stack

ProcXProcX

@Callee

PHIT

Op

era

ting

Co

n. M

on

it.Interrup

t Co

nt.O

pe

ratin

g C

on

. Mo

nit.

Inte

rrup

t C

ont

.

TCDM

Sh

aredL

1 -I$

Callee Corek Caller Corei

ProcX

@Caller……

Stacks

Page 19: Http://variability.org  Procedure Hopping: a Low Overhead Solution to Mitigate Variability in Shared-L1 Processor Clusters Abbas Rahimi

Intra-procedure Peak Power Variation

• Maximum of 1.28× intra-corner peak power variation occurs between IFFT and tblook procedures at (0.81V,125C).

• Maximum inter-corner peak power variation is 3.5× for FIR.• Maximum of 4.1× peak power variation across corners and

procedures, a2time at (0.81V,-40C), and IFFT at (0.99V,125C).

19

0.0

0.5

1.0

1.5

2.0

2.5

3.0

3.5

4.0

Pea

k p

ow

er (

W)

(0.81V, -40°C) (0.81V, 125°C) (0.90V, 25°C) (0.99V, 125°C)