Upload
others
View
6
Download
0
Embed Size (px)
Citation preview
SCoRPiO: Significance based Computingfor Reliability and Power Optimization
http://www.scorpio-project.eu/
This poster has been prepared by the CERTH research group with input from RWTH-Aachen, EPFL and QUB research groups.
1. VisionJust like parallelism, we want to elevate compu-tational significance as a first class concern in thedesign of algorithms and systems.Exploit computational significance to build energyefficient hardware and software platforms thatscale gracefully in case of errors induced by scaledvoltages and/or parametric variations.
2. Motivation• Power/Energy dissipation remains #1 constraint
of future systems.
• Several application domains offer the opportunityto trade-off quality of service for significant im-provements in energy comsumption (e.g. JPEG).
• Preliminary fault injection trials on GEM5 simu-lator suggest that tolerance to injected faults washighly dependent on the spatial location of thefaults as well as the specific portion of the affectedcode.
- PC and IFetch modules are very vulnerable evento single errors.
- Arithmetic operations are often error tolerant, es-pecially if they are not used for address calcula-tions
3. Objectives• Introduce computational significance as an algo-
rithmic property and expose it to the level of theprogrammer.
• Devise techniques that facilitate automatic char-acterization of code and data significance usingcompile-time or runtime analysis.
• Lay the necessary foundations at all levels of sys-tem stack allowing controlled quality executionon unreliable hardware substrates.
• Enable drastic power dissipation reduction by op-portunistically and aggressively powering partsof the platform below nominal values in a tar-geted way.
• Produce a vertically integrated system prototypeto prove the efficacy of significance-based com-puting.
4. Semi-Automatic significance analysis• Introduce a rigorous mathematical definition of
computational significance.
• Apply a set of criteria to (semi)automatically char-acterize significance.
• Partition source code to slices of varying signifi-cance.
• Example of interval arithmetic and algorithmicdifferentiation:
- Original code:v1 = log( x1 );v2 = v1 + x2;y = v2;
- Input range for x1 : [1, 2]
- Input range for x2 : [1, 20]
- Significance criterion : w([v] · ∇[v][y]) > ε
- Significance bound : ε = 1
[v] ∇[v][y] [v] · ∇[v][y][x1] [1, 2] [0.5, 1] [0.5, 2][x2] [1, 20] [1, 1] [1, 20][v1] [0, 0.693] [1, 1] [0, 0.693][y] [1, 20.7] [1, 1] [1, 20.7]
Interpretation: Intermediate v1 turns out to be in-significant over the complete input ranges of x1 andx2, since w([v1] · ∇[v1][y]) < ε. Using the midpointm[v1] = 0.3465 of v1 as constant initializer for v1, thecode can be simplified for the complete input rangeof x1 and x2 to:
v1 = 0.3465;y = v1 + x2;
Interval evaluation of the modified code for ranges[x1] = [1, 2], [x2] = [1, 20] gives [y] = [1.35, 20.3],which is a sub-interval of the output range of theoriginal code.
6. Hardware modeling and design• Develop instruction level power and behavior
models under various degrees of voltage scalingand variations.
• Exploit the dynamic timing profile and modifythe circuits in conjunction with the core micro-architecture to enable graceful performance degra-dation.
• Cores are enhanced with low cost error detec-tion/correction mechanisms to help in adjustingtheir reliability and energy efficiency.
• Extended ISA to support both approximate andaccurate instructions.
• A simulator that supports the overall significancedriven vertical stack and allows its evaluation atvarious operating modes will be developed.
This work has been supported by the EC within the 7th Framework Programme under the FET-Opengrant agreement SCoRPiO, grant number 323872.
5. Task-based programming model andruntime systemProgramming Model:• Code is tagged with significance information at
task or even subtask level.
• Significance determines the (un)reliability of thecore used for task execution.
• Offer recovery mechanisms coupled with sanitychecking functions.
• Develop and deploy smart fault-handling mecha-nisms for graceful performance degradation suchas statistical error correction.
• Pragmas for instantiation and synchronizationof tasks (ompss extensions).
Runtime System:• Component-level control of power, performance
and reliability• Significance-aware scheduling and memory man-
agement• Dynamic optimization under reliability con-
straints• Significance-aware runtime system monitors the
dynamic behavior of the cores:- Did the core crash due to a fault?- If not, how many faults could the hardware de-
tect?• Based on the information conveyed by the hard-
ware, the runtime system calls the sanity functionand takes corrective actions.