14
Language-Based Expression of Reliability and Parallelism for Low-Power Computing Alcides Fonseca , Frederico Cerveira , Bruno Cabral, and Raul Barbosa Abstract—Improving the energy-efficiency of computing systems while ensuring reliability is a challenge in all domains, ranging from low-power embedded devices to large-scale servers. In this context, a key issue is that many techniques aiming to reduce power consumption negatively affect reliability, while fault tolerance techniques require computation or state redundancy that increases power consumption, thereby leading to systematic tradeoffs. Managing these tradeoffs requires a combination of techniques involving both the hardware and the software, as it is impractical to focus on a single component or level of the system to reach adequate power consumption and reliability. In this paper, we adopt a language-based approach to express reliability and parallelism, in which programs remain adaptable after compilation and may be executed with different strategies concerning reliability and energy consumption. We implement the proposed programming model, which is named MISO, and perform an experimental analysis aiming to improve the reliability of programs, through fault injection experiments conducted at compile-time, as well as an experimental measurement of power consumption. The results obtained indicate that it is feasible to write programs that remain adaptable after compilation in order to improve the ability to balance reliability, power, and performance. Index Terms—Programming languages, dependability, low-power computing, parallelism Ç 1 INTRODUCTION C OMPUTING systems are continually being improved toward lower operating voltages and smaller scales, leading to significant performance improvements while reducing energy consumption. However, hardware faults also become more frequent. Consequently, designers must carefully balance reliability, power, and performance. Designing reliable systems is usually achieved through fault tolerance techniques aiming at detecting, isolating, and recovering from errors. Fault tolerance involves some form of redundancy, at some level in the system, and therefore imposes some energy overhead. Thus, achieving low-power computing while guaranteeing the necessary dependability is a challenging task for most applications. Reducing the energy needed for an integrated circuit to execute a given task can be achieved through a vast number of advanced techniques, such as energy-efficient circuit design, scheduling algorithms, approximate computing, etc. Some of these approaches may improve reliability, while other approaches may reduce reliability—for example, reduc- ing power consumption may reduce operating temperature thereby increasing reliability, while lowering operating voltages may increase the soft error rate and therefore reduce reliability. In this paper we argue that the software should be written in a way such that it remains adaptable, or malleable, after compilation, in order to improve the balance between fault tolerance, power, and performance. We propose to achieve this by means of a language-based technique that expresses programs as sets of cells containing a part of the program state and a state-transition function. Reliability is achieved through software-implemented fault tolerance techniques that employ redundancy in a program’s state and in the execution of state-transition functions. Reducing energy consumption while attending to performance requirements is achieved through parallel execution of different cells. With this approach, a single program can be executed with or without redundancy, and it can use parallelism or execute sequentially, without modifying the original program. The main contributions of the paper are: A programming model that simplifies how parallel programs are expressed through the usage of cells that hold state-related information and state-transition functions. The model is designed to facilitate analysis toward reducing power consumption, extracting par- allelism, and increasing reliability. The programming model is called MISO, which stands for Multiple Input Single Output. A method for parallelism extraction from MISO programs, built upon the ability of the model to express dependencies among cells. Each cell is local- write-only but globally able to read from all other cells. This feature of the model allows parallelism to be extracted from MISO programs, reducing execution time and power consumption. A. Fonseca is with LASIGE, Faculdade de Cincias, Universidade de Lisboa, Lisboa P-1749 016, Portugal. E-mail: [email protected]. F. Cerveira, B. Cabral, and R. Barbosa are with CISUC, Department of Informatics Engineering, University of Coimbra, Coimbra P-3030 290, Portugal. E-mail: {fmduarte, bcabral, rbarbosa}@dei.uc.pt. Manuscript received 4 Feb. 2017; revised 26 Oct. 2017; accepted 31 Oct. 2017. Date of publication 8 Nov. 2017; date of current version 6 Sept. 2018. (Corresponding author: Alcides Fonseca.) Recommended for acceptance by D. Zhu, M. Shafique, M. Lin, and S. Pasricha. For information on obtaining reprints of this article, please send e-mail to: [email protected], and reference the Digital Object Identifier below. Digital Object Identifier no. 10.1109/TSUSC.2017.2771376 IEEE TRANSACTIONS ON SUSTAINABLE COMPUTING, VOL. 3, NO. 3, JULY-SEPTEMBER 2018 153 2377-3782 ß 2017 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See ht_tp://www.ieee.org/publications_standards/publications/rights/index.html for more information.

Language-Based Expression of Reliability and Parallelism ... · Title: Language-Based Expression of Reliability and Parallelism for Low-Power Computing Subject: IEEE Transactions

  • Upload
    others

  • View
    7

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Language-Based Expression of Reliability and Parallelism ... · Title: Language-Based Expression of Reliability and Parallelism for Low-Power Computing Subject: IEEE Transactions

Language-Based Expression of Reliabilityand Parallelism for Low-Power Computing

Alcides Fonseca , Frederico Cerveira , Bruno Cabral, and Raul Barbosa

Abstract—Improving the energy-efficiency of computing systems while ensuring reliability is a challenge in all domains, ranging from

low-power embedded devices to large-scale servers. In this context, a key issue is that many techniques aiming to reduce power

consumption negatively affect reliability, while fault tolerance techniques require computation or state redundancy that increases power

consumption, thereby leading to systematic tradeoffs. Managing these tradeoffs requires a combination of techniques involving both

the hardware and the software, as it is impractical to focus on a single component or level of the system to reach adequate power

consumption and reliability. In this paper, we adopt a language-based approach to express reliability and parallelism, in which

programs remain adaptable after compilation and may be executed with different strategies concerning reliability and energy

consumption. We implement the proposed programming model, which is named MISO, and perform an experimental analysis aiming

to improve the reliability of programs, through fault injection experiments conducted at compile-time, as well as an experimental

measurement of power consumption. The results obtained indicate that it is feasible to write programs that remain adaptable after

compilation in order to improve the ability to balance reliability, power, and performance.

Index Terms—Programming languages, dependability, low-power computing, parallelism

Ç

1 INTRODUCTION

COMPUTING systems are continually being improvedtoward lower operating voltages and smaller scales,

leading to significant performance improvements whilereducing energy consumption. However, hardware faultsalso become more frequent. Consequently, designers mustcarefully balance reliability, power, and performance.

Designing reliable systems is usually achieved throughfault tolerance techniques aiming at detecting, isolating, andrecovering from errors. Fault tolerance involves some formof redundancy, at some level in the system, and thereforeimposes some energy overhead. Thus, achieving low-powercomputing while guaranteeing the necessary dependabilityis a challenging task for most applications.

Reducing the energy needed for an integrated circuit toexecute a given task can be achieved through a vast numberof advanced techniques, such as energy-efficient circuitdesign, scheduling algorithms, approximate computing, etc.Some of these approaches may improve reliability, whileother approachesmay reduce reliability—for example, reduc-ing power consumption may reduce operating temperaturethereby increasing reliability, while lowering operating

voltages may increase the soft error rate and therefore reducereliability.

In this paper we argue that the software should be writtenin a way such that it remains adaptable, or malleable, aftercompilation, in order to improve the balance between faulttolerance, power, and performance. We propose to achievethis by means of a language-based technique that expressesprograms as sets of cells containing a part of the programstate and a state-transition function. Reliability is achievedthrough software-implemented fault tolerance techniquesthat employ redundancy in a program’s state and in theexecution of state-transition functions. Reducing energyconsumption while attending to performance requirementsis achieved through parallel execution of different cells.With this approach, a single program can be executed withor without redundancy, and it can use parallelism or executesequentially, without modifying the original program.Themain contributions of the paper are:

� A programming model that simplifies how parallelprograms are expressed through the usage of cellsthat hold state-related information and state-transitionfunctions. The model is designed to facilitate analysistoward reducing power consumption, extracting par-allelism, and increasing reliability. The programmingmodel is called MISO, which stands for Multiple InputSingleOutput.

� A method for parallelism extraction from MISO

programs, built upon the ability of the model toexpress dependencies among cells. Each cell is local-write-only but globally able to read from all othercells. This feature of the model allows parallelism tobe extracted from MISO programs, reducing executiontime and power consumption.

� A. Fonseca is with LASIGE, Faculdade de Cincias, Universidade de Lisboa,Lisboa P-1749 016, Portugal. E-mail: [email protected].

� F. Cerveira, B. Cabral, and R. Barbosa are with CISUC, Department ofInformatics Engineering, University of Coimbra, Coimbra P-3030 290,Portugal. E-mail: {fmduarte, bcabral, rbarbosa}@dei.uc.pt.

Manuscript received 4 Feb. 2017; revised 26 Oct. 2017; accepted 31 Oct. 2017.Date of publication 8 Nov. 2017; date of current version 6 Sept. 2018.(Corresponding author: Alcides Fonseca.)Recommended for acceptance by D. Zhu, M. Shafique, M. Lin, and S. Pasricha.For information on obtaining reprints of this article, please send e-mail to:[email protected], and reference the Digital Object Identifier below.Digital Object Identifier no. 10.1109/TSUSC.2017.2771376

IEEE TRANSACTIONS ON SUSTAINABLE COMPUTING, VOL. 3, NO. 3, JULY-SEPTEMBER 2018 153

2377-3782� 2017 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See ht _tp://www.ieee.org/publications_standards/publications/rights/index.html for more information.

Page 2: Language-Based Expression of Reliability and Parallelism ... · Title: Language-Based Expression of Reliability and Parallelism for Low-Power Computing Subject: IEEE Transactions

� A compile-time approach to ensure a specified reli-ability objective, that takes as input the expectedhardware-fault rate and the intended reliability of aprogram, and generates an appropriate runtime toexecute that program. The approach is based on con-ducting fault injection experiments automatically atcompile-time to measure a program’s vulnerabilityto hardware faults and select appropriate fault-handling mechanisms. This approach considers bothsequential and parallel programs, taking advantageof the parallel processors regardless of the parallel-ism in the initial program.

The proposed MISO model adopts principles of theactor model [1] combined with a read-global write-localapproach, and is specifically designed aiming to lowerpower consumption and allowing software-implementedfault tolerance techniques. This model is inspired bycellular automata and can directly represent programs inthe fields of biology, chemistry [2], medicine [3], physics[4], [5], astronomy [6], economics [7], and urban plan-ning [8].

Extracting parallelism from MISO programs is performedautomatically and provides the means to execute a programon multiple processing units, and save energy by turningoff the chip earlier than what would be possible with theequivalent sequential program.

Improving reliability is achieved through a dynamicanalysis technique that we propose, consisting of runningfault injection experiments at compile-time in order tochoose an appropriate runtime environment for running aMISO program. A single MISO program can be adapted byapplying some form of redundancy, such as duplicateexecution with comparison [9] to detect errors or dividinga program into multiple processes [10], [11] to isolate andprevent error propagation. A program is targeted usingfault injection to determine the probability distribution ofthe distinct failure modes. These probabilities are comparedwith a target reliability value and the expected hardware-fault rate, and the result is an appropriate runtime for exe-cuting the MISO program.

The experimental results of targeting distinct programsshow that it is feasible to write programs that remainadaptable after compilation, thereby improving the abilityto balance fault tolerance, power consumption, andperformance.

The remainder of the paper is organized as follows.Section 2 identifies related work and compares the proposedapproach with existing solutions. Section 3 introduces theMISO programming and the MISORUST language. Analyzingprograms aiming to extract parallelism is discussed inSection 4, and Section 5 provides the means to apply faulttolerance techniques to MISO programs. Section 6 presentsthe experimental results evaluating power consumptionand reliability of MISO programs. Section 7 discusses the mainobservations and limitations of the proposed approach.Section 8 closes the paperwith themain conclusions.

2 RELATED WORK

There has been several approaches for using programminglanguages to express the parallel patterns that can occur inthe code. Cilk [12], OpenMP [13] and MPI [14] are examples

of language extensions that have converted sequential lan-guages (C, for instance) into having parallel semantics. Theapproach taken by these extensions is to be able to translatefrom sequential C with as little effort as possible. Thus, thesuggested approach is to write a sequential program first,and then parallelize it, step by step.

Communicating Sequential Processes [15] (CSP) is ano-ther approach for writing parallel programs that avoidsconcurrency issues. Processes are independently executed,except for reading and writing from shared channels.The Go programming language [16] follows this modelusing goroutines. The actor model [1] uses the samephilosophy as CSP, in which actors perform their workindividually and can only communicate with others usingsynchronous mailboxes. The actor model has been used toautomatically extract task parallelism [17]. Charm++ [18]implements the actor model with extensions, one of themfor hierarchical read-only information sharing acrossactors, in order to minimize communication delays fromsynchronization.

Over the last decade, several new languages have beendeveloped that do not follow a sequential execution model.These languages allow the programmer to describe the pro-gram without having to explicitly create threads and man-age concurrency. X10 [19], Fortress [20], Chapel [21] andAEMINIUM [22] are examples of such languages. Fortressis an evolution of Fortran in which several constructs (suchas loops, tuples, arguments) have parallel semantics insteadof being sequential. X10 is a language that is based on theconcept of places, memory regions that cannot be accessedconcurrently. If two functions do not access any commonplace, they can be executed in parallel. Chapel uses a relatedconcept, a locale, that represents a unit of computation thathas uniform access to memory, assigned as domains. Opera-tions over domains can be performed in parallel, as long asthere is no concurrent access to the same subset of a domain.In minium, all operations occur in parallel, as long as theannotated data permissions are not violated. minium uses adataflow approach, in which the program is converted intoa Directed Acyclic Graph (DAG) of tasks that are scheduledby a runtime system.

On the hardware side, processors are being designedwith parallelism in mind. As the underlying hardware pro-gresses toward smaller manufacturing processes andtoward a lower energy consumption [23]. Considering thesetwo trends in processor design, the hardware fault rate isexpected to increase significantly [24], [25]. Thus, programsthat target highly parallel processors should also considerfault tolerance features.

Hardware faults have multiple sources [26] and differenttechniques can be used to mitigate their impact. Fault-handling techniques have been proposed at multiple levels,including the software. Examples include running two ormore replicas of a given software in order to detect errors,and recover through voting or re-execution of processes [10],[11], [27], [28], [29]. Service migration and lightweightcheckpointing mechanisms can also be used, among othertechniques [30], [31]. A compiler-based approach likeSWIFT [9] replicates instructions in the binary file and addsstate comparison routines at certain points to detect andrecover from transient hardware faults. Control-flow

154 IEEE TRANSACTIONS ON SUSTAINABLE COMPUTING, VOL. 3, NO. 3, JULY-SEPTEMBER 2018

Page 3: Language-Based Expression of Reliability and Parallelism ... · Title: Language-Based Expression of Reliability and Parallelism for Low-Power Computing Subject: IEEE Transactions

checking is also added to ensure that faults do not changethe control flow in a way that bypasses the fault tolerancemechanisms. RAFT [11] and PLR [10] provide fault toler-ance at run-time by executing two or more replicas of thesame process and using system call comparison to detectfaults. However relying on system calls as the trigger foroutput comparison may not be an adequate approach whenapplied to applications that have little system call usage(e.g., long-running CPU-heavy computations).

Fault-tolerance at the programming language has beentackled in TALFT [32], a typed assembly that is able to provethat programs detect register-related faults through dupli-cation of registers and program counters. This approachworks on the machine instruction level, being agnostic tothe high-level programming language, but relies on the pro-gram being sequential, which does not apply to new paralleland concurrent programing models. This duplication hasalso been presented in EDDI [33], where instructions areduplicated in different registers.

A concern when implementing fault-handling techni-ques is their energy impact. Evaluating checkpointing tech-niques [34] concluded that checkpoints and RAM-loggingwas more efficient than hard-drive logging, given that theyare faster. Checkpointing, logging and parallel recoveryhave also been compared with regards to energy effi-ciency [35]. For executions with failures, parallel recoveryis the approach with less energy overhead. ECOFIT [36]is a framework designed to estimate the performance offault-tolerant protocols in High Performance Computing.The trade off between energy consumption and error cover-age provided by fault-handling mechanisms has been stud-ied, with several Dynamic Voltage and Frequency Scaling(DVFS) configurations presented as possible solutions forreal-time hardware [37]. At the cluster-level, an adaptiveunified load manager has been proposed [38], using check-point/restart mechanisms for fault-tolerance and granular-ity management for energy-awareness. Low-powereddevices may improve energy efficient, but may introduce alower precision or higher fault-rate. Programming lan-guages have been extended to consider reliable and unreli-able memory regions, in order to allow programmers toexpress programs that can tolerate faults in selectedparts [39], [40].

Evaluating computer systems in presence of errors andthe effectiveness of their fault-handling mechanisms hasbeen performed extensively in the past, through fault injec-tion. This technique consists of emulating faults and observ-ing the behavior of programs. In this paper we use asoftware-implemented fault injection (SWIFI) techniquethat draws from existing approaches in the literature,namely Xception [41] and Goofi [42].

3 THE MISO PROGRAMMING MODEL AND THE

MISORUST LANGUAGE

MISORUST is a programming language that follows the MISO

cell-based programming model, abbreviated to cell model.The cell model is a generic programming approach, whichcan be implemented as an extension to any programminglanguage, from high-level languages like Python to morebare-metal languages such as C. While a previous version

[43] was built extending the Scala programming language,the current implementation is built on top of the Rust pro-gramming language, for performance and safety reasons.This implementation is available online at https://github.com/alcides/miso-rust.

To illustrate the cell model, let us consider the example inListing 1. The FibonacciCell is a cell template that calcu-lates the nth fibonacci number iteratively. A cell is definedby its memory and its transition function. The memory ofa cell can hold any number of objects of different types,similarly to a C struct or a Java class. In this example, thecell holds its n value, the respective fibonacci number fibðnÞ(cfib) and the previous value fibðn� 1Þ (pfib).

Listing 1. The Fibonacci cell template

cell FibonacciCell {

n : int,

cfib : int,

pfib : int,

} => self, previous_state, world {

self.n = previous_state.n + 1;

self.pfib = previous_state.cfib;

self.cfib = previous_state.cfib +

previous_state.pfib;

}

The transition function is defined after the => operator.self represents the current cell state, and holds the mem-ory items defined in the cell (such as n, cfib and pfib).previous_state hold the last cell state, which is eitherthe initial state, or the result of the last transition functionapplication. world represents the access to the memory ofother cells.

The most important rule of MISO is that cells can onlywrite in their own memory. In practice, only items insideself can be written to, while previous_state andworld can only be read. This rule has several implicationsfor parallelization and dependability that will be discussedin later sections.

The world is a singleton that holds all the cell instances.Similarly to class-based object-oriented programming, celltemplates are types that can be instantiated. In Listing 2, aworld is defined with just one instance of the Fibonacci-

Cell cell. Thus, inside the previous transition function,world.fcwould be equivalent to previous_state. How-ever, it is possible to access other cell instances by name,through the world singleton. Each variable defined insidethe world can have default values, and that is how differentcells of the same type can execute with different values.

Listing 2. The world singleton for the Fibonacci example

world {

fc : FibonacciCell

}

Memory is shared across cells using the world, allowingfor different parts of the program to interact with eachother. Since there are only reads of memory that is notowned by the current cell, there are no race conditions inMISO programs. MISO programs have a synchronized

FONSECA ET AL.: LANGUAGE-BASED EXPRESSION OF RELIABILITY AND PARALLELISM FOR LOW-POWER COMPUTING 155

Page 4: Language-Based Expression of Reliability and Parallelism ... · Title: Language-Based Expression of Reliability and Parallelism for Low-Power Computing Subject: IEEE Transactions

semantic across cells, i.e., all cells have their transition func-tion executed simultaneously. In order to evidence thissimultaneous behavior for the programmer, the new andold states are separated in self and previous_state cellstates.

The MISORUST language can be defined by the BNF gram-mar below. The language has an additional semanticsrestriction: previous_state and world variables insidethe transition function are read-only.

Finally, the last component of MISORUST is the runtime.MISO separates program description (Cell templates andWorld) from the execution itself. A runtime function advan-ces the world a certain number of steps, or until a certaincondition is true, and then accesses the necessary informa-tion from any cell. This separation of structure and runtimeinformation allows for interchangeable runtimes that havedifferent properties in regards to parallelization anddependability.

Writing programs in MISO is different from writing pro-grams using an object-oriented approach, CSP [15] or fol-lowing the actor model [1]. Current and previous state isdistinguished, and cells can get arguments from other cells,but not modify them. This requires a different paradigmwhen programming, focusing on fetching relevant datawhere it is needed, instead of sending data to another pro-cess, either by a function call or by message passing.

In summary, the MISORUST programming language imple-ments the cell and world constructors. The world is a single-ton collection of cell instances, created from cell templates.Cell templates define the memory structure of a cell as wellas the transition function that iteratively moves the cellfrom one state to the next. Inside the transition function,

there can be only writes in the current state of the cell, butmemory can be read from previous states of other cells.

4 PARALLEL PROGRAMMING

Because of its semantics, MISO is a parallel programmingmodel by default. The semantics of synchronous simulta-neous cell transitions evidences the potential parallelism inany MISO program with more than one cell. MISO supportsboth Task and Data Parallelism, through the usage of differ-ent cell templates, or different cell instances that follow thesame template. Depending on the program specific charac-teristics, different parallelization approaches can be applied.

Listing 3. The matrix multiplication cell program. Thisprogram contains X cells that follow the MatMulCell tem-plate, where X is the number of available processors. Eachcell handles a submatrix, thus parallelizing the operation

cell MatMulCell {

x_start: u64,

x_end: u64

M1: Matrix[],

M2: Matrix[],

M3: Matrix[]

} => self, previous_state, world {

for i in self.x_start..self.x_end {

for j in 0..ROW_SIZE {

let mut a = 0.0;

for k in 0..ROW_SIZE {

a += previous_state.M1[i * ROW_SIZE + k] *

previous_state.M2[k * ROW_SIZE + j];

}

self.M3[i * ROW_SIZE + j] = a;

}

}

}

world {

cs: CellArray<MatMulCell>)

The default parallel mode of MISORUST does not makeassumptions regarding the program or the machine, it par-allelizes according to the MISO semantics, thus supportingall programs with more than one cell instance. Listing 3 pro-vides an example of the Matrix Multiplication programwritten in MISORUST, showing how different cells execute thesame code, but in different matrix indices. Parallelization isdone when defining an array of cells, instead of a singleone. Fig. 1 illustrates the parallelization process. Threads

Fig. 1. Parallelization example with four different cell instances transi-tioning three times.

hprogrami � hcell� def � listihworld� defihcell-def-listi � hcell� defi j hcell� defihcell� def � listi

hcell-defi � cellhtypeifhcell� structure� listig ¼> hself; previous state;world fhstmt� listighworld-defi � worldfhnamed� cell� listig

hnamed-cell-listi � hnamed-celli j hnamed-celli; hnamed-cell-listihnamed-celli � hidentifieri : htypeihidentifieri � An identifier of the host language representing the name of a cell

htypei � A valid type identifier; representing the type of a cell

hstmt-listi � A list of statements allowed in the host language

156 IEEE TRANSACTIONS ON SUSTAINABLE COMPUTING, VOL. 3, NO. 3, JULY-SEPTEMBER 2018

Page 5: Language-Based Expression of Reliability and Parallelism ... · Title: Language-Based Expression of Reliability and Parallelism for Low-Power Computing Subject: IEEE Transactions

are spawned for each cell instance inside the world single-ton, and may execute their transition functions at the sametime as long as they synchronize after each transition, usinga barrier. This is required to prevent cell instance #1to access state 0 of cell instance #2, while cell instance #2 isalready in step 3. This is the approach implemented inMISORUST, creating the number of threads equal to the num-ber of processors.

A second approach is based on the default one, butthreads do not need to synchronize. Instead, copies of thecell state are created to allow cells to access past states ofother cells. This approach reduces the contention on thebarrier, but increases memory usage and execution timebecause of memory allocation.

A third approach also extends the default mode with thegoal of removing unnecessary barriers. A transition func-tion may pattern match on the step count, thus performingdifferent operations at different step iterations. One exam-ple would be a parallel genetic algorithm following theIsland model, shown in Fig. 2, in which after x iterations ofthe genetic algorithm, individuals from different popula-tions migrate to another population. In these cases, the bar-rier would only be necessary after the steps that includemigration. Using static analysis on the source code, it is pos-sible to introduce the barrier only in those transitions,reducing the overall contention of the program.

Regardless of which of the three approaches is used, thegranularity of parallelization should be optimized. The paral-lelizationmodel described a 1 : 1model between threads andcell instances. However, it has been shown that a M : Nmodel is more efficient. User-level threads [44] are faster thannative OS-level threads for a large number of threads [45],[46], because context-switching user-level threads is fasterthan performing kernel calls, in the case of OS-level contextswitching. Taking into account the program structureenforced by MISO, the decision of how to group the differentcell instances in groups can be made at compile-time basedon the workload of each transition cell (using the Cost-Modelgranularity approach [47]) and based on the data dependen-cies (grouping tasks that access adjacent data [48]). Alterna-tively, this decision can be delayed to runtime, through theusage of work-stealing[49] and data packing [50].

Finally, data-parallel programs can be efficiently parallel-ized for GPUs. GPUs have a natural synchronization at the

workgroup level, requiring barriers to exist only at the globallevel, reducing the contention at every step inside eachwork-group. GPUs perform better when there is no divergentbranching in the code, i.e., all threads execute the same codeat the same time, which is frequent in the case of data paral-lelism, with all cells following the same cell template. If aprogram has several cells that follow the same template, andhave no external dependency on the OS (filesystem, sockets,etc.), it can be directly compiled to a GPU executable.

5 RELIABLE PROGRAMS

Reliability is achieved by applying fault tolerance mecha-nisms to MISO programs. There is a wide range of techniquesdesigned to improve the reliability of programs, that may beapplied to MISO cells in order to protect the state informationor the state-transition function.

Examples of techniques that may be used to protect thestate of each cell include checksums, cyclic redundancycodes, or duplication; techniques that increase the robust-ness of state-transition functions include control-flowchecks, assertions, or redundant execution with voting.Redundancy is selectively added to a MISO program, basedon the predicted failure modes of that program.

We use fault injection to identify the failure modes andtheir relative frequency of occurrence. Fault injection is ageneral technique that consists of deliberately insertingfaults during the execution of a program, in order to charac-terize its behavior in presence of faults. Each MISO programis targeted using fault injection in order to determine how itfails (the distinct failure modes) and the distribution of suchfailures. Using these results, an appropriate runtime is tai-lored to guarantee the intended reliability for the program.

5.1 Fault Tolerance Techniques

A first set of fault injection experiments was conducted toidentify the failure modes of MISORUST programs. After oneinjection, a program can produce an incorrect result, it cancrash, the program can reach a specified timeout, and inmany cases a fault has no effect on the program. The failuremodes are listed below and represent a subset of the failuremodes proposed by Avizienis et al. [51] that are sufficient toreflect our experimental results.

� Program crash. The program does not produce theexpected output and exits through a segmentationfault or some other exception. Program timeouts arealso counted as crashes, since a timeout is detectedand has no other consequence in the system. In anycase, a program crash is always detectable.

� Incorrect output. The program produces an incorrectvalue. This failure mode is often called a silent datacorruption (SDC) in the literature, as the result ofsuch an event could propagate to other componentsof the system. Incorrect output is difficult to handleat the system-level, although it can be detectedthrough redundant computation.

� No effect. The program displays the same performanceand correctness as the fault-free version, and thereforethe injected error does not change the program’sbehavior in any way that could be considered a

Fig. 2. Parallelization example of a Island model parallel genetic algo-rithm, with migrations occurring every three steps.

FONSECA ET AL.: LANGUAGE-BASED EXPRESSION OF RELIABILITY AND PARALLELISM FOR LOW-POWER COMPUTING 157

Page 6: Language-Based Expression of Reliability and Parallelism ... · Title: Language-Based Expression of Reliability and Parallelism for Low-Power Computing Subject: IEEE Transactions

failure. Examples include faults injected in unusedprocessor registers and so-called unintentional redun-dancy naturally occurring inmost programs.

Timing failures were not taken into account due to thedifficulty inherent to understanding the low-level impact ofregister bit flips on the control-flow of the application with-out introducing major overheads from profiling. For thesame reason timing failures are often disregarded in otherfault injection studies [9], [41], [52], [53], and the studies thatinclude timing failures show that they have a low probabil-ity of occurring [54].

Handling the two identified failure modes (i.e., incorrectoutput and program crash) requires specific fault tolerancemechanisms capable of performing the three well-knownsteps of detection, isolation, and recovery. The goal is toprotect MISO programs using redundancy that allows errorsto be detected, ensures that errors do not propagate amongdistinct parts of the system, and enables recovery by usingan existing value that is correct or by re-executing aprogram.

The overall goal is to attain a reliability value specified bythe program designer. We adopt the definition of reliabilityas the probability of failure on-demand. In other words,given a specified hardware-fault rate and program-specificfailure modes, reliability is defined as the probability thatthe program produces correct output after one execution. AMISO program can be executed without any redundancy,and we consider two distinct fault tolerance strategies:

� Duplication and comparison. In this strategy, depictedin Fig. 3, all cells in a program are replicated intotwo copies, the state-transition functions executeredundantly (in parallel whenever possible [55]) andthe output is voted. This strategy allows nearly allerrors to be detected, and recovery is achieved by re-executing the state-transition if the voting processdetermines that a fault occurred. The strategy ofduplication and comparison is suitable for detectionand recovery from the incorrect output failure mode.In its current state, MISORUST assumes all operations tobe pure. However, a speculative approach, such asRAFT [11], can be applied to handle side-effectsfrom system-calls. The Sphere of Replication [55] ofthis technique is the world, composed of cells, andthe MISORUST runtime is excluded from replication.

This runtime was designed to be as minimal as pos-sible, since it is also a target for fault injection.

� Process-level isolation. This strategy consists of execut-ing a MISO program within a dedicated process withmemory isolation, as depicted in Fig. 4. If the pro-gram ends abruptly by crashing, it is imperative toprevent error propagation through appropriatemechanisms such as process-level isolation. A singlecopy of the program is executed in the normal case,and any crash (a detectable failure) is recovered byre-starting the same program in another process.

The first strategy duplicates the execution cost and issuitable for recovering from nearly all incorrect output fail-ures. The second strategy has the cost of wrapping a MISO

program in a dedicated process, but any re-execution occursonly if a crash occurs. Hence, duplication and comparison isused to handle incorrect output failures, and process-levelisolation is used to handle crash failures.

The two strategies are complementary in the sense thatdifferent failure modes are addressed. Consider, for exam-ple, a program that only fails by crashing (it is the onlyfailure mode observed through fault injection). Such a pro-gram requires only process-level isolation as the necessarystrategy for assuring reliability. Another example could bea program that, conversely, only fails by producing incor-rect output. Such a program only requires duplication andcomparison. A third example could be a program thatmanifests both crashes and incorrect output, and such acase could be dealt with by combining the two strategiesand duplicating the execution of a program while placingeach copy in its own process (rather than running two sep-arate threads).

5.2 Fault-Tolerant Runtime Selection

Considering the different strategies available for detection,isolation, and recovery, the problem may now be formu-lated as determining which strategy (or strategies) shouldbe applied to reach a specified reliability value intended bythe program designer. To this end, each program is first tar-geted for fault injection without any fault tolerance strategy.The result of this experiment is a distribution of probabili-ties across the three possible outcomes (no effect, incorrectoutput, or crash).

Fig. 3. Cell duplication and output comparison for error detection.

Fig. 4. Process-level isolation and re-execution.

158 IEEE TRANSACTIONS ON SUSTAINABLE COMPUTING, VOL. 3, NO. 3, JULY-SEPTEMBER 2018

Page 7: Language-Based Expression of Reliability and Parallelism ... · Title: Language-Based Expression of Reliability and Parallelism for Low-Power Computing Subject: IEEE Transactions

Taking � as the base hardware fault rate and pne as theconditional probability that an error has no effect given thatan error occurred, and assuming the fault rate to be con-stant, we have the reliability function

RðtÞ ¼ e��ð1�pneÞt;

where t is the execution time of the sequential program.Through fault injection, at compile-time, we estimate pne foreach program. The values of � and R (the intended reliabil-ity) are design parameters for each program.

If the estimated reliability is below the intended reliabil-ity R, then a fault tolerance strategy must be used. To thisend, the values of pi (the conditional probability that incor-rect output is produced given that a fault occurred) and pc(the conditional probability that the program crashes giventhat a fault occurred) are used to determine if the programrequires duplication and comparison, process isolation, orboth techniques combined.

The values of pne, pi, and pc are estimated through faultinjection, � is the expected hardware fault rate, R is a designparameter that the program must meet, and t is the execu-tion time of the program. A suitable fault injection tech-nique is used to emulate the effects of faults and estimatethe conditional probabilities.

Several empirical studies have observed that hardwarefailures may follow the Weibull distribution [56], [57] ratherthan the exponential distribution. Therefore, although it iscommon to assume constant failure rates [58], one shouldalso consider the Weibull distribution as the basis for reli-ability analysis. According to this distribution, we have thereliability function

RwðtÞ ¼ e�ð1�pneÞðthÞb ;

where t is the execution time of the sequential program. Thevalue of pne is estimated through fault injection at compile-time, and represents the conditional probability that anerror has no effect given that an error occurred (as with theexponential distribution). The value of h is the Weibull scaleparameter and the value of b is the Weibull shape parame-ter, and both are a property of the hardware.

Taking the same design parameter R as the intendedprobability for a program to terminate correctly at time t, ifRwðtÞ is less than R then at least one of the fault tolerancestrategies must be adopted. Depending on the estimatedvalues of pne, pi, and pc, a program may require duplicationand comparison, process isolation, or both techniques com-bined. It should be noted that some fault tolerance techni-ques may modify the program’s execution time and,therefore, the value of t should be modified accordingly.

The intended reliability target R is a design parameterthat depends on the application. Achieving R ¼ 1 is, inpractice, unattainable once we assume a failure rate greaterthan zero. Nevertheless, for many applications (e.g., numer-ical simulations) R should be very close to 1. In order toachieve such a high reliability, one must combine effectivefault tolerance mechanisms. To this end, the proposedapproach allows developers to write a single program andselect the most efficient runtime, through reliability model-ing and fault injection configured by compilation parame-ters (� and R).

5.3 Fault Injection Technique

Transient hardware faults are emulated by applying thewell-known single bit flip errormodel. The injector runs thework-load (i.e., the program) and introduces one bit flip randomlyin each execution. The outcome of each experiment is deter-mined by classifying the output of the program and whetherthe program finished its execution abruptly. Errors areinjected in microprocessor registers to emulate faults thataffect the processor, including faults directly affecting theregisters and faults that cause errors in other components(e.g., floating point unit, arithmetic logic unit, internal caches)but which only produce consequences when they reach theregisters. Memory injections are also possible, although suchinjections are only interestingwhenmemory is unprotected.

Fault injection is implemented using a software-basedtechnique. The injector is a user-space application that takesadvantage of the ptrace capabilities available in the Linuxkernel to quickly suspend and resume a thread or process.During this brief downtime, the injector requests the regis-ter’s values and performs a bit flip in one of them, accordingto the parameters passed by the user. The ptrace functionali-ties are also commonly used by debuggers and have evenbeen used by other fault injectors [52], [59], [60]. One of theadvantages of using this approach is the portability and lowintrusion inherited from using a feature that already existsin the kernel. However, since the fault injector is executingfrom user-space, it must request certain functionalities fromthe kernel through system calls, which imply context andprocessor ring switches. For bit flips in CPU registers thisdisadvantage is negligible, but such is not the case wheninjecting faults in memory, where a system call is requiredfor each requested memory page. Another disadvantage isthat ptrace, and hence our fault injector, cannot inject in ker-nel space processes.

6 RESULTS

The MISO programming model was built in order to implic-itly embed parallelism and fault-tolerance in programs. Assuch, both parallelization and fault-tolerance implicationsof MISO programs are evaluated in four benchmark pro-grams, taken from the Aeminium Benchmark suite [61].

6.1 Evaluation Methodology

MISO was evaluated in two different areas: parallelizationand fault-tolerance. For both areas we have used a set offour small benchmarks as examples of programs that can bewritten in the MISORUST language. These programs have dif-ferent workloads, but they take seconds to execute, allowingthe MISORUST infrastructure to be the center of the evaluation,and not the program themselves.

Factorial consists of the multiplication of natural numbersup to 320000, divided across 8 MISO cells (one per proces-sor), all following the same cell template. Each cell com-putes the product of all values between a lower and anupper bound, with all cells combined covering the rangeuntil the desired value.

Matmul performs the matrix multiplication of two800 � 800 matrices. Again, 8 cells of the same type wereused, with each cell being responsible for calculating thevalues for 100 columns of the final matrix.

FONSECA ET AL.: LANGUAGE-BASED EXPRESSION OF RELIABILITY AND PARALLELISM FOR LOW-POWER COMPUTING 159

Page 8: Language-Based Expression of Reliability and Parallelism ... · Title: Language-Based Expression of Reliability and Parallelism for Low-Power Computing Subject: IEEE Transactions

Integral obtains the integral of fðxÞ ¼ esinðxÞ for x between0 and 800 with a 20000�1 resolution step. 8 cells of the sametemplate are used. This benchmark program representsdata-parallel applications, with a more complex workloadthan MatMul.

Heat is a tiling computation that simulates the heat diffu-sion across a mesh of size 800 by 800, based on the algorithmof Cilk 5.4.6 [12].

Our evaluation used two machines: server and mobile.server features a Intel Core i7-4770 CPU with 4 cores at3.40 GHz with hyperthreading, and has 8 GB of RAM. Theinstalled OS was CentOS 7.0.mobile is a Odroid XU develop-ment board, with a Exynos5 Octa Cortex processor featurestwo sets of CPU cores: four A15 1.6 Ghz cores and four A71.2 Ghz cores. The board has 2 GB of RAM and is runningUbuntu 14.04. In both machines, MISO programs were com-piled using Rust 1.16 nightly (83c2d9523 2017-01-24).

All programs are executed with only the operating sys-tem executing at the same time and take at least 2-3 secondsto execute. When measuring time and energy consumed nooutput is generated (to avoid measuring the time of IO tothe console). In the case of server, energy is also measured,using the MSR RAPL interface from the Intel processor [62].

6.2 Parallelization Results

In order to evaluate the parallelization potential of MISO, thefour programs were executed in sequential (Seq) and paral-lel (Par) modes. The sequential mode transitions all cellsone by one, while the parallel mode creates one thread percell transition. For each version, 100 runs were executed.

Figs. 5 and 6 show the execution time of all threeprograms in both version, in mobile and server respectively.The factorial program had a speedup of 3.7 in mobile and 3.4in server, showing that a simple program can easily achieve

speedups without any performance tuning. The matrixmultiplication program had a speedup of 3.0 on mobile and2.4 on server. The integral program had the best speedup,with 3.4 in mobile and 5.2 in server. The heat programobtained 1.9 in mobile and 2.2 in server.

Overall, the speedups on mobile are higher because of itslower CPU frequency. Having a slower clock representsa longer sequential time, which will be even higher thanthe context switch overhead necessary to spawn threads.The fibonacci program is used to understand the paralleli-zation overhead on a sequential program. The overhead ofcreating tasks is in the order of magnitude between 10�3

and 10�2 seconds.Fig. 7 represents the scalability of the parallelization

of MISO programs on the server machine, showing thespeedup of the parallel version using a different number ofcores. It is possible to see that after 4 threads the perfor-mance only increases in integral. The reson for the decreaseis that the other programs are more memory intensive andthe hyper-threading is degrading the cache usage.

The impact of parallelization on integral is higherbecause this program has the largest workload. Programswith more expensive workloads have the potential forhigher speedups, while the other programs spend most ofthe time loading and storing memory, and synchronizingcell transitions.

Fig. 8 shows the energy consumption on the servermachine. The energy consumption increases in two programsand decreases in two, but not significantly. The highestdecrease is in integral, where the workload is heavier and thespeedup is the highest. High speedups reduce the executiontime, which tends to decrease the energy consumption.

Fig. 5. Execution time of all programs on server in sequential and parallelmodes.

Fig. 6. Execution time of all programs on mobile in sequential and paral-lel modes.

Fig. 7. Speedup of parallel programs on server using different numberof cores.

Fig. 8. Energy consumption of all programs on server in sequential andparallel modes.

160 IEEE TRANSACTIONS ON SUSTAINABLE COMPUTING, VOL. 3, NO. 3, JULY-SEPTEMBER 2018

Page 9: Language-Based Expression of Reliability and Parallelism ... · Title: Language-Based Expression of Reliability and Parallelism for Low-Power Computing Subject: IEEE Transactions

Fig. 9 shows the scalability of the energy consumptionwith a different number of cores configured in the parallelprogram. It is possible to see that the behaviour is similar tothe speedup, improving energy consumption up to 4 cores,and after that only the integral program, that does not makeintensive usage of memory improves.

6.3 Fault-Tolerance Evaluation

MISO provides mechanisms that can be used to prevent softerrors. In order to evaluate the impact of soft errors, oneof the fault injectors of the ucXception project is used tosimulate this type of errors. The tools belonging to this proj-ect have already been used to inject faults in other areas(e.g., [53], [63]). More information is available at https://ucxception.dei.uc.pt.

As previously stated, the fault injector is a user-spaceapplication that takes advantage of ptrace, which is availablein the Linux kernel, to control the execution flow and injectbit flips in the registers. While this tool could be used toinject faults in both memory and CPU registers, this studyfocused on the latter, because memories already have awide range of error correction mechanisms available whichmake it less likely for a fault to occur.

To ensure that the injected faults do not occur while theapplication is starting or finishing, warmup and cooldownperiods where faults cannot be injected are defined. Thecooldown period also serves to increase the fidelity of theresults by not terminating the experiment run too soon aftera fault has been injected. These are dynamically calculatedaccording to the baseline duration, where the warmup andcooldown accounts for 5 and 15 percent of the total execu-tion time respectively. The warmup period is relativelysmall due to the simple nature of the workload, whereas thecooldown period is thrice as big as to provide sufficientroom for the manifestation of the fault. Faults are injectedfollowing a uniform distribution between the warmup andthe cooldown phases. In each campaign, a total of 10,000

faults per program is injected in server and 5,000 faults perprogram in mobile. Fig. 10 shows a normal experiment flow.

During compilation, the choice of fault tolerance mecha-nism is done through the analysis of the fault injection out-come. Tables 1 and 2 show the frequency of the possibleoutcomes for server and mobile, respectively, in the fourprograms.

The results show that the majority of the executions inserver lead to a crash failure, whereas incorrect output rarelyoccurs, except in the factorial benchmark, which has ahigher probability of seeing this failure mode. Most execu-tions in mobile show no failure, followed by crash failuresand incorrect output in terms of probability. Once again thefactorial benchmark has an higher than average incorrectoutput probability, but not as high as matmul, which onlyin mobile showed this tendency to produce incorrect output.

The variation in probabilities between programs and sys-tems is normal and can be attributed to the nature of eachprogram, namely, the type of instructions used (i.e., arith-metic, logic, branch), which registers are more often or notat all used, optimizations (e.g., loop unrolling), and otheraspects. This variation also justifies the need for executing afault injection campaign at compile-time for every program.

Taking these results into account, in server, the compilerwill enable process-level isolation for all programs to be ableto recover from the predominant failure mode—crashes.In mobile, the compiler will prioritize either process-levelisolation or duplication and comparison according towhether the program showsmore propensity to crash or pro-duce incorrect content. If the user has chosen to providethe maximum reliability possible, the compiler will use bothprocess-level isolation and duplication and comparison.Given the higher probability of incorrect content in the facto-rial and matmul programs, duplication and comparisonwill be required to obtain a decent failure rate. However,duplication and comparison can be avoided in any programif the failure rate is deemed acceptable when compared withthe energy spending.

After applying process-level isolation and duplicationand comparison to the programs under study, we repeatedthe same fault injection campaign to assess the effect ofthese measures. The results are presented in Tables 3 and 4.

Fig. 9. Energy of parallel programs on server using different number ofcores.

Fig. 10. Experiment flow for evaluating the fault tolerance of a MISObinary.

TABLE 1Fault Injection Outcomes—Server—Before Fault

Tolerance Applied

Program Factorial Matmul Integral Heat

Crash 64.16% 65.88% 55.48% 67.07%Incorrect Output 10.43% 1.8% 1.74% 0.39%No effect 25.41% 32.32% 42.78% 32.54%

TABLE 2Fault Injection Outcomes—Mobile—Before Fault

Tolerance Applied

Program Factorial Matmul Integral Heat

Crash 14.38% 14.42% 8.28% 13.48%Incorrect Output 9.52% 20.06% 1.52% 0.04%No effect 76.1% 65.52% 90.2% 86.48%

FONSECA ET AL.: LANGUAGE-BASED EXPRESSION OF RELIABILITY AND PARALLELISM FOR LOW-POWER COMPUTING 161

Page 10: Language-Based Expression of Reliability and Parallelism ... · Title: Language-Based Expression of Reliability and Parallelism for Low-Power Computing Subject: IEEE Transactions

The results show undeniable improvements in both sys-tems, with reduced crash probabilities and zero probabilityof incorrect output. The effect in crash probabilities is farmore noticeable in server, where a significant reduction isseen across every program, than in mobile, which by natureappears to be more resilient to this failure mode and whereeven in one occasion this value slightly increased after add-ing fault tolerance.

An analysis of the reasons behind crash failures is shownin Tables 5 and 6. Of all crash failures, the majority are dueto segmentation faults. A segmentation fault implies thatthe application attempted to access a memory location thatis outside of its permissions and the operating systemdetected and stopped that operation. In second place arelanguage (Rust) exceptions. Rust has several runtimechecks, such as integer overflow or array bound accesses. Inthese cases, a panic is raised and the execution stops, inorder to avoid possible silent data corruption or unintendedbehavior in general. This behavior would not have beenseen if another language, such as C/C++, had been used. Inthat case we can hypothesize that the faults would havepropagated and would either have been handled by MISO

(e.g., MISO detected incorrect content and recovered) orwould have caused the application to crash.

The above two reasons account for the majority of allcrash failures, however occasionally other reasons exist.For example, a bus error might occur, which means that theapplication is either trying to access memory that does notexist in the system, or that it is trying to perform anunaligned access. A trace/breakpoint trap might also occurdue to a wrong memory area containing a specific value(the trap instruction) being mistakenly treated as code.

As a future improvement to MISO, we plan to implement anapproach to handle crashes due to segmentation faults, whichconsists in adding a custom signal handler that listens for thesegmentation fault signal and triggers a process restart.

Despite their effectiveness, these fault tolerance mecha-nisms carry an overhead. Therefore it is important to evalu-ate the time and energy consumed by each method. First,let us consider the duplication and comparison method.Programs executing in this mode will have their cells dupli-cated, including memory and transition functions. Everystep, the memory of a cell is compared with its dupe, and ifthe two are different, that transition function is repeated.

Figs. 11 and 12 show the execution time impact of theduplication technique on both machines. The highest over-head on factorial or matrix multiplication was a slowdownof 14 percent. On simple sequential programs like the fibo-nacci, the overhead is several times higher.

Fig. 13 shows the energy consumption of the machinewhen in both modes. In both parallel programs, the energyconsumption doubled when the duplication mechanismwas used. The main reason for this is that memory usageincreases, and that is a large part of energy consumption.

When comparing the parallel mode with and without theduplication, the results are similar: the duplication strategyduplicates energy consumption and increases executiontime as well.

Crash handling in MISORUST is done by executing the pro-gram in one process, and if it is found to have crashed,another process is spawned to execute the same program.The time and energy impact of this approach only occurs incase of crashes, where both energy consumption and timecan increase up to the double of a regular execution.

TABLE 3Fault Injection Outcomes—Server—After Fault Tolerance Applied

Program Factorial Matmul Integral Heat

Crash 46.51% (-17.65%) 41.36% (-24.52%) 43.74% (-11.74%) 32.52% (-34.55%)Incorrect Output 0.0% (-10.43%) 0.0% (-1.8%) 0.0% (-1.74%) 0.0% (-0.39%)No effect 53.49% (+28.08%) 58.64% (+26.32%) 56.26% (+13.48%) 67.48% (+34.94%)

TABLE 4Fault Injection Outcomes—Mobile–After Fault Tolerance Applied

Program Factorial Matmul Integral Heat

Crash 13.72% (-0.66%) 13.24% (-1.18%) 10.78% (+2.5%) 10.11% (-3.37%)Incorrect Output 0.0% (-9.52%) 0.0% (-20.06%) 0.0% (-1.52%) 0.00% (-0.04%)No effect 86.28% (+10.18%) 86.76% (+21.24%) 89.22% (-0.98%) 89.89% (+3.41%)

TABLE 5Breakdown of Reasons Behind Crash—Server—Before

Fault Tolerance Applied

Program Factorial Matmul Integral Heat

Segmentation Fault 90.42% 89.87% 99.01% 87.37%Rust Exception 8.74% 9.74% 0.00% 11.94%Timeout 0.01% 0.02% 0.09% 0.00%Illegal Instruction 0.44% 0.21% 0.40% 0.42%Bus error 0.23% 0.08% 0.32% 0.09%Breakpoint trap 0.16% 0.08% 0.18% 0.18%

TABLE 6Breakdown of Reasons Behind Crash—Mobile—Before

Fault Tolerance Applied

Program Factorial Matmul Integral Heat

Segmentation Fault 54.24% 47.43% 81.16% 44.36%Rust Exception 42.42% 47.57% 0.24% 51.04%Timeout 1.95% 0.00% 4.59% 0.00%Illegal Instruction 1.39% 3.75% 11.11% 1.78%Bus error 0.00% 1.25% 2.90% 2.82%Breakpoint trap 0.00% 0.00% 0.00% 0.00%

162 IEEE TRANSACTIONS ON SUSTAINABLE COMPUTING, VOL. 3, NO. 3, JULY-SEPTEMBER 2018

Page 11: Language-Based Expression of Reliability and Parallelism ... · Title: Language-Based Expression of Reliability and Parallelism for Low-Power Computing Subject: IEEE Transactions

7 DISCUSSION

MISO can be evaluated in different aspects: expression ofconcurrency, parallelism and reliability.

Considering the expression of parallel programs, MISOhas a more natural approach than OpenMP and similar lan-guages. In those models, programs are written sequentially,and then manually annotated and tuned until the perfor-mance has improved. This requires a deep understandingof parallelism and OpenMP, with several choices frequentlybeing done through empirical tests. MISO abstract thosedecisions by allowing the programmer to express differentthreads of execution as small transition functions that occursequentially.

In this aspect, MISO is similar to CSP or even more to theactor model. In all approaches, the programmer writes sev-eral independent paths of execution. The main difference isthat while other approaches require the usage of synchro-nous channels and mailboxes that introduce implicit syn-chronization that may not be clear to the programmer, inMISO any cell can read from all other cells without requiringblocking operations or hidden dependencies. In MISO allcells advance at the same pace, making the data dependen-cies among cells very clear and explicit.

In terms of parallelization, there are several trade offchoices that have to be further explored, like the mappingbetween cells and threads, and how to obtain the minimumnumber of barrier synchronization points for a given pro-gram. However, the results obtained without optimizationare equivalent to those obtained in other parallelizationframeworks, such as minium [61]. The MISO model is agnos-tic to the scheduling runtime, but provides more usefulinformation about the program structure than other pro-gramming models without annotations. Further experi-ments with hardware with more parallelism (GPU, Xeon

Phi, etc..) are required to have a better understanding of thescalability of MISO.

Additionally, this work focuses on parallelism inside thesame machine, ignoring distributed programming in MISO,which is also possible, but requires the introduction of com-munication mechanisms to ensure memory consistency.Distributed programming has implications in the Fault Tol-erance evaluation, in which failure modes such as byzantinefailures or trick timing failures can occur.

Finally, this work used two fault-handling strategies:duplication and comparison, and process-level isolation.These techniques were selected because they successfullyhandled the types of failures obtained in our system. How-ever, different techniques could be used and plugged inMISO. Being a generic model, MISO allows for different faulttolerance techniques to be applied without the need to mod-ify the original programs. The two techniques used areexamples of what can be done in this platform.

The process isolation technique has a neglectable over-head when there is no crash, as it only checks if the programwas successfully executed. In the presence of a crash, theoverhead is equal to the amount of time consumed by thecrashed process. This option was selected because of its lowoverhead in the most common case, while checkpoint/resume approaches would have a constant overhead on allexecutions, in order to improve performance in the rarecases of crashes.

Thread-level duplication doubles the memory usage,functioning as the main mechanism for preventing SDC.Depending on the underlying memory architecture it mightincrease energy consumption, below 100 percent in reason-ably large workloads (Fig. 13. The overhead in computationdepends on the workload and the available parallelism inthe processors. Figs. 11 and 12 show that Factorial, a sequen-tial program with a low workload has a large overheadwhen using duplication. The other programs have a lowoverhead because the instruction duplication is occurringsimultaneously on another processor core.

While being substantially different in nature, MISO can becompared with approaches such as PLR [10] or RAFT [11].These techniques modify the binary to introduce instruc-tions that verify the execution of opcodes through duplica-tion and checking. MISO works at the programminglanguage level, and as such, performs duplication at thetask-level and not the instruction level. While otherapproaches can be applied to any existing binary, the MISO

model uses can duplicate at a coarse level, allowing for low-level compiler optimizations, such as vectorization, to occur

Fig. 11. Execution time of all programs onmobile in sequential and dupli-cated modes.

Fig. 12. Execution time of all programs on server in sequential and dupli-cated modes.

Fig. 13. Energy consumption of all programs on server in sequential andduplicated modes.

FONSECA ET AL.: LANGUAGE-BASED EXPRESSION OF RELIABILITY AND PARALLELISM FOR LOW-POWER COMPUTING 163

Page 12: Language-Based Expression of Reliability and Parallelism ... · Title: Language-Based Expression of Reliability and Parallelism for Low-Power Computing Subject: IEEE Transactions

after the fault-tolerant techniques are applied. Thisincreases the potential for an effient execution of the pro-gram. The overhead of using this techniques is similar tothat of miso (energy impact has not been studied in otherapproaches) and the window of vulnerability is the same:unprotected runtimes take only a small percentage of theexecution time. In our results, there was no incorrect outputdetected using MISO. Finally, MISO allows distributed fault-tolerance techniques (such as duplication in differentmachines), while none of the other approaches considersthat scenario.

8 CONCLUSION

This paper introduces a language-based approach toexpress both reliability and parallelism. The proposedmodel, MISO, is agnostic to the programming language andconcepts of threads and synchronization, instead relying ontwo simple semantics that must be followed: isolation ofstate and state transitions. This model has the advantage ofsupporting different execution modes that may have differ-ent degrees of parallelism and fault-tolerance.

Parallelism can be automatically extracted from MISO

programs. In our experiments, programs obtained speedupsbetween 1.9 and 5.2 on machines with 4 available cores, sim-ilar to state of the art parallel languages. In processors thatsupport dynamic voltage and frequency scaling, reducingthe execution time will also reduce energy consumption.

The MISO compilation strategy includes automatic selec-tion of fault tolerance techniques. During compilation, pro-grams are targeted using fault injection and the result ofthose faults is analyzed. Different mechanisms are thenselected to handle the observed failure modes, resulting indifferent reliability guarantees. Two distinct types of faulttolerance mechanisms are considered: those designed toprovide high error detection, and those designed to providehigh error isolation. Programs that are found, through faultinjection, to produce incorrect output benefit from duplica-tion and comparison of output, while programs that tend tocrash benefit from process-level isolation.

The proposed approachwas implemented and experimen-tally evaluated. The MISORUST programming language iscapable of parallelizing programs automatically, obtainingspeedups similar to other parallel programming frameworks.The impact on time and energy of twodifferent fault-tolerancetechniques was evaluated, concluding that duplication andcomparison increased by up to 14 percent the execution timeand around twice the energy consumption, due to memoryusage. Process-level isolation had negligible overheads.

Overall, the usage of the MISO programming model leadsto programs that can be executed in parallel, without usingany explicit parallel constructs, and can automatically beexecuted with a reliability guarantee through fault injectionand error handling techniques.

ACKNOWLEDGMENTS

The first authorwas supported by the LASIGE ResearchUnit(UID/CEC/00408/2013). This research was also partiallysupported by the Centro de Informtica e Sistemas da Univer-sidade de Coimbra (CISUC) and by the project EUBra-BIG-SEA(http://www.eubra-bigsea.euwww.eubra-bigsea.eu),

funded by the European Commission under the CooperationProgramme, Horizon 2020 grant agreement no 690116.

REFERENCES

[1] G. Agha, “Actors: A model of concurrent computation in distrib-uted systems,” DTIC Document, 1986.

[2] R. L. Mendes, A. A. Santos, M. Martins, and M. Vilela, “Clustersize distribution of cell aggregates in culture,” Physica A: StatisticalMech. Appl., vol. 298, no. 3, pp. 471–487, 2001.

[3] S. H. White, A. M. Del Rey, and G. R. S�anchez, “Modeling epidem-ics using cellular automata,” Appl. Math. Comput., vol. 186, no. 1,pp. 193–202, 2007.

[4] Y. Zhao, S. A. Billings, and D. Coca, “Cellular automata modellingof dendritic crystal growth based on Moore and von Neumannneighbourhoods,” Int. J. Modelling Identification Control, vol. 6,no. 2, pp. 119–125, 2009.

[5] T. Gomes, R. Silva, and L. Ferracioli, “A model of velocity distri-bution of a river based on a qualitative computer modelling envi-ronment,” in Proc. Int. Workshop Appl. Modelling Simul., 2006,pp. 133–137.

[6] H. Isliker, A. Anastasiadis, D. Vassiliadis, and L. Vlahos, “Solarflare cellular automata interpreted as discretized MHD equa-tions,” Astronomy Astrophysics, vol. 335, pp. 1085–1092, 1998.

[7] G. Qiu, D. Kandhai, and P. Sloot, “Understanding the complexdynamics of stock markets through cellular automata,” Phys. Rev.E, vol. 75, no. 4, 2007, Art. no. 046116.

[8] R. M. Itami, “Simulating spatial dynamics: Cellular automata the-ory,” Landscape Urban Planning, vol. 30, no. 1/2, pp. 27–47, 1994.

[9] G. A. Reis, J. Chang, N. Vachharajani, R. Rangan, and D. I. August,“SWIFT: Software implemented fault tolerance,” in Proc. Int.Symp. Code Generation Optimization, 2005, pp. 243–254. [Online].Available: http://dx.doi.org/10.1109/CGO.2005.34

[10] A. Shye, T. Moseley, V. J. Reddi, J. Blomstedt, and D. A. Connors,“Using process-level redundancy to exploit multiple cores fortransient fault tolerance,” in Proc. 37th Annu. IEEE/IFIP Int. Conf.Depend. Syst. Netw., Jun. 2007, pp. 297–306.

[11] Y. Zhang, S. Ghosh, J. Huang, J. W. Lee, S. A. Mahlke, andD. I. August, “Runtime asynchronous fault tolerance via spec-ulation,” in Proc. 10th Int. Symp. Code Generation Optimization, 2012,pp. 145–154.

[12] R. D. Blumofe, C. F. Joerg, B. C. Kuszmaul, C. E. Leiserson,K. H. Randall, and Y. Zhou, “Cilk: An efficient multithreaded run-time system,” ACM SIGPLAN Notices, vol. 30, no. 8, pp. 207–216,1995.

[13] L. Dagum and R. Menon, “OpenMP: An industry standard APIfor shared-memory programming,” IEEE Comput. Sci. Eng., vol. 5,no. 1, pp. 46–55, Jan.–Mar. 1998.

[14] W. Gropp, E. Lusk, N. Doss, and A. Skjellum, “A high-performance, portable implementation of the MPI message passinginterface standard,” Parallel Comput., vol. 22, no. 6, pp. 789–828, 1996.

[15] C. A. R. Hoare, “Communicating sequential processes,” in TheOrigin of Concurrent Programming. Berlin, Germany: Springer,1978, pp. 413–443.

[16] R. Pike, “The go programming language,” Talk given at GooglesTech Talks, 2009.

[17] S. M. Imam and V. Sarkar, “Integrating task parallelism withactors,” ACM SIGPLAN Notices, vol. 47, no. 10, pp. 753–772, 2012.

[18] L. V. Kale and S. Krishnan, “CHARM++: A portable concurrentobject oriented system based on C++,” ACM SIGPLAN Notices,vol. 28, no. 10, pp. 91–108, 1993.

[19] P. Charles, et al., “X10: An object-oriented approach to non-uniform cluster computing,” ACM SIGPLAN Notices, vol. 40,no. 10, pp. 519–538, 2005.

[20] E. Allen, et al., “The fortress language specification,” Sun Micro-syst., vol. 139, 2005, Art. no. 140.

[21] B. L. Chamberlain, D. Callahan, and H. P. Zima, “Parallelprogrammability and the chapel language,” Int. J. High Perform.Comput. Appl., vol. 21, no. 3, pp. 291–312, 2007.

[22] S. Stork, et al., “AEMINIUM: A permission-based concurrent-by-default programming language approach,” ACM Trans. Program.Languages Syst., vol. 36, no. 1, 2014, Art. no. 2.

[23] ITRS, International Technology Roadmap for Semiconductors, 2013.[24] P. Hazucha, et al., “Neutron soft error rate measurements in a

90-nm CMOS process and scaling trends in SRAM from 0.25-/splmu/m to 90-nm generation,” in Proc. IEEE Int. Electron DevicesMeet.,Dec. 2003, pp. 21.5.1–21.5.4. [Online]. Available: http://dx.doi.org/10.1109/IEDM.2003.1269336

164 IEEE TRANSACTIONS ON SUSTAINABLE COMPUTING, VOL. 3, NO. 3, JULY-SEPTEMBER 2018

Page 13: Language-Based Expression of Reliability and Parallelism ... · Title: Language-Based Expression of Reliability and Parallelism for Low-Power Computing Subject: IEEE Transactions

[25] S. Borkar, “Designing reliable systems from unreliable compo-nents: The challenges of transistor variability and degradation,”IEEE Micro, vol. 25, no. 6, pp. 10–16, Nov. 2005. [Online]. Avail-able: http://dx.doi.org/10.1109/MM.2005.110

[26] S. S. Mukherjee, J. Emer, and S. K. Reinhardt, “The soft error prob-lem: An architectural perspective,” in Proc. IEEE 20th Int. Symp.High Perform. Comput. Archit., 2005, pp. 243–247.

[27] D. J. Scales, M. Nelson, and G. Venkitachalam, “The design of apractical system for fault-tolerant virtual machines,” SIGOPSOper. Syst. Rev., vol. 44, no. 4, pp. 30–39, Dec. 2010. [Online]. Avail-able: http://doi.acm.org/10.1145/1899928.1899932

[28] A. Shye, V. Janapa, R. Joseph, B. Daniel, and A. Connors, “Usingprocess-level redundancy to exploit multiple cores for transientfault tolerance,” in Proc. 37th Int. Conf. Depend. Syst. Netw., 2007,pp. 297–306.

[29] Y. Zhang, J. Lee, N. Johnson, and D. August, “DAFT: Decoupledacyclic fault tolerance,” Int. J. Parallel Program., vol. 40, no. 1,pp. 118–140, 2012.

[30] B. Cully, G. Lefebvre, D. Meyer, M. Feeley, N. Hutchinson, andA. Warfield, “Remus: High availability via asynchronous virtualmachine replication,” in Proc. Netw. Syst. Des. Implementation,2008, pp. 161–174.

[31] L. Wang, Z. Kalbarczyk, R. K. Iyer, and A. Iyengar,“Checkpointing virtual machines against transient errors,” inProc. 11th IEEE Int. On-Line Testing Symp., 2010, pp. 97–102.

[32] F. Perry, L. Mackey, G. A. Reis, J. Ligatti, D. I. August, andD. Walker, “Fault-tolerant typed assembly language,” ACM SIG-PLAN Notices, vol. 42, no. 6, pp. 42–53, 2007.

[33] N. Oh, P. P. Shirvani, and E. J. McCluskey, “Error detection byduplicated instructions in super-scalar processors,” IEEE Trans.Rel., vol. 51, no. 1, pp. 63–75, Mar. 2002.

[34] M. el Mehdi Diouri, O. Gl€uck, L. Lefevre, and F. Cappello,“Energy considerations in checkpointing and fault tolerance pro-tocols,” in Proc. IEEE/IFIP 42nd Int. Conf. Depend. Syst. Netw. Work-shops, 2012, pp. 1–6.

[35] E. Meneses, O. Sarood, and L. V. Kal�e, “Assessing energy effi-ciency of fault tolerance protocols for HPC systems,” in Proc. IEEE24th Int. Symp. Comput. Archit. High Perform. Comput., 2012,pp. 35–42.

[36] M. E. M. Diouri, O. Gl€uck, L. Lef�evre, and F. Cappello, “ECOFIT:A framework to estimate energy consumption of fault toleranceprotocols during HPC executions,” in Proc. 13th IEEE/ACM Int.Symp. Cluster Cloud Grid Comput., 2013, pp. 522–529.

[37] S. Ðo�si�c and M. Jevti�c, “Energy efficiency and fault tolerance anal-ysis of hard real-time systems,” in Proc. Small Syst. SimulationSymp. 2012, Nis?, Serbia, 12th–14th Feb. 2012.

[38] B. Acun, et al., “Power, reliability, and performance: One systemto rule them all,” Comput., vol. 49, no. 10, pp. 30–37, 2016.

[39] A. Sampson, W. Dietl, E. Fortuna, D. Gnanapragasam, L. Ceze,and D. Grossman, “EnerJ: Approximate data types for safe andgeneral low-power computation,” ACM SIGPLAN Notices, vol. 46,no. 6, pp. 164–174, 2011.

[40] M. Engel, F. Schmoll, A. Heinig, and P. Marwedel, “Unreliable yetuseful–reliability annotations for data in cyber-physical systems,”in Proc. Workshop Softw. Language Eng. Cyber-Phys. Syst., 2011.

[41] J. A. Carreira, H. Madeira, and J. A. G. Silva, “Xception: A tech-nique for the experimental evaluation of dependability in moderncomputers,” IEEE Trans. Softw. Eng., vol. 24, no. 2, pp. 125–136,Feb. 1998. [Online]. Available: http://dx.doi.org/10.1109/32.666826

[42] D. Skarin, R. Barbosa, and J. Karlsson, “GOOFI-2: A tool for exper-imental dependability assessment,” in Proc. 44th Annu. IEEE/IFIPInt. Conf. Depend. Syst. Netw., 2010, pp. 557–562.

[43] A. Fonseca and R. Barbosa, “MISO: An intermediate language toexpress parallel and dependable programs,” in Proc. 12th Eur.Depend. Comput. Conf., 2016.

[44] M. Weiser, A. Demers, and C. Hauser, “The portable commonruntime approach to interoperability,” ACM SIGOPS OperatingSyst. Rev., vol. 23, no. 5, pp. 114–122, 1989.

[45] T. E. Anderson, E. D. Lazowska, and H. M. Levy, “The perfor-mance implications of thread management alternatives forshared-memory multiprocessors,” IEEE Trans. Comput., vol. 38,no. 12, pp. 1631–1644, Dec. 1989.

[46] Y. Gu, B.-S. Lee, and W. Cai, “Evaluation of java thread perfor-mance on two different multithreaded kernels,” ACM SIGOPSOperating Syst. Rev., vol. 33, no. 1, pp. 34–46, 1999.

[47] A. Fonseca andB. Cabral, “Controlling the granularity of automaticparallel programs,” J. Comput. Sci., vol. 17, pp. 620–629, 2016.

[48] M. L. Seidl and B. G. Zorn, “Segregating heap objects by referencebehavior and lifetime,” ACM SIGPLAN Notices, vol. 33, no. 11,pp. 12–23, 1998.

[49] R. D. Blumofe and C. E. Leiserson, “Scheduling multithreadedcomputations by work stealing,” J. ACM, vol. 46, no. 5, pp. 720–748, 1999.

[50] C. Ding and K. Kennedy, “Improving cache performance indynamic applications through data and computation reorganiza-tion at run time,” ACM SIGPLAN Notices, vol. 34, no. 5, pp. 229–241, 1999.

[51] A. Avizienis, J. C. Laprie, B. Randell, and C. Landwehr, “Basicconcepts and taxonomy of dependable and secure computing,”IEEE Trans. Depend. Secure Comput., vol. 1, no. 1, pp. 11–33,Jan.-Mar. 2004.

[52] H.-J. H€oxer, K. Buchacker, and V. Sieh, “Umlinux-A tool for test-ing a linux systems fault tolerance,” LinuxTag, Jun. 2002.

[53] F. Cerveira, R. Barbosa, H. Madeira, and F. Araujo, “Recovery forvirtualized environments,” in Proc. 11th Eur. Depend. Comput.Conf., 2015, pp. 25–36.

[54] H. Madeira, M. Rela, F. Moreira, and J. G. Silva, RIFLE: A GeneralPurpose Pin-Level Fault Injector. Berlin, Germany: Springer, 1994,pp. 197–216. [Online]. Available: http://dx.doi.org/10.1007/3–540-58426-9_132

[55] S. K. Reinhardt and S. S. Mukherjee, “Transient fault detection viasimultaneous multithreading,” ACM SIGARCH Comput. Archit.News, vol. 28, no. 2, pp. 25–36, 2000.

[56] B. Schroeder and G. A. Gibson, “A large-scale study of failures inhigh-performance computing systems,” IEEE Trans. Depend. SecureComput., vol. 7, no. 4, pp. 337–350, Oct.–Dec. 2010. [Online]. Avail-able: http://doi.ieeecomputersociety.org/10.1109/TDSC.2009.4

[57] T.-T. Lin and D. Siewiorek, “Error log analysis: Statistical model-ing and heuristic trend analysis,” IEEE Trans. Rel., vol. 39, no. 4,pp. 419–432, Oct. 1990.

[58] N. Storey, Safety Critical Computer Systems. Reading, MA, USA:Addison-Wesley, 1996.

[59] G. A. Kanawati, N. A. Kanawati, and J. A. Abraham, “FERRARI:A flexible software-based fault and error injection system,” IEEETrans. Comput., vol. 44, no. 2, pp. 248–260, Feb. 1995.

[60] J. Xu, Z. Kalbarczyk, and R. Iyer, “HiPerFI: A high-performancefault injector,” in Fast Abstract Proc. IEEE Int. Conf. Depend. Syst.Netw., 2002.

[61] A. Fonseca and B. Cabral, “Evaluation of runtime cut-offapproaches for parallel programs,” in Proc. 12th Int. Meet. HighPerform. Comput. Comput. Sci., 2016, pp. 121–134.

[62] D. Hackenberg, R. Sch€one, T. Ilsche, D. Molka, J. Schuchart, andR. Geyer, “An energy efficiency feature survey of the IntelHaswell processor,” in Proc. IEEE Int. Parallel Distrib. Process.Symp. Workshop, 2015, pp. 896–904.

[63] J. M. Franco, F. Cerveira, R. Barbosa, and M. Zenha-Rela,“Modeling the failure pathology of software components,” inProc. 12th Int. ACM SIGSOFT Conf. Quality Softw. Archit., 2016,pp. 41–49.

Alcides Fonseca received the PhD degree incomputer science from the University of Coimbra,where he was a lecturer previously and workedon the Aeminium Project, designing the multicoreand GPU runtimes for Aeminium, a concurrent-by-default programming language. He is anassistant professor with the University of Lisbon.His research interests include programming lan-guages, compilers, parallelization, optimization ofmulticore, and GPU programs.

Frederico Cerveira is working toward the PhDdegree and is a researcher at the University ofCoimbra, Portugal. His PhD topic deals with theevaluation of current cloud computing systems inthe presence of hardware and software faults, inorder to propose mechanisms capable of increas-ing the dependability of these systems. His mainresearch interests include dependability, faultinjection and fault tolerance, mainly in the contextof virtualized and cloud computing systems.

FONSECA ET AL.: LANGUAGE-BASED EXPRESSION OF RELIABILITY AND PARALLELISM FOR LOW-POWER COMPUTING 165

Page 14: Language-Based Expression of Reliability and Parallelism ... · Title: Language-Based Expression of Reliability and Parallelism for Low-Power Computing Subject: IEEE Transactions

Bruno Cabral is an assistant professor with theUniversity of Coimbra. He has been an adjunctassociate teaching professor with Carnegie MellonUniversity, and was faculty of the Dual-degreemasters in software engineering (MSE). His mainresearch interests include concurrent program-ming and programming languages, exception han-dling models, and code instrumentation. He hasparticipated in many research and software proj-ects in cooperation with institutions such as theEuropean Space Agency, Carnegie Mellon Univer-sity, and the PortugueseGovernment.

Raul Barbosa received the PhD degree in com-puter engineering from the Chalmers University ofTechnology. He is an assistant professor with theUniversity of Coimbra. At Carnegie Mellon Univer-sity, he was an adjunct associate teaching profes-sor in the Institute for Software Research. Hecollaborated in diverse research projects, includingAMBER, Affidavit, DFEA-2020, and TRONE, andwas the principal investigator in project DECAF.His main research interests focus on dependableand secure distributed systems, through experi-mental approaches such as fault injection and for-mal approaches such asmodel checking.

" For more information on this or any other computing topic,please visit our Digital Library at www.computer.org/publications/dlib.

166 IEEE TRANSACTIONS ON SUSTAINABLE COMPUTING, VOL. 3, NO. 3, JULY-SEPTEMBER 2018