1
A Comprehensive Compiler-Assisted Thread Abstraction for Resource-Constrained Systems
Alexander Bernauer, ETH ZürichKay Römer, University of Lübeck
Meng-Lin 2013/05/06
4
Multi-Tasking Paradigms
• Scarce memory for embedded System• Challenging for multi-tasks• Event-based is most common now• Thread-based is better managed
Event-based
CooperativeThreads
EfficientExecution
ComfortableDevelopment
???
8
Motivation
• Event-based• Hard to manage task’s control flow• Tasks’ contexts to be manually managed and preserved between events
• Thread-based• Sequential control flow• Tasks’ contexts stored in automatic (local) variables
• But existing thread libraries• Provide incomplete thread semantics• Introduce a significant context overhead
• So…?
9
Event-based
RuntimeCooperativeThreads
EfficientExecution
ComfortableDevelopment
Compiler-AssistedCooperativeThreads???Cooperative
Threads
11
Overview
• Propose a dedicated compiler Ocram• Translate a thread-based application (T-code) into an equivalent event-based
application (E-code)• https://github.com/copton/ocram
• Leverage platform abstraction layers (PAL) to bind Ocram to Contiki and TinyOS
• Show the feasibility of compiler-assisted threads for three different WSN applications
• Verify the correctness of the transformation• Measure the resource costs of this abstraction compared to both native
event-based implementations and thread libraries
12
System Overview
• Compiler-Assisted Thread• Translate thread-based programs (T-code) into equivalent event-based ones
(E-code)
13
Equivalence
E-code is equivalent to T-code
if and only if
every possible observable behavior of the E-code
corresponds to
one possible observable behavior of the T-code
15
Translation Scheme
• Transform valid T-code into equivalent E-code• Data flow and Control flow• Non-interruptible functions passed through unchanged• Interruptible functions translated into an intermediate representation
(IR) (E-code)• Uniqueness of identifiers• while and for loops replaced by if and goto• Two normal form for interruptable calls
• interruptable_call(parameters);• expression = interruptable_call(parameters);
16
Data Flow
Global State
Task N
Instruction Pointer
Stack (for context switch)
T-Code
Global State
Task N
Call-back Function
E-Code
17
Example of Data Flow
• For each interruptible function1. Find the critical variables2. Generate T-stack frame
• Continuation (line 2)• Return value (line 3)• Parameters (line 4)• Critical variables (line 5)• Callees (line 6-9)
3. Generate E-stack frame
18
Control Flow T-Code
function callfunction return
API call
f1
f2
f3
f4
E-Code
f1
thread_n() {
}
f2
f3
f4
gotogoto
API call
19
Example of Control Flow• Inline the bodies of all interruptible functions in each thread into one
common thread execution function, serve as a single event handler
Blinky
Blinky
Wait
Wait
Wait
Interruptible call of an interruptible function Interruptible call of a T-code API function
20
Correctness
• Define observable behavior (OB) as the order of all API calls including the values of all input parameters
• Data flow transformation• Keep OB unchanged• Preserve effects of each statement
• Control flow transformation• Preserve the sequence of statement execution
22
Evaluation
• Executed via COOJA/MSPSim, logged by extra COOJA plugin• Applications (case studies)
• Data Collection Application (DCA)• CoAP implementation (COAP)• Remote Procedure Calls (RPC)
• Variants• Event-based (NAT) - Native Contiki using protothreads with only 2 bytes
overhead per thread• Run-time threads (TL) - TinyThreads ported to Contiki• Compiler-assisted threads (GEN)
23
RAM
• Size of the data(initialized)+bss section(uninitialized) + max. stack size• Almost same data & common bss
• All use Contiki
• Equal maximum stack size• No runtime stack
• Larger bss for TL• Expensive thread libraries
• Gen• 1% more overhead compared to Nat.
24
CPU cycles
• Number of CPU cycles• TL’s scheduler adds up to 12 % of
CPU cycles compared to Nat• Gen only 2% more CPU cycles
compared to Nat• Less CPU cycles if excluding PAL
25
Text
• Size of code• Larger text for TL
• Overhead of thread scheduler
• Larger text for Gen• Overhead of PAL• 3% more than Nat
26
Text Per Thread
• Various worker threads for RPC• Lowest slope for TL
• General thread libraries• Only involves starting new thread
• Slope for Nat• Originates from protothread
• Steep slope for Gen• PAL also originates from protothread• Generated E-code pay extra for re-
entrant function (not hold any global variable in function)
27
RAM Per Thread
• Size of the data(initialized)+bss section(uninitialized) + max. stack size• Various worker threads for RPC• Slope for Gen same as Nat• Steep slope for TL
• Safety margin for threads’ stack• Generality for all possible cases
28
Limitations
• Not allowed to take the address of an interruptible function• Use case differentiation
• Interruptible functions may not be recursive which causes stack consumption undecidable
• Uncommon in embedded system
• Unable to support dynamic thread creation, but have to statically assign threads with thread start functions
• WSN applications tend to have a fixed set of tasks
29
Conclusion
• Comprehensive thread abstraction• +1% RAM, +2% CPU, +3% ROM
• Strength• Hard work
• Weakness• Might susceptible to application dependency• Unable to guarantee timing issues
31
Reference
1. Mishra, S. and Yang, R., Thread-based vs event-based implementation of a group communication service, Parallel Processing Symposium, 1998
33
Processes and Threads
• Both processes and threads are independent sequences of execution.• Threads (of the same process) run in a shared memory space, while
processes run in separate memory spaces.
34
Process state
http://en.wikipedia.org/wiki/Process_state
35
Cooperative V.S. Preemptive Thread
• Cooperative style• Context switching only occurs at yield points, that is once a thread is given
control it continues to run until it explicitly yields control or it blocks• The saved context only consists of the variables that are read after the yield
point• Unable to guarantee timings, priorities
• Preemptive style• Context switching allowed to step in and hand control from one thread to
another at any time, usually based on priorities• The saved context consists of the thread’s complete stack and all CPU
registers
36
Event-based V.S. Thread-based
• Event-based [1] is most common now• A single-threaded event loop performs event demultiplexing and event
handler dispatching in response to the occurrence of multiple events• No need to allocate memory for per-thread stacks
• Thread-based [1] is better managed• A separate thread is spawned for each event type, say etype, in the program.
This thread waits for an event of type etype to occur and takes appropriate actions on the occurrence of that event
• Require context saved during context switching
37
Trade-off between Programming Paradigm • Event-based paradigm is efficient since no need to allocate memory for per-
thread stacks, but not scales well with growing complexity• Hard to manage complex control flow of a task, which formed by a causal chain of
events and event handler functions• Contexts of tasks to be manually managed and preserved between events
• Thread-based paradigm remedies this situation by • Sequential control flow• Contexts of tasks can be stored in automatic (local) variables
• But existing thread libraries• Provide incomplete thread semantics• Introduce a significant resource overhead by storing contexts in automatic (local)
variables• Hard to estimate the maximum size of a stack supporting multiple threads
38
Control Flow
• The order in which the individual statements, instructions or function calls of an imperative or a declarative program are executed or evaluated
• http://en.wikipedia.org/wiki/Control_flow• Labels• Goto• Subroutines/Functions
39
Reentrancy
• A function is called reentrant if it can be interrupted in the middle of its execution and then safely called again ("re-entered") before its previous invocations complete execution.
• http://en.wikipedia.org/wiki/Reentrancy_(computing)• Rules for reentrancy
• Reentrant code may not hold any static (or global) non-constant data.• Reentrant code may not modify its own code.• Reentrant code may not call non-reentrant computer programs or routines.
40
Terminology
• Context (process, thread ...)• the minimal set of data used by this task that must be saved to allow a task
interruption at a given date, and a continuation of this task at the point it has been interrupted and at an arbitrary future date
• Value of the CPU registers, process state, program counter…
• Automatic variable• A variable which is allocated and deallocated automatically when program
flow enters and leaves the variable's context
• Instantiation• Creation of a real instance or particular realization of an abstraction or
template such as a class of objects or a computer process
Recommended