Upload
tobias-fuchs
View
115
Download
3
Tags:
Embed Size (px)
DESCRIPTION
Presentation for my master's thesis on wait-free data structures for embedded multi-core systems.
Citation preview
• Vortrag zur Masterarbeit
• Aufgabensteller: Prof. Dr. Dieter Kranzlmüller
• Betreuer: Dr. Karl Fürlinger (LMU)
Dr. Tobias Schüle (Siemens CT)
• Datum des Vortrags: 05.11.2014
Evaluation of Task Scheduling
Algorithms and Wait-Free Data
Structures for Embedded Multi-Core
Systems
Tobias Fuchs
Structure of this talk
1. Introduction1. Motivation
2. Problem Statement and Objectives
2. Wait-free data structures1. Foundations
2. Pools
3. Queues
4. Stacks
3. Task Scheduling1. Work stealing
2. Prioritized work stealing in EMBB
4. Conclusion
2Task Scheduling Algorithms and Wait-Free Data Structures for Embedded Multi-Core Systems
Wait-freedom:
Motivation
3Task Scheduling Algorithms and Wait-Free Data Structures for Embedded Multi-Core Systems
Motivation
Wait-free algorithms
• Strongest possible fault tolerance
• Guarantee progress and upper bound for execution time
Gains:
+ Progress is potentially a formal constraint in real-time
computing
+ Wait-freedom eliminates the classic concurrency problems:
Deadlocks, Priority Inversion, Convoying, Kill-Intolerance
4Task Scheduling Algorithms and Wait-Free Data Structures for Embedded Multi-Core Systems
Problem statement
State of the art
No suitable wait-free data structures for embedded systems:
• Employing mechanisms such as garbage collection
• Not designed for restricted resources
• No evaluation for latency
Challenges:
- Transforming data structures to wait-free equivalents is
non-trivial, usually from-scratch redesign
- Implementations depend on platform architecture
5Task Scheduling Algorithms and Wait-Free Data Structures for Embedded Multi-Core Systems
Objectives
1. Review and evaluation of state of the art approaches for
suitability on embedded systems
2. Real-time compliant implementations of wait-free data
structures
3. Definition, implementation and evaluation of suitable
benchmark scenarios for wait-free data structures and
task scheduling algorithms
+ Automated verification derived from semantic definition
6Task Scheduling Algorithms and Wait-Free Data Structures for Embedded Multi-Core Systems
Foundations
7Task Scheduling Algorithms and Wait-Free Data Structures for Embedded Multi-Core Systems
Progress conditions
Classification of progressOn the Nature of Progress (Herlihy, Shavit 2011)
8Task Scheduling Algorithms and Wait-Free Data Structures for Embedded Multi-Core Systems
Real-time requirements
Performance priorities on real-time systems
Guarantees on worst-case runtime behavior
Aim for latency / jitter-reduction, neglecting throughput
Avoid non-determinism, as in malloc / new (see: MISRA)
9Task Scheduling Algorithms and Wait-Free Data Structures for Embedded Multi-Core Systems
Evaluation methodology
10Task Scheduling Algorithms and Wait-Free Data Structures for Embedded Multi-Core Systems
Real-time applications are designed to optimize latency
Related work does not evaluate latency, but only mean or
median throughput
Evaluation of worst-case latency is tough:
• In related work, measurements outside of 97.5% confidence
interval are considered outliers and ignored
• These outliers are our data
Pools
11Task Scheduling Algorithms and Wait-Free Data Structures for Embedded Multi-Core Systems
Wait-free data structures:
Pools
Pools
… realize dynamic memory allocation
… while eliminating heap fragmentation
• Fundamental data structure of any concurrent container
• Fixed number of objects in static or automatic memory
• Pools manage concurrent removal and reclamation of
objects
RemoveAny(pool, er) Remove and return element erAdd(pool, e) Add element e back to the pool
12Task Scheduling Algorithms and Wait-Free Data Structures for Embedded Multi-Core Systems
Pools:
Related work
Related work
Close to none:
• Several lock-free pools, e.g. tree-based
• Wait-free pools: array-based, simple yet inefficient
Why are wait-free pools hard to design?
Common wait-free paradigms require dynamic memory
allocation …
13Task Scheduling Algorithms and Wait-Free Data Structures for Embedded Multi-Core Systems
Array-based pools
Array-based wait-free pools
• Consists of array holding atomic reservation flags
• Threads traverse reservation array from the beginning
and try to reserve a flag atomically (CAS)
• Index of successfully toggled flag is acquired element index
• Worst-case complexity: O(n)
14Task Scheduling Algorithms and Wait-Free Data Structures for Embedded Multi-Core Systems
Compartment pool
Wait-free pool with thread-specific compartments
• Array-based pool with additional range of elements that
can only be acquired by a specific thread
• Threads acquire elements from their private compartment
first
15Task Scheduling Algorithms and Wait-Free Data Structures for Embedded Multi-Core Systems
Wait-free data structures:
Pools - Evaluation
16Task Scheduling Algorithms and Wait-Free Data Structures for Embedded Multi-Core Systems
Wait-free data structures:
Pools - Evaluation
17Task Scheduling Algorithms and Wait-Free Data Structures for Embedded Multi-Core Systems
Wait-free data structures:
Pools - Evaluation
18Task Scheduling Algorithms and Wait-Free Data Structures for Embedded Multi-Core Systems
Queues
19Task Scheduling Algorithms and Wait-Free Data Structures for Embedded Multi-Core Systems
Queues:
Related work
Related work
Kogan and Petrank presented the first wait-free queue for
multiple enqueuers and dequeuersWait-Free Queues With Multiple Enqueuers and Dequeuers (Kogan, Petrank, 2011)
- Implemented in Java
- Relying on garbage collection
- Requires monotonic counter (phase)
20Task Scheduling Algorithms and Wait-Free Data Structures for Embedded Multi-Core Systems
Kogan-Petrank queue
Adapting the Kogan-Petrank wait-free queueRedesign helping scheme to remove phase counter
• In original publication, new phase value is greater than all
phases of any announced operation (including non-pending)
21Task Scheduling Algorithms and Wait-Free Data Structures for Embedded Multi-Core Systems
Kogan-Petrank queue
Adapting the Kogan-Petrank wait-free queueRedesign helping scheme to remove phase counter
• Modification: Help all other non-pending operations first
• Possibly helping operations that are newer than the thread‘s
own operation
22Task Scheduling Algorithms and Wait-Free Data Structures for Embedded Multi-Core Systems
Kogan-Petrank queue
Adapting the Kogan-Petrank wait-free queueRedesign helping scheme to remove phase counter
• Fairness is maintained: all other threads are guaranteed
to help this thread’s operation before engaging in their own
23Task Scheduling Algorithms and Wait-Free Data Structures for Embedded Multi-Core Systems
Kogan-Petrank queue
Adapting the Kogan-Petrank wait-free queueMemory reclamation
Hazard pointers scheme typically presented as a solutionHazard pointers: Safe memory reclamation for lock-free objects (Michael, 2004)
24Task Scheduling Algorithms and Wait-Free Data Structures for Embedded Multi-Core Systems
Kogan-Petrank queue
Adapting the Kogan-Petrank wait-free queueIntroduce hazard pointers
Step 1: Find upper memory bound for hazard pointers
Step 2: Guard queue nodes using hazard pointers
25Task Scheduling Algorithms and Wait-Free Data Structures for Embedded Multi-Core Systems
Kogan-Petrank queue
Adapting the Kogan-Petrank wait-free queueIntroduce hazard pointers
Step 2: Guard queue nodes using hazard pointers
Culprit: Guarding is not wait-free
pointer p = node.Next;// -- possible change of node.Next –while(hp.GuardPointer(p) && p != node.Next) {
// Release and retry, unbounded number of retrieshp.ReleaseGuard(p);
}
26Task Scheduling Algorithms and Wait-Free Data Structures for Embedded Multi-Core Systems
Kogan-Petrank queue
Adapting the Kogan-Petrank wait-free queueIntroduce hazard pointers
Step 2: Guard queue nodes using hazard pointers
Culprit: Guarding is not wait-free
Fortunately, retry loops can be avoided in the Kogan-
Petrank queue, but the implementation is not trivial
see implementation at
https://github.com/fuchsto/embb/tree/benchmark/
27Task Scheduling Algorithms and Wait-Free Data Structures for Embedded Multi-Core Systems
Queues - Evaluation
Queue benchmark scenarios
In addition to scenarios for bag semantics
• Buffer latency
Elements enqueued with current timestamp, difference from
timestamp at dequeue is buffer latency
28Task Scheduling Algorithms and Wait-Free Data Structures for Embedded Multi-Core Systems
Queues - Evaluation
29Task Scheduling Algorithms and Wait-Free Data Structures for Embedded Multi-Core Systems
Queues - Evaluation
30Task Scheduling Algorithms and Wait-Free Data Structures for Embedded Multi-Core Systems
Stacks
31Task Scheduling Algorithms and Wait-Free Data Structures for Embedded Multi-Core Systems
Stacks:
Related work
Related work
Fatourou presented a wait-free “universal” construction
that is applicable for stacksWait-Free Queues With Multiple Enqueuers and Dequeuers (Kogan, Petrank, 2011)
32Task Scheduling Algorithms and Wait-Free Data Structures for Embedded Multi-Core Systems
Elimination stack
Fatourou’s universal construction SIMA highly efficient universal construction (Fatourou, 2011)
Principle
• Optimized helping scheme
• Threads apply operations to a local copy of the stack
• Every thread tries to replace the global shared object with
its local copy via CAS
• Only applicable for shared objects with small state
33Task Scheduling Algorithms and Wait-Free Data Structures for Embedded Multi-Core Systems
Elimination stack
Fatourou’s universal construction SIMA highly efficient universal construction (Fatourou, 2011)
Elimination
• Push and Pop have reverse semantics:Push(Pop(stack)) = Pop(Push(stack)) = stack
• Eliminated operations are completed immediately
if they do not alter the object’s state
Significantly improves performance if applicable
34Task Scheduling Algorithms and Wait-Free Data Structures for Embedded Multi-Core Systems
Elimination stack
Fatourou’s universal construction SIMA highly efficient universal construction (Fatourou, 2013)
Original version is not suitable for real-time applications:
- ABA problem is prevented using tagged pointers
- Thread-local pools with unbounded capacity
- No deallocation in published algorithm
35Task Scheduling Algorithms and Wait-Free Data Structures for Embedded Multi-Core Systems
Elimination stack
Fatourou’s universal construction SIMA highly efficient universal construction (Fatourou, 2013)
Modified version of Fatourou’s stack
- Uses hazard pointers for safe reclamation
- Uses compartment pool with limited capacity
- Employs the elimination scheme from the original
publication
36Task Scheduling Algorithms and Wait-Free Data Structures for Embedded Multi-Core Systems
Stacks:
Evaluation
37Task Scheduling Algorithms and Wait-Free Data Structures for Embedded Multi-Core Systems
Stacks:
Evaluation
38Task Scheduling Algorithms and Wait-Free Data Structures for Embedded Multi-Core Systems
Task scheduling
39Task Scheduling Algorithms and Wait-Free Data Structures for Embedded Multi-Core Systems
Task Scheduling:
Objectives
Task Scheduling
• Intra-process task scheduling with priority queues
• Low-overhead, fine-grained scheduling of thousands of
small tasks
Priorities:
Focus on low latency and jitter reduction (i.e. predictability),
thus regarding maximum throughput as a secondary
benchmark.
40Task Scheduling Algorithms and Wait-Free Data Structures for Embedded Multi-Core Systems
Task scheduling:
Work stealing
Work stealing
41Task Scheduling Algorithms and Wait-Free Data Structures for Embedded Multi-Core Systems
• One worker thread per
SMP core, no migration
• Tasks passed as &func
• Load-balancing on task
queues
• Many flavors of concrete
implementations
Task scheduling:
Work stealing
Work stealing with task priorities
42Task Scheduling Algorithms and Wait-Free Data Structures for Embedded Multi-Core Systems
• Extended work-stealing
by queues for every
priority
•
Conclusion
43Task Scheduling Algorithms and Wait-Free Data Structures for Embedded Multi-Core Systems
Conclusion
Revisiting the objective
• Wait-free implementations of pools, queues and stacks now
available for real-time applications
• Benchmark framework and evaluation tools (R) are
published as open source
• Reproducible evaluation of real-time performance
• Verification tool chain on the way
44Task Scheduling Algorithms and Wait-Free Data Structures for Embedded Multi-Core Systems
Conclusion
Recommendations
• Wait-free data structures can rival performance of lock-free
implementations
• But are hard to maintain
• Formal wait-freedom is practically not achievable
Employ wait-free data structures for fault-tolerance, not as a
guarantee for critical deadlines
45Task Scheduling Algorithms and Wait-Free Data Structures for Embedded Multi-Core Systems
Thank You
Source code (data structures, benchmarks, R scripts): https://github.com/fuchsto/embb/tree/benchmark/
Official development source base of embb:https://github.com/siemens/embb/tree/development/
Wiki to this thesis:http://wiki.coreglit.ch
46Task Scheduling Algorithms and Wait-Free Data Structures for Embedded Multi-Core Systems