46
Vortrag zur Masterarbeit Aufgabensteller: Prof. Dr. Dieter Kranzlmüller Betreuer: Dr. Karl Fürlinger (LMU) Dr. Tobias Schüle (Siemens CT) Datum des Vortrags: 05.11.2014 Evaluation of Task Scheduling Algorithms and Wait-Free Data Structures for Embedded Multi-Core Systems Tobias Fuchs

Wait-free data structures on embedded multi-core systems

Embed Size (px)

DESCRIPTION

Presentation for my master's thesis on wait-free data structures for embedded multi-core systems.

Citation preview

Page 1: Wait-free data structures on embedded multi-core systems

• Vortrag zur Masterarbeit

• Aufgabensteller: Prof. Dr. Dieter Kranzlmüller

• Betreuer: Dr. Karl Fürlinger (LMU)

Dr. Tobias Schüle (Siemens CT)

• Datum des Vortrags: 05.11.2014

Evaluation of Task Scheduling

Algorithms and Wait-Free Data

Structures for Embedded Multi-Core

Systems

Tobias Fuchs

Page 2: Wait-free data structures on embedded multi-core systems

Structure of this talk

1. Introduction1. Motivation

2. Problem Statement and Objectives

2. Wait-free data structures1. Foundations

2. Pools

3. Queues

4. Stacks

3. Task Scheduling1. Work stealing

2. Prioritized work stealing in EMBB

4. Conclusion

2Task Scheduling Algorithms and Wait-Free Data Structures for Embedded Multi-Core Systems

Page 3: Wait-free data structures on embedded multi-core systems

Wait-freedom:

Motivation

3Task Scheduling Algorithms and Wait-Free Data Structures for Embedded Multi-Core Systems

Page 4: Wait-free data structures on embedded multi-core systems

Motivation

Wait-free algorithms

• Strongest possible fault tolerance

• Guarantee progress and upper bound for execution time

Gains:

+ Progress is potentially a formal constraint in real-time

computing

+ Wait-freedom eliminates the classic concurrency problems:

Deadlocks, Priority Inversion, Convoying, Kill-Intolerance

4Task Scheduling Algorithms and Wait-Free Data Structures for Embedded Multi-Core Systems

Page 5: Wait-free data structures on embedded multi-core systems

Problem statement

State of the art

No suitable wait-free data structures for embedded systems:

• Employing mechanisms such as garbage collection

• Not designed for restricted resources

• No evaluation for latency

Challenges:

- Transforming data structures to wait-free equivalents is

non-trivial, usually from-scratch redesign

- Implementations depend on platform architecture

5Task Scheduling Algorithms and Wait-Free Data Structures for Embedded Multi-Core Systems

Page 6: Wait-free data structures on embedded multi-core systems

Objectives

1. Review and evaluation of state of the art approaches for

suitability on embedded systems

2. Real-time compliant implementations of wait-free data

structures

3. Definition, implementation and evaluation of suitable

benchmark scenarios for wait-free data structures and

task scheduling algorithms

+ Automated verification derived from semantic definition

6Task Scheduling Algorithms and Wait-Free Data Structures for Embedded Multi-Core Systems

Page 7: Wait-free data structures on embedded multi-core systems

Foundations

7Task Scheduling Algorithms and Wait-Free Data Structures for Embedded Multi-Core Systems

Page 8: Wait-free data structures on embedded multi-core systems

Progress conditions

Classification of progressOn the Nature of Progress (Herlihy, Shavit 2011)

8Task Scheduling Algorithms and Wait-Free Data Structures for Embedded Multi-Core Systems

Page 9: Wait-free data structures on embedded multi-core systems

Real-time requirements

Performance priorities on real-time systems

Guarantees on worst-case runtime behavior

Aim for latency / jitter-reduction, neglecting throughput

Avoid non-determinism, as in malloc / new (see: MISRA)

9Task Scheduling Algorithms and Wait-Free Data Structures for Embedded Multi-Core Systems

Page 10: Wait-free data structures on embedded multi-core systems

Evaluation methodology

10Task Scheduling Algorithms and Wait-Free Data Structures for Embedded Multi-Core Systems

Real-time applications are designed to optimize latency

Related work does not evaluate latency, but only mean or

median throughput

Evaluation of worst-case latency is tough:

• In related work, measurements outside of 97.5% confidence

interval are considered outliers and ignored

• These outliers are our data

Page 11: Wait-free data structures on embedded multi-core systems

Pools

11Task Scheduling Algorithms and Wait-Free Data Structures for Embedded Multi-Core Systems

Page 12: Wait-free data structures on embedded multi-core systems

Wait-free data structures:

Pools

Pools

… realize dynamic memory allocation

… while eliminating heap fragmentation

• Fundamental data structure of any concurrent container

• Fixed number of objects in static or automatic memory

• Pools manage concurrent removal and reclamation of

objects

RemoveAny(pool, er) Remove and return element erAdd(pool, e) Add element e back to the pool

12Task Scheduling Algorithms and Wait-Free Data Structures for Embedded Multi-Core Systems

Page 13: Wait-free data structures on embedded multi-core systems

Pools:

Related work

Related work

Close to none:

• Several lock-free pools, e.g. tree-based

• Wait-free pools: array-based, simple yet inefficient

Why are wait-free pools hard to design?

Common wait-free paradigms require dynamic memory

allocation …

13Task Scheduling Algorithms and Wait-Free Data Structures for Embedded Multi-Core Systems

Page 14: Wait-free data structures on embedded multi-core systems

Array-based pools

Array-based wait-free pools

• Consists of array holding atomic reservation flags

• Threads traverse reservation array from the beginning

and try to reserve a flag atomically (CAS)

• Index of successfully toggled flag is acquired element index

• Worst-case complexity: O(n)

14Task Scheduling Algorithms and Wait-Free Data Structures for Embedded Multi-Core Systems

Page 15: Wait-free data structures on embedded multi-core systems

Compartment pool

Wait-free pool with thread-specific compartments

• Array-based pool with additional range of elements that

can only be acquired by a specific thread

• Threads acquire elements from their private compartment

first

15Task Scheduling Algorithms and Wait-Free Data Structures for Embedded Multi-Core Systems

Page 16: Wait-free data structures on embedded multi-core systems

Wait-free data structures:

Pools - Evaluation

16Task Scheduling Algorithms and Wait-Free Data Structures for Embedded Multi-Core Systems

Page 17: Wait-free data structures on embedded multi-core systems

Wait-free data structures:

Pools - Evaluation

17Task Scheduling Algorithms and Wait-Free Data Structures for Embedded Multi-Core Systems

Page 18: Wait-free data structures on embedded multi-core systems

Wait-free data structures:

Pools - Evaluation

18Task Scheduling Algorithms and Wait-Free Data Structures for Embedded Multi-Core Systems

Page 19: Wait-free data structures on embedded multi-core systems

Queues

19Task Scheduling Algorithms and Wait-Free Data Structures for Embedded Multi-Core Systems

Page 20: Wait-free data structures on embedded multi-core systems

Queues:

Related work

Related work

Kogan and Petrank presented the first wait-free queue for

multiple enqueuers and dequeuersWait-Free Queues With Multiple Enqueuers and Dequeuers (Kogan, Petrank, 2011)

- Implemented in Java

- Relying on garbage collection

- Requires monotonic counter (phase)

20Task Scheduling Algorithms and Wait-Free Data Structures for Embedded Multi-Core Systems

Page 21: Wait-free data structures on embedded multi-core systems

Kogan-Petrank queue

Adapting the Kogan-Petrank wait-free queueRedesign helping scheme to remove phase counter

• In original publication, new phase value is greater than all

phases of any announced operation (including non-pending)

21Task Scheduling Algorithms and Wait-Free Data Structures for Embedded Multi-Core Systems

Page 22: Wait-free data structures on embedded multi-core systems

Kogan-Petrank queue

Adapting the Kogan-Petrank wait-free queueRedesign helping scheme to remove phase counter

• Modification: Help all other non-pending operations first

• Possibly helping operations that are newer than the thread‘s

own operation

22Task Scheduling Algorithms and Wait-Free Data Structures for Embedded Multi-Core Systems

Page 23: Wait-free data structures on embedded multi-core systems

Kogan-Petrank queue

Adapting the Kogan-Petrank wait-free queueRedesign helping scheme to remove phase counter

• Fairness is maintained: all other threads are guaranteed

to help this thread’s operation before engaging in their own

23Task Scheduling Algorithms and Wait-Free Data Structures for Embedded Multi-Core Systems

Page 24: Wait-free data structures on embedded multi-core systems

Kogan-Petrank queue

Adapting the Kogan-Petrank wait-free queueMemory reclamation

Hazard pointers scheme typically presented as a solutionHazard pointers: Safe memory reclamation for lock-free objects (Michael, 2004)

24Task Scheduling Algorithms and Wait-Free Data Structures for Embedded Multi-Core Systems

Page 25: Wait-free data structures on embedded multi-core systems

Kogan-Petrank queue

Adapting the Kogan-Petrank wait-free queueIntroduce hazard pointers

Step 1: Find upper memory bound for hazard pointers

Step 2: Guard queue nodes using hazard pointers

25Task Scheduling Algorithms and Wait-Free Data Structures for Embedded Multi-Core Systems

Page 26: Wait-free data structures on embedded multi-core systems

Kogan-Petrank queue

Adapting the Kogan-Petrank wait-free queueIntroduce hazard pointers

Step 2: Guard queue nodes using hazard pointers

Culprit: Guarding is not wait-free

pointer p = node.Next;// -- possible change of node.Next –while(hp.GuardPointer(p) && p != node.Next) {

// Release and retry, unbounded number of retrieshp.ReleaseGuard(p);

}

26Task Scheduling Algorithms and Wait-Free Data Structures for Embedded Multi-Core Systems

Page 27: Wait-free data structures on embedded multi-core systems

Kogan-Petrank queue

Adapting the Kogan-Petrank wait-free queueIntroduce hazard pointers

Step 2: Guard queue nodes using hazard pointers

Culprit: Guarding is not wait-free

Fortunately, retry loops can be avoided in the Kogan-

Petrank queue, but the implementation is not trivial

see implementation at

https://github.com/fuchsto/embb/tree/benchmark/

27Task Scheduling Algorithms and Wait-Free Data Structures for Embedded Multi-Core Systems

Page 28: Wait-free data structures on embedded multi-core systems

Queues - Evaluation

Queue benchmark scenarios

In addition to scenarios for bag semantics

• Buffer latency

Elements enqueued with current timestamp, difference from

timestamp at dequeue is buffer latency

28Task Scheduling Algorithms and Wait-Free Data Structures for Embedded Multi-Core Systems

Page 29: Wait-free data structures on embedded multi-core systems

Queues - Evaluation

29Task Scheduling Algorithms and Wait-Free Data Structures for Embedded Multi-Core Systems

Page 30: Wait-free data structures on embedded multi-core systems

Queues - Evaluation

30Task Scheduling Algorithms and Wait-Free Data Structures for Embedded Multi-Core Systems

Page 31: Wait-free data structures on embedded multi-core systems

Stacks

31Task Scheduling Algorithms and Wait-Free Data Structures for Embedded Multi-Core Systems

Page 32: Wait-free data structures on embedded multi-core systems

Stacks:

Related work

Related work

Fatourou presented a wait-free “universal” construction

that is applicable for stacksWait-Free Queues With Multiple Enqueuers and Dequeuers (Kogan, Petrank, 2011)

32Task Scheduling Algorithms and Wait-Free Data Structures for Embedded Multi-Core Systems

Page 33: Wait-free data structures on embedded multi-core systems

Elimination stack

Fatourou’s universal construction SIMA highly efficient universal construction (Fatourou, 2011)

Principle

• Optimized helping scheme

• Threads apply operations to a local copy of the stack

• Every thread tries to replace the global shared object with

its local copy via CAS

• Only applicable for shared objects with small state

33Task Scheduling Algorithms and Wait-Free Data Structures for Embedded Multi-Core Systems

Page 34: Wait-free data structures on embedded multi-core systems

Elimination stack

Fatourou’s universal construction SIMA highly efficient universal construction (Fatourou, 2011)

Elimination

• Push and Pop have reverse semantics:Push(Pop(stack)) = Pop(Push(stack)) = stack

• Eliminated operations are completed immediately

if they do not alter the object’s state

Significantly improves performance if applicable

34Task Scheduling Algorithms and Wait-Free Data Structures for Embedded Multi-Core Systems

Page 35: Wait-free data structures on embedded multi-core systems

Elimination stack

Fatourou’s universal construction SIMA highly efficient universal construction (Fatourou, 2013)

Original version is not suitable for real-time applications:

- ABA problem is prevented using tagged pointers

- Thread-local pools with unbounded capacity

- No deallocation in published algorithm

35Task Scheduling Algorithms and Wait-Free Data Structures for Embedded Multi-Core Systems

Page 36: Wait-free data structures on embedded multi-core systems

Elimination stack

Fatourou’s universal construction SIMA highly efficient universal construction (Fatourou, 2013)

Modified version of Fatourou’s stack

- Uses hazard pointers for safe reclamation

- Uses compartment pool with limited capacity

- Employs the elimination scheme from the original

publication

36Task Scheduling Algorithms and Wait-Free Data Structures for Embedded Multi-Core Systems

Page 37: Wait-free data structures on embedded multi-core systems

Stacks:

Evaluation

37Task Scheduling Algorithms and Wait-Free Data Structures for Embedded Multi-Core Systems

Page 38: Wait-free data structures on embedded multi-core systems

Stacks:

Evaluation

38Task Scheduling Algorithms and Wait-Free Data Structures for Embedded Multi-Core Systems

Page 39: Wait-free data structures on embedded multi-core systems

Task scheduling

39Task Scheduling Algorithms and Wait-Free Data Structures for Embedded Multi-Core Systems

Page 40: Wait-free data structures on embedded multi-core systems

Task Scheduling:

Objectives

Task Scheduling

• Intra-process task scheduling with priority queues

• Low-overhead, fine-grained scheduling of thousands of

small tasks

Priorities:

Focus on low latency and jitter reduction (i.e. predictability),

thus regarding maximum throughput as a secondary

benchmark.

40Task Scheduling Algorithms and Wait-Free Data Structures for Embedded Multi-Core Systems

Page 41: Wait-free data structures on embedded multi-core systems

Task scheduling:

Work stealing

Work stealing

41Task Scheduling Algorithms and Wait-Free Data Structures for Embedded Multi-Core Systems

• One worker thread per

SMP core, no migration

• Tasks passed as &func

• Load-balancing on task

queues

• Many flavors of concrete

implementations

Page 42: Wait-free data structures on embedded multi-core systems

Task scheduling:

Work stealing

Work stealing with task priorities

42Task Scheduling Algorithms and Wait-Free Data Structures for Embedded Multi-Core Systems

• Extended work-stealing

by queues for every

priority

Page 43: Wait-free data structures on embedded multi-core systems

Conclusion

43Task Scheduling Algorithms and Wait-Free Data Structures for Embedded Multi-Core Systems

Page 44: Wait-free data structures on embedded multi-core systems

Conclusion

Revisiting the objective

• Wait-free implementations of pools, queues and stacks now

available for real-time applications

• Benchmark framework and evaluation tools (R) are

published as open source

• Reproducible evaluation of real-time performance

• Verification tool chain on the way

44Task Scheduling Algorithms and Wait-Free Data Structures for Embedded Multi-Core Systems

Page 45: Wait-free data structures on embedded multi-core systems

Conclusion

Recommendations

• Wait-free data structures can rival performance of lock-free

implementations

• But are hard to maintain

• Formal wait-freedom is practically not achievable

Employ wait-free data structures for fault-tolerance, not as a

guarantee for critical deadlines

45Task Scheduling Algorithms and Wait-Free Data Structures for Embedded Multi-Core Systems

Page 46: Wait-free data structures on embedded multi-core systems

Thank You

Source code (data structures, benchmarks, R scripts): https://github.com/fuchsto/embb/tree/benchmark/

Official development source base of embb:https://github.com/siemens/embb/tree/development/

Wiki to this thesis:http://wiki.coreglit.ch

46Task Scheduling Algorithms and Wait-Free Data Structures for Embedded Multi-Core Systems