Performance Analysis of Concurrent & Distributed Real-Time Software Designs ECEN5053 Software Engineering of Distributed Systems University of Colorado

Performance Analysis of Concurrent & Distributed Real-Time Software

Designs

ECEN5053 Software Engineering of Distributed Systems

University of Colorado

October 23, 2002 ECEN5053 SW Eng of Distributed Systems, Performance Analysis, Univ of Colorado

2

Overview

Why bother

Review of RMA

Advanced RMA

Event Sequence Analysis

Examples


3

Why bother?

Quantitative analysis allows for early detection of potential performance problems

Both Rate Monotonic Analysis and Event Scheduling Analysis are applied to designs

Task architecture level

Provides early performance estimate and characterization, e.g. where are bottlenecks


4

A Word About the SPE model

The SPE model (Smith and Williams) can model distributed systems or single CPU systems

Represent components whether they are software or hardware or both

Specify varying workloads


5

Review of RMA

Priority based scheduling of concurrent tasks with hard deadlines

Same CPU

Can be used in environments with less rigid constraints

For example, server role in a client/server application

Assumes priority preemption scheduling algorithm

Can be applied where task synchronization is required


6

Basic Theory

InitiallyIndependent periodic tasks

• Do not communicate with each other• Do not synchronize with each other

Periodic task has A period T, frequency with which it executesAn execution time C, CPU time required/periodCPU utilization of C/T

Group of tasks is schedulable if each task can meet its deadlinesAssign a fixed priority such that the shorter period has the higher priority

RMA Review (cont. 1)


7

RMA Review (cont. 2)

A set of n independent periodic tasks scheduled by the rate monotonic algorithm will always meet its deadlines for all task phasings, if:

C1/T1 + … + Cn/Tn <= n( 21/n – 1) = U(n)

where Ci and Ti are the execution time and period of task ti, respectively.

(Note: the upper bound converges to 69% as the number of tasks approaches infinity.)

U(1) = 1.000 U(2) = .828 U(3) = .779 U(4) = .756U(5) = .743 U(6) = .734 U(7) = .728 U(8) = .724


8

Conclusions & Assumptions

The rate monotonic algorithm is stable when there is a transient overload

A subset of the total number of tasks (highest priorities) will still meet their deadlines if the system is overloaded for a relatively short time.

Context switching overhead is included in the CPU times of the interrupting tasksThe Utilization Bound Theorem is pessimistic. If it fails, we can do a further check by applying a second theorem to get an exact determination of whether the tasks are schedulable.


9

Completion Time Theorem -- Thm 2

For a set of independent periodic tasks, if each task meets its deadline when all tasks are started at the same time, the deadlines will be met for any combination of start times.

Check the end of the first period of task ti as well as the end of all periods of higher priority tasks.

Remember the higher priority tasks have shorter periods

These are called scheduling points

Can be illustrated graphically with a timing diagram


10

Time-annotated sequence diagram

t1 t2 t3Time in msec


11

Contradictions to Basic RMA Theory

Sometimes tasks execute at actual priorities different from their rate monotonic priorities – priority inversion

For example, a lower priority task must execute its critical section at a higher priority to avoid being preempted by a higher priority task that shares the same resource but is mutually excluded

• Support mutual exclusion

• Avoid deadlock


12

Contradictions to Basic RMA Theory - 2

Aperiodic tasks can be treated as periodic tasks where the worst-case inter-arrival time is its “period”

If this “period” is longer than another, it will be assigned a lower priority

Often aperiodic tasks are interrupt-driven and execute as soon as the interrupt arrives


13

Accounting for Priority InversionExtend Theorem 1 (Utilization Bound)

Four factors need to be considered to determine whether task ti can meet its first deadline

Preemption time by higher priority tasks (periods less than ti) Cj/Tj for each task

Execution time for task ti, Ci/Ti

Preemption by higher priority tasks with longer periods, that is, non-rate-monotonic priorities.

• Can only interrupt ti once (why?)

• Ck is the sum of their execution times

• Ck/Ti because worst case is that it all occurs in i’s period

Blocking time by lower priority tasks – once/Ti


14

Generalized Utilization Bound Thm

CkTi

1

Tj

Cj

+j n

Ci + Bi + k l

Ui is the utilization bound during period Ti for task ti. The first term is the total preemption utilization by higher priority tasks with periods of less than ti’s. The second term is the CPU utilization by task ti. The third term is the worst-case blocking utilization experienced by ti. The fourth term is the total preemption utilization by higher priority tasks with longer periods than ti’s period. (Terms 3 and 4 are instances of priority inversion.)

If Ui is less than the worst-case upper bound for U(i), this means the task ti will meet its deadline. The utilization-bound test must be applied to each task. Since rate monotonic priorities are not guaranteed, ti may meet its deadline while a higher priority task does not.

Ui =


15

Generalized Completion Time Theorem

Assumes the worst case that all tasks are ready for execution at the start of the task ti’s period.

Draw the timing sequence diagram for all the tasks and take into account the priority inversion as well as preemption that can occur.

If each task meets its first deadline while all higher priority tasks meet all of their deadlines up to that point and all priority-inverted tasks meet their deadlines up to that point, then ti will meet its deadlines.


16

Task scheduling and DesignCautious approach at design time

Use estimatesSatisfy Thm 1, the conservative one, not just Thm 2

If some tasks with lower priorities have soft real-time or non-real-time tasks

Ok to exceed utilization bound somewhatIf ok to miss their deadlines/targets occasionally

At design time, can choose priorities to assignAim for rate monotonic priorities for periodic tasksAssign highest priorities to interrupt-driven tasks to reflect realityIf 2 tasks have same period, assign one a higher priority based on application semantics


17

Example of Generalized RMA

4 tasks, t1 and t3 are periodic and t2 and ta are aperiodic

ta is interrupt-driven and must execute within 200 ms of the arrival of its interrupt or data will be lost t2 has a worst-case interarrival time of T2.

t1 is periodic: C1 = 20; T1 = 100; U1 = 0.2

t2 is aperiodic: C2 = 15; T2 = 150; U2 = 0.1

ta is aperiodic, interrupt-driven: Ca = 4; Ta = 200; Ua = 0.02

t3 is periodic: C3 = 30; T3 = 300; U3 = 0.1

t1, t2 and t3 access a data repository protected by semaphore s.


18

Notes, not meant for use as slide

If tasks assigned strict rate monotonic priorities, obviously the assignments in priority order from highest to lowest would be t1, t2, ta, and t3.

ta stringent response time tells us to give it the highest priority. The priority assignment becomes ta, t1, t2, and t3.

Overall CPU utilization is 0.42 which is less than worst-case utilization bound for infinity, namely 0.69.

Since rate monotonic priorities are not strictly assigned, we can’t rely on the basic Theorem 1, we need to apply the extended theorem 1 to each task individually.

ta is highest priority and interrupt-driven so there are no blockers. Ua is 0.02 < U(1) -- no problem meeting its deadline.

(cont. next slide)


19

Notes 2, not meant for use as slideConsider t1. Need to consider four factors :

a. Preemption time by higher priority tasks with periods less than T1. There are higher priority tasks (the aperiodic one) but not with shorter periods.

b. Execution time C1 for the task t1 = 20. U1 = 0.2

c. Preemption by higher priority tasks with longer periods. ta is one of these. Preemption utilization during the period T1: Ca /T1 = 4/100=0.04

d. Blocking time by lower priority tasks. Because of the semaphore, t2 and t3 can both potentially block t1. In the worst case, one of them will. But at most one lower priority task can actually block t1 (why?). The worst case is the task with the longer CPU time, t3 = 30. Blocking utilization during the period T1: B3 /T1 = 30/100 = 0.04

Worst case utilization = preemption util. +execution util. + blocking util. = .04 + .2 + .3 = .54 < worst-case upper bound of .69. t1 will be ok.


20

NOTES 3

You do the calculation for tasks 2 and 3. Ask for help if you need it.


21

Event Sequence Analysis

If done properly, during requirements definition, the system’s required response times to external events are specified

After task structuring, we can make a first attempt at allocating time budgets to the concurrent tasks

Event Sequence Analysis determines the tasks to be executed to service a given external event


22

Pick an external event

Determine which I/O task is activated by this event

Determine the sequence of internal events that follow in response

Identify the tasks that are activated

Identify the I/O tasks that generate the system response to the external event

Estimate CPU time for each task

Estimate CPU overhead, inter-task communication and synchronization overhead

Consider other tasks that execute during this period

Performance Analysis using ESA


23

CPU Utilization for ESA

The Sum (indented list) must be less than or equal to the specified system response time

CPU times for the tasks that participate in the event sequence

Times for additional tasks that execute

CPU overhead

Allocate a worst-case upper bound for uncertain CPU times

Overall CPU utilization, estimate for given interval

CPU time for each task, for each path if >= 1

Frequency of activation * tasks’ CPU times


24

Example of Perf. Analysis using ESA

Consider Cruise Control subsystem; see event sequence diagram

based on task architecture diagram

assume, for now, that all the other tasks in the system as well as Calibration in this subsystem have lower priorities so that we can ignore them

Consider first the case of the driver engaging the cruise control lever in the accelerate position resulting in controlled acceleration of the car.

Performance requirement: system must respond to driver’s action within 250 ms.

Sequence of internal events following the driver’s stimulus is shown by the event seq. on the concurrent collaboration diagram (Fig. 17.2 taken from Gomaa’s book).


25

Performance Analysis ESA example - cont.

Assume Cruise Control is in its initial state. ACCEL is the cruise control input.

Event sequence: (Ci is time to process event i)

C1: interrupt arrives from external cr. cont. lever

C2: CC Lever Interface reads the ACCEL input from the CC lever

C3: CC Lever interface sends a cc request message to CC

C4: CC receivse the msg, executes its state transition diagram, and changes state from Initial to Accelerating

C5: CC sends an increase speed command msg to Speed Adjustment

C6: Speed Adjustment executes the command, computes throttle value

C7: Speed Adj sends throttle value msg to Throttle Interface task


26

Performance Analysis ESA example - cont. 2

Event sequence continues:

C8: Throttle Interface computes new throttle position

C9: Throttle Interface outputs throttle position to the real-world throttle. (This is an output operation, uses no CPU time.)

Four tasks required to support the ACCEL external event

Minimum of four context switches required, 4*Cx where Cx is context switching overhead

Assume Cm is message communication overhead so C3, C5, and C7 are all equal to Cm


27


Execution time of this event sequence, Ce = what?

System response time, however, must also consider other tasks that could execute during the time when the system must respond to the external event.

Look at Fig. 17.2 (remember we have artificially decided that all other tasks have lower priorities -- they can’t execute during this time)

Assume Auto Sensors (C10) is periodically activated every 100 ms. It could execute 3 times before the 250 ms deadline.

Shaft Interface (C11) is activated once every shaft rotation. It could execute up to __?__ times assuming a shaft rotation max rate of 6000 rpm. This is once every __?__ .

Distance & Speed (C12) activates periodically once every quarter of a second. In the 250 ms window, it can execute _?_.


28


Every time another task intervenes, there could be two context switches (assume 0.5ms for real-time)

assuming the executing task is preempted and then resumes execution after completion of the intervening taskThese three tasks could therefore impose an additional __?__ context switches.

Total CPU time Cother for these three tasks including system overhead is what?Estimated response time to the external event is greater than or equal to the total CPU time which is the sum of the tasks in the event sequence plus the CPU time for the other tasks. Ctotal = Ce + Cother


29


Make estimates for each of these timing parameters so that the equations can be solved (see table provided)

Substituting for the timing parameters results in estimated value of Ce = 35 ms.

Substituting for the estimated timing parameters adding up to Cother results in estimated value of 79 ms

Ctotal = 114 ms. This is well below the specified response time of 250 ms.


30


How susceptible is the estimated response time to error?

Experiment with different values

What if context switching time were 1 ms instead of 0.5?


31

Performance Analysis using RMA & ESA

An external event activates a task. Its execution initiates a series of internal events which activate other internal tasks.

Can all the tasks in the combined event sequence be executed before the deadline?

Each internal event sequence can be analyzed regarding how much time it will take. The internal event sequences can then be treated as a group of tasks rate monotonically speaking …

That is, initially allocate all the tasks in the event sequence the same priority. These can collectively be considered one equivalent task from a real-time scheduling viewpoint.


32

Performance Analysis using RMA & ESA - 2

This equivalent task has a CPU time equal to the

sum of the CPU times of the tasks in the event sequence

Plus context switching overhead

Plus message communication or event synchronization overhead

Worst-case inter-arrival time of the external event that initiates the event sequence is the period of this equivalent task.


33


To decide if the equivalent task can meet its deadline, apply the real-time scheduling theorems. Consider:

Preemption by higher priority tasks

Blocking by lower priority tasks

Execution time of the equivalent task itself

Cannot always replace all tasks in the event sequence by a single equivalent task

A task may be used in more than one event sequence

Executing the equivalent task at the chosen priority may prevent other tasks from meeting their deadlines.

May need to analyze tasks separately and assign different priorities


34


Must consider preemption and blocking on a per task basis

Also necessary to determine whether all tasks in the event sequence will complete before the deadline.


35

Perf. Analysis using RMA

Some considerationsConsider first a steady state involving only the periodic tasks.After that, the aperiodic externally-imposed demands on the system can be considered.Consider the worst steady state case, namely the case that causes maximum CPU demandRemember context switching timeYou can include aperiodic tasks if they have a known/estimated worst-case inter-arrival timeIf 2 tasks have same period, assign higher priority to the independent task*


36

Perf. Analysis using RMA - 2

Access time to shared data stores consists of one read instruction or one write instruction.

So small that potential delay time due to blocking of one task by another is considered negligible.

It’s guaranteed to be “short” and to “terminate” so don’t try to compute it as a blocking factor, just include it in its CPU time

Significant priority inversion delays can occur and those are the ones to consider


37

Perf. Analysis Example using RMA & ESA

Back to the Cruise Control example

Driver initiates an external event (CC lever or pressing the brake)

Must consider the tasks in the event sequence as well as the periodic tasks that execute on an ongoing basis when simply driving under CC

Earlier we replaced the four tasks in the event sequence with an equivalent aperiodic task


38

Perf. Analysis Example using RMA & ESA -2

Consider the impact of the additional load imposed by the driver-initiated external event on the steady state load of the periodic tasks.

The worst case is when the vehicle is already under automated control (CC). If it weren’t, Speed Adjustment and Throttle Interface wouldn’t be executing so the CPU load would be lighter

Input from CC lever. In the event sequence analysis, we saw CC Lever Interface, CC, Speed Adjustment, and Throttle Interface process this input. (CPU time Ce calculated at slide 29)


39


Four tasks are involved but they must execute in strict sequence.

Each activated by msg from its predecessor.The four are equivalent to one aperiodic taskCe is the sum of the CPU times of the four tasks plus msg communication overhead and context switching overhead. We’ll call the combined task the “event sequence task”

In RMA, can treat aperiodic task as one whose period is the minimum inter-arrival time of the requests. Call it Te = 250 ms.For now, assume desired response is also Te


40


When assigning priority to the event sequence task, initially assign its rate monotonic priority.

When we do this, the event sequence task has the same period as two other periodic tasks, Speed Adjustment and Distance & Speed.Assign the event sequence task the highest priority of those three

The event sequence task still has a lower priority than Shaft Interface, Throttle Interface, and Auto Sensors. (See Table 17.4, Gomaa)Ce for the event sequence task is 35 ms; Te is 250 ms; therefore CPU utilization Ue is 0.14


41


Total CPU utilization of the periodic tasks is 0.48 (you can compute that if you don’t believe me )

Total periodic and event sequence task CPU utilization is 0.62 which is less than .69 and therefore less than U(n) where n is the number of periodic tasks plus 1

Therefore, the event sequence task can meet its deadline as can all the periodic tasks.


42


We made one assumption

All tasks can be allocated their rate monotonic priorities

What is wrong with giving the event sequence task its rate monotonic priority?

What is wrong with giving it the highest priority?

Compromise, give the event sequence task a priority lower than Shaft Interface but higher than Throttle Interface and Auto Sensors. This is higher than its rate monotonic priority.

What does THAT mean we’ll have to do?


43


Overall CPU utilization is less than the 0.69

Bursts of activity can lead to transient loads that are much higher

In the 100 ms worst case CPU burst, the total utilization of the three steady state tasks and the one event sequence task is 67 %, allowing lower priority tasks to execute.

If the next highest priority task, Distance & Speed, were to also execute in this busy 100 ms, CPU utilization would increase to 78%

Comparing to the proper U(n) value, all tasks can meet their deadlines.


44

Design Restructuring

If proposed design does not meet performance goals, design needs to be restructured

Revisit task clustering criteria and task inversion criteria

Consider sequential task inversionCC task sends a speed command msg to the Speed Adj task which in turn sends throttle msgs to the Throttle Interface task.These may be combined into one task, the CC tasks with passive objects for Speed Adj and Throttle Interface. This eliminates message communication overhead between them plus context switching overhead


45

Estimation & Measurement of Performance Parameters

Performance input parameters must be determined through estimation or measurement before the performance analysis is carried out.

Independent variables whose values are input to the performance analysis

Dependent variables are variables whose values are estimated by the real-time scheduling theory

Assumption for RMA, all tasks are locked in main memory so there is no paging overhead. Typically paging overhead cannot be tolerated in real-time system design.


46

Estimation & Measurement of Performance Parameters -- 2

Individual task parameters that need to be estimated for each task involved in the performance analysisTask’s period Ti which is the frequency with which it executesExecution time Ci which is the CPU time required for the periodCPU overheads

Context switching overheadInterrupt handling overheadInter-task communication and synchronization overhead

Documents

Performance Analysis of Concurrent & Distributed Real-Time Software Designs ECEN5053 Software Engineering of Distributed Systems University of Colorado