74

Real-Time Performance of Windows XP Embedded - Research

  • Upload
    others

  • View
    6

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Real-Time Performance of Windows XP Embedded - Research

Real-Time Performance ofWindows XP Embedded

Andreas Harnesk [email protected] Tenser [email protected]

April 30, 2006

ABB Corporate Research Mälardalen UniversityAdvanced Industrial Department of ComputerCommunication Group Science and ElectronicsVästerås, Sweden Västerås, SwedenSupervisor: Henrik Johansson Supervisor: Frank Lü[email protected] [email protected]

Page 2: Real-Time Performance of Windows XP Embedded - Research

AbstractA business unit of ABB, providing embedded system based prod-

ucts for the automation industry, today runs their real-time core ondedicated hardware, isolated from any extra functionality. To staycompetitive in the industry, development costs need to be reduced.One possible solution is to run both the real-time core and the ex-tra functonality on the same hardware, and switch to Windows XPEmbedded as the operating system.

In this report, the characteristics of XP as a real-time operating sys-tem are revealed by investigating how XP works under the hood. Twotypes of real-time implementations are evaluated; one implemented as anormal user-thread, and another implemented in a device driver. Testsare conducted, measuring execution times of the implementations.

The results show that a device driver implementation is more de-terministic than a user-mode implementation. While the speci�c testsconducted yielded execution times with�as it appeared�limited vari-ation, no hard guarantees about an absolute worst case execution timecan be made. However, the tests show that the probability of executiontimes exceeding those measured are very unlikely. Thus, this indicatesthat XP might be suitable as a soft real-time operating system undercertain controlled conditions.

1

Page 3: Real-Time Performance of Windows XP Embedded - Research

Sammanfattning

En a�ärsenhet på ABB som tillverkar produkter baserad på in-tegrerade system för automationsindustrin kör idag sin realtidskärnapå separat hårdvara, isolerad från övrig funktionalitet. För att förblikonkurrenskraftig i industrin måste utvecklingskostnaderna minskas.En möjlig lösning kan vara att köra både realtidskärnan och övrigfunktionalitet på samma hårdvara, samt att byta operativsystem tillWindows XP Embedded.

I denna rapport avslöjas XP:s karakteristik som realtidsoperativsys-tem genom att undersöka hur XP fungerar under huven. Två typer avrealtidsimplementationer testas; en implementerad som en normal an-vändartråd och den andra implementeras i en drivrutin. Tester medavseende på exekveringstider genomförs.

Resultaten visar att en drivrutinsimplementation är mer deter-ministisk än en användartrådsimplementation. Även om de speci�katesterna medgav exekveringstider med begränsad variation går det inteatt ge några hårda garantier för en absolut värstafallsexekveringstid.Testerna visar dock att sannolikheten för exekveringstider överskri-dande de uppmätta är ytterst osannolika. Detta indikerar att XPkan fungera som ett mjukt realtidsoperativsystem under kontrolleradeförutsättningar.

2

Page 4: Real-Time Performance of Windows XP Embedded - Research

AcknowledgementsFirst and foremost we would like to thank our thesis supervisors HenrikJohansson and Frank Lüders, who have shown a large and consistent interestthroughout the project. Our numerous scienti�c discussions and their manyconstructive comments have greatly improved this work.

Thanks to Roger Melander for giving us a deeper insight in how taskinterrupts work in a processor, and also for being such a nice person.

A special thanks to Jimmy Kjellsson for helping us setting up the oscil-loscope environment.

Last but not least, thanks to Dr. Tomas Lennvall for providing us withlots of constructive feedback.

Our experience at ABB Corporate Research has been nothing but posi-tive and the people working there truly are top notch.

3

Page 5: Real-Time Performance of Windows XP Embedded - Research

Contents1 Introduction 9

1.1 Microsoft Windows . . . . . . . . . . . . . . . . . . . . . . . . 9

2 Real-Time Concepts 112.1 Hard and Soft Real-Time . . . . . . . . . . . . . . . . . . . . 112.2 Tasks, Processes and Threads . . . . . . . . . . . . . . . . . . 112.3 Shared Resources and Semaphores . . . . . . . . . . . . . . . 122.4 Priorities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122.5 Scheduling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132.6 Time Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

3 RTOS Requirements 143.1 Requirement 1: Preemtible and Multitasking . . . . . . . . . 143.2 Requirement 2: Task Priorities . . . . . . . . . . . . . . . . . 153.3 Requirement 3: Predictable Task Synchronization Mechanisms 153.4 Requirement 4: Avoid Priority Inversion . . . . . . . . . . . . 153.5 Requirement 5: Predictable Temporal Behavior . . . . . . . . 16

4 Windows XP Embedded 174.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174.2 System Structure Overview . . . . . . . . . . . . . . . . . . . 18

4.2.1 Hardware Abstraction Layer . . . . . . . . . . . . . . . 184.2.2 Kernel . . . . . . . . . . . . . . . . . . . . . . . . . . . 184.2.3 Device Drivers . . . . . . . . . . . . . . . . . . . . . . 194.2.4 Executive . . . . . . . . . . . . . . . . . . . . . . . . . 19

4.3 Thread Scheduling and Priority Levels . . . . . . . . . . . . . 194.4 Interrupt Handling . . . . . . . . . . . . . . . . . . . . . . . . 20

4.4.1 Interrupt Service Routine . . . . . . . . . . . . . . . . 224.4.2 Deferred Procedure Call . . . . . . . . . . . . . . . . . 224.4.3 Asynchronous Procedure Call . . . . . . . . . . . . . . 22

4.5 Memory Management . . . . . . . . . . . . . . . . . . . . . . 224.5.1 Kernel Page Pools . . . . . . . . . . . . . . . . . . . . 244.5.2 Memory Manager . . . . . . . . . . . . . . . . . . . . . 24

4.6 Windows Driver Model . . . . . . . . . . . . . . . . . . . . . . 244.6.1 I/O Request Packets . . . . . . . . . . . . . . . . . . . 254.6.2 Driver Types . . . . . . . . . . . . . . . . . . . . . . . 254.6.3 Device Objects . . . . . . . . . . . . . . . . . . . . . . 264.6.4 I/O Request Processing . . . . . . . . . . . . . . . . . 264.6.5 Floating-Point Operations . . . . . . . . . . . . . . . . 28

4

Page 6: Real-Time Performance of Windows XP Embedded - Research

5 Real-Time Aspects of XP 305.1 Design Issues That Limit XP's Use As a RTOS . . . . . . . . 305.2 Using XP as a RTOS . . . . . . . . . . . . . . . . . . . . . . . 31

6 Extensions 336.1 RTX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

6.1.1 Architecture . . . . . . . . . . . . . . . . . . . . . . . . 336.1.2 Software Development . . . . . . . . . . . . . . . . . . 356.1.3 Does RTX Meet the RTOS Requirements? . . . . . . . 35

6.2 INtime . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 366.2.1 Architecture . . . . . . . . . . . . . . . . . . . . . . . . 366.2.2 APIs . . . . . . . . . . . . . . . . . . . . . . . . . . . . 376.2.3 Software Development . . . . . . . . . . . . . . . . . . 376.2.4 Does INtime Meet the RTOS Requirements? . . . . . 38

7 Related Work 397.1 User Level Thread Implementation . . . . . . . . . . . . . . . 397.2 Driver Based Implementation . . . . . . . . . . . . . . . . . . 397.3 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

8 Problem Description 428.1 Suggested Model . . . . . . . . . . . . . . . . . . . . . . . . . 438.2 Purpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 448.3 Scope . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

9 Methodology 459.1 Conducted Tests . . . . . . . . . . . . . . . . . . . . . . . . . 45

9.1.1 User-Thread Implementation . . . . . . . . . . . . . . 469.1.2 Driver Implementation . . . . . . . . . . . . . . . . . . 47

9.2 Test System . . . . . . . . . . . . . . . . . . . . . . . . . . . . 479.2.1 System Services . . . . . . . . . . . . . . . . . . . . . . 48

9.3 Execution Time Measurement . . . . . . . . . . . . . . . . . . 489.3.1 Performance Counter . . . . . . . . . . . . . . . . . . . 489.3.2 Time-Stamp Counter . . . . . . . . . . . . . . . . . . . 509.3.3 Oscilloscope . . . . . . . . . . . . . . . . . . . . . . . . 51

9.4 System Load Conditions . . . . . . . . . . . . . . . . . . . . . 529.4.1 Idle . . . . . . . . . . . . . . . . . . . . . . . . . . . . 529.4.2 CPU Load . . . . . . . . . . . . . . . . . . . . . . . . . 529.4.3 Graphics Load . . . . . . . . . . . . . . . . . . . . . . 529.4.4 HDD Load . . . . . . . . . . . . . . . . . . . . . . . . 529.4.5 Network Load . . . . . . . . . . . . . . . . . . . . . . . 539.4.6 Stress . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

9.5 Test Names . . . . . . . . . . . . . . . . . . . . . . . . . . . . 539.6 Additional Tests . . . . . . . . . . . . . . . . . . . . . . . . . 53

5

Page 7: Real-Time Performance of Windows XP Embedded - Research

10 Results 5410.1 TSC Measurement Results . . . . . . . . . . . . . . . . . . . . 54

10.1.1 UserIdle . . . . . . . . . . . . . . . . . . . . . . . . . . 5410.1.2 UserCPU . . . . . . . . . . . . . . . . . . . . . . . . . 5410.1.3 UserGraphics . . . . . . . . . . . . . . . . . . . . . . . 5510.1.4 UserHDD . . . . . . . . . . . . . . . . . . . . . . . . . 5510.1.5 UserNetwork . . . . . . . . . . . . . . . . . . . . . . . 5710.1.6 UserStress . . . . . . . . . . . . . . . . . . . . . . . . . 57

10.2 Oscilloscope Test Results . . . . . . . . . . . . . . . . . . . . 57

11 Conclusions 6011.1 Better Determinism Than Reported In Previous Work . . . . 6011.2 Higher Task Priority Yields Better Determinism . . . . . . . . 6011.3 Driver Faster Than User-Mode . . . . . . . . . . . . . . . . . 6111.4 Task Interruption Can Occur Anywhere . . . . . . . . . . . . 6111.5 Small Di�erence Between Normal and Prioritized DPC . . . . 6111.6 Algorithm Slower in Kernel-Mode . . . . . . . . . . . . . . . . 6211.7 No Guarantees Can Be Given . . . . . . . . . . . . . . . . . . 62

12 Future Work 6312.1 Use of an Ethernet Based Protocol for Communication . . . . 6312.2 Modify Interrupt Handling . . . . . . . . . . . . . . . . . . . . 6312.3 Run the Tests on XPE . . . . . . . . . . . . . . . . . . . . . . 6412.4 Evaluate Extensions . . . . . . . . . . . . . . . . . . . . . . . 64

A Oscilloscope Test Results 66

6

Page 8: Real-Time Performance of Windows XP Embedded - Research

List of Figures1 Simpli�ed Windows architecture[30]. . . . . . . . . . . . . . . 182 The full range of priority levels in XP[3]. . . . . . . . . . . . . 213 The virtual memory for two processes. The gray areas repre-

sent shared memory. . . . . . . . . . . . . . . . . . . . . . . . 234 The �ow of I/O requests through the system. . . . . . . . . . 265 Sketch of the implementation suggested by the ABB business

unit. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 436 Full event cycle of the user-thread implementation. . . . . . . 467 Sequential time diagram of the user-thread implementation. . 468 Full event cycle of the driver implementation. . . . . . . . . . 479 Sequential time diagram of the driver implementation. . . . . 4810 Measured start-stop time versus measurement number for the

Performance Counter. (a) With Sleep(), (b) Without Sleep(). 4911 Measured start-stop time versus measurement number for the

Time-Stamp Counter, (a) with Sleep(), (b) without Sleep(). . 5012 UserIdle algorithm execution time. (a) Scatter plot, (b) Time

distribution. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5513 UserCPU algorithm execution time. (a) Scatter plot, (b) Time

distribution. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5614 UserGraphics algorithm execution time. (a) Scatter plot, (b)

Time distribution. . . . . . . . . . . . . . . . . . . . . . . . . 5615 UserHDD algorithm execution time. (a) Scatter plot, (b)

Time distribution. . . . . . . . . . . . . . . . . . . . . . . . . 5716 UserNetwork algorithm execution time. (a) Scatter plot, (b)

Time distribution. . . . . . . . . . . . . . . . . . . . . . . . . 5817 UserStress algorithm execution time. (a) Scatter plot, (b)

Time distribution. . . . . . . . . . . . . . . . . . . . . . . . . 5818 Suggested model for interrupt interception. . . . . . . . . . . 64

List of Tables1 The priority levels in XP. . . . . . . . . . . . . . . . . . . . . 202 Measured start-stop time in µs for the PeC and TSC. . . . . . 513 Test names used throughout the report. . . . . . . . . . . . . 534 Algorithm execution time in µs for the TSC tests. . . . . . . . 54

7

Page 9: Real-Time Performance of Windows XP Embedded - Research

GlossaryAPC Asynchronous Procedure CallAPI Application Program InterfaceBIOS Basic Input/Output SystemCOTS Commercial o� the ShelfCPU Central Processing UnitDDK Microsoft Windows Driver Development KitDPC Deferred Procedure CallEDF Earliest Deadline FirstFDO Functional Device ObjectsFiDO Filter Device ObjectsFIFO First In First OutFTP File Transfer ProtocolGPOS General Purpose Operating SystemGUI Graphical User InterfaceHAL Hardware Abstraction LayerHDD Hard Disk DriveI/O Input/OutputIDE Integrated Development EnvironmentIDT Interrupt Descriptor TableIDTR Interrupt Descriptor Table RegisterIRP Interrupt Request PackageIRQ Interrupt RequestIRQL Interrupt Request LevelISR Interrupt Service RoutineOS Operating SystemPC Personal ComputerPeC Performance CounterPCI Peripheral Component InterconnectPDO Physical Device ObjectsRAM Random Access MemoryRT-HAL Real-Time Hardware Abstraction LayerRTOS Real-Time Operating SystemRTSS Real-Time Sub-SystemSRI Service Request InterruptTSC Time-Stamp CounterWCET Worst-Case Execution TimeWDM Windows Driver ModelXPE Microsoft Windows XP EmbeddedXP Microsoft Windows XP Family (including XPE)

8

Page 10: Real-Time Performance of Windows XP Embedded - Research

1 IntroductionWithin ABB, and the automation industry in general, embedded systems arefound in virtually every product and system. More and more functionalityare being built in and the performance requirements become tougher.

Today it is very common that embedded systems run with the supporto�ered by a real-time operating system (RTOS) in order to meet the re-quirements enforced by, for example, the industry process being controlled.These RTOSs are often high performing and quite reliable. However, thisoften comes to the price of high cost, high complexity, and unfriendly usage.In addition, it is not rare that such systems require special development toolsand environments, and sometimes also special platforms. Since cost and us-ability are two important factors for the industry, this can be a problem incertain business areas.

In the past, automation systems were usually developed speci�cally forone or possibly a few products, which made the development extremely ex-pensive. To decrease development costs in general, functionality is oftengrouped into independent, reusable and well de�ned solutions. This hasbeen possible thanks to standardization from organizations such as IEEE,W3C, ISO, and IEC, but also because of de-facto standards like MicrosoftWindows and the .NET framework.

1.1 Microsoft WindowsThe Windows XP family of operating systems (OS) dominate the personalcomputer OS market[30]. It is a general purpose operating system (GPOS)designed to optimize throughput and average performance[34]. Because of itspopularity, there is a strong interest in using XP as an embedded real-timesystem for the automation industry.

There are several reaons why the interest in XP as a RTOS is so strong.The most signi�cant possible bene�ts are:

• Personal computer (PC) hardware is much cheaper than traditionalembedded systems in the automation industry. For example, cheapEthernet adapters could be used.

• Functionality can be developed using rapid prototyping and the .NETframework.

• A vast amount of software, development tools, and COTS components(e.g. ActiveX) are available to the developers[24].

• It is arguably easier to develop software for XP than for RTOSs be-cause of the familiar integrated development environments (IDE) likeMicrosoft Visual Studio.

9

Page 11: Real-Time Performance of Windows XP Embedded - Research

• Customers are inherently familiar with the user interface, since XP isa de-facto standard in the o�ce world[30].

This leads to the key question: Under which conditions can XP be used asa RTOS? This report tries to answer this question by investigating how XPworks under the hood, and describing central functionality and mechanisms�in particular those a�ecting real-time performance�of XP.

10

Page 12: Real-Time Performance of Windows XP Embedded - Research

2 Real-Time ConceptsBefore covering the requirements of a RTOS, the general real-time terms andconcepts will be introduced. A real-time system is de�ned as a system wherecorrect behavior not only depends on an error free result, but also on whenthe result is delivered[23].

2.1 Hard and Soft Real-TimeIf the real-time system fails to complete the calculation within a de�nedtime frame, it is considered a system failure. The e�ect of missing a deadlinevaries from di�erent applications, and real-time systems are often dividedinto two separate classes, depending on how critical a missed deadline is. Ina hard real-time system, deadlines must be met at all times, and a misseddeadline could lead to catastrophic results[21]. An example of a hard real-time system is the steering systems in an airplane, where a missed deadlineduring landing could result in a crash. A calculated result delivered after adeadline is considered useless in a hard real-time system.

Soft real-time systems[7], however, are allowed to miss deadlines some-times, but it will usually result in a performance degradation[21]. An ex-ample of a soft real-time system is a DVD-player, where missed deadlinesduring decoding could result in frame skips, leading to poor quality ratherthan failure.

2.2 Tasks, Processes and ThreadsAll real-time systems consist of tasks[7]. A task can be seen as a sequenceof method executions. There are two types of tasks, known as periodic andnonperiodic tasks. Just as it sounds, periodic tasks are executed periodically,for example every 20 milliseconds. Nonperiodic tasks, also known as eventtriggered tasks, are executed when an event occurs[23]. Periodic tasks areoften used for sensor reading, actuator control, and other time critical events,while nonperiodic tasks are better suited for events that are less common,for example user interaction.

A process is an executing program, including the current values of theprogram counter, registers, and variables. The central processing unit (CPU)rapidly switches from process to process, running each for a short period oftime. At any instant of time, the CPU is running only one process, but inthe course of a longer period of time, it may run several processes. Thistechnique, giving the illusion of parallelism, is known as multitasking [31].The actual switch of actively running process is known as a context switch.

Each process has an address space and at least one thread of execution.The thread has a program counter, keeping track of the instruction to executenext, along with registers, and a stack. In modern operating systems, a

11

Page 13: Real-Time Performance of Windows XP Embedded - Research

process can have more than one thread, all sharing the same address space.Switching of actively running thread in a process is also called a context

switch. However, a context switch within the same process is much faster,since the address space for the process remains unchanged. This is one ofthe most signi�cant bene�ts of multithreading OSs.

Both processes and threads can be seen as di�erent types of tasks.

2.3 Shared Resources and SemaphoresA shared resource is a resource used by several tasks. It can be anythingfrom network access to a global variable used for task synchronisation. Toensure deterministic behavior of a real-time application, the usage of sharedresources may need to be protected in some cases. More speci�cally, onlyone task should be able to access a shared resource at a time[31]. Thisprotection mechanism is know as a critical section. For example, if a linkedlist is used as a shared resource, only one task can be allowed to access itduring updates (write operations), since iteration of a list being written tocan result in pointer errors.

This protection of resources is often realized using semaphores[23]. Putsimply, a task has exclusive rights to a resource if the task has locked thesemaphore. When a semaphore is locked, any other task requesting access tothe resource is blocked during the execution of the critical section. When atask has left the critical section, the semaphore is unlocked and the blockedtask can acquire it instead. If multiple tasks are waiting for the semaphore,di�erent approaches can be taken to determine which task should be grantedthe semaphore[31].

2.4 PrioritiesThe concept of priority levels is important in a real-time system. All tasksare given a priority level, which determines the exection order and time-shareof each task in a system. A low priority task is interrupted if a task withhigher priority wants to execute during the same timeframe.

A common priority problem in real-time systems occurs when sharedresources are used between tasks with di�erent priorities. In the followingexample, a system with three tasks and a semaphore is used. The tasks havethe priority levels of high, normal, and low. At �rst, the high and normalpriority tasks are idle and the low priority task runs and immediately locksthe semaphore. After a while, the high priority task is ready to run and wantsto use the semaphore, but since it is already locked, this task is blocked untilthe low prioritized task is �nished with the semaphore. In the meantime,the normal priority task is ready to execute. Because it has a higher prioritythan the low priority task, it gets to execute instead and does so for anarbitrary long time. During this time, the highest priority task has to wait

12

Page 14: Real-Time Performance of Windows XP Embedded - Research

for the blocked semaphore owned by the low priority task, which in turncannot execute since a higher priority task is running, even though it hasthe highest priority. In other words, the priority of the tasks is inverted; aphenomenom called priority inversion. Mechanisms used to avoid priorityinversions will be discussed in Section 3.4.

2.5 SchedulingThe execution order of tasks is decided by a scheduler. Two kinds of schedul-ing algorithms exist: o�-line and on-line scheduling[23]. An o�-line sched-uler makes a schedule prior to code execution[7]. Because of this, an o�-linescheduler can guarantee that no deadlines are missed, since it has completeknowledge of the system, assuming the timing constraint of each task is cor-rect. However, this type of scheduling algorithm allows no event triggeredtasks, since the knowledge of when an event occurs is not known beforeruntime.

To allow event triggered tasks in a real-time system, an on-line schedulerneeds to be used. The main drawback with an on-line scheduler is thatdeadline guarantee can only be given under certain controlled conditions, forexample, if no event triggered tasks exist[20].

2.6 Time AnalysisTime analysis is an important subject in real-time systems. Normally, de-velopers are interested in the average execution time, worst-case executiontime (WCET), and execution time variation[7]. WCET and variation arethe most interesting ones of the three. The WCET is the longest time a taskwill take to execute. If this time is known, it is possible to design the systemin such a way that the deadlines are never missed.

The execution time variation is also important since a low variationmeans a better utilization of the hardware. Since most real-time systemsare embedded and have limited hardware capacity[7], both determinism andmemory e�ciency are important to keep hardware costs down.

13

Page 15: Real-Time Performance of Windows XP Embedded - Research

3 RTOS RequirementsBased on the de�nition of a real-time system, the results of a RTOS shouldbe given within a prede�ned time frame. The RTOS needs to be time de-terministic to guarantee the ful�llment of this requirement. Although timedeterministic behavior is important in a RTOS it is not the only requirementfor an OS to be considered a RTOS. The following requirements need to beful�lled[34, 32, 19]:

• The OS has to be multitasked and preemptible.

• The notion of task priority has to exist.

• The OS has to support predictable task synchronization mechanisms.

• The OS must support a system for avoiding priority inversion.

• The OS must have predictable temporal behavior.

Because of the vague de�nition of a soft real-time system (a system al-lowed to occasionally miss deadlines), no de�nition of a RTOS can be basedon what is required by soft real-time systems. As Timmerman says in [34]:�...the term `real-time' is often misused to indicate a fast system. And fastcan then be seen as `should meet timing deadlines', thus meaning a softreal-time system.�. In other words, a GPOS would be considered a RTOS ifsoft real-time characteristics were su�cient. In this report, the term RTOSmeans a OS suitable to run hard real-time systems.

3.1 Requirement 1: Preemtible and MultitaskingAccording to the �rst requirement, the OS must be multitasked. Tasks canbe implemented as both processes and threads in the same system. Sinceall threads in a process share the same address space, creating, destructing,and switching threads are many times faster then the same operations onprocesses[31]. Multithreaded OSs are therefore preferred over those that arejust multitasked.

According to [34]: �...[The] scheduler should be able to preempt anythread in the system and give the resource to the thread that needs it most.The OS (and the hardware architecture) should also allow multiple levelsof interrupts to enable preemption at the interrupt level.� In other words,a preemtible system must be capable of preempting a thread at any timeduring execution. Almost all OSs are multitasked, multithreaded, and o�erpreemption. However, most GPOSs do not allow the kernel to be preempted.Because of this limitation, a high-priority task cannot preempt a kernel callmade by a low-priority task.[19].

14

Page 16: Real-Time Performance of Windows XP Embedded - Research

3.2 Requirement 2: Task PrioritiesThe notion of task priorities needs to exist in order to have some predictabil-ity of task execution order and to ensure that the most critical tasks get torun �rst. There are many di�erent scheduling algorithms available to makethis possible. The optimal solution for dynamic priorities (priorities assignedduring runtime) is called earliest deadline �rst (EDF) and lets the task withthe earliest deadline execute. But since complete knowledge of the task exe-cution needs to be known in advance, this algorithm is not suitable in eventtriggered systems[23]. Although tasks in a system running EDF schedulingdo not have priority levels assigned during system design, all tasks are stille�ectively prioritized according to the earliest deadline. Rate monotonic isthe optimal algorithm for system with static prioritized task (task priorityis decided in advance). One of the major drawbacks with this schedulingalgorithm is the unreliable requirement that all tasks executes without anyinteraction[23].

3.3 Requirement 3: Predictable Task Synchronization Mech-anisms

It is unlikely that tasks in a RTOS execute independently of each other. Be-cause of this, a RTOS needs a predictable synchronization between tasks[23].By using shared resources guarded by locks, safe interprocess/thread com-munication can be guaranteed. In a RTOS, this locking mechanism needs tobe time deterministic.

3.4 Requirement 4: Avoid Priority InversionPriority inversion is a classic real-time problem and must be handled in aRTOS. There is no way to eliminate priority inversion when shared resourcesand priority levels are used[19], which are both requirements of a RTOS. ARTOS needs to have a system for minimizing the time of the inversion.One solution for this problem is known as a shared resource protocol, whichdetermines rules for accessing shared resources.

One of the simplest and most widely used shared resource protocol iscalled priority inheritance protocol. It reduces the blocking time by givingthe low priority task the same priority as the blocked task waiting for thesemaphore. To reduce the blocking time even more, a task cannot have asemaphore locked when execution is done. The downside of this protocol isthat a high prioritized task can be blocked by several low prioritized tasks[23].For example, if a high prioritized task needs two semaphores to execute andboth of them are locked when execution starts, the high prioritized task mustwait until both low prioritized tasks have released their semaphores beforeexecution can start.

15

Page 17: Real-Time Performance of Windows XP Embedded - Research

3.5 Requirement 5: Predictable Temporal BehaviorThe �nal requirement states that the system activities (system calls, taskswitching, interrupt latency, and interrupt masking) should have predicabletemporal behavior. Some papers argue that predicable temporal behavior isnot enough and that timing constraints even should be given by the RTOSmanufacturer. Also, system interrupt levels and device driver interrupt re-quest levels need to be known by the developer of the real-time system[34].Interrupts are described in Section 4.4.

16

Page 18: Real-Time Performance of Windows XP Embedded - Research

4 Windows XP EmbeddedIn the previous section, the basic concepts of real-time systems were intro-duced, along with a list of requirements an RTOS needs to ful�ll. This sectionintroduces Windows XP Embedded (XPE) and explains how its relevant OSmechanisms work.

Because XPE is a componentized version of Windows XP Professional(XP), all technical operating system details for XP, such as thread priori-ties, scheduling algorithms, and inter-process communication also apply toXPE[37]. Applications designed for XP can run without modi�cations onXPE, as long as the required libraries for the application are installed (forexample, a .NET application will obviously need the .NET framework)[37].Furthermore, the same driver model (WDM) is used, which makes all devicedrivers for XP available to the embedded system[36].

4.1 BackgroundMost of the previous research on real-time applications and Windows hasbeen based on Windows NT 4.0. There are several reasons for this:

• Windows NT was designed from the ground up as a 32-bit operat-ing system with reliability, security, and performance as its primarygoals[8]. This means NT was considered a new technology, which inci-dentally is what the letters NT stand for[31].

• Windows NT 4.0 was the �rst version of NT that sported the popularuser interface from Windows 95, which made it easier for companies tomigrate to it; and many of them did so[8, 31].

• Since NT 4.0, the kernel has not changed in terms of real-time char-acteristics. The scheduling algorithm, thread priorities, and interruptroutines have remained the same throughout the di�erent versions ofNT[30]. This means that the limitations of using the platform as aRTOS are already well known.

Because this report is examining the real-time characteristics of XPE, it isimportant to know that XP, which XPE derives from, is part of the WindowsNT family of operating systems and its formal version number is NT 5.1. Asstated above, XP also uses the same scheduling algorithms and interrupt han-dling routines as NT 4.0 and NT 5.0 (commonly known as Windows 2000).This makes the previous research (see [27, 24, 34]) highly relevant for thisreport, even though it was performed on NT 4.0.

From here on, the term XP will be used for information applying to bothWindows XP and Windows XP Embedded. The term XPE will be used onlyfor information applying speci�cally to Windows XP Embedded.

17

Page 19: Real-Time Performance of Windows XP Embedded - Research

Systemsupport

processes

Serviceprocesses

Userapplications

Environmentsubsystems

Subsystem DLLs

Executive

Hardware abstraction layer (HAL)

Kernel Device drivers

Windowingand graphics

User-mode

Kernel-mode

Figure 1: Simpli�ed Windows architecture[30].

4.2 System Structure OverviewIn order to understand how XP works with threads, priorities, and interrupts,it is necessary to gain some basic knowledge about the structure of the OS.

4.2.1 Hardware Abstraction LayerOne of the primary design goals of NT was to make it portable across di�erentplatforms[31]. Therefore, NT/XP is divided into several layers, each oneusing the services of the ones below it. As shown in Figure 1, the �rst layer,working closely with the hardware, is called the Hardware Abstraction Layer(HAL). Its purpose is to provide the upper level of the OS with a simpli�edabstraction of the often very complex hardware below it, in order to allowthe rest of the OS to be mostly platform independent. For example, HALhas calls to associate interrupt service procedures with interrupts, and settheir priorities[31]. The HAL is delivered in source code (requiring a specialagreement with Microsoft). It is thus possible to rede�ne how XP handlesthe system clock, interrupts, and so forth[35]. As will be shown in Section 6,some third party solutions make use of a modi�ed HAL to achieve predictabletemporal behavior in XP.

4.2.2 KernelAbove the HAL is the actual kernel layer. The purpose of the kernel is tomake the rest of the OS hardware independent. This is where XP handlesthread management and scheduling, context switches, CPU registers, pagetables, and so on. The actual scheduling algorithm used will be discussedlater in this section. The kernel has another important function: it pro-vides support for two classes of system objects, namely control objects and

18

Page 20: Real-Time Performance of Windows XP Embedded - Research

dispatcher objects.Control objects are objects controlling the system. The most important

object to know about is the deferred procedure call (DPC), which is usedto split o� the non-time critical part of an interrupt service procedure fromthe time critical part. This mechanism will be explained in greater detail inSection 4.4.2.

Dispatcher objects include semaphores, mutexes, events, and other ob-jects threads can wait on. Since this is closely related to thread scheduling,dispatcher objects are handled in the kernel.

4.2.3 Device DriversDevice drivers work closely with the kernel. Running in kernel-mode, theyhave direct memory access and can manipulate system objects and I/O de-vices. However, a device driver can also do things not related to devices,such as performing calculations. This part of the system is relevant for thisreport. The Windows Driver Model (WDM) will be discussed in greaterdetail in Section 4.6.

4.2.4 ExecutiveThe last part of the system structure mentioned in this brief overview is whatis known as the executive. It is a collection of components working togetherwith the kernel to provide the rest of the system with a device-independentabstraction. Among other things, the executive contains components formanaging processes, I/O, and memory. The I/O Manager, for example,plays an important role in interrupt handling, explained in Section 4.4

4.3 Thread Scheduling and Priority LevelsWindows XP has 32 priority levels for user-mode threads, numbered 0 to 31.A process can have one of the following class priorities for the process: Idle,Below Normal, Normal, Above Normal, High, and Realtime. Each threadcan then have a relative priority compared to the other threads in the process.The available thread priority levels are: Idle, Lowest, Below Normal, Normal,Above Normal, Highest, and Time Critical[31]. This sums up to a totalof 42 combinations, which are mapped to the 32 priority levels according toTable 1.

As seen in Table 1, the class priorities ranging from High to Idle havethe same upper and lower priority limit. This makes it possible for the XPscheduler to dynamically make priority adjustments to maximize averageperformance[24]. For example, when an I/O operation completes a requestthat a thread was blocked waiting for, the priority of that thread is increased.The purpose of this is to maximize I/O utilization[31] and is not the same

19

Page 21: Real-Time Performance of Windows XP Embedded - Research

Win32 process class prioritiesAbove Below

Realtime High Normal Normal Normal IdleTime critical 31 15 15 15 15 15

Highest 26 15 12 10 8 6Win32 Above normal 25 14 11 9 7 5Thread Normal 24 13 10 8 6 4

priorities Below normal 23 12 9 7 5 3Lowest 22 11 8 6 4 2

Idle 16 1 1 1 1 1

Table 1: The priority levels in XP.

as priority inheritance. Note that the dynamic priority boosts never increasethe priority above level 15.

As a result of these dynamic priority properties, none of the a�ectedpriority classes are predictable and the use of them in a real-time applicationwould render the application non-deterministic. The number of availablepriority levels to be considered for a real-time application is thus reducedfrom 32 to just 7 (the Realtime class).

The thread priority levels in the Realtime class are all higher than thedynamic classes, making them more suitable for real-time application usage.It should be clear that, although this priority class is called Realtime, thereare no guarantees given from the operating system. It simply means that itis the highest priority class available for user-level threads and no dynamicpriority adjustments are ever made on threads in this class[31].

Threads sharing the same priority level are processed in First-In-First-Out (FIFO) order.

Figure 2 shows the full range of priority levels in XP, including the ISRsand DPCs.

4.4 Interrupt HandlingInterrupts in XP have higher priority than all the user-level threads men-tioned in Section 4.3, including those in the real-time priority class.

All hardware platforms supported by XP implement an interrupt con-troller that manages external interrupt requests (IRQs) for the CPU. Oncean interrupt occurs, the CPU gets the interrupt number (known as a vector),which is translated from the IRQ by the interrupt controller. This vectoris then used as an index in the interrupt descriptor table (IDT) to �nd theappropriate routine for handling the interrupt[30, 13]. XP �lls the IDT withpointers to routines for interrupt handling at start-up. To locate the IDT,the CPU reads the IDT register (IDTR), which stores the base address andsize of the IDT[13]. XP also uses the IDT to map vectors to IRQs[30].

20

Page 22: Real-Time Performance of Windows XP Embedded - Research

Figure 2: The full range of priority levels in XP[3].

Since interrupts are handled di�erently by di�erent CPU architectures,XP provides an abstract scheme to deal with all platforms. This HAL schemeprovides a common priority handling mechanism for interrupt requests byassigning an interrupt request level (IRQL) to all interrupts[2]. IRQLs rangefrom 0 to 31, where higher numbers represent higher priority. The dynamicand real-time priority spectrum for user threads all run at IRQL 0 and havean internal priority scheme as described earlier in this report.

Because the CPU is always executing code at a speci�c IRQL stored aspart of the execution context of the executing thread, the IRQLs is used todetermine execution order.

When an interrupt occurs, the CPU compares the IRQL of the incominginterrupt to the current IRQL. If the incoming interrupt has a higher IRQLthan the current one, the trap handler saves the state information of thecurrently executing thread, raises the IRQL of the CPU to the value of theincoming interrupt, and calls the interrupt dispatcher, which is a part of theI/O Manager. The interrupt dispatcher calls the appropriate routine for han-dling the interrupt. When the interrupt routine is �nished, the CPU lowersthe IRQL to the value of the preempted thread and continues execution.

If the IRQL of the interrupt is lower than or equal to the current IRQLof the CPU, the interrupt request is left pending until the IRQL drops belowthe value of the request[30, 2].

Two classes of IRQLs exist. The lowest three IRQLs (0-2) belong to thesoftware class. They consist of PASSIVE_LEVEL, used for normal thread exe-cution, DISPATCH_LEVEL for thread scheduling, memory management and ex-

21

Page 23: Real-Time Performance of Windows XP Embedded - Research

ecution of DPCs, and APC_LEVEL for asynchronous procedure call execution[2].Asynchronous procedure calls and deferred procedure calls are explainedlater in this section.

The remaining levels (3-31) belong to the hardware class. The lowest 24IRQLs in this class (3-26) are reserved for device interrupts, also known asDIRQLs. They are used for interrupt service routine (ISR) execution[2, 30].

4.4.1 Interrupt Service RoutineThe interrupt dispatcher, among other things, makes the system execute anISR mapped to the device triggering the interrupt, which runs at the sameIRQL as the interrupt[31]. For a more detailed explanation of the interruptdispatcher, see Section 4.6.4 on page 26.

Only critical processing is performed in the ISR, for example, copyingor moving a registry value or bu�er. An ISR must complete its executionvery quickly to avoid slowing down the operation of the device triggering theinterrupt, and delaying the operation of all lower processes at lower IRQL.

4.4.2 Deferred Procedure CallAlthough an ISR might move data from a CPU register or a hardware portinto a memory bu�er, in general the bulk of the processing is scheduled forlater execusion in a DPC, which runs when the processor drops its IRQLto DISPATCH_LEVEL[31]. The DPCs are handled by the scheduler in a FIFOqueue. Since interrupts have higher IRQLs, a DPC can be preempted byan interrupt at any time, which means the FIFO queue can sometimes growvery long.

However, it is possible to set a higher priority of a scheduled DPC usinga special kernel method. This will e�ectively place the DPC �rst in line ofthe queue[25].

4.4.3 Asynchronous Procedure CallThere are also asynchronous procedure calls (APCs) running below the DPCpriority level. APCs are similar to DPCs, but they must execute their codein the context of a speci�c user process[3], which means a full process contextswitch may need to be carried out by the OS before it can run.

ISRs and DPCs, on the other hand, only manipulate the kernel memoryshared by all processes and can therefore run within any process context.

4.5 Memory ManagementThe concept of virtual memory is used in XP. One of the main reasons for thisis to allow the system to use more memory than is physically available. Forexample, an application requiring 500 MB of memory can run on a computer

22

Page 24: Real-Time Performance of Windows XP Embedded - Research

with only 256 MB of random access memory (RAM) available. This can beachieved by moving blocks of memory out to the hard disk drive (HDD)when not directly needed by an application, to make room for the ones thatare actually needed[31, 9]. These blocks of memory, or pages, are said tobe mapped out from memory when not needed. Likewise, when pages areneeded by an application and not currently in memory, they are mapped inagain. Pages not loaded in memory are stored in paging �les. This allowsthe system to use as much memory as the RAM and paging �les combined.

All processes running in XP use pages to access memory. A �xed pagesize is used for a speci�c system architecture. On the Pentium architecture,the page size is 4 KB. An address in the virtual address space is 32 bitslong, which results in a total availability of 4 GB virtual memory for eachprocess[31, 9, 30].

The virtual memory for each process is split up in two halves. Thelower 2 GB half is used for process code and data, except for about 250 MB,which is reserved for system data. This system data is shared by all userprocesses and contains system counters and timers.

The upper 2 GB half of the virtual address space is the kernel memory,containing the operating system itself, page tables, the paged pool, and thenonpaged pool. Except for the page tables, the upper memory is shared byall user processes in the system. However, it is only accessible from kernel-mode, which means the user processes are not allowed to directly access thismemory[31, 9].

2 GB

250 MB

1750 MB

System Data

ProcessPrivate

Code andData

Page Table

OS

Nonpaged Pool

Paged Pool

Process 2

System Data

ProcessPrivate

Code andData

Page Table

OS

Nonpaged Pool

Paged Pool

Process 1

Physical Memory

Figure 3: The virtual memory for two processes. The gray areas representshared memory.

The page tables store information about the available pages for eachprocess in the system. Every process has its own private page table.

23

Page 25: Real-Time Performance of Windows XP Embedded - Research

In order for a user-process to access the kernel memory, system calls(including driver requests) need to be made. When a system call is executed,the system traps into kernel-mode, which makes the entire kernel memoryvisible to the process. The virtual address space remains unchanged, whichmakes the processing of system calls performant[31].

4.5.1 Kernel Page PoolsThe nonpaged and paged memory pools are used by drivers and the OS fordata structures. Drivers are loaded in the nonpaged pool and can allocatememory from both the nonpaged and the paged memory pools[31, 30].

Although both the paged and the nonpaged memory pool are accessiblefor all processes, one major di�erence exists. While the paged pool is handledjust like the private memory of each process, the nonpaged pool is nevermapped out from memory, which means no page faults can occur whenaccessing pages allocated in the nonpaged pool.

One of the reasons for having a nonpaged pool is to guarantee that someparts of the system are never paged out. For example, if the memory manageritself, running in DISPATCH_LEVEL, was mapped out, no other pages couldbe mapped in, leading to system failure[30]. For this reason, memory inthe paged pool can only be accessed from the PASSIVE_LEVEL IRQL. HigherIRQLs must use the nonpaged pool[9, 30].

4.5.2 Memory ManagerThe memory manager is responsible for moving pages in and out of memory.When a process is accessing a page that is not mapped in, a page fault willoccur. This page fault is handled by the memory manager, which loads thepage to memory. The process causing the page fault is interrupted and has towait for the memory manager to load the page into memory before executioncan continue.

For performance reasons, the page replacement algorithm in XP strivesto always have a certain amount of free physical memory pages available.This will decrease the amount of work when a page needs to be mapped in,since only one disk operation is needed to read a page to be mapped in, asopposed to both mapping out a page to disk and mapping in another. Tomake sure enough free pages exist, the system runs the balance set managerevery second. If the number of free pages decrease to a speci�c threshold, thememory manager starts mapping out pages not needed at the moment[31,30].

4.6 Windows Driver ModelThe WDM is a framework for device drivers that is source code compatiblewith Windows 98 and later. It includes a library, o�ering a large set of

24

Page 26: Real-Time Performance of Windows XP Embedded - Research

routines to the developer[25, 36]. There are two major classes of WDMdrivers. The �rst class is called user-mode drivers. Drivers in this classrun in user-mode and the class is mostly intended for testing purposes withsimulated hardware.

The other class, which this report will focus on, is the kernel-mode driver.As the name implies, drivers in this class run in kernel-mode. Because ofthe direct hardware access available in kernel-mode, this type of driver isused to control hardware. Even though kernel-mode drivers are often usedto control hardware, simulated hardware or no hardware at all can be usedby these drivers[2, 25].

4.6.1 I/O Request PacketsDrivers written for the WDM framework should handle input/output (I/O)requests as speci�ed in the I/O Request Packet (IRP). I/O requests areI/O system service calls from user-mode applications, such as read and writeoperations[2]. An IRP determines the work order, i.e. in what order di�erentsubroutines of a driver should be executed to complete an I/O request. Whenthe IRP is created, it is passed to the I/O Manager, which determines whatdriver and subroutine should execute. The subroutine performs its work onthe IRP and passes it back to the I/O Manager, which sends it to the nextsubroutine. When the IRP is completed, the I/O Manager destroys it andsends the status back to the requestor[25, 2].

4.6.2 Driver TypesThree types of drivers exist under WDM.

Function drivers are responsible for I/O operations, handling interruptswithin the driver, and deciding what should be controllable by the user.

Bus drivers handle the connection between the hardware and the rest ofthe computer. The PCI bus driver, for example, detects the cards on thePCI bus. It determines the I/O-mapping or memory-mapping requirementsof each card. Both function and bus drivers are required for all hardwaredevices.

The third type of driver, the �lter driver, can be supplied by manufac-turers to modify the functionality of the higher functional driver[25]. This isknown as an upper �lter driver. There are also lower �lter drivers that workas a �lter between the bus driver and the function driver. A good exampleof a lower �lter driver is one that encrypts data before it reaches the busdriver, which means neither the functional driver nor the bus driver need toknow about the encryption.

25

Page 27: Real-Time Performance of Windows XP Embedded - Research

4.6.3 Device ObjectsTo help software manage hardware in Windows, device objects are used.Each type of driver has a device object mapped to it. Bus drivers are rep-resented by physical device objects (PDO). Functional device objects (FDO)are mapped to the function drivers. Both above and below the FDO, �lterdevice objects (FiDO) may exist, which are mapped to �lter drivers[2, 25].

4.6.4 I/O Request ProcessingWhen an I/O request is raised in the system, it gets processed according tothe steps in Figure 4. Although not all I/O requests go through all thesesteps, this model represents a typical I/O request �ow.

io_request();

User-Mode Application

User Thread

Win32 Kernel

I/O Manager

Kernel-Mode Driver

Dispatch routine

IRP = PENDING;

StartIO();

StartIO

Enable-

Interrupts();

ISR

RequestDPC();

DPC

IRP = SUCCESS;

CompleteIRP();

1

2 3 4 5 8 9 10

12HAL

6

11

7

Figure 4: The �ow of I/O requests through the system.

1. When an I/O request is invoked by a user-thread, the system trapsinto kernel-mode and passes the request to the I/O Manager.

2. In the I/O Manager the request is translated into an IRP, describingthe work order of the drivers involved in handling the request. Beforeinvoking the right dispatch routine of the driver (one dispatch routine

26

Page 28: Real-Time Performance of Windows XP Embedded - Research

per function o�ered by the driver exists), the I/O manger prepares theuser bu�er and the access method to this bu�er[2, 25].

3. If no device activity requiring interrupts is needed for the I/O request(for example, when reading zero bytes or writing to a port register), thedispatch routine marks the IRP as completed, executes the rest of thedispatch routine, and sends it back to the I/O Manager, which noti�esthe user-thread of the completion of the I/O request. The scenarioof reading zero bytes can occur if polling (periodical status checking)drivers are used[2, 25].Usually, however, the I/O request actually needs some device activitybefore completion. In this case the IRP is marked as pending and theStart I/O function of the driver is called before the IRP is passed backto the I/O Manager. The dispatch routine also performs parametervalidation. For functional drivers, the parameter validation has totake the limitations of the underlying bus driver into account. Forexample, if the total transfer size exceeds the limits of the bus driver,the dispatch routine is responsible for splitting the request into multiplerequests[2, 25].

4. The I/O Manager then queues the call to the Start I/O routine ofthe driver, which starts up the device. The �rst thing done by theI/O Manager when a device is requested to start is to check to see ifthe device is busy. That is, checking if a previous IRP is marked aspending for the device. If the device is busy, the new IRP is queued. Ifthe device have no IRP marked as pending, the queue is skipped andthe Start I/O routine of the device is called directly, which starts thedevice by safely accessing the device registers[2, 25].

5. The IRP is then returned to the I/O Manager, which awaits a deviceinterrupt[2, 25].

6. HAL receives the device interrupt when it occurs.

7. The interrupt is then routed to the interrupt dispatcher of the I/OManager.

8. Most devices are connected to an interrupt request level (IRQL), whichmeans the interrupt dispatcher calls the ISR of a device connected to aspeci�c IRQL when a device interrupt occurs. Some devices do not useinterrupts and requires polling to notice any changes for that device[2].Since IRQLs can be shared by other drivers, the �rst thing the ISR doesis checking whether or not the interrupt was intended for the speci�cdevice. If not, the interrupt request is passed back to the interruptdispatcher, which sends it to another device connected to the sameIRQL[2, 25].

27

Page 29: Real-Time Performance of Windows XP Embedded - Research

The ISR is working on the IRQL of the device, which means thatother threads at the same IRQL or lower have to wait until the ISRis completed. Because of this, as little work as is reasonably possibleshould be performed in the ISR. Most of the time, ISRs only performhardware dependent work, such as moving data to or from hardwareregisters to kernel-mode bu�ers. As mentioned earlier, the number ofkernel-mode functions available in an ISR is very limited[2, 25].

9. Because of the limited kernel-mode functionality available, the ISRoften schedules a DPC for latex execution, which will take care of theprocessing not performed in the ISR[25].

10. The scheduling of DPCs are handled by the I/O Manager and is imple-mented as a FIFO queue. Although the DPC queue is of FIFO type,drivers can set the priority of the DPC as high, which will make theI/O Manager place the DPC �rst in the queue.

11. The DPCs run in DISPATCH_LEVEL and have full access to the kernel-mode functions. The DPCs complete the work of the device driver thatfor various reasons could or should not be performed in an ISR. Afterthe work in the DPC is done, the DPC marks the IRP as completedand sends it back to the I/O Manager, which in turn destroys it[25, 2].

12. When the I/O Manager has destroyed the IRP, it schedules a kernel-mode APC. This APC will execute I/O Manager code for copy statusand transfer size information to the user-thread. The APC needs toexecute in the context of the requesting user-thread since it needs tosafely access the user-space memory. By running the APC at the samepriority level as the requesting thread, page faults can be handled nor-mally.If the I/O request included a data read from a device with the bu�eredI/O read method, the APC copies the driver allocated bu�ers backto the user-space bu�ers of the requesting thread (from the nonpagedpool to the paged pool accessible by the user-thread). When the APChas completed its execution, the I/O Manager noti�es the requestinguser-thread[2, 25].

4.6.5 Floating-Point OperationsAccording to the WDM documentation, drivers should avoid doing any�oating-point operations unless absolutely necessary, for performance rea-sons [25, 36]. Before carrying out �oating-point operations, a special kernelroutine needs to be called to save the nonvolatile �oating-point context. Af-ter the �oating-point operations are �nished, another kernel routine must becalled to restore the nonvolatile �oating-point context again[25]. Callers of

28

Page 30: Real-Time Performance of Windows XP Embedded - Research

these kernel routines must be running at IRQL ≤ DISPATCH_LEVEL. In otherwords, �oating-point operations are not allowed in ISRs[25].

29

Page 31: Real-Time Performance of Windows XP Embedded - Research

5 Real-Time Aspects of XPWhile the previous section provided an overview of XP, this section analysesthe real-time aspects of the OS and compares the system characteristics withthe previously mentioned RTOS requirements. Finally, the reasons why XPis not suitable for hard real-time applications are explained.

XP is a GPOS for PCs[34] and as such, the priority for the OS is to op-timize average performance, not minimize or limit worst-case performance.For a real-time application, the WCET is more relevant, since it is a guaran-tee that the execution time will never exceed a certain limit[33]. The averageperformance, on the other hand, is irrelevant in the RTOS context, since itgives no guarantee regarding execution time for a particular execution.

5.1 Design Issues That Limit XP's Use As a RTOSThere are several design issues in XP limiting its use as a RTOS[24]:

• No priority inversion protection existsThreads running in the Realtime class can be blocked by lower prioritythreads holding a shared resource. No mechanism to prevent this existsin XP.

• Limited number of prioritiesAs explained in Section 4.3, only 7 priority levels are available forRealtime threads. This is only su�cient for very simple real-time ap-plications and severely limits the amount of control a system designerhas over thread priorities.

• DPCs are processed in FIFO orderEven though di�erent interrupt priority levels exist, the bulk of theprocessing in a device driver is done in a DPC, which is processed inFIFO order. This makes time critical processing unsuitable even atthis priority level, since it may be delayed inde�nitely by less criticalprocessing scheduled earlier in the FIFO queue. DPCs can also bedelayed by ISRs of any priority level.It is possible to specify a higher priority when scheduling a DPC. Thiswill place the DPC �rst in the DPC queue. However, there is noguarantee that other device drivers will not do the same, which wouldonly invert the DPC processing order.

• Masking interruptsAny code running in kernel-mode, including all device drivers, can dis-able interrupts or raise the IRQL to the highest level, which e�ectively

30

Page 32: Real-Time Performance of Windows XP Embedded - Research

gives the code exclusive access to the CPU. This can lead to unpre-dictable results.This could potentially be used by a small real-time application thatwants to increase the temporal determinism, but there is no guaranteethat other non-critical device drivers in the system would not takeadvantage of this too.

• Page swappingXP's use of virtual memory leads to page swapping, which can occur atany point during the execution of a thread. However, virtual memorycan be turned o� in XP, e�ectively eliminating this design issue.

• IRQL mappingThe HAL dynamically maps interrupts to IRQLs at system startup asit detects the devices attached. This leads to reduced portability andpredictability of a real-time application, since it is not possible to knowthe order of device interrupts when hardware changes.By reducing the number of device drivers used in the system and mak-ing sure that as few drivers as possible share the same IRQL, a higherlevel of predictability can be achieved.

• Interrupts and DPCs have higher priority than Realtime threadsEven threads running at the highest user-level priority can be delayedinde�nitely because of interrupting ISRs and DPCs.

5.2 Using XP as a RTOSDi�erent approaches of using XP as a RTOS are suggested throughout theliterature, where the most common alternatives are [32, 27, 17]:

• Use XP as it is, but with a constrained environment for applicationsand functionality to ensure timing constraints. Future development ofsuch a system is hard, and no guarantees of deadlines can be given.

• Implement the time critical parts as a device driver running in kernel-mode. The richness of the entire Win32 application program interface(API) cannot be utilized. Debugging becomes more di�cult and criti-cal, since bugs can crash the whole system.[3, 5].

• Create a wrapper for the Win32 API to a commercial RTOS. No COTScan be used and the Windows device drivers cannot be used.

• Run Windows XP and a RTOS on two di�erent machines. Both hard-ware and software costs increase.

31

Page 33: Real-Time Performance of Windows XP Embedded - Research

• Run Windows XP and a RTOS on a single machine.

This report will focus on the �rst two approaches.

32

Page 34: Real-Time Performance of Windows XP Embedded - Research

6 ExtensionsIn this section, the approach of running Windows XP and a RTOS on a sin-gle processor machine will be examined, using real-time extensions availablefrom third parties. All the extensions have slightly di�erent implementa-tions, but all of them have made some modi�cations to the HAL or at leastintercepts the interrupts before they reach the HAL (which actually can beseen as a modi�cation)[32]. Note that not all of these extensions make per-manent changes to the HAL. Instead, a recon�guration of the HAL is doneat system startup. The extensions include:

• CeWin and VxWin by Kuka Controls[18].

• HyperKernel by Nematron[12].

• RTX by Ardence[29].

• INtime by TenAsys[15].

Since information on CeWin, VxWin, and HyperKernel is sparse, thisreport will not focus on these two solutions. RTX and INtime, however, willbe given deeper descriptions, since more research has been done on thosetechnologies.

6.1 RTX6.1.1 ArchitectureThe RTX runtime environment is implemented as something called a Real-Time Sub-System (RTSS). This is actually a kernel device driver for WindowsXP. Achieving real-time performance this way is possible thanks to the stan-dard device driver model and the fact that the HAL is customizable. Bycombining these two techniques, a temporal predictable model for buildingreal-time systems is possible[10].

The RTSS is implemented as a system capable of stopping Windows frommasking interrupts, using an own scheduler, and handling synchronization,to name a few features[10]. Since the RTSS runs as a kernel device driver,applications written in RTX will also run in kernel-mode. This mode o�ersno memory or stack overrun protections, errors that would likely give anunreliable execution environment resulting in a system crash.

The HAL modi�cations used in RTX have been implemented as exten-sions instead of an entire replacement. This makes the RTSS compatiblewith all existing versions of Windows XP, Windows 2000, and Windows2003 platforms. New Windows service packs can be installed without af-fecting the RTSS environment. The RTSS relies on the HAL extensions tooperate correctly[10]. The extended HAL used in RTSS is called RT-HALthroughout this section.

33

Page 35: Real-Time Performance of Windows XP Embedded - Research

The standardWindows HAL was modi�ed for the following three reasons[10]:

1. To make it impossible for Windows XP threads to interrupt the RTSSor mask the RTSS-managed devices. RT-HAL intercepts interruptmasks coming from the Windows threads and manipulates this mask,so that no RTSS-controlled interrupts can be masked.

2. To increase the resolution of the Windows XP provided timers to 100µs, instead of 1000 µs.

3. To provide a shutdown handler for the Windows XP environment,which makes it possible for the RTSS to carry on after a traditionalbluescreen Windows crash. The RTSS applications are responsible formanaging the shutdown handler and it is up to the real-time appli-cation developer to decide which applications should use this handler.The handler is used to clean up and reset any hardware state if a crashor normal shutdown of the XP environment occurs. However, it is thedeveloper's responsibility what will actually happen.

RTX supports 256 thread priority levels. The scheduling algorithm usedis round-robin[1] and the ready queue is implemented as a double linkedlist for each priority level. This increases both the speed of insertion andremoval of threads compared to a single linked list. If two threads of thesame priority are ready at the same time, one of them is chosen and runsuntil the quantum has expired. By default, the quantum is set to in�nity[10].

The RTSS uses the Windows provided model even for RTSS interrupthandling. This may seem unwise, since previous work has shown that DPCsare not deterministic enough for real-time use[3]. However, it only catchesthe interrupt in Windows and then the actual ISR is run in the RTSS, if theinterrupt was intended for it. The RTSS is therefore only dependent on theinterrupt latency of XP[10]. Studies have shown that interrupt latencies inXP is very deterministic, enough to even run hard real-time systems in anISR[3]. RTX has worked on lowering the interrupt latency, and claims havebeen made that WCET of less than 30 µs is possible[10].

The RTSS environment uses the memory management mechanisms pro-vided by XP, and memory allocation is done in the nonpaged memory pool[10].This means that memory allocation by a RTSS-thread is non-deterministic.The bene�t of this memory model, according to Ardence, is that it reducesRTX resource consumption.

Communication between the XP environment and the RTSS environmentis realized with the use of queues, one in each direction. If an XP threadneeds some service from the RTSS environment, a command is inserted intothe queue as a Service Request Interrupt (SRI). The RTSS environment thenexecutes the service and sends a reply message back to the XP thread. Nor-mally, SRIs for synchronization are requested by the XP environment and

34

Page 36: Real-Time Performance of Windows XP Embedded - Research

SRIs for memory management and �le operations are requested by the RTSSenvironment[10]. Priority inversions for shared resources are avoided by us-ing priority inheritance, also known as priority promotion in most papersstudying RTX[10, 1].

6.1.2 Software DevelopmentRTX provides libraries which can be used by Visual Studio. It also providesa useful application wizard, guiding the user through settings, and generatesskeleton source code for the applications[10, 1]. Even though applicationswritten for RTX run in kernel-mode, code writing and debugging can bedone in user-mode during development from within Visual Studio (version6.0 and newer), o�ering a fully protected environment. Breakpoints can beset and source code stepping can be used, just like when debugging anynormal Windows application. Final releases, however, will be compiled torun in kernel-mode[10, 1].

6.1.3 Does RTX Meet the RTOS Requirements?• �The OS has to be multitasked and preemptible.�

The OS is de�nitely preemptible and multitasked. Preemption can oc-cur for both threads and ISRs. Tasks can be realized as both processesand threads in this system, allowing for lower task switching time ifmemory can be shared by other tasks. The scheduling algorithm usedis round-robin with priority queues.

• �The notion of task priority has to exist.�256 priority levels for threads exist.

• �The OS has to support predictable task synchronization mechanisms.�Synchronization objects are available, such as semaphores, mutex, andshared memory objects. A study by Timmerman et al. showed predi-cable behavior of synchronization objects[32].

• �The OS must support a system for avoiding priority inversion.�Priority inheritance is used to protect the system from priority inver-sion.

• �The OS must have predictable temporal behavior.�To determine this requirement, extensive testing of this extension needsto be made. Memory allocation is not deterministic since it is handledby Windows memory management mechanism[10].

Four of the �ve requirements of a RTOS are de�nitely ful�lled by RTX.Even though the undeterminism of the memory allocation can be solved byallocating all memory needed before startup of the real-time system, the

35

Page 37: Real-Time Performance of Windows XP Embedded - Research

tests done in [32] are too limited to conclude that RTX o�ers predictabletemporal behavior under all conditions.

6.2 INtime6.2.1 ArchitectureIn contrast to RTX, INtime from TenAsys runs both the real-time appli-cations and the non real-time applications in user-mode. There is still thepossibility to write a real-time application as a driver, which will run inkernel-mode[16, 28]. Running applications in user-mode protects the systemfrom crashing because of programming errors such as null pointers, and pagefaults. However, applications still have the ability to gain direct access tophysical memory if that is deemed necessary by the developer[28].

INtime installs a number of components in Windows. The most impor-tant includes a Windows kernel driver and a Windows service. The kerneldriver manages communication between the INtime and the Windows envi-ronment. The service handles the actual loading of the INtime kernel intothe system. A context switch then occurs to make the system go into theINtime kernel. In this state all real-time activity is handled before any Win-dows activity. XP e�ectively becomes the idle-task of the INtime kernel.When running in the INtime kernel, all Windows interrupts are masked. Areal-time interrupt (both software and hardware) is handled directly. Thanksto monitoring of the HAL, Windows kernel is unable to mask real-time in-terrupts. This means that even badly designed device drivers, masking in-terrupts running in the Windows kernel, cannot a�ect the performance ofthe real-time kernel[28].

The scheduling algorithm used in INtime is round-robin with 256 prioritylevels. 128 of these levels are priority for user threads and the other 128 areused for interrupt priorities[16, 24].

The interrupt handling in INtime is similar to the one used in XP. Whenan interrupt occurs, it is handled by its appropriate ISR. Just as in XP, min-imal work is done in the ISR[24]. The bulk of the work is instead performedin an interrupt thread. Interrupt threads are like DCPs in XP, but withdi�erent priority levels instead of a single FIFO queue, to increase temporaldeterminism of the system. This interrupt model can also be bypassed andprocesses can handle interrupts directly[28].

Memory management is handled by INtime itself and all shared memory(used for shared resources) reside in the nonpaged memory pool. This meansno swapping of shared memory can occur, making the temporal predictabilityin accessing shared memory good[24].

Shared resources used within INtime are protected with semaphores.Semaphore queues can be realized as both FIFO and priority queues. Com-pared to XP, which only uses FIFO queues for semaphores, the temporal

36

Page 38: Real-Time Performance of Windows XP Embedded - Research

behavior is more deterministic using priority queues. Priority inheritanceis used on shared resources to ensure that priority inversion is as low aspossible[24, 28].

Thanks to the monitoring functionality, which makes XP the idle threadof INtime, real-time applications can continue to run even if XP crashes. The�rst thing done in case of an XP crash is to suspend the thread schedulingXP. A real-time process can then restart the Windows operating system andoperation can be brought back to normal mode[28]. This means it is possibleto make the real-time applications completely independent of XP.

6.2.2 APIsINtime provides the user with multiple programming APIs:

• Real-Time APIThe real-time API resembles the Win32 API, which will make a tran-sition for Windows programmers as smooth as possible[24]. The real-time API is object based, where all objects are referenced by handles.Handles are global to the entire real-time system[28].

• Win32 APIA subset of the Win32 implementation used in Windows CE is pro-vided by INtime. It will allow usage of some existing code directlyin INtime[28]. Although it is based on the Win32 API for CE, noinformation is given whether it has time deterministic behavior or not.

• APIs for the Windows environmentWindows APIs are provided to allow the non-real-time Windows envi-ronment to share objects with the real-time INtime environment. Bothreal-time objects and Win32 objects can be shared by processes in thedi�erent environments[28].

• C and C++ libraries.INtime provides support for both Embedded C++ (EC++), with theuse of the Standard Template Library (STL), and ANSI C.

6.2.3 Software DevelopmentSoftware is written in C or C++, with the entire STL available. Buildingapplications from start to release can be done entirely in Microsoft VisualStudio. INtime even includes a project wizard for the IDE. This wizard easesdevelopment and generates skeleton code for the developer. Even debuggingcan be done in Visual Studio .NET with the use of breakpoints, source-levelsingle-stepping, and variable watching. For Visual Studio 6 users, INtimeincludes a separate debugger called Spider.

37

Page 39: Real-Time Performance of Windows XP Embedded - Research

6.2.4 Does INtime Meet the RTOS Requirements?Does XP using the INtime extension ful�ll the requirements put on a RTOS?

• �The OS has to be multitasked and preemptible.�This requirement is ful�lled since INtime is multithreaded, and therebymultitasked. Preemption can occur at every level in the system. ISRshave di�erent priority levels and are preemptible as well. Interruptthreads exist with 128 di�erent priority levels. These attributes clearlyful�ll the �rst requirement.

• �The notion of task priority has to exist.�This requirement is also ful�lled, since both user/kernel level threadsand interrupt threads have priorities.

• �The OS has to support predictable task synchronization mechanisms.�The INtime kernel uses semaphores with both FIFO and priority queues,where priority queues should give higher predictability. Acquisitionand release of semaphores has been shown to be deterministic in [32].

• �The OS must support a system for avoiding priority inversion.�INtime uses priority inheritance to achieve this goal.

• �The OS must have predictable temporal behavior.�In a previous study, INTime showed predictable behavior[32]. How-ever, this study was based on version 1.20 (while the current version is3.0) and the number of tests conducted were limited.

INTime ful�lls at least four of the �ve RTOS requirements. However,tests and time measurements of the software are needed to determine if ito�ers a predictable temporal behavior under all conditions.

38

Page 40: Real-Time Performance of Windows XP Embedded - Research

7 Related WorkStudies of real-time performance in Windows NT (using the same schedulingand interrupt routines as XP) have been done before. While most of thestudies focused on real-time performance of user level threads[24, 27, 4],some studies focused on the real-time performance in device drivers[5, 3].Most of the papers based their conclusions on time measurements, while [34]drew its conclusion based on the inner workings of Windows NT.

7.1 User Level Thread ImplementationThe testing of user level thread performance was done in a slightly di�erentway. While [27, 24] implemented a time critical application, [4] only mea-sured thread creation time and task switching under various system loads.All these tests show that temporal predictability decreases as the systemload increase and the frequency of interrupts increases. Higher prioritizedtasks also seem to increase the temporal predictability. The conclusions ofthe predictability of user level threads are clear: Windows NT is not suitablefor running real-time applications at user level; WCET for the applicationin [24] was almost 10000% over the average execution time. Depending onthe timing constraints, Windows NT running user level applications couldbe used for soft real-time systems. According to the studies, WCETs cannotbe guaranteed, which means the system must be allowed to miss deadlinessometimes.

7.2 Driver Based ImplementationThe driver based experiments di�er more from each other than the user leveltesting. [3] measures interrupt latencies for di�erent drivers, and interruptvectors under various load conditions (always known before testing). Thepaper continues with execution time measurements for ISRs and DPCs underthe same system conditions. Even measurements of context switching latencyfor both processes and threads were made, but these tests did not use asvarying system load as the test conducted to measure interrupt latency. Theresults from the interrupt latency tests showed that the latency did not di�ermuch because of system load, except when network load was present. Sincethe Ethernet interface was set up to raise interrupts at vector 10, the customserial driver (connected to vector 11) would be preempted by every interrupton the Ethernet interface.

By assigning the driver to another vector, the high latency introduced bynetwork load could be reduced. As shown by these test results, network loadhad no major e�ect on the interrupt latency when lower interrupt vectorsthan 10 were used. ISR execution time of the custom serial driver had ahigh temporal predictability under all system loads tested. However, the

39

Page 41: Real-Time Performance of Windows XP Embedded - Research

ISR execution time for di�erent drivers did not have predictable temporaldeterminism, but that is dependent on the amount of work needed in theISR (or perhaps badly designed drivers).

The result from the DPC latency measurement had enormous standarddeviations from average times. Neither thread nor process context switchinggives any determinism to the system. The latency depended highly on CPUload.

The conclusion of this study is that Windows NT is suitable not onlyfor soft real-time systems but also for hard systems as long as all the timecritical execution is done inside the ISRs. Running in DPC-level or belowo�ers too poor determinism to be used by a hard real-time system. The �nalrecommendation was to turn o� virtual memory in the system, especially ifan integrated driver electronics HDD is used, since the experiment on pagingshowed that IDE drivers basically had no temporal determinism at all.

The second paper, focusing on the driver implementation[5], had a dif-ferent approach. It implemented a driver polling input data at the frequencyprogrammed to the LAPIC driver (see [5] for more information) located onthe CPU, running with the frequency of the system bus. This timer wasprogrammed to use the highest interrupt level. However, the LAPIC timercan still be delayed if interrupts have been disabled by other drivers. Themethodology descriptions in these tests were sparse, making it hard to fullyunderstand how the tests were carried out. According to the paper, theLAPIC driver performed well on both loaded and unloaded systems. How-ever, since some data polling occations still missed the deadline, no hardreal-time systems could be implemented successfully according to the paper.The authors stated that the LAPIC driver required a dual processor machine,since the LAPIC is disabled by Windows XP on a uniprocessor system. Thiscontradicts the fact that they presented test results from a system using asingle Pentium 4 processor with Hyper-Threading, which is a technology tosimulate a dual processor machine (see [11]). Although this suggests thatHyper-Threading is enough, it is still a drawback with this solution.

7.3 ConclusionsIn general, it seems that all papers agreed Windows NT can be used for softreal-time systems if:

• the timing constraints are not too tight,

• the system is allowed to miss deadline sometimes, and

• the work load is low.

Some of the papers also concluded that if all jobs are done at ISR level,even hard real-time systems can be built on Windows NT[3]. In contrast,

40

Page 42: Real-Time Performance of Windows XP Embedded - Research

[34] concludes that running a hard real-time system Windows NT is out ofthe question. The methodology between these two papers were very di�erentsince [3] measured execution time of an implementation while [34] based theirconclusions on analysis of the inner workings of Windows NT.

41

Page 43: Real-Time Performance of Windows XP Embedded - Research

8 Problem DescriptionThe true trigger of this master's thesis is an anonymous business unit of ABBproviding embedded system based products for the automation industry.They use a traditional real-time system, where input data received from asensor is processed in an algorithm and then sent to an actuator. Outsidethe real-time core, extra functionality is provided to make the units moreuseful to the customer.

Today, the real-time core is running on a dedicated hardware, isolatedfrom the extra functionality. The trend clearly shows that the demand for ex-tra functionality outside the core is growing. The development cost requiredto meet this demand is usually very high, for several reasons:

• The systems often run on speci�c hardware with memory constraintsand limited resources.

• The systems are sometimes running a custom designed operating sys-tem, which means no commercial of the shelf (COTS) software com-ponents exist.

• Even on systems running a commercial RTOS, writing software andreusing software components is more limited than in popular generalpurpose operating systems (GPOS).

• The RTOS development environments are often complex, which makeswriting software hard.

To stay competitive in this industry, the development cost for this kindof supportive functionality needs to be reduced. One possible solution is torun the real-time core on the same hardware as the extra functionality andto switch OS to a popular GPOS.

The ABB business unit wants the alternative of using XPE with a cheapreal-time extension to be explored. For their purposes, a self-written devicedriver implementation is optimal, since it would be cheaper than buyinglicenses from third-party real-time extensions. Because of the planned ship-ping volume, the manufacturing cost per unit is important to keep as low aspossible. The development cost is less important, as it is viewed more as aone-time cost.

The following is a list of some of the key reasons why the ABB businessunit wants XP to be investigated:

• By using XP, development time would be reduced by using the sameplatform for both development and target units.

• Rapid prototyping using the .NET framework would be possible with-out access to the target unit.

42

Page 44: Real-Time Performance of Windows XP Embedded - Research

• A large number of COTS and standard applications would be able torun on the target units.

The fact that XP is designed as a GPOS, and as such does not supporthard real-time usage, is recognized by the business unit. However, the char-acteristics of the OS need to be thoroughly explored in order to get a deeperunderstanding of its performance and limitations. Even if the results wouldconclude that XP is not suitable for their purposes, they would at least havea clear reason why this is the case.

8.1 Suggested ModelFigure 5 shows the original suggested model for the embedded system, pro-vided by the business unit. The sensor is on the left, the embedded systemrunning XPE is in the middle, and the actuator is on the right.

Figure 5: Sketch of the implementation suggested by the ABB business unit.

It is just meant as an overview of a possible system and is in no way �nal-ized. For example, the communication stack suggested may be replaced withanother Ethernet based automation protocol in the real implementation.

43

Page 45: Real-Time Performance of Windows XP Embedded - Research

8.2 PurposeThe purpose of this report is to reveal the characteristics of XPE as a RTOSby investigating how XP works under the hood.

8.3 ScopeBecause of the limited time available for this master's thesis, XP will be usedinstead of XPE to run the tests. Too much time would otherwise be spenton setting up and con�guring an XPE installation. Since XP and XPE usethe same kernel (along with scheduling algorithms, IRQLs, and HAL), thiswill not a�ect the results of the tests[37].

While Figure 5 is a model of the whole embedded system, the only partthat will be investigated in this report is the real-time characteristics of XP.Communication from the sensor and to the actuator will be simulated.

The third-party real-time extensions will not actually be tested. Thefocus will be solely on the real-time characteristics of XP itself.

There are other OSs relevant to the assigner. For example, Windows CE5.0 is a scalable OS with real time capabilities that allows applications tobe developed in a familiar Windows environment[22]. However, CE does nothave the rich availability of COTS and applications as XP, and the .NETCompact Framework is more limited than the standard .NET framework[6].Windows CE will not be investigated in this report.

44

Page 46: Real-Time Performance of Windows XP Embedded - Research

9 MethodologyIn order to measure the real-time characteristics of XP, a number of testson a dedicated system were conducted. Because Microsoft is very protectiveabout the source code for XP, at best a black box approach to performanceanalysis was possible.

The tests evaluated the performance aspects that a�ect the determinismand responsiveness of XP as a real-time system, which included: ISR latency,interrupt execution, DPC latency, DPC execution, and communications be-tween user-mode and device drivers.

9.1 Conducted TestsThis report focused on the �rst two approaches of using XP as a RTOSexplained in Section 5.2: Using XP as it is with a standard user-mode pro-cess; and implementing the time critical parts as a device driver running inkernel-mode.

The latter approach was in turn devided into two separate tests: oneimplementing the time critical parts in a DPC, and another implementingit in a prioritized DPC. An ISR implementation was not considered becauseof the inability to safely calculate �oating-point operations, as mentioned inSection 4.6.5.

The tests simulated a typical system used in the automation industry,where a sensor transmits an input to an embedded real-time system, whichperforms calculations on the input, and thereafter sends the results to anactuator. Time measurements were conducted on speci�c events, as well asthe full event cycle, to evaluate the determinism of XP.

The sensor input was simulated using a tone generator connected to theacknowledge (ACK) pin on the parallel port, which generated a hardwareinterrupt at IRQL 3 in the CPU. A custom device driver was written tohandle the interrupts and start the event cycle.

The processing of sensor input was simulated using an algorithm, per-forming a �xed number of �oating point operations (e.g. multiplications,divisions, etc.). Originally, this algorithm came from the ABB business unittriggering this Master's Thesis, but because the algorithm contained manycompiler warnings and was generally very complex, the algorithm was re-placed with a simpler one only using a fraction of the source code neededfor the original algorithm. Tests were conducted to make sure the executiontime of the new algorithm was equal to the original one.

Finally, the output to the actuator was simulated by setting a parallelport pin in the device driver.

45

Page 47: Real-Time Performance of Windows XP Embedded - Research

9.1.1 User-Thread ImplementationIn the user-thread implementation, communication with the input and out-put were handled by read and write calls to the device driver. The algorithmwas then processed in a user-thread with the highest priority (31). Figure 6shows the full event cycle for this implementation.

Although the user-thread starts by calling the read() function of thedriver, the actual event cycle (input from sensor to output to actuator) startswhen the hardware interrupt occurs. This is event number 6 in Figure 6.After the algorithm is processed, the userthread calls the write() functionof the device driver, which simulates the communication with the actuatorby setting a parallel port pin.

Figure 7 shows the sequence of events, where the vertical axis representsthe priority level of the executing event.

while(1) {

read();

algorithm();

write();

}

User-Mode Application

User Thread

Win32 Kernel

I/O Manager

Kernel-Mode Driver

Read

IRP = PENDING;

StartPackage();

StartIO

Enable-

Interrupts();

ISR

Disable-

Interrupts();

RequestDPC();

DPC

IRP = SUCCESS;

CompleteIRP();

Write

outp(...);

CompleteIRP();

1

2 3 4 5 8 9 10 11 14

11

13

15

16

HAL

6

7

Figure 6: Full event cycle of the user-thread implementation.

Passive

Dispatch

DIRQL

t 1t 0t Bs t Bf t Cs t Ds t Df t Gs t Gft Fs t Fft Cf

Usr: Algorithm

Drv: DPC

Drv: Write Drv: Read

Drv: StartIO

Drv: ISR

t As t Af

Drv: ISR

Figure 7: Sequential time diagram of the user-thread implementation.

46

Page 48: Real-Time Performance of Windows XP Embedded - Research

9.1.2 Driver ImplementationIn the driver implementation, the algorithm was executed directly in theDPC of the driver. Figure 8 shows the full event cycle for the implementation.As with the user-thread implementation, the event cycle starts when theinterrupt occurs.

Figure 9 shows the sequence of events, where the vertical axis representsthe priority level of the executing event.

Because of the fewer events and higher priority levels of the driver im-plementation, it was reasonable to believe it would have better performancethan the user-thread implementation. More speci�cally, lower WCETs wereexpected.

Win32 Kernel

I/O Manager

3445

HAL

1

DPC

algorithm();

IRP = SUCCESS;

CompleteIRP();

Kernel-Mode Driver

ISR

Disable-

Interrupts();

RequestDPC();

2

Figure 8: Full event cycle of the driver implementation.

9.2 Test SystemThe tests were conducted on a dedicated PC system running Windows XPProfessional with Service Pack 2 installed. The hardware on which themeasurements were conducted consisted of an ICP Electronics NANO-7270motherboard, a Pentium M 1.6 GHz processor, and a Fujitsu MHT2060BHSATA hard disk drive. Attached to it were a standard USB keyboard and aPS/2 mouse.

47

Page 49: Real-Time Performance of Windows XP Embedded - Research

Passive

Dispatch

DIRQL

t 1t 0t As t Af t Bs t Cs t Bft Cf

Algorithm

ISR ISR

DPC DPC

Figure 9: Sequential time diagram of the driver implementation.

9.2.1 System ServicesAll Windows XP system services not needed for the test system, such asServer, Workstation, and DNS, were disabled. Virtual memory was disabledas well. Only the most critical services required for XP to run properly wereenabled, namely:

• Plug and Play

• Remote Procedure Call (RPC)

9.3 Execution Time MeasurementTo determine the real-time performance of XP, the execution time of the dif-ferent aspects described earlier in this section were measured. Three di�erentmethods for measuring execution time were considered:

• using the performance counter (PeC) available in the Win32 API,

• using the time-stamp counter (TSC) of the processor, and

• using an oscilloscope to externaly measure signals on the parallel port.

9.3.1 Performance CounterMeasurements using the PeC were performed using the two methods pro-vided in the Win32 API: QueryPerformanceCounter() and QueryPerfor-manceFrequency(). The QueryPerformanceFrequency() function returns thenumber of clock ticks per second, while the QueryPerformanceCounter()function returns the current value. Unfortunately, the PeC uses di�erenthardware timers on di�erent systems. Most platforms without any processorpower saving technologies such as Speedstep use the TSC of the processor as

48

Page 50: Real-Time Performance of Windows XP Embedded - Research

the timer, while other systems use the chipset, BIOS, or power managementtimer[38].

This counter was evaluated under two di�erent test conditions. The �rstcondition stored two consecutive readings of the PeC (start and stop time)6.8 million times in a for-loop. This test condition generated a theoreticalworst-case latency, since this test utilized 100% of the CPU. This makes itmore likely for kernel-mode tasks such as scheduling to interrupt the PeCreadings.

In the second test condition, two consecutive readings were stored in thesame manner, but with an added Sleep() statement after each start andstop measurement. This simulated a real-time system used in the automationindustry more closely, where the system waits for an input sent from a sensor.The number of instructions required for reading the PeC is insigni�cantcompared to the entire test system and algorithm. As a result, it is morelikely that the system will be interrupted when not reading the PeC, whichmade this condition more applicable to a real-world application.

On the test platform, the timer was running at a frequency of 3,579,545 Hz,which gives a resolution of 279 ns. The di�erence between the start and stoptime under both system conditions can be seen in Figure 10. Table 2 providesa summary of the measurement statistics.

(a) (b)

Figure 10: Measured start-stop time versus measurement number for thePerformance Counter. (a) With Sleep(), (b) Without Sleep().

The test of the PeC without a Sleep() statement showed two discretelevels, one at 838 ns and another at 1,117 ns, which are equal to 3 and 4ticks respectively on the PeC. Both these levels probably represent the nor-mal latency introduced of two consecutive time-stamps. Since the proces-sor/performance counter frequency ratio is about 450 to 1, we can assume

49

Page 51: Real-Time Performance of Windows XP Embedded - Research

that the normal latency of two time-stamps is somewhere between 3 and 4ticks of the PeC.

The second test of the PeC with the added Sleep() statement showedthree discrete levels; the same two as in the previous test, and a third levelwith a slightly higher latency than the other two. Even though the PeCusually gives a low latency for making the time-stamps, the tests showedmaximum values as high as 126.27 µs and 45.26 µs for the �rst and secondtest, respectively.

Figure 10 and the standard deviation of these tests showed that the PeCwas unsuitable for time measurements in our applications, since samples werespread over the entire spectrum between 838 ns to 126.27 µs (45.26 µs forthe second test).

9.3.2 Time-Stamp CounterAll processors built on the IA-32 architecture, starting with the Pentiumprocessor, have a built in TSC. The clock tick frequency of this counter varieson di�erent processor families. On some processors, the counter is increasedat a constant rate determined by the processor con�guration, while othersincrease the counter with every internal processor clock cycle[14]. In the P6family (Pentium, Pentium M, Pentium 4, and Xeon) the TSC is implementedas a 64-bit counter and is guaranteed to not wrap around within 10 yearsafter being reset[14].

(a) (b)

Figure 11: Measured start-stop time versus measurement number for theTime-Stamp Counter, (a) with Sleep(), (b) without Sleep().

The test platform with its 1.6 GHz Pentium M processor has the TSCimplemented as a 64-bit counter, increasing with every processor clock cycle.

50

Page 52: Real-Time Performance of Windows XP Embedded - Research

Min Mean WCET Std. dev.PeC without Sleep() 0.84 1.07 126.27 0.37

PeC with Sleep() 0.84 0.99 45.26 0.22TSC without Sleep() 0.03 0.03 60.56 0.05

TSC with Sleep() 0.03 0.03 39.65 0.02

Table 2: Measured start-stop time in µs for the PeC and TSC.

Since the test was conducted with SpeedStep technology disabled, constantclock frequency was guaranteed, giving a resolution of 0.625 ns. To comparethe latency of the TSC with that of the performance counter, the two evalu-ation tests explained in the previous section were conducted on the TSC aswell.

The TSC showed lower latency of two consecutive readings compared tothe performance counter. The di�erent test conditions had higher impacton the results than during the test of the PeC. As seen in Figure 11 thetest without any Sleep() statement have the same latency on almost allthe samples compared to the test condition with the Sleep() statementadded which shows three discrete levels, where the highest of these threeonly occurred during the last half of the test. However, the test without aSleep() statement got a higher worst case latency and a higher standarddeviation than the other test. Table 2 shows the minimum, mean-value,maximum, median, and standard deviation (in ns) of all four tests conductedon both the PeC and the TSC. Even though maximum values of the TSCwere in the same range as the ones of the PeC, only a few of the samplesreached a time higher than 1 µs. The low latency and the fact that it isunlikely for two consecutive readings to have a higher latency than 1 µs gavea determinism good enough to be suitable for measurement in our conductedtests.

9.3.3 OscilloscopeThe ISR latency (the delay between a hardware interrupt and the start of theISR execution) is impossible to measure using either the PeC or the TSC,since the start of the event occurs on the OS level, which the test imple-mentations have no control over. For this reason, an external measurementapproach was also necessary.

An Agilent In�niium 54833D MSO oscilloscope was connected to the par-allel port of the motherboard, measuring the voltage on selected pins. Toverify the reliability of this measurement method, a simple test was con-ducted where a parallel port pin was set (logical 1) and then immediatelyunset (logical 0) again. This test was then running in a user-thread of priority31 (Realtime), and was iterated one million times.

Normally, a user-thread is not allowed to write to port registers because it

51

Page 53: Real-Time Performance of Windows XP Embedded - Research

is a restricted kernel-mode operation and will cause a Privileged InstructionException. Because of this, a third-party solution called AllowIO was used,which can grant the process full rights to any port[26].

The results of the test showed a maximum jitter of less than 5 µs, with amaximum execution time of 6.21 µs and an average execution time of 1.37 µs.This was signi�cantly more deterministic than using the PeC or TSC, andthe accuracy was su�cient for the other tests conducted.

One signi�cant limitation with the oscilloscope was its inability to saveeach individual measurement to a �le for later analysis. The oscilloscopewas only capable of calculating the WCET, minimum execution time, meanexecution time, and the standard deviation on the collected data set.

9.4 System Load ConditionsThe real-time application tests were conducted using di�erent load condi-tions in order to evaluate the performance impact. Several applications weredeveloped to realize these load conditions. These applications were devel-oped in Visual Studio .NET. A custom device driver was also developed, toallow the measurements of ISR/DPC latency and performance, and commu-nication between device drivers and user-processes.

9.4.1 IdleWhen the system was idle, no other processes than the ones necessary for XPto function properly were running. Network was disabled, and the keyboardand mouse were not used.

9.4.2 CPU LoadIn this system load, an simple C application was developed, running a endlessfor-loop to utilize 100% of the CPU. The process was running in the Normalprocess and thread priority levels.

9.4.3 Graphics LoadA custom application was written in Visual Basic .NET to realize this systemload. It dynamically created many graphical user interface (GUI) controlsand then moved, resized, and changed properties on them. The purposewas to test how GUI rendering of normal applications a�ected real-timeperformance.

9.4.4 HDD LoadTwo large �les were copied back and forth on the HDD to determine howdisk activity a�ects real-time performance. A simple batch script was usedto achieve this load condition.

52

Page 54: Real-Time Performance of Windows XP Embedded - Research

User-thread DPC Prioritized DPCIdle UserIdle DriverIdle DriverPrioIdle

CPU Load UserCPU DriverCPU DriverPrioCPUGraphics Load UserGraphics DriverGraphics DriverPrioGraphics

Hard Drive Load UserHDD DriverHDD DriverPrioHDDNetwork Load UserNetwork DriverNetwork DriverPrioNetwork

Stress UserStress DriverStress DriverPrioStress

Table 3: Test names used throughout the report.

9.4.5 Network LoadIn this system load, a batch script was used to transfer large �les over asmall local network, connected with a router. In e�ect, both network anddisk load was measured at the same time. A HP Vectra VL800 runningFilezilla Server version 0.9.12 beta was used as the File Transfer Protocol(FTP) server. The test platform was running the console-based FTP clientNcFTP version 3.1.9.

9.4.6 StressIn the Stress mode, all of the above load conditions were running at the sametime, to simulate a worst case scenario of the real-time application.

9.5 Test NamesTable 3 shows the names used to identify the speci�c tests conducted in thevarious load conditions.

9.6 Additional TestsAside from the above tests measuring the impact of di�erent load conditions,additional tests were conducted to measure mechanisms such as a processcontext switch, the time quanta of a process, etc. These test results arenot presented in the report, as they were only conducted to gain a betterunderstanding of the primary execution time tests.

All tests listed in Table 3 were conducted using the oscilloscope. However,since the oscilloscope was incapable of collecting and saving each sample inthe tests, additional tests were conducted using the TSC. This provided agraphical scatter plot of the measured execution time for every sample andan execution time distribution of the samples.

The TSC tests measured the algorithm execution time 4,500,000 timesin each test. Because of the limited time available for these additional tests,they were only conducted on the user-thread implementation.

53

Page 55: Real-Time Performance of Windows XP Embedded - Research

10 Results10.1 TSC Measurement ResultsThis section presents the results of the TSC measurements of each user-thread test graphically using two diagrams for each test. The �rst diagramshows a scatter plot of the measured execution time in µs for every sample,while the second one shows the execution time distribution of the samples.

Table 4 shows the minimum, mean, WCET and standard deviation ofthe TSC test results.

Min Mean WCET Std. dev.UserIdle 36.88 37.27 132.24 0.83UserCPU 36.89 37.40 107.15 0.72

UserGraphics 36.87 37.29 154.74 1.00UserHDD 36.87 37.40 155.42 2.00

UserNetwork 36.88 38.36 144.48 4.58UserStress 36.88 39.40 168.87 6.20

Table 4: Algorithm execution time in µs for the TSC tests.

10.1.1 UserIdleAs seen in the time distribution diagram in Figure 12, a majority of thesamples measured 37.42 µs, which is represented by the distinct lowest linein the scatter plot diagram. This means that the majority of the algorithmcalculations in the test were not interrupted by other tasks.

The remaining samples were distributed in a time spectrum ranging fromaround 40− 120 µs, except for two samples taking 130.55 µs and 132.24 µs,respectively. It is possible to identify discrete levels in this spectrum, wheresamples are more densely grouped. For example, one level exists at 75 µs.However, because of the black box approach used when testing XP, the reasonwhy these levels exist is not known.

One interesting note about the UserIdle scatter plot diagram is the changeof characteristics after one third of the test period. The reason could be oneor more device drivers entering a power saving mode. For example, theHDD might be spinning down after a period of inactivity. The power savingfunctionality is part of the WDM development guidelines[2, 25].

10.1.2 UserCPUThe di�erence between UserIdle and UserCPU was minimal. A majority ofthe samples measure 37.43 µs and the remaining samples were distributedin the 40− 120 µs range. The fact that a pure CPU load did not a�ect theperformance of the test implementation much was not surprising since the

54

Page 56: Real-Time Performance of Windows XP Embedded - Research

(a) (b)

Figure 12: UserIdle algorithm execution time. (a) Scatter plot, (b) Timedistribution.

test ran in the Realtime priority class, while the CPU load application ranin the Normal priority class.

10.1.3 UserGraphicsIn the UserGraphics test, the sample distribution in the 40 − 120 µs timerange was more dense, which means more algorithm calculations were in-terrupted compared to UserIdle and UserCPU. However, the WCET sampleof 154.74 µs was not much worse than the WCET for UserIdle, which indi-cates that the real-time performance of XP is not signi�cantly a�ected byGUI stress.

10.1.4 UserHDDIn the HDD stress load condition, there was a signi�cant increase of samplesaround 50 µs, as seen in Figure 15. Also, the spectrum between 40 − 80µs was more dense compared to the previous load conditions. However, theWCET was just 155.42 µs, which was similar to the results in the previoustests.

As in the UserIdle test, the UserHDD test changed characteristics aftera period of time. In this test, the change occurred after two thirds of thetime, where the samples ranging between 50− 80 µs suddenly dropped to amore compact range of 50 − 60 µs. The reason for this is unknown, but asseen in Figure 15, this does not a�ect WCET. In fact, the WCET samplewas measured near the end of the test where the scatter plot showed the besttemporal determinism.

55

Page 57: Real-Time Performance of Windows XP Embedded - Research

(a) (b)

Figure 13: UserCPU algorithm execution time. (a) Scatter plot, (b) Timedistribution.

(a) (b)

Figure 14: UserGraphics algorithm execution time. (a) Scatter plot, (b)Time distribution.

56

Page 58: Real-Time Performance of Windows XP Embedded - Research

(a) (b)

Figure 15: UserHDD algorithm execution time. (a) Scatter plot, (b) Timedistribution.

10.1.5 UserNetworkThe network load condition had a signi�cant number of samples around 60µs. Also, the range between 40 − 65 µs was dense. The samples above 65µs were distributed in a similar way as the UserHDD test, and the WCETwas 144.48 µs.

10.1.6 UserStressThe UserStress test, running all previous load conditions at the same time,was�perhaps unsurprisingly�having the most impact on real-time perfor-mance. The density of samples around 60 µs was even higher here comparedto UserNetwork, but the characteristics and time distribution was similar.

However, even in this stressed load condition, no sample exceeded 170µs. In fact, although not speci�cally designed for it, XP seems to do a goodjob keeping the WCET at�what it seems�a limited level. The amount ofsystem load applied seems to have a small impact of the measured WCET.

Every one million samples, the test changed characteristics for a shortperiod of time, as seen in the scatter plot of Figure 17. In actual time, thiswas roughly every 15 minutes. The reason for this behavior is not known,but it did not negatively a�ect real-time performance.

10.2 Oscilloscope Test ResultsThe results of the tests conducted using the oscilloscope are brie�y presentedin this section. For a complete listing of these test results, see Appendix A.

57

Page 59: Real-Time Performance of Windows XP Embedded - Research

(a) (b)

Figure 16: UserNetwork algorithm execution time. (a) Scatter plot, (b) Timedistribution.

(a) (b)

Figure 17: UserStress algorithm execution time. (a) Scatter plot, (b) Timedistribution.

58

Page 60: Real-Time Performance of Windows XP Embedded - Research

The oscilloscope test results showed a surprisingly good level of pre-dictability compared to the results of the previous work in the �eld of XPreal-time performance[27, 24, 3].

The CPU load conditions (UserCPU, DriverCPU, and DriverPrioCPU)had a minor impact on the tests. At most, a slightly higher standard devia-tion was measured, but the WCET were similar to the idle tests (UserIdle,DriverIdle, and DriverPrioIdle).

Similar to the TSC tests, HDD and network loads had the biggest per-formance impact after the stress tests.

As expected, the driver implementation had shorter average executiontimes and WCET than the user-thread implementation. For example, theUserStress WCET was 450.89 µs, where the DriverStress and DriverPrioStressWCETs were 328.30 µs and 356.06 µs, respectively.

One surprising discovery was that the tests with prioritized DPCs hadlonger execution times than the normal DPC tests in many cases. Possiblereasons why this was the case are discussed in Section 11.

59

Page 61: Real-Time Performance of Windows XP Embedded - Research

11 ConclusionsA number of observations can be made from the tests conducted. The fol-lowing list summarizes the most important observations, and the rest of thischapter is devoted to explaining them in greater detail:

• XP has a better determinism than was reported in the previous work(Section 7).

• Higher task priority yields better determinism.

• A pure driver implementation is faster and more deterministic than auser-mode implementation.

• Task interruption can occur anywhere in a full event cycle.

• The di�erence in execution time and determinism between a prioritizedand a normal DPC is small.

• The algorithm execution time is slower in kernel-mode compared tonormal user-mode.

• No hard guarantees in terms of WCET can be given.

11.1 Better Determinism Than Reported In Previous WorkWhen evaluating the test results, a general re�ection is that the latenciesand WCETs were much more predictable than previously reported in the�eld of XP real-time performance[27, 24, 3]. While [24] reported applicationWCETs almost 10000% over the average execution time, our conducted testsnever even generated full event cycle WCETs ten times over the averageexecution time. There can be several reasons for this di�erence, where themost probable one is di�erent load conditions. It is impossible to pinpointthe exact reasons, since the conditions under which the tests of the previouswork were conducted are not known.

11.2 Higher Task Priority Yields Better DeterminismIt is clear from the results that a higher priority level yields a higher degreeof determinism. The ISR, running at the highest priority level in the tests,had a maximum latency of 41.82 µs after an interrupt was triggered. Thiswas measured in the UserFile test. Compared to the mean latency of 11.96µs, the maximum latency is roughly four times larger.

In the same test, the maximum time between the scheduling of a DPCand its actual execution is 80.61 µs. Compared to the mean latency of 3.72µs, the maximum latency is over 20 times larger, which is signi�cantly largerthan the latency of the ISR. The reason for this, as discussed in Section 4.4,

60

Page 62: Real-Time Performance of Windows XP Embedded - Research

is the fact that a DPC can be interrupted by any interrupt, whereas theISR can only be interrupted by higher IRQL interrupts. Since the parallelport uses IRQL 3 on the test system, the only devices with priority are thekeyboard and the system timer.

11.3 Driver Faster Than User-ModeA pure driver implementation has a shorter WCET and is more deterministicthan a user-mode implementation communicating with a driver. This cameas no surprise, considering the reduced number of steps required in the drivercompared to the user-mode implementation. Also, the algorithm runs in ahigher priority level (DISPATCH_LEVEL) in the driver implementation, whichreduces the probability of it becoming interrupted.

11.4 Task Interruption Can Occur AnywhereEvery individual step in a full event cycle can be interrupted at any time.In the test results, it is easy to see that a discrete step, such as the DPCexecution time, has a signi�cantly higher WCET compared to its averageexecution time.

The sum of the WCET of each individual event exceeds the actuallymeasured WCET for the whole event cycle. This means that, theoretically,the WCET is higher than measured in the tests. However, the test resultsindicate that it is statistically very unlikely that all steps in the chain ofevents will be interrupted in a single event cycle.

11.5 Small Di�erence Between Normal and Prioritized DPCThe di�erence between a prioritized and a normal FIFO DPC in terms ofWCET and average execution time was unexpectedly small. In fact, manyof the driver implementation tests had longer execution times when usinga prioritized DPC. Two possible reasons are considered, where a mixture ofthe two may be closer to the actual reason:

1. The DPC queue is never long enough for the priority to make a dif-ference. This reason alone seems unlikely, considering the File andNetwork load conditions, both generating many DPCs.

2. The other device drivers loaded in the system also use prioritized DPCs.This is impossible to know without access to the source code for everydevice driver loaded in the system.

The use of prioritized DPCs in a real-time application is therefore not ad-visable.

61

Page 63: Real-Time Performance of Windows XP Embedded - Research

11.6 Algorithm Slower in Kernel-ModeThe average execution time for the algorithm is approximately 12 % fasterwhen executing in a normal user-mode thread compared to when executingin the DPC. This is likely because of di�erent levels of code optimization inthe DDK compiler and the Visual Studio 2003 compiler.

11.7 No Guarantees Can Be GivenWhile the speci�c tests conducted did not yield any execution times over onemillisecond, no hard guarantees about an absolute WCET can be made. Thetests only prove that, under exactly the load conditions simulated, executiontimes exceeding the results were not measured.

However, the tests show that the probability of execution times exceedingthose measured are very unlikely. Thus, this indicates that XP might besuitable as a soft RTOS under certain controlled conditions.

62

Page 64: Real-Time Performance of Windows XP Embedded - Research

12 Future WorkThe results from the tests showed that XP could be suitable as a soft real-time system. However, only the inner workings of XP were evaluated, whichis just part of the suggested target platform by the ABB business unit.Although the temporal predictability of XP is su�cient for some soft real-time systems as it is, di�erent techniques to further increase the determinismshould be evaluated to make XP an alternative for systems with even strictertemporal constraints. The following areas would be of interest to evaluate ifmore time was available:

• Use of an Ethernet based protocol for communication

• Modify interrupt handling

• Run the tests on XPE instead of XP

• Evaluate extensions

12.1 Use of an Ethernet Based Protocol for CommunicationAs mentioned in Section 8, all conducted test used the parallel port to sim-ulate communications from the sensor and to the actuator. In the originalmodel suggested by the business unit (see Figure 5), the communicationbetween actuator and sensor could be using an Ethernet stack to decreaseproduction cost. Evaluating the temporal behavior of the TCP/IP stack us-ing UDP would be of interest. If the temporal predictability of this protocolstack is not su�cient, other Ethernet based protocols could be evaluatedinstead.

12.2 Modify Interrupt HandlingTo modify interrupt handling in XP, two di�erent approaches are suggestedas future work; modify the source code of HAL, or intercept interrupts beforethey even reach HAL. As mentioned in Section 4, the source code for HALcan be delivered from Microsoft with a special agreement. Some simplemodi�cations could possibly increase the determinism enough to make XPa more suitable alternative for systems with higher temporal constraints.

Interception of interrupts could be done by modify the IDT to pass allinterrupts to a custom interrupt handler routine. A suggested model ispresented in Figure 18.

When an interrupt occurs in this model, it is passed to the customizedinterrupt handler. This interrupt handler �rst examines the interrupt vectorto determine if the interrupt is intended for the time critical system or not.If the interrupt was intended for the system (represented by path An inFigure 18), the custom interrupt sets a �ag to mark that an interrupt for

63

Page 65: Real-Time Performance of Windows XP Embedded - Research

Win32 Kernel

I/O Manager

HALB3

Custom Interrupt Handler

IntendedForTime-

CriticalSystem();

MarkAsPending();

QueueInterrupts();

TimeCriticalSystem();

UnmarkPending();

ProcessQueue();

1

B2

A3

Time Critical System

ReadFromSensor();

Algorithm();

WriteToActuator();

A2A4

Figure 18: Suggested model for interrupt interception.

the system is pending, queues all incoming interrupts, executes the criticalapplication, turns o� the pending �ag, and �nally processes the queue ofinterrupts. However, if the interrupt was not intended for the time criticalapplication, the customized interrupt handler simply passes the interrupt tothe HAL, and processing of the interrupt is handled by the XP I/O Manageras normal (represented by path Bn in Figure 18).

12.3 Run the Tests on XPEBecause of the limited time available for this Master's Thesis, the tests wereconducted onWindows XP Professional instead of XPE. Although the kernel,thread priorities, scheduling algorithms, and inter-process communication ofXP and XPE are identical, further testing on XPE would be interesting tosee if additional system services not needed for the ABB business unit couldbe disabled to improve temporal predictability.

12.4 Evaluate ExtensionsThis report only had time for a brief overview of the third-party realtimeextensions. Although [32] shows promising results for the evaluated exten-sions, the number of test conducted are too few to make any real conclusions

64

Page 66: Real-Time Performance of Windows XP Embedded - Research

about the temporal predictability of the extensions. Further analysis of theavailable real-time extensions would be subject for future work in this area.

65

Page 67: Real-Time Performance of Windows XP Embedded - Research

A Oscilloscope Test ResultsUser-thread measurementsAll user-thread tests measured the full event cycle as well as selected individ-ual events described in Figure 6. The test results use the same event namesused in Figure 7. All test results are presented in µs.

UserIdleMin Mean WCET Std. dev.

t0�tGf 106.72 110.89 186.76 1.68t0�tAs 8.69 10.21 21.30 0.69

tAf�tBs 1.96 2.96 62.96 0.34t0�tDf 81.67 85.38 160.55 1.51

tDf�tGf 24.59 25.52 94.88 0.67

Number of samples: 994 210

UserCPUMin Mean WCET Std. dev.

t0�tGf 103.57 109.28 192.55 2.46t0�tAs 8.60 10.19 25.43 0.69

tAf�tBs 2.08 2.15 66.02 0.39t0�tDf 80.41 85.04 165.71 2.18

tDf�tGf 22.27 24.24 100.28 0.83

Number of samples: 1 064 400

UserGraphicsMin Mean WCET Std. dev.

t0�tGf 104.52 118.25 256.30 14.70t0�tAs 8.69 10.39 22.85 0.78

tAf�tBs 2.08 2.99 85.17 0.65t0�tDf 81.74 93.60 226.59 13.47

tDf�tGf 22.16 24.66 105.60 1.50

Number of samples: 441 060

UserHDDMin Mean WCET Std. dev.

t0�tGf 103.05 145.05 327.89 17.79t0�tAs 8.69 11.80 42.67 2.03

tAf�tBs 2.09 3.78 163.61 1.60t0�tDf 81.16 119.14 298.27 16.41

tDf�tGf 21.84 25.87 123.26 2.34

Number of samples: 1 339 300

66

Page 68: Real-Time Performance of Windows XP Embedded - Research

UserNetworkMin Mean WCET Std. dev.

t0�tGf 102.99 130.04 402.52 11.31t0�tAs 8.69 11.27 42.43 1.48

tAf�tBs 2.09 3.52 214.81 2.26t0�tDf 81.05 104.58 358.28 10.06

tDf�tGf 21.77 25.46 130.16 3.98

Number of samples: 1 099 900

UserStressMin Mean WCET Std. dev.

t0�tGf 102.68 141.32 450.89 20.04t0�tAs 8.74 12.14 51.34 2.72

tAf�tBs 2.08 3.54 243.74 4.10t0�tDf 80.13 114.55 416.50 18.14

tDf�tGf 21.98 26.77 147.96 5.60

Number of samples: 1 267 800

Driver MeasurementsAll device driver tests measured the full event cycle as well as selected individual eventsdescribed in Figure 8. The test results use the same event names used in Figure 9. Alltest results are presented in µs.

DriverIdleMin Mean WCET Std. dev.

t0�tBf 56.93 58.58 119.63 0.84t0�tAs 6.50 7.98 18.82 0.68

tAf�tBs 2.06 3.05 60.54 0.34tAs�tAf 2.77 2.95 13.37 0.06tBs�tBf 44.49 44.61 59.38 0.35

Number of samples: 1 294 000

DriverCPUMin Mean WCET Std. dev.

t0�tBf 57.08 65.53 125.81 0.84t0�tAs 6.44 7.96 18.80 0.68

tAf�tBs 2.07 2.12 59.73 0.33tAs�tAf 2.78 2.81 13.03 0.05tBs�tBf 44.49 52.62 67.75 0.82

Number of samples: 1 314 900

67

Page 69: Real-Time Performance of Windows XP Embedded - Research

DriverGraphicsMin Mean WCET Std. dev.

t0�tBf 56.61 60.60 146.52 3.60t0�tAs 6.48 8.19 21.91 0.81

tAf�tBs 2.06 2.92 90.39 0.64tAs�tAf 2.77 3.00 16.85 0.16tBs�tBf 44.49 46.49 71.61 3.43

Number of samples: 1 315 300

DriverHDDMin Mean WCET Std. dev.

t0�tBf 56.68 65.05 190.39 5.92t0�tAs 6.53 9.32 26.01 1.33

tAf�tBs 2.07 3.51 123.43 1.50tAs�tAf 2.78 3.61 19.78 0.89tBs�tBf 44.51 48.61 84.33 3.46

Number of samples: 963 450

DriverNetworkMin Mean WCET Std. dev.

t0�tBf 57.05 65.21 259.72 5.67t0�tAs 6.67 9.08 25.86 1.05

tAf�tBs 2.09 3.36 182.26 2.26tAs�tAf 2.94 3.45 27.14 0.64tBs�tBf 44.71 49.33 89.71 4.15

Number of samples: 1 353 900

DriverStressMin Mean WCET Std. dev.

t0�tBf 64.01 72.31 328.30 7.20t0�tAs 6.49 9.49 31.37 1.55

tAf�tBs 2.07 3.09 249.14 3.65tAs�tAf 2.89 3.67 26.81 1.20tBs�tBf 51.23 56.06 93.01 3.05

Number of samples: 1 312 300

68

Page 70: Real-Time Performance of Windows XP Embedded - Research

DriverPrioIdleMin Mean WCET Std. dev.

t0�tBf 56.56 58.63 110.85 0.83t0�tAs 6.53 8.03 17.32 0.68

tAf�tBs 2.08 3.05 56.18 0.28tAs�tAf 2.79 2.94 13.41 0.07tBs�tBf 44.51 44.62 59.41 0.36

Number of samples: 991 920

DriverPrioCPUMin Mean WCET Std. dev.

t0�tBf 64.00 65.57 121.72 0.84t0�tAs 6.44 7.99 13.98 0.68

tAf�tBs 2.07 2.12 57.59 0.32tAs�tAf 2.78 2.84 13.33 0.05tBs�tBf 52.54 52.62 67.80 0.35

Number of samples: 906 930

DriverPrioGraphicsMin Mean WCET Std. dev.

t0�tBf 56.58 60.22 135.74 3.37t0�tAs 6.47 8.13 17.74 0.77

tAf�tBs 2.06 2.91 80.38 0.58tAs�tAf 2.78 3.01 12.76 0.22tBs�tBf 44.50 46.17 71.19 3.23

Number of samples: 809 390

DriverPrioHDDMin Mean WCET Std. dev.

t0�tBf 56.68 65.44 223.53 6.12t0�tAs 6.56 9.38 39.20 1.35

tAf�tBs 2.07 3.55 138.00 1.49tAs�tAf 2.79 4.25 31.30 1.02tBs�tBf 44.51 48.26 88.63 3.50

Number of samples: 3 776 400

69

Page 71: Real-Time Performance of Windows XP Embedded - Research

DriverPrioNetworkMin Mean WCET Std. dev.

t0�tBf 56.93 68.44 297.21 7.25t0�tAs 6.73 9.21 25.15 1.29

tAf�tBs 2.07 3.63 218.36 3.20tAs�tAf 2.82 3.95 24.53 0.97tBs�tBf 44.73 51.65 84.95 4.97

Number of samples: 990 750

DriverPrioStressMin Mean WCET Std. dev.

t0�tBf 64.09 72.73 356.06 7.61t0�tAs 6.52 9.42 33.63 1.59

tAf�tBs 2.07 3.28 278.43 3.92tAs�tAf 2.80 3.28 28.39 1.37tBs�tBf 52.51 55.82 96.11 3.16

Number of samples: 998 320

Algorithm Execution TimeThe following test was conducted to measure the di�erence in execution time of �oating-point operations. The results are presented in µs.

Min Mean WCET Std. dev. SamplesUser-thread 39.34 39.87 105.28 0.79 153 910

Device driver 41.67 41.76 56.40 0.32 116 370

70

Page 72: Real-Time Performance of Windows XP Embedded - Research

References[1] Ardence RTX Real-time Extension for Control of Windows. Ardence.

http://www.ardence.com/assets/5f940542924c4a42b30fc5584872d798.pdf.

[2] A. Baker and J. Lozano. The Windows 2000 Device Driver Book. Prentice Hall PTR,2001.

[3] A. Baril. Using Windows NT in Real-Time Systems. In Proceedings of the FifthIEEE Real-Time Technology and Applications Symposium (RTAS '99), pages 132�141, Washington - Brussels - Tokyo, 1999. IEEE Computer Society.

[4] L. Budin and L. Jelenkovic. Time-Constrained Programming in Windows NT Envi-ronment. In Proceedings of the IEEE International Symposium on Industrial Elec-tronics, (ISIE '99), pages 90�94, Bled, 1999. IEEE Computer Society.

[5] J. Cinkelj et al. Soft Real-Time Acquisition in Windows XP. In Intelligent Solutionsin Embedded Systems, 2005. Third International Workshop, pages 110�116, Bled,2005. Intelligent Solutions in Embedded Systems.

[6] Comparisons with the .NET Framework.http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dv_evtuv/html/etconcomparisonswithnetframework.asp.

[7] I. Crnkovic and M. Larsson. Building Reliable Component-Based Software Systems.Artech House, Inc., 2002.

[8] S. Daily. Introducing Windows NT 4.0. 29th Street Press, February 1997.

[9] E. Dekker and J. Newcomer. Developing Windows NT Device Drivers. Addison-Wesley, 1999.

[10] Hard Real-Time with Venturcom RTX on Microsoft Windows XP and Windows XPEmbedded. Venturcom, Inc., September 2003.http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dnxpesp1/html/tchHardReal- TimeWithVenturcomRTXOnMicrosoftWin-dowsXPWindowsXPEmbedded.asp.

[11] Hyper-Threading Technology Overview. Intel Corporation.http://www.intel.com/business/bss/products/hyperthreading/overview.htm.

[12] HyperKernel - Real-time Extensions for Windows NT/2000. Nematron.http://www.nematron.com/HyperKernel/.

[13] Intel Corporation. IA-32 Intel Architecture Software Developer's Manual Volume 3A:System Programming Guide, Part 1, January 2006. Order Number: 253668-018.

[14] Intel Corporation. IA-32 Intel Architecture Software Developer's Manual Volume 3B:System Programming Guide, Part 2, January 2006. Order Number: 253669-018.

[15] INtime. TenAsys. http://www.tenasys.com/intime.html.

[16] INtime 3.0 Real-time Operating System (RTOS) Extension for Windows. TenAsys.http://www.tenasys.com/resources/getFile.php?�leid=6.

71

Page 73: Real-Time Performance of Windows XP Embedded - Research

[17] D. Kresta. Getting Real with NT Approaches to Real-Time Windows NT. Real-TimeMagazine, 2:32�35, 1997.

[18] KUKA Controls GmbH - Hard Real-Time Windows XP. KUKA Controls GmbH.http://www.kuka-control.com/product/.

[19] P. N. Leroux. RTOS versus GPOS: What is best for embedded development? Em-bedded Computing Design, January 2005.

[20] C. Liu and J. Leyland. Scheduling Algorithms for Multiprogramming in Hard Real-Time Environment. Journal of the ACM, 20(1), 1973.

[21] M. Lutz and P. Laplante. C# and the .NET Framework: Ready for Real-Time?IEEE Software, 20(1):74�80, 2003.

[22] Microsoft Windows CE 5.0.http://msdn.microsoft.com/library/default.asp?url=/library/en-us/wceintro5/html/wce50oriWelcomeToWindowsCE.asp.

[23] C. Nordström et al. Robusta realtidssystem. Mälardalen Real-Time Research Centre,Västerås, August 2000.

[24] K. Obenland, J. Kowalik, T. Frazier, and J. Kim. Comparing the Real-Time Per-formance of Windows NT to an NT Real-Time Extension. In Proceedings of theFifth IEEE Real-Time Technology and Applications Symposium (RTAS '99), pages142�153, Washington - Brussels - Tokyo, 1999. IEEE Computer Society.

[25] W. Oney. Programming the Microsoft Windows Driver Model. Microsoft Press, 1999.

[26] C Peacock. PortTalk - A Windows NT I/O Port Device Driver.http://www.beyondlogic.org/porttalk/porttalk.htm.

[27] K. Ramamritham et al. Using Windows NT for Real-Time Applications: Experimen-tal Observations and Recommendations. In Proceedings of the Fourth IEEE Real-Time Technology and Applications Symposium (RTAS '98), pages 132�141, Wash-ington - Brussels - Tokyo, June 1998. IEEE Computer Society.

[28] Real-Time Operating Systems: INtime Architecture. TenAsys Corporation,September 2003. http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dnxpesp1/html/tchReal-TimeOperatingSystemsINtimeArchitecture.asp.

[29] RTX. Ardence. http://www.ardence.com/embedded/products.aspx?ID=70.

[30] M. E. Russinovich and D. A. Solomon. Microsoft Windows Internals Fourth Edition:Microsoft Windows Server 2003, Windows XP, and Windows 2000. Microsoft Press,2005.

[31] A. Tanenbaum. Modern Operating Systems, Second Edition. Prentice Hall Interna-tional, 2001.

[32] M. Timmerman et al. Designing for Worst Case: The Impact of Real-Time OSPerformance on Real-World Embedded Design. Real-Time Magazine, 3:11�19, 1998.

72

Page 74: Real-Time Performance of Windows XP Embedded - Research

[33] M. Timmerman and J-C. Monfret. Designing for Worst Case: The Impact of Real-Time OS Performance on Real-World Embedded Design. Real-Time Magazine, 3:52�56, 1997.

[34] M. Timmerman and J-C. Monfret. Windows NT as Real-Time OS? Real-TimeMagazine, 2:6�13, 1997.

[35] M. Timmerman and J-C. Monfret. Windows NT Real-Time Extensions: an Overview.Real-Time Magazine, 2:14�24, 1997.

[36] Windows Driver Model (WDM). Microsoft Corporation, April 2002.http://www.microsoft.com/whdc/archive/wdmoverview.mspx.

[37] Windows XP Embedded Home Page. Microsoft Corporation, November 2005.http://msdn.microsoft.com/embedded/windowsxpembedded/.

[38] P. Work and K. Nguyen. Measure Code Sections Using The Enhanced Timer. IntelCorporation. http://www.intel.com/cd/ids/developer/asmo-na/eng/209859.htm.

73