Upload
independent
View
1
Download
0
Embed Size (px)
Citation preview
Hardware Scheduling and Placement in Operating Systems
for Reconfigurable SoC
Yuan-Hsiu Chen
A Thesis Submitted toInstitute of Computer Science and Information Engineering
College of EngineeringNational Chung Cheng University
for the Degree ofMaster
inComputer Science and Information Engineering
2005
Abstract
Existing operating systems can manage the execution of software tasks, however
the manipulation of hardware tasks is very limited. The major goal of this research is
to design and implement an embedded operating system that manages both software
and hardware tasks in the same framework. In this research, we will demonstrate
how to add the management of hardware tasks to a traditional operating system.
To manage hardware tasks, some issues need to be solved, three of which are as
follows. The first one is the dynamic scheduling of hardware tasks. A good schedul-
ing method for the dynamically reconfigurable architecture will not only help us to
improve execution efficiency, but also reduce power consumption. The second issue
is allocation. Dynamic partial reconfiguration usually results in many fragmented
spaces. To fully utilize the platform resources, efficiently managing the remaining
space is necessary. The last issue is how to place a hardware task in a reconfigurable
logic space. Finding the best fit space can reduce the free space wasted and let more
tasks to be placed at the same time in a reconfigurable platform.
Keywords: Dynamic Reconfigurable SoC, FPGA, Scheduling, Placement
Contents
1 Introduction 6
1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.2 Motivation and Purpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.3 The Target . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.4 Our approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2 Previous work 15
2.1 Scheduling Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.2 Allocation and Placement Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3 Hardware Task Scheduling in FPGA 22
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.2 Task Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.3 Hardware Scheduling Without Deadline . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.4 Hardware Scheduling With Deadline . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
4 Placement Methodology for Hardware Task 27
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
1
4.2 Task Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
4.3 Recording Placement Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
4.4 Placement Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
4.5 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
5 EXPERIMENT RESULTS 43
5.1 Experiment with Random Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
5.2 Experiment with Real Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
6 CONCLUSIONS 50
2
List of Figures
1.1 The configurable logic for hardware tasks. . . . . . . . . . . . . . . . . . . . . . . . . 11
1.2 The architecture of an operating system for reconfigurable systems. . . . . . . . . . . 12
1.3 A Placement Example. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.1 Scheduling methodology without deadline (SRPT). . . . . . . . . . . . . . . . . . . . 24
3.2 Comparing the SRPT and LLF Scheduling Methodologies. . . . . . . . . . . . . . . . 26
4.1 The overall architecture of FPGA with configurable logic blocks . . . . . . . . . . . . 28
4.2 The difference between our method and stuffing. . . . . . . . . . . . . . . . . . . . . 29
4.3 Structure of space list and task list. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
4.4 The example of tasks in FPGA to illustrate space list and task list. . . . . . . . . . . 32
4.5 Placement steps. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
4.6 Example of Conflict. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
4.7 Example of two type tasks place into free space. . . . . . . . . . . . . . . . . . . . . 35
4.8 Example of Placement: Total task information. . . . . . . . . . . . . . . . . . . . . . 37
4.9 Example of Placement: (a). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
4.10 Example of Placement: (b). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
4.11 Example of Placement: (c). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
3
4.12 Example of Placement: (d). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
4.13 Example of Placement: (e). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
4.14 Example of Placement: (f). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
4.15 Example of Placement: (g). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
4.16 Example of Placement: (h). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
4.17 Example of Placement: Final information of two list. . . . . . . . . . . . . . . . . . . 42
5.1 The placement example with ratio 4:1. . . . . . . . . . . . . . . . . . . . . . . . . . . 45
5.2 The fragmentation with ratio 1:4. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
5.3 Result of our method and stuffing. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
4
List of Tables
4.1 Finding the Best Classification Ratio. . . . . . . . . . . . . . . . . . . . . . . . 30
5.1 Experimental Results and Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . 44
5.2 The Attributes of Six Real Tasks. . . . . . . . . . . . . . . . . . . . . . . . . . . 48
5.3 Three Case with 50 Real Tasks. . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
5.4 Experimental Results and Comparison with Real Tasks . . . . . . . . . . . . . . . . 49
5
Chapter 1
Introduction
Nowadays, a hand-held product can provide a variety of applications such as digital
video, 3D games, MP3 player, and so on. ASICs can be used to make these appli-
cations feasible. However, it may be costly and highly complex, because if someone
wants to add a new application in the near future, one must add an ASIC into the
platform and modify the entire circuit. Furthermore, it is highly probable that the
ASICs will not all run at the same time, because users may require only some of
the applications simultaneously. In general, applications implemented using ASICs
will increase the time to market and the design and production costs. An alternative
method is letting a microprocessor to accomplish all of these applications. Although
when adding a new application this will become simpler than using ASICs completely,
but it perhaps cannot provide the required performance. Hence, some hardware imple-
mentations are required. The characteristic of dynamically reconfigurable platforms
6
such as those using FPGA are between ASIC and microprocessor. FPGAs can be
configured at run-time according to the dynamically required applications, so they
are more flexible than ASICs. FPGAs are also faster than microprocessors in task
execution. Therefore, dynamically reconfigurable platforms perhaps are gradually
being used more and more extensively. Specifically in partially reconfigurable plat-
forms, one can change the configuration of some CLBs (Configurable Logic Blocks)
while other CLBs are still running. However partial reconfiguration is more complex.
We must manage the FPGA space and know when and where the hardware tasks
are to be placed. Controlling partially reconfigurable platforms is quite similar to
virtual memory management in operating systems. In this work, our objective is to
implement an embedded operating system to manage both the hardware and software
tasks simultaneously.
1.1 Background
Generally, EDA tools are used to setup and configure dynamic partially recon-
figurable platforms. The FPGA chips are configured through the JTAG interface.
However in this work, we would like to dynamically control all the resources in a
dynamic partially reconfigurable platform. To achieve this objective, a controller
must to be designed into the platform, which must be small and fast. A possible
tool that can be used instead of the conventional FPGA synthesis tools is the Xil-
inx JBits, which is a set of Java classes providing an API into the Xilinx Virtex II
7
FPGA family bitstream. However, the proposed controller is not only responsible for
configuration as that in JBits, but also the management of the logic resources. The
jobs that a controller must perform include hardware task scheduling, space alloca-
tion, and task placement are all. In the future, many products will make use of the
reconfigurable embedded systems, such as wearable computing and mobile systems.
Therefore, we intend to create an efficient controller to manage the resources of re-
configurable platforms. This controller will be done in the content of an operating
system for reconfigurable systems.
1.2 Motivation and Purpose
Since a dynamically reconfigurable platform usually consists of a microprocessor and
FPGA, a hardware-software co-design methodology is required. Existing operating
systems already have the capability to manage software processes running on a micro-
processor. If we extend an existing operating system, we can avoid the time to design
the software related functions for memory management, inter-process communica-
tion, and process scheduling. We also believe that hardware-software co-scheduling
and communication are also easier to design when extending an existing operating
system. Uniformly treating all system tasks, including both hardware and software
tasks, would be beneficial for hardware-software co-scheduling. Because hardware
and software tasks may have data dependency resulting in process sequencing. If
one of them waits for the other for a long time due to data dependency, the whole
8
system will be very inefficient. Hence, we can let an operating system to coordinate
their execution sequence so as to avoid complex asynchronous control and obtain the
best efficiency. According to the dependency mentioned before, the data between
hardware and software will need to be communicated.
Our implementation of the proposed operating system will provide channels be-
tween the FPGA and the processor to allow communication. The operating system
will also provide a configuration channel to convey configuration data.
The proposed operating system will eventually support the ability for switching
between hardware and software tasks. The most suitable hardware-software switch
time is determined by the operating system which will check the workloads at both
the FPGA and the processor or check the task urgency. Furthermore, it also desirable
that hardware tasks can be preempted, just like software task preemption. To satisfy
real time requirements hardware preemption is also needed. However, preemption will
result in the problem of configuration overhead. If the frequency of preemption is high,
high power consumption could be another side effect. The decision on preemption
will be implemented into the operating system.
Perhaps many people still do not know why one must use an operating system.
Besides the above-mentioned advantages about hardware-software co-design, below
are some reasons why we decided to implement an operating system for reconfigurable
systems.
• increases portability: A reconfigurable system can be easily ported to another
9
platform, by modifying only the architecture related part in an operating system
for reconfigurable system, without the need to start from scratch.
• shorten time to market: When a new product resembles a previous one, we can
reuse some modules in the operating system to reduce the design time. Similar
to the increase in portability, we also need to only modify certain parts of a
system.
• easy partitioning: If the architecture has both FPGA and CPU, and the oper-
ating system supports hardware and software task switching, then a software
task running on the CPU, can be switched and dispatched to run on the FPGA
by the operating system for improved performance. Operating systems can be
made to perform dynamic partitioning easily and efficiently.
1.3 The Target
The target platform on which we will conduct our experiments is the Xilinx ML310,
which has a VirtexII Pro XC2VP30-FF896 chip with two PowerPC405 cores, 256MB
DDR RAM, a 512MB CompactFlash card, and a 10/100 MB/s ethernet interface. The
XC2VP30-FF896 chip has 3680 CLBs, and it is a partial dynamically reconfigurable
chip. The FPGA configurable space is considered to be a set of columns, where a
column spans the height of a chip and the set of columns spread across the width of
the chip. Fig. 1.1 shows the overall architecture of the FPGA chip. The right side
10
rectangle in Fig. 1.1 shows the structure of a basic unit of configuration. The left side
in Fig. 1.1 is the example of placement, there are two tasks A and B on FPGA, the
task A occupies one column and the task B occupies two. In this platform, the basic
placement unit of configuration is one column of CLBs. Hence, the configuration in
this chip is just a one dimensional problem, which means that we need only know
how wide a task is to place it into the chip. Thus, the space allocation is easier than
the two dimensional placement.
CLB
CLB
CLB
CLB
CLB
CLB
CLB
CLB
TaskA
TaskB
Figure 1.1: The configurable logic for hardware tasks.
1.4 Our approach
In the design of an operating system for reconfigurable systems, the focus here will
be on the hardware task scheduling, placement, and space allocation. The criteria
11
for finding solutions to the above research issues include the usage of reconfigurable
resources such as the utilization of the FPGA space, real time performance for tasks,
power consumption and so on. Fig. 1.2 illustrates the prototype of an operating sys-
tem for reconfigurable systems. The left side of Fig. 1.2 depicts the original existing
software task management. The right side shows the hardware counterpart for man-
aging the FPGA resources. The details of the scheduler, placer, and allocator will be
described later. The loader component in Fig. 1.2 is in charge of the configuration. It
transports the configuration data to the FPGA. Communicator is a bridge between
the processor and the FPGA. Signals and messages transfer through this interface.
Finally, the arrow between software scheduler and hardware scheduler represents the
process of co-scheduling.
Scheduler
OSS H
Scheduler
Placer
Allocator
MMU
FPGA
Communicator
.
.
.Loader
Figure 1.2: The architecture of an operating system for reconfigurable systems.
12
In the following, we introduce the proposed method of scheduling, placement, and
allocation. We choose the Shortest Remaining Processing Time (SRPT) [1] or Least
laxity First (LLF) [3] to decide the scheduling priority. The task having the shorter
remaining time or least laxity will have a higher priority. The system will thus choose
the task with the shortest remaining time or least laxity in the ready queue to process
first. For the placement and allocation methods, we can refer to Fig. 1.3. In Fig. 1.3,
there are two coordinate axes representing the width of FPGA in terms of columns
and time in cycle. Basically, the proposed placer in this thesis will separate the arrived
tasks into two types. One of them has longer processing times but use lesser space,
and the other one have shorter processing times but use more space. For example,
tasks 1 and 3 in Fig. 1.3 are of type A, and tasks 2, 4, and 5 are of type B. Graphically,
tasks of type A are more flat and lying-down, while tasks of type B are more thin
and standing. The placer places the two types of tasks towards the two ends of the
width axis, that is, on one end tasks of low SUR type are placed and on the other
end tasks of type high SUR are placed. We adopt this classification method so as to
reduce the amount of space wasted, because it reduces the amount of fragmentation.
Details of the placement algorithm will be given in Chapter 4.
13
Chapter 2
Previous work
The major issues related to the design and implementation of an operating sys-
tem for reconfigurable systems including task scheduling, allocation, and placement.
These issues are intimately related, because we must think about not only execution
sequence, but also the space in the reconfigurable logic device to reduce reconfigura-
tion overhead. Scheduling methods can be classified as static scheduling and dynamic
scheduling. Allocation and placement can be 1 dimensional or 2 dimensional prob-
lems of management in the FPGA space. An overview of previous work related to
these methodologies will be described in this Chapter.
Section 2.1 gives a review on some approaches for hardware task scheduling. Sec-
tion 2.2 surveys different approaches of placement and allocation.
15
2.1 Scheduling Methods
There are many similarities between the scheduling of hardware and software tasks.
For example irrespective of hardware or software, the scheduler needs some a-priori
information about tasks such as task priority, execution time, task size, deadline, and
so on. Hence, some methods for software task scheduling could also be implemented
for hardware task scheduling.
Scheduling methods can be classified into static scheduling and dynamic schedul-
ing, according to the time of scheduling the ready tasks. Static scheduler is used
before the execution of an application. For an application with statically known data
dependencies, task arrival time, and execution time, static scheduling can be used.
However, it is impossible to anticipate the behavior of all tasks before execution in a
dynamically reconfigurable system implementation, so we must schedule the tasks dy-
namically at run time. Though dynamic scheduling will achieve the better execution
sequence than static scheduling when some information could not know at run time,
but it will spend more time than static scheduling to decide the processing sequence
at run time.
Most schedulers are based on task priority. For example, in list scheduling [4], there
is a list of ready tasks, sorted by their priorities. If there are multiple ready tasks
ready, the scheduler will select a task with the highest priority to run. The difference
among these schedulers is the method for calculating the task priority. There are
16
some algorithms to compute the priority as follows.
• Static Scheduling
– As Soon As Possible (ASAP) and As Late As Possible (ALAP): These two
algorithms are static, which means the priority is calculated before schedul-
ing. If we know the before and after processing sequence and task depen-
dence before scheduling, ASAP will let task as soon as possible to be placed
into FPGA and not violates the processing sequence. ALAP will place the
task into FPGA as late as possible and obeys the sequence.
• Dynamic Scheduling
– First In First Out (FIFO): Tasks in queue will be sorted by their arrival
time. The earliest arriving task is at the head of the queue. The FIFO
scheduler selects tasks from the queue head.
– Shortest Job First (SJF): This algorithm sorts the queue according to the
execution time of the tasks. The head of the queue is a task with the shortest
execution time.
– Earliest Deadline First (EDF): The scheduler sorts tasks by their deadline.
A task with the earliest deadline is executed first.
– Shortest Remaining Processing Time (SRPT): This method supports task
preemption. The scheduler sorts the queue in ascending order of the remain-
ing execution time of tasks.
17
A new metric used to compute priority was proposed in [4]. This metric combines
ALAP and dynamic ASAP to compute task priority dynamically [4]. Unlike static
ASAP computes the priority before scheduling, the dynamic ASAP computes priority
at run time. When a task is scheduled, all dynamic ASAP values will be recalculated.
The ALAP time does not change at run time during dynamic scheduling, but dynamic
ASAP can realize the situation at run time to choose a better execution sequence. In
[8], the dynamic scheduling algorithm is based on the event windows and the active set
information. The scheduling algorithm of event windows is based on the task type and
task priority. Task priority is used to decide execution sequence. A task type means
this task would use which type circuit of dynamic reconfigurable logic. In the next
section, we will describe the configuration type along with the dynamic reconfigurable
logic. For minimizing the reconfiguration overhead a specific scheduling is supported
in [10]. It selects an existing run-time/design-time scheduling methodology called
Task Concurrency Management (TCM) [18] since it results in only a small run-time
penalty as most of the computation is done at design time. Although the TCM
can reduce power consumption, but it is not aware of the reconfiguration overhead.
Hence, there are three modules [10] that update the output of the TCM scheduler at
run-time.
18
2.2 Allocation and Placement Methods
In this section will introduce some management methods for the FPGA logic space
such as space allocation and task placement. These methods are intimately related
to the dynamic reconfigurable device, and they attempt to obtain better utilization
of these reconfigurable spaces.
Two well known online algorithms for the 1D problem are first fit (FF) and best
fit (BF). BF results in less fragments, but FF is much faster than BF algorithm.
We will introduce some methodologies about the FPGA area utilization next. The
reconfigurable logic could be partition into many blocks with different sizes [14]. All
of these blocks have fixed length in the vertical dimension, so block size is only
related to width. Because the sizes of blocks are different, the placer can select the
most suitable size of block for a task. Choosing the best fit block size result in
a smaller fragmentation. In [8], the FPGA space was divided into many Dynamic
Reconfigurable Logics (DRL), and each DRL will be classified into different type. If
a task ready to execute and find there is DRL has the same type, this task could
execute immediately without reconfiguring DRL. There are two event windows to
record execution and configuration events. If new tasks become ready for execution,
new task events are generated and inserted in execution event windows. When the
suitable DRL type for a task can be found in the FPGA, the event of this task will
be inserted in the execution event window. If all the DRL types are not suitable for
19
a new task, an event will be inserted in the reconfiguration event window. Scheduler
will select one event from reconfiguration event window and start to reconfigure the
DRL when execution event window is empty. This methodology tends to decrease the
member of reconfigurations for reducing the overall reconfiguration overhead, because
the DRL will not be reconfigured if there exists events in the execution event window
that use the same DRL type continually.
Two techniques to implement scheduling and placement called Horizon and Stuff-
ing are proposed in [14]. In the horizon technique two lists used, the scheduling
horizon list and the reservation list. The scheduling horizon list is a set of intervals,
an interval is written as [X,Y]@T, where [X,Y] denotes the width from X to Y in one
dimension and T is the last release time. The reservation list stores all tasks which
are scheduled but not yet executed. When a new task arrives, the scheduler will find
a suitable width from scheduling horizon list for this task. The stuffing technique
also uses two lists, but the free space list replaces the scheduling horizon list in the
horizon technique. The free space list is a set of intervals [X,Y] that denote currently
unused resource rectangles with height H. X and Y means the left end and right end
position in FPGA, is just the task width. H means the execution time of this task. If
a task arrives, it will find a space from the free space list. Though a free space width
may fit the task width, yet we should check whether this task conflicts with other
tasks which are in the reservation list during execution time. The main difference
between Horizon and Stuffing is that Stuffing could use the FPGA space efficiency.
20
Because Stuffing would find suitable slices from task ready time, but Horizon finds
the suitable slices from the time that these slices never be used after. When using
Horizon even though these slices from the free time to next using time have enough
time to process this task, this interval time would not be used.
21
Chapter 3
Hardware Task Scheduling in
FPGA
3.1 Introduction
This Chapter will give a detailed description on the hardware scheduler. We
propose two scheduling methodologies in this work as follows. For tasks without
deadlines we use the Shortest Remaining Processing Time (SRPT) methodology for
task scheduling. For tasks with deadlines we change the method to decide priority
for hardware tasks, that is, we use the Least Laxity First (LLF) algorithm.
22
3.2 Task Modeling
The hardware to be configured on an FPGA is modeled as a set of hardware
tasks, where each task t has a set of attributes including arrival time A(t), execution
time E(t), deadline D(t), and area C(t) (in terms of the number of FPGA columns
required). These numbers can be obtained by synthesizing a hardware function using
a synthesis tool such as Synplify or XST.
3.3 Hardware Scheduling Without Deadline
How to decide the priority for a task is the key decision in any scheduling method.
Suppose we have only information about execution time E(t) and task size (C(t))
at the time of scheduling. Task size is related to placement, so this information will
not be used for scheduling. The SRPT is a very suitable method when we know only
the execution time about tasks. SRPT [1] is an optimal algorithm for minimizing
mean response time and it has also been analyzed that the common misconception
of unfairness in SRPT to large tasks is unfounded.
The priority for each task is defined by their remaining processing time. The
task which needs the least execution time will be assigned the highest priority. In
the scheduling method there is a queue to place hardware tasks that are ready for
execution. Every ready task must go through the scheduler before it is put in the
ready queue. The scheduler will insert the tasks into the ready queue according to
23
their priorities. Hence, the task in the front part of the queue will be scheduled first.
For the example in Fig. 3.1, there is a queue and with four scheduled tasks. The four
tasks in the queue have their individual execution times, the scheduler will determine
their priorities as follows. The order of their priorities from high to low is A, B, C,
and D. In Fig. 3.1, the execution time of the next ready task E is 15, it will be inserted
between task C and task D.
GTask
Exe. time10
FTask
Exe. time20
ETask
Exe. time15
…..
Queue
A 5
B 8
C 14
D 20
Taskexecution
time
Placement
Schedule
A 5
B 8
C 14
D 20
Add taskE 15
Figure 3.1: Scheduling methodology without deadline (SRPT).
3.4 Hardware Scheduling With Deadline
If we have information about task deadline, maybe we could use this to get better
performance. Because, when one task misses its deadline, it would affect other tasks
that are waiting for the task. If we reduce the amount of deadline misses, we could
24
make the application performance more efficient.
For scheduling real-time hardware tasks with deadlines, we can use either EDF
or LLF. However, because hardware tasks are non-preemptive and truly parallel, we
decided to use LLF, which degrades a little more gracefully under overload than EDF,
extends relatively well to scheduling multiple processors, which are similar to parallel
hardware tasks, and approaches the schedulability of EDF for non-preemptive tasks.
Now we will describe the LLF methodology. In LLF, the priority of a task t
is assigned as given in Equation 3.1, where D(t) is task deadline, E(t) is the task
execution time, now is the current time, and P (t) is the priority. Intuitively, the
priority is the slack time for a task.
P (t) = D(t)− E(t)− now (3.1)
A LLF example is shown in Fig. 3.2. There are three ready tasks A, B, and C
with execution time 10, 5, and 15, and deadline 25, 30, and 20 respectively. If we
suppose the current time of them are the same, the priorities of A, B, and C are 15,
25, and 5. Hence, the order in ready queue are C, A, and B. Using the LLF all of
these three tasks can complete execution before their deadlines.
During scheduling we have not considered the task size and the FPGA space size,
hence some tasks may miss their deadlines if there is not enough space for these tasks.
The placement methodology will be described in Chapter 4.
25
Task A
Exec. 10Deadline 25
Task B
Exec. 5Deadline 30
Task C
Exec. 15Deadline 20
Schedule
C
A
B
LLF
Head
P = 5
P = 15
P = 25
Figure 3.2: Comparing the SRPT and LLF Scheduling Methodologies.
26
Chapter 4
Placement Methodology for
Hardware Task
4.1 Introduction
The XC2VP30 chip we use to experiment has 80 rows and 46 columns. In
Chapter 1 we have discussed that the smallest placement unit is one column of CLBs.
Hence there will be 46 columns for hardware task placement. Fig. 4.1 illustrates the
architecture of FPGA with configurable logic blocks, the column of CLBs with gray
background is the smallest placement unit. Hence, we need to know how wide a task
is but need not to know how high the task occupies. If we only consider the FPGA
space for placement, it is a one dimensional problem. In order to more effectively
utilize FPGA space, we also consider the task execution time. Hence, in out method
27
the hardware task placement is a two dimensional problem.
IOB
IOB
IOB
IOB
CLB CLBCLB CLB CLBCLB
CLB CLBCLB CLB CLBCLB
CLB CLBCLB CLB CLBCLB
CLB CLBCLB CLB CLBCLB
CLB CLBCLB CLB CLBCLB
CLB CLBCLB CLB CLBCLB
CLB CLBCLB CLB CLBCLB
CLB CLBCLB CLB CLBCLB
BUS
MACROS
BUS
MACROS
Figure 4.1: The overall architecture of FPGA with configurable logic blocks
The main difference between our classified stuffing and the original stuffing method
[11] is the classification of hardware tasks. In our method, we classify all hardware
tasks into two types and the placement location of these two types of tasks are
different, while in the stuffing method there is no such distinction. An example
is shown in Fig. 4.2. Suppose tasks A, B, and C are already placed as shown in
Fig. 4.2 and the placements are the same in our method and in stuffing. Next, task D
is to be placed, however the location it is placed will be different in out method and
in stuffing. In stuffing, it is placed adjacent to task C in the center columns of the
FPGA space. In contrast, in our method task D will be placed from the rightmost
columns. As a consequence, in our method tasks E and F can be placed earlier than
that in stuffing, resulting in a shorter schedule. Finally, the fragmentation in our
28
ClassifiedStuffing
Stuffing
Time(cycle)
Column151413121110987654321
A
B
C
D
EF
1 2 3 4 5 6 7 8 9 10111213141516171819202122232425262728 29 30
1 2 3 4 5 6 7 8 9 10111213141516171819202122232425262728 29 30
Column151413121110987654321
A
C
B
D
EF
Time(cycle)
Figure 4.2: The difference between our method and stuffing.
classified stuffing will also be lesser than that in stuffing because the space is used
more compactly.
4.2 Task Classification
For a task t, we call the ratio C(t)/E(t) the Space Utilization Rate (SUR) of t.
Given a set of hardware tasks, they are classified into two types, namely high SUR
tasks with SUR(t) > 1 and low SUR tasks with SUR(t) ≤ 1. The classification of a
task determines where it will be placed in the FPGA as follows. High SUR tasks are
placed starting from the leftmost available columns of the FPGA space, while low
29
Table 4.1: Finding the Best Classification Ratio.SR\ CR 1:4 1:3 1:1 3:2 3:1 4:1 5:1
4:1 TF 667 667 667 665 654 638 640TT 192.1 192.1 192.1 192.1 191.6 191.1 191.1
1:1 TF 471 471 435 441 461 456 459TT 172.7 172.7 171.4 171.9 172.8 172.6 172.7
1:4 TF 427 429 431 435 435 440 443TT 162.3 162.6 162.6 162.7 162.7 162.8 163
SR: set ratio, CR: Classification ratio
SUR tasks are placed starting from the rightmost available columns. This choice of
segregating the tasks and their placement is to reduce the conflicts between the high
and low SUR tasks. A conflict often results in unused FPGA space, which in turn
increases the total execution time. Because the classification scheme does not take
the number of tasks of each type into account, one may wonder if this classification
will not gain much when there is an imbalance between the number of high and low
SUR tasks. However, after through experiments, we found the best results (least
fragmentation, shortest execution time) are always obtained when we divide the task
set considering only their SUR values.
To support our choice of task classification using SUR value and not considering
the number of tasks of each type, we performed several experiments by using different
classification ratio (CR) for a task set. The results are tabulated in Table 4.1, where
we can observe that the best results are usually obtained when the classification ratio
is the same as the set ratio, that is, the threshold for classification can depend only
on the SUR values.
30
4.3 Recording Placement Information
In our placement methodology we need two lists, space list for recording the free
space, and a task list for recording the task location in FPGA after placement naming.
Besides these two lists each task has four attributes, namely width, execution time,
arrival time, and deadline.
space list
task list
Freetime
Leftend
Rightend
Freetime
Leftend .......
Space 1 Space 2
Starttime
Leftend
Rightend
Exe.time
Starttime .......
Task 1 Task 2
Figure 4.3: Structure of space list and task list.
Fig. 4.3 illustrates the structure of space list and task list. For every free space in
the space list we will record three information which are release time for columns of
CLBs, the leftmost and rightmost columns in FPGA of this free space. When placer
places a task into FPGA, task list will store four information about this task. First
is the starting time of task executing on FPGA, following are leftmost column and
rightmost column in FPGA that task occupies. The final information is execution
time of this task. Space list could supply information for placer to find a fit free space
for task, and task list could let placer detect whether there is any conflict among
the tasks. How to find a fit space from space list and how to detect conflicts will be
31
explained in Section 4.4. Now we will explain how to store information into space list
and task list.
FPGAWidth
(Column) Task A
Task B
1
10
Time (cycle)1 2 5 103 4 6 7 8 9 1511 12 13 14
2
3
4
5
6
7
8
9
0 1 8 8 1 10 …..Space list
Space 1 Space 2
0 9 10 8 …..Task list
Task A
Add Task B
0 9 10 8 0 1 6 5 …..Task list
Task A Task B
0 7 8 5 1 8 8 1 10 …..Space list
Space 1 Space 2 Space 3
Figure 4.4: The example of tasks in FPGA to illustrate space list and task list.
In Fig. 4.4 we assume that the FPGA has 10 columns of CLBs. First, the placer
dispatches task A to the FPGA, the task list will store [0,9,10,8] for this task and
space list will change from one free space [0,1,10] to two free spaces [0,1,8,8,1,10].
Next, task B will be placed, the task list adds [0,1,6,5] after [0,9,10,8] and the space
list will change a lot after placement. The first is that the free space [0,1,8] will
change into [0,7,8]. Second, a new free space [5,1,8] must be inserted between [0,7,8]
32
Task with highestpriority
Recognize task
Stand
Find fit space
Flat
Decide placementlocation
Conflictchecking
Place into thisspace
Modify spacelist
No
Yes
Figure 4.5: Placement steps.
and [8,1,10], because free spaces in space list are sorted by their release time, so it
will not be added to the end of the queue.
4.4 Placement Method
For different types of tasks we will provide different placement methods. In Section
4.2 we have explained how to recognize task type, now we will give a detailed descrip-
tion of the placement method for these two types in this section. Fig. 4.5 illustrates
33
the main steps of placement, and we will describe these steps in the following.
1 2 3 4 5 6 7 8 9 101112131415161718192021222324252627282930
ConflictTask E
Task B
TaskA Task
C
TaskD
Task E
1
2
3
4
6
5
7
8
15
14
13
12
11
10
9
Time (Cycle)
Column
Figure 4.6: Example of Conflict.
1. Classify tasks: At this step the placer chooses a highest priority task from the
ready queue scheduled by scheduler and then classifies this task. Section 4.2
described the method for task classification. Knowing the task type we can find
a fit space at the next step.
2. Find a fit space: Base on the width of a task, the placer will find a best fit free
space from the space list. The best fit space means here is meaning that it is
the first release space of all free spaces which have enough columns of CLBs for
the task. If there are more than one spaces releasing at the same time, the high
SUR type task will select the leftmost space of them, the low SUR type task will
choose the rightmost space.
3. Decide placement location in a free space: At the second step, the placer found
34
a best fit space for a task, but this space maybe wider than that required by the
task. In this case, according to the task type we determine which columns of
CLBs of the best fit space will be used. For example, in Fig. 4.7 task A is high
3
4
5
6
7
8
9
10
11
12
13
14
Cycle
Column
Task A
7
2
High SUR
Task B 3
9
10
11121314151617Column
Low SUR
E
C
C
E
Cycle
Figure 4.7: Example of two type tasks place into free space.
SUR type and B is low SUR type. When the placer finds a fit space for task A
from column 3 to column 14, task A will be placed from column 3 to column 9.
If there is a free space from column 10 to column 17 for task B, it will be placed
from column 17 to column 15. Because the high SUR type task will be placed
from the leftmost column and low SUR type will be placed from the rightmost
column.
4. Conflict checking conflict: At this step we already know where a task is placed,
35
but we need to check if any task has been placed into the same columns during
execution time of this task. Fig. 4.6 is a conflict example. The first time the
placer will find the space of column 7 to column 12 and it decides to place at
column 10 to column 12. However, the placer has already decided to place task
D from column 1 to column 10 at the 6th cycle, hence task E will conflict with
task D. If a conflict exists, we must choose another fit space for this task by
going back to Step(2). In Fig. 4.6 the placer will choose a new free space from
column 1 to column 12 at the 8th cycle for placing task E.
5. Modify space list: After a task is placed into a best fit space, not only the
information of this fit space need to be modified, but also some other free spaces
in the space list may be influenced. The example of this step will be given in
Section 4.5.
4.5 Example
In this section we will use figures to illustrate a placement example. In this
example there are eight tasks as shown in Fig. 4.8. One thing we must notice is that
the space list always has one free space at the beginning, that is [1,1,15]. The other
thing we need to explain in our example is how to modify the space list. For example
in Fig. 4.10, before task B has been placed there are two free spaces [1,10,15] and
[4,1,15]. After placing task B as in Fig. 4.11 the space [1,10,15] will be modified to
36
Task Arrival time (cycle)
Width(Column)
Exec. Time (cycle))
A 0 9 3
B 0 2 14
C 1 7 2
D 1 10 5
E 2 3 16
F 4 12 3
G 4 8 4
H 5 2 11
Space List Task List
1 1,1,15 null
Figure 4.8: Example of Placement: Total task information.
[1,10,13], the space [4,1,15] will be [4,1,13], and the new space [15,1,15] will be added.
Thus it can be seen that placing a task will affect some free spaces, and all free spaces
in the space list are sorted by their release time. Below are the detailed illustration of
placement, where we could follow the step to understand our placement methodology.
37
Space List Task List
151413121110987654321
1 2 3 4 5 6 7 8 9 101112131415161718192021222324252627282930
Column
Time (cycle)
A
1 1,1,15 null
Figure 4.9: Example of Placement: (a).
Space List Task List
1 1,1,9,31 1,10,15
2 4,1,15
151413121110987654321
1 2 3 4 5 6 7 8 9 101112131415161718192021222324252627282930
Column
Time (cycle)
A
B
Figure 4.10: Example of Placement: (b).
38
Space List Task List
1 1,10,13
2 4,1,13
3 15,1,15
1 1,1,9,3
2 1,14,15,14
151413121110987654321
1 2 3 4 5 6 7 8 9 101112131415161718192021222324252627282930
Column
Time (cycle)
A
B
C
Figure 4.11: Example of Placement: (c).
Space List Task List
1 1,10,13
2 4,8,13
3 6,1,13
4 15,1,15
1 1,1,9,3
2 1,14,15,14
3 4,1,7,2
151413121110987654321
1 2 3 4 5 6 7 8 9 101112131415161718192021222324252627282930
Column
Time (cycle)
A
B
CD
Figure 4.12: Example of Placement: (d).
39
Space List Task List1 1,10,13
2 4,8,13
3 6,11,13
4 11,1,13
5 15,1,15
1 1,1,9,3
2 1,14,15,14
3 4,1,7,2
4 6,1,10,5
151413121110987654321
1 2 3 4 5 6 7 8 9 101112131415161718192021222324252627282930
Column
Time (cycle)
A
B
CD
E
Figure 4.13: Example of Placement: (e).
Space List Task List1 1,10,13
2 4,8,10
3 11,1,10
4 15,14,15
5 20,1,15
1 1,1,9,3
2 1,14,15,14
3 4,1,7,2
4 4,11,13,16
5 6,1,10,5
151413121110987654321
1 2 3 4 5 6 7 8 9 101112131415161718192021222324252627282930
Column
Time (cycle)
A
B
CD
E
F
Figure 4.14: Example of Placement: (f).
40
Space List Task List1 1,10,13
2 4,8,10
3 11,1,10
4 15,14,15
5 20,13,15
6 23,1,15
1 1,1,9,3
2 1,14,15,14
3 4,1,7,2
4 4,11,13,16
5 6,1,10,5
6 20,1,12,3
151413121110987654321
1 2 3 4 5 6 7 8 9 101112131415161718192021222324252627282930
Column
Time (cycle)
A
B
CD
E
F
G
Figure 4.15: Example of Placement: (g).
Space List Task List1 1,10,13
2 4,8,10
3 11,9,10
4 15,1,10
5 15,14,15
6 20,13,15
7 23,1,15
1 1,1,9,3
2 1,14,15,14
3 4,1,7,2
4 4,11,13,16
5 6,1,10,5
6 11,1,8,4
7 20,1,12,3
151413121110987654321
1 2 3 4 5 6 7 8 9 101112131415161718192021222324252627282930
Column
Time (cycle)
A
B
CD
E
F
G
H
Figure 4.16: Example of Placement: (h).
41
Space List Task List
1 1,10,13
2 4,8,10
3 11,9,10
4 15,1,10
6 23,1,13
7 26,1,15
5 20,13,13
1 1,1,9,3
2 1,14,15,14
3 4,1,7,2
4 4,11,13,16
5 6,1,10,5
6 11,1,8,4
7 15,14,15,11
8 20,1,12,3
Figure 4.17: Example of Placement: Final information of two list.
42
Chapter 5
EXPERIMENT RESULTS
5.1 Experiment with Random Data
To evaluate the advantages of the classified stuffing method compared with the
original stuffing technique [11], we experimented with 150 sets of randomly generated
hardware tasks, fifty of which had 50 tasks, fifty of which had 200 tasks, and the
other fifty had 500 tasks. The input task sets were also characterized by different set
ratios, where the set ratio of a task set is defined as |S>1| : |S≤1|, where S>1 = {t |
SUR(t) > 1}, S≤1 = {t | SUR(t) ≤ 1} so as to find out the kind of task sets for
which our classified stuffing works better than stuffing. We experimented with three
sets of tasks with set ratios 4:1, 1:1, and 1:4. The experiments were conducted on a
P4 1.6 GHz PC with 512 MB RAM.
In the two-dimensional area of FPGA space versus execution time as illustrated in
43
Table 5.1: Experimental Results and Comparison
Task Set Classified Stuffing S&P Time(S) Original Stuffing (reduction) (ms)
|S| SR TC TF TT AF TR TF TT AF TR
4:1 1,711 505 105 4.8 3 477 104 4.6 3 3(−5.5%) (−1%) (−4.2%)
1:1 1,707 405 100 4 5 380 99 3.8 4 250 (−6.2%) (−1%) (−5%)
1:4 1550 418 93 4.5 2 392 92 4.3 2 2(−6.2%) (−1%) (−4.4%)
4:1 6,780 1237 381 3.25 3 1119 376 2.97 3 13(−9.5%) (−1.3%) (−8.6%)
1:1 6,478 887 350 2.53 1 760 344 2.21 1 20200 (−14.3%) (−1.7%) (−12.6%)
1:4 6,133 823 330 2.49 0 739 326 2.27 0 27(−10.2%) (−1.2%) (−8.8%)
4:1 17,145 2246 922 2.43 5 2007 911 2.20 5 47(−10.6%) (−1.2%) (−9.5%)
1:1 16,080 1619 842 1.92 9 1242 825 1.50 9 77500 (−23.3%) (−2.0%) (−21.9%)
1:4 15,077 1341 782 1.71 5 1185 775 1.53 5 98(−11.6%) (−0.9%) (−10.5%)
SR: set ratio, TC: total column-cycles, TF : total fragmentation, TT : total time,
AF : average fragmentation (TF/TT ), TR: #tasks rejected, S&P: Scheduling and Placement
SR = |S>1| : |S≤1|, S>1 = {t | SUR(t) > 1}|, S≤1 = |{t | SUR(t) ≤ 1} called Set Ratio
Fig. 4.2 each unit of placement is a column-cycle, where the space unit is a column
and the time unit is a cycle. Table 5.1 shows the results of applying our method and
the stuffing method to the same set of tasks, where the total column-cycles is the
amount of column-cycles used by all tasks, the total fragmentation is the number of
column-cycle not utilized, the total time is the total number of cycles for executing
all tasks, the average fragmentation is the quotient of total fragmentation by total
time, and the number of rejected tasks is the number of tasks whose deadlines could
not be satisfied. For each task set, the data in Table 5.1 are the overall averages of
44
1 2 3 4 5 6 7 8 9 101112131415161718192021222324252627282930
Colum
n
Time (cycle)
151413121110987654321
A
B
C DE
F
GH
Figure 5.1: The placement example with ratio 4:1.
the results obtained from experimenting with 50 different sets of randomly generated
tasks.
From Table 5.1, we can observe that for both the methods and for all the sets of
tasks the set ratio of 1:4 results in a shorter schedule and more compact placement
as evidenced by the smaller fragmentation, which means that it is desirable to have
hardware tasks synthesized into tasks with low SUR. The intuition here is that a
greater number of high SUR tasks would require wider FPGA spaces resulting in
larger fragmentation and thus longer schedules. Fig. 5.1 depicts is an example of a
task set with set ratio as 4:1, where we observe that there exists more tasks which
occupy wider spaces of FPGA than other set ratios. Because the width of FPGA is
fixed, when a wide task is placed there will remain limited spaces which are hard to
45
1 2 3 4 5 6 7 8 9 101112131415161718192021222324252627282930
Colum
n
Time (cycle)
151413121110987654321
C
G
A
D
F
B
E
Figure 5.2: The fragmentation with ratio 1:4.
utilize unless there is a narrow task like task B in Fig. 5.1. The light grey area from
1th cycle to 20th cycle in the Fig. 5.1 is the fragmentation. Fig. 5.2 shows a task set
with set ratio as 1:4. It is easier to utilize the remaining space after placing some
narrow tasks. There will exist more space that can be utilized when the set ratio is
1:4.
From Table 5.1, we can observe that for all the cases our method performs better
than stuffing in terms of both lesser amount of fragmentation and lesser number of
cycles. The total fragmentation is reduced by 5.5% to 23.3% and the total execution
time is reduced by 0.9% to 2.0%. The reduction becomes more prominent in the case
with set ratio 1:1 because there is an increased interference between the placements
of the low SUR tasks and the high SUR tasks resulting in a larger fragmentation
46
Figure 5.3: Result of our method and stuffing.
and longer schedules in the original stuffing method. These kinds of interferences are
diminished in classified stuffing. Comparing the task sets with different cardinalities,
namely 50, 200 and 500, one can observe that the larger the task set is the better
is the performance of our method. This shows that our method scales better than
stuffing to more complex reconfigurable hardware systems. Fig. 5.3 gives the total
fragmentation results with different set ratios and amount of tasks.
As far as the performance of the scheduler and placer is concerned, the time taken
by the classified stuffing method and the original stuffing method are almost the
same. This is because the added classification technique does not also expend any
significant amount of time, while it does improve the FPGA space utilization and
total task execution time.
47
Table 5.2: The Attributes of Six Real Tasks.Huffman Lossless Lossless MPEG4
Attribute \ Task Type DES 3-DES JPEG JPEG JPEG EncoderDecoder Decoder Encoder
Columns 1 3 4 13 10 12Execution time 7 8 4 3 1 2(10−8 second)
SUR 0.14 0.38 1 4.33 10 6
Table 5.3: Three Case with 50 Real Tasks.Huffman Lossless Lossless MPEG4
Task Type DES 3-DES JPEG JPEG JPEG EncoderDecoder Decoder Encoder
SR \ SUR 0.14 0.38 1 4.33 10 6Case 1 14:11 4 9 15 10 5 7Case 2 1:1 13 7 5 4 9 12Case 3 3:2 11 3 16 5 5 10
5.2 Experiment with Real Data
We use six types of real tasks to experiment in this section. Table 5.2 shows
the six types of tasks and their required columns and execution time when they
are configured on an FPGA. There are fifty tasks to schedule and placement. The
columns and execution times of tasks are fixed, the arrival time and deadline will
be randomly generated for each task. We experiment with three cases where the
numbers of each kind of tasks are different as shown in Table 5.3. Table 5.4 shows
the results of applying our method and the stuffing method to these three cases.
The total fragmentation is reduced by 1.5% to 7.4% and the total execution time is
reduced by 0% to 1.8%.
48
Table 5.4: Experimental Results and Comparison with Real TasksTask Set Classified Stuffing S&P Time
(S) Original Stuffing (reduction) (ms)Case TC TF TT AF TR TF TT AF TR
1,092 298 66 4.5 0 276 65 4.2 0 21 (−7.4%) (−1.5%) (−6.7%)
873 334 57 5.9 0 329 57 5.8 0 32 (−1.5%) (0%) (−1.7%)
890 291 56 5.2 0 279 55 5.1 0 23 (−4.1%) (−1.8%) (−1.9%)
49
Chapter 6
CONCLUSIONS
An improved classified stuffing technique proposed in this work showed signifi-
cant benefits in both a shorter schedule and a more compact placement. This was
demonstrated through a large number of task sets. The current method focuses only
on hardware task scheduling. In the future we plan to investigate hardware-software
task co-scheduling.
50
Bibliography
[1] N. Bansal and M. Harchol-Balter. Analysis of SRPT scheduling: investigating unfairness. In
Proc. of the ACM SIGMETRICS International Conference on Measurement and Modeling of
Computer Systems, pages 279-290. ACM Press, 2001.
[2] J.S. Jean, K. Tomko, V. Yavagal, J. Shah, , and R. Cook. Dynamic reconfiguration to support
concurrent applications. IEEE Transactions on Computers, 48(6):591–602, June 1999.
[3] J. Leung. A new algorithm for scheduling periodic, real-time tasks. algorithmica, 4:209-219.
1989.
[4] B. Mei, P. Schaumont, and S. Vernalde. A hardware-software partitioning and scheduling
algorithm for dynamically reconfigurable embedded systems. In Proc. of the 11th ProRISC
Workshop on Circuits, Systems and Signal Processing Veldhoven, November 2000.
[5] J.-Y. Mignolet, S. Vernalde, D. Verkest, and R. Lauwereins. Enabling hardware-software multi-
tasking on a reconfigurable computing platform for networked portable multimedia appliances.
In Proc. of the International Conference on Engineering of Reconfigurable Systems and Algo-
rithms (ERSA), pages 116-122, 2002.
[6] J. Noguera and R. M. Badia. A hw/sw partitioning algorithm for dynamically reconfigurable
51
architectures. In Proc. of the Design Automation and Test, Europe (DATE), pages 729-734,
2001.
[7] J. Noguera and R. M. Badia. Dynamic run-time hw/sw scheduling techniques for reconfigurable
architectures. In Proc. of the 17th International Conference on Hardware Software Codesign
(CODES), pages 205-210, May 2002.
[8] J. Noguera and R. M. Badia. System-level power-performance trade-offs in task scheduling
for dynamically reconfigurable architectures. In Proc. of the 2003 International Conference
on Compilers, Architectures and Synthesis for Embedded Systems, ACM Press, pages 73-83,
October 2003.
[9] V. Nollet, P. Coene, D. Verkest, S. Vernalde, and R. Lauwereins. Designing an operating
system for a heterogeneous reconfigurable SoC. In Proc. of the 17th International Symposium
on Parallel and Distributed Processing, pages 174, April 2003.
[10] J. Resano, D. Mozos, D. Verkest, F. Catthoor, and S. Vernalde. Specific scheduling support to
minimize the reconfiguration overhead of dynamically reconfigurable hardware. In Proc. of the
41st Annual Design Automation Conference, pages 119-124, San Diego, California, USA, June
2004.
[11] C. Steiger, H. Walder, and M. Platzner. Operating systems for reconfigurable embedded plat-
forms: Online scheduling of real-time tasks. IEEE Transactions on Computers, 53(11):1392–
1407, November 2004.
[12] H. Walder and M. Platzner. Non-preemptive multitasking on FPGAs: Task placement and
footprint transform. In Proc. of the International Conference on Engineering of Reconfigurable
Systems and Algorithms (ERSA), pages 24-30, 2002.
52
[13] H. Walder and M. Platzner. Reconfigurable hardware operating systems: From design concepts
to realizations. In Proc. of the 3rd International Conference on Engineering of Reconfigurable
Systems and Architectures (ERSA’03), pages 284-287, Las Vegas (NV), USA, June 2002.
[14] H. Walder and M. Platzner. Online scheduling for block-partitioned reconfigurable devices.
In Proc. of the Design Automation and Test, Europe (DATE) Volume 1, pages 10290-10295,
March 2003.
[15] H. Walder and M. Platzner. Reconfigurable hardware OS prototype. In Swiss Federal Institute
of Technology Zurich (ETH), Computer Engineering and Networks Laboratory, TIK Report Nr.
168 Zurich, Switzerland, pages 8, April 2003.
[16] G. Wigley and D. Kearney. The development of an operating system for reconfigurable com-
puting. In Proc. of the IEEE Symposium on Field-Programmable Custom Computing Machines
(FCCM’01), April 2001.
[17] G. Wigley and D. Kearney. Research issues in operating systems for reconfigurable computing.
In Proc. of the 2nd International Conference on Engineering of Reconfigurable Systems and
Algorithms (ERSA’02), pages 10-16, June 2002.
[18] P. Yang, C. Wong, P. Marchal, F. Catthoor, D. Desmet, D. Verkest, and R. Lauwereins. Energy-
aware runtime scheduling for embedded-multiprocessors SoCs. IEEE Journal on Design and
Test of Computers, pages 46–58, 2001.
53