10

Software synthesis for real-time information processing systems

Embed Size (px)

Citation preview

ACM SIGPLAN Workshop on Languages, Compilers and Tools for Real-Time Systems, La Jolla, California, June 1995.Software Synthesis for Real-Time Information Processing Systems�Filip Thoen, Marco Corneroy, Gert Goossens and Hugo De ManzIMECKapeldreef 75B-3001 Leuven, [email protected] currently with SGS-Thomson Microelectronics, Crolles, Francez Professor at the Katholieke Universiteit LeuvenAbstractSoftware synthesis is a new approach which focuses onthe support of embedded systems without the use ofoperating-systems. Compared to traditional design prac-tices, a better utilization of the available time and hard-ware resources can be achieved with software synthesis,because the static information provided by the systemspeci�cation is fully exploited and a application speci�csolution is automatically generated.In this paper on-going research on a software synthesisapproach for real-time information processing systemsis presented which starts from a concurrent process sys-tem speci�cation and tries to automate the mapping ofthis description to a single processor. An internal rep-resentation model which is well suited for the supportof concurrency and timing constraints is proposed, to-gether with exible execution models for multi-taskingwith real-time constraints. The method is illustratedon a personal terminal receiver demodulator for mobilesatellite communication.1 IntroductionThe target application domain of our approach is ad-vanced real-time information processing systems, suchas consumer electronics and personal communicationsystems [6]. The distinctive characteristic of these sys-tems is the coexistence of two di�erent types of func-tionalities, namely digital signal processing and controlfunctions, which require the support for di�erent kind oftiming constraints. Speci�cally, signal processing func-tions operate on sampled data streams, and are subjectto the real-time constraint derived from the requiredsample frequency or throughput, as determined by theenvironment. Control procedures vary in nature from� This work was supported by the European Commission,under contract Esprit-9138 (Chips)

having to be executed as soon as possible (like e.g. aman-machine interface), but an eventual delay in theirexecution does not usually compromise the integrity ofthe entire system (soft deadline), to having very strin-gent timing constraints, like e.g. a critical feedback con-trol loop (hard deadline).Traditionally, real-time kernels, i.e. specialized operat-ing systems, are used for software support in the designof embedded systems [16][17][19]. Most of these kernelsare stripped-down versions of traditional time-sharingoperating-system, made appropriate for the real-timedomain by reducing run-time overhead. These smallkernels, often with limited functionality, are in the �rstplace designed to be fast by ensuring a fast contextswitch, quick response to interrupts and a minimal in-terrupt disable time.Above all, real-time kernels provide the run-time sup-port for real-time multi-tasking to perform softwarescheduling, and the required primitives for inter-processcommunication and synchronization, and for accessingthe hardware resources. Since processes are consideredas black boxes, most kernels apply a coarse grain, blackbox model for process scheduling. Most kernels tend touse a �xed priority preemptive scheduling mechanism,where process priorities have to be used to mimic thetiming constraints. Alternatively, traditional schedul-ing approaches use timing constraints, speci�ed as pro-cess period, release time and deadline [20]. From thedesigner viewpoint however, these constraints are morenaturally speci�ed with respect to the occurrence of ob-servable events. Moreover, the scheduler has no knowl-edge about the time stamps when the events are gen-erated by the processes, and consequently can not ex-ploit this. Assignment of the process priorities, as in thecase of the �xed priority scheduling scheme, is a manualtask to be performed without any tool support. Typi-cally, an iterative and error-prone design cycle, with alot of code and priority tuning, is required. Not only is

this approach in exible and time consuming, but it alsorestricts the proof of correctness to the selected stim-uli. Additionally, the behavior of the scheduler underpeak load conditions is hard to predict, resulting oftenin under-utilized systems to stay on the safe side. It issafer to guarantee timeliness pre-runtime, as new fam-ily of kernels tend to attain [5][18]. Moreover, kernelstrade optimality for generality, causing them to be as-sociated with run-time and memory overhead.Software synthesis [1][2][9, 10] is an alternative approachto real-time kernels: starting from a system speci�ca-tion, typically composed of concurrent communicatingprocesses with timing constraints, the aim of softwaresynthesis is the automatic generation of the source codewhich realizes 1) the speci�ed functionalities while satis-fying the timing constraints and 2) the typical run-timesupport required for real-time systems, such as multi-tasking, and the primitives for process communicationand synchronization. Compared to real-time kernels, abetter utilization of the available time and hardware re-sources can be achieved with software synthesis, becausethe static information provided by the system speci�ca-tion is fully exploited; as a consequence the automati-cally generated run-time support is customized for anddedicated to each particular application, and does notneed to be general, as in the case of real-time kernels.Moreover, an accurate static analysis provides an earlyfeedback to the designer on the feasibility of the inputspeci�cations. In this way the trial and error designcycle including a lot of manual code tuning typical forreal-time kernels is avoided, and satisfaction of the tim-ing constraints can be guaranteed automatically. Be-sides, the transformations and optimizations envisionedin the software synthesis approach, try to automate thiscode tuning. Finally, since the output of software syn-thesis is source code, portability can be easily achievedby means of a retargetable compiler [7].The software synthesis approach in the VULCANframework [9, 10] allows to specify latency and rate tim-ing constraints. Program threads are extracted from thesystem speci�cation, in order to isolate operations withan unknown timing delay. A simple non-preemptive,control-FIFO based run-time scheduler alternates theirexecution, but provides only a restricted support forsatisfying these constraints, since threads in the FIFOare executed as they are put and are not reordered.Moverover, interrupts are not supported, due to thechoice of the non-preemptive scheduler.The approach taken in the CHINOOK system su�ersfrom a similar restriction: although preemption is al-lowed based on the watchdog paradigm, resuming at thepreemption point is di�cult, and hence interrupts arenot supported. The system is targetted towards reactive

control systems, and only supports timing constraintson state transitions and on latency between operations.No rate constraints are supported, as is typical for DSPapplications.The rest of this paper is structured as follows. Section 2introduces the system representation and the conceptsused. In section 3, two di�erent execution models andthe di�erent steps of a possible implementation scriptare discussed. A real-life illustration of the approach isthe subject of section 4. Finally, section 5 draws someconclusions.2 System Representation - Con-cepts and ModelWe assume that the target application can be mod-eled in a concurrent process description, which capturesoperation behavior, data dependencies between opera-tions, concurrency and communication [11][12][13]. Theprecise semantics of such a speci�cation are beyond thescope of this paper. From this speci�cation, a constraintgraph can be derived that contains su�cient informationfor the software synthesis problem, as will be introducedbelow.We de�ne a program thread as a linearized set ofoperations which may or may not start with a non-deterministic (ND) time delay operation [10]. Exam-ples of ND-operations are synchronization with inter-nal and external events, wait for communication andunbounded loops. The purpose of extracting programthreads from the concurrent process input speci�cationis to isolate all the uncertainties related to the execu-tion delay of a given program at the beginning of theprogram threads. Program threads, which can be exe-cuted using a single thread of control (as present in mostcontemporary processors), have the property that theirexecution latency can be computed statically. Besidesbeing de�ned by the ND-operations, program threadscan also capture concurrency and multi-rate transitions,often occuring in DSP applications. A new representa-tion model, based on constraint graphs [14], is thenbuilt up from the extracted threads. This model allowsa static analysis, both of the imposed timing constraintsand of the thread scheduling. The vertexes representprogram threads and the edges capture the data depen-dency, control precedence and the timing constraints be-tween threads. Speci�cally, let �(vi) the execution delayof the thread represented by vertex vi; a forward edgeei;j with weightwi;j = �(vi), represents a minimumtim-ing constraint between vi and vj , i.e. the requirementthat the start time of vj must occur at least wi;j unitsof time later than vi. Similarly, amaximum timing con-

v5

v9

v8v7

v6

v2v1

event1event2

v4v3

v2v1

event1

v5

event2

v4v3

v7

v6

art. event1

v9

v8

art. event2

Artificial events :

art. event1

art. event2

= end(v2) && end(v3)

= end(v5) && end(v6)

minimum

maximum

Timing Constraints

Frameclustering

v10

event3

event4

v12

v11

v10

event3

event4

v12

v11

(a) (b)

frame1frame2

frame3

frame4

frame5

frame6

Figure 1: Example of a Constraint Graphstraint between two threads vi and vj is indicated as abackward edge with negative weight wi;j, representingthe requirement that the end time of vi must occur nolater than j wi;j j units of time later than the end timeof vj. Finally, ND-operations are represented by sepa-rate event nodes. An example is given in �gure 1 (a).Our model di�ers from [14] in the abstraction level of aCG node: in our approach a CG node isolates a group ofoperations which corresponds to static program parts,while in [14] individual operations are the CG entities.Moreover, in [14] a CG is restricted to being a singleconnected graph, not able to capture process concur-rency. This restriction is lifted in our approach andinternal events are introduced to synchronize betweenconcurrent graphs capturing process concurrency. Ad-ditionally, multi-rate, typical for DSP applications, issupported in our model by placing relative executionrate numbers on the control edges.By de�nition, all the uncertainties related to the timingbehavior of a system speci�cation are captured by eventnodes. Since the arrival time of an event is unknown atcompile time, event nodes limit the extent of analysisand synthesis which can be performed statically.In a second step, threads are clustered into so-calledthread frames (�gure 1 (b)). The purpose of identi-fying thread frames is to partition the initial constraintgraphs into disjoint clusters of threads triggered by asingle event, so that static analysis and synthesis (e.g.scheduling) can be performed for each cluster relativelyto the associated event. Remark that sequence edge(s)can exist between frames according to the original sys-tem speci�cation (e.g. in �gure 1 (b) a control edgebetween frame3 and frame4 enforces sequence in theexecution of the frames).

Similarly to the anchor sets de�ned for constraintgraphs in [14], the event set E(vi) of a node vi is de-�ned as the set of event nodes which are predecessors ofvi. Arti�cial events are introduced for threads with anevent set which contains at least two elements betweenwhich there does not exist a path in the graph. This in-troduction must be done without changing the originaltiming semantics. These introduced events are in factinternal events and must be observed and taken care ofby the execution model.The execution model (see next section) will take care ofthe activation at run-time of the di�erent thread framesaccording to the original speci�cation while taking intoaccount the sequence of occurred events and the im-posed timing constraints. In this way the unknown de-lay in executing a program thread appears as a delay inscheduling the program thread, and is not considered aspart of the thread latency.3 Implementation - ExecutionModels and ScriptIn this section the implementation, i.e. the mapping ofthe representation model to the single thread of controlof the target processor is described. First, the executionmodels are explained.3.1 Execution ModelsBlocking Model - Cyclic Executive Combinedwith Interrupt Routines A simple, but cost ef-fective solution for the run-time thread frame activa-tion consists of using a simple event loop in backgroundcombined with tying di�erent thread frames to pro-

cessor interrupts. The event loop in the backgroundpolls in a round-robin fashion the occurrence of theevents triggering the di�erent thread frames and ac-cordingly starts executing the appropriate frames se-quentially. The processor interrupts present a cheapway, supported by hardware, to asynchronously startup thread frames which suspend the currently execut-ing frame in the event loop or another frame started upby an interrupt. The processor interrupt masking andpriority levels can be used to selectively allow interrup-tion of time critical thread frame sections and to favorhigh priority frames.Only frames which are triggered by event correspond-ing to interrupts (either the external hardware or theinternal peripheral interrupts) can be started up asyn-chronously, while the other frames are to be placed inthe background event loop. Moreover, once a back-ground frame is started up by the event loop, it willblock the processor till the end of its execution not al-lowing other frames in the event loop to be started up.Hence, the name "blocking execution model". The ex-ecution length of the frames limits the response time ofevents in the event loop, and therefore limits the scopeof this model.The execution model presented next is more general andnot subject to these limitations, but is not that cost ef-fective in terms of overhead.Non-blockingModel using a Run-time SchedulerFigure 2 (a) outlines the execution model which takesa two-level scheduling approach. Static scheduling (i.e.at compile-time) is performed after thread clusteringto determine a relative ordering of threads within eachthread frame, and by assumption this ordering is notchanged anymore in the dynamic scheduling phase. Atrun-time, a small preemptive and time-driven schedulertakes care of the composition and interleaving of thedi�erent thread frames according to the system evolu-tion and the timing constraints. Additionally, the run-time scheduler tries to avoid active waiting by schedul-ing other frames when the next frame is not ready to beexecuted, in this way maximizing processor utilization(see e.g. �gure 1 (b): when frame3 has �nished execu-tion and ev4 has not occurred yet, instead of activelywaiting for ev4, other frames can be executed mean-while).This run-time behavior is illustrated in the lower partof �gure 2 (a): starting from an idle state, suppose thatevent1 occurs; this event activates the run-time sched-uler, and since no other frames are currently active, thethreads of the �rst frame are executed with the orderdetermined previously with static scheduling (order 1-3-2). Occurrence of event2, while executing thread3 ofthe �rst frame, causes the following actions: 1) thread3is interrupted; 2) the run-time scheduler is invoked for

Run-Time Executive

SOFTWARE SYNTHESIS

EXECUTION MODELSELECTION

THREADTRANSFORMATIONS

THREAD FRAMECLUSTERING

STATIC FRAMESCHEDULING

BUFFER ALLOCATION

TIMING ANALYSIS

Machine Code

Concurrent ProcessSpecification

Code Generation

LINKING

CODE GENERATION

ConstraintGraph

THREAD EXTRACTIONLATENCYESTIMATION

LATENCYESTIMATION

scheduled thread FRAMES

interrupt assignmentexternal events

Figure 3: Possible software synthesis scriptdetermining the subsequent execution order (in the ex-ample: A, rest of thread3, B, 2) 3) execution proceedswith the newly determined thread ordering. It is im-portant to note that the relative ordering between thethreads of the same frames, determined at compile timeby the static scheduler, is not changed by the run-timescheduler allowing an e�cient implementation of therun-time scheduler, which must be necessarily very fast.The scheduling metric used by the run-time scheduleris the frame slack time. This information is derivedstatically based on the imposed timing constraints andon the relative thread ordering within each frame. Theframe slack indicates the amount of time the end of anindividual thread in a thread frame can be postponed,relatively to its static schedule, before violating a tim-ing constraint. As illustrated in �gure 2 (b), the frameslack is de�ned as the minimum of all thread slacks, i.e.the remaining time between the end of the thread andits timing constraint, of all the succeeding threads inthe static schedule. The frame slack derived at com-pile time, is used and updated at run-time. The staticscheduling algorithm can be driven by optimizing thisframe slack, because it allows more frame interleavingat run-time. For a more formally description of thismodel, we refer to [3].3.2 ScriptFigure 3 gives an overview of the proposed approach.From the concurrent process speci�cation, the dif-ferent program threads are extracted and the non-deterministic timing delay is isolated in event nodes.During this step, a code generator can provide a staticestimate of the execution time for the di�erent threads,

1

3

2 A

B

Static Scheduling Static Scheduling

1 3 2 A B+ +

(2) at run-time :

(1) at compile-time :

event event

run-timescheduler

timeevent1

1 3 2

event2

A B3

(a) (b)

frame slack A = min { } = { }

frame slack B = min { } = { }

Tslack(A), Tslack(B)

Tslack(B) λBdB-

time

dB

A

B

λA

dA

λB

dA λA- , λBdB-

21

frame 2frame

1

Figure 2: The run-time execution model (a) and the frame slack scheduling metric (b)possible because the threads in themselves do not con-tain any non-determinism anymore. These executiontimes are placed together with the timing constraints ina constraint graph, the abstraction model used in thesequel of the approach.After selection of one of the two execution models ex-plained above, alternatively both models could be vis-ited in order to evaluate the overhead penalty, �vetasks, which are phase coupled, have to be performed.The outcome of software synthesis are scheduled threadframes and possibly (depending on the execution modelchosen) a small run-time executive, which activates theappropriate frames at run-time. The assignment of ex-ternal processor interrupts to event nodes in the con-straint graph, which is determined by the context ofthe system, is to be provided by the user. Thread frameclustering tries to cluster the constraint graph into dis-jointed groups of threads which are triggered by thesame event set. These thread frames are to be activatedat run-time by the selected execution model. Sinceevents introduce an overhead during frame scheduling,we also want to minimize the number of clusters, with-out violating timing constraints. Static frame schedul-ing will determine at compile-time the relative order ofthe threads inside each of the identi�ed thread frames.Occasionally, the timing constraints can not be met bythe identi�ed frames. In this case, a transformation stepon the threads or the frames can resolve the problem orcan provide a more optimal solution in general. As anexample of these transformations, the illustration in sec-tion 4 will give more details. One transformation thatwill be illustrated is the frame cutting transformation:in case of the cyclic executive based execution model,frames can collect a total execution time which, whenplaced in the background event loop, block the proces-

sor for a too long time, con icting with response timeson events triggering other frames in the event loop.Bu�er allocation will insert the required bu�ers in be-tween communicating thread frames, by deriving thebu�er sizes from the execution rates of the threadframes. Timing analysis is used in di�erent phases :once upon entry of the tool, to check for consistency ofthe user speci�ed timing constraints, and subsequentlyduring the execution of the tool, to verify whether a re-sult of a synthesis task still satis�es all constraints.The output of the method are scheduled frames, which,after compilation with the code generator, have to belinked and downloaded with the compiled run-time ex-ecutive according to the chosen execution model.4 Illustration of the ApproachSystem Description - Concurrent Communicat-ing Process Speci�cation Figure 4 outlines graph-ically the process speci�cation of a mobile terminal re-ceiver demodulator to be used in the MSBN satellitecommunication network [4][15]. This network allows abi-directional data and voice communication in a starnetwork consisting of a �xed earth station and multi-ple mobile stations. Two di�erent data channels, calledpilot and tra�c channel, are sent over on the sametransmission carrier using the CDMA technique [8], i.e.correlating the channels with orthogonal pseudo-noisecodes enabling them to use the same frequency spec-trum without interference. The former channel carriesnetwork system information (e.g. average channel biterror rate), the latter carries the actual user data. Ac-quisition and tracking of the transmission carrier is per-formed by the demodulator on the pilot channel in co-operation with an intelligent antenna. A prototype of

write_antenna

to PC(via serial2 in FPGA)

freq

pilotmanage_data(format)

to antennato LEDs (FPGA)

corr_symbols

traffic_demodulate

track_pilot+ demodulate

to 2nd DSP proc.(mem. mapped)

trafficmanage_data

display_LEDS

Viterbi

RS_decode

DMA process (H/W)

STOP

pilot DSP function

compose packet

to NCOserialperipheral

start DMA

setup DMA

setup DMA

serial_periph

DMA_periph

symbol_clock

traffic_status

send_vocoder

16

10241

16

6464

read_decorr 4

8

4

4

1

3

DMA_interrupt

from decorrelator

pilot_status

control flow

data flow

peripheral process

wait for data

I/O port

parameter

rate conversion

read 8 bytes

modify parameters

from PC (mem. mapped)

read_sys_cmd

PC Cmdread

resetdemodulator()

from antenna

READY (ACQU)TRACK

set_antenna_parameters

read_antenna

Antenna Cmd

cmd=1 cmd=2

Figure 4: Concurrent Process Speci�cation of the MSBN demodulatorread

_decorr

demod_pilot+ format_data

LEDsanten-na

pilot DSP functions+ setup DMA

periodic : 3.4 kHz

manage_data+ send_vocoder

1 1024

1

3

1(3.4 kHz)

<

sporadic (MMI-interface)

< sec’s

(a) (b)

read_sys_cmd

read_antenna

read_decorr

demod_pilot+ format_data

demod_traffic

pilot DSP functions+ setup DMA

manage_data+ send_vocoder

1 1024

1

3

frame cutting

rate matching

1(3.4 kHz)

<

read_antenna

high priority:

1(3.4 kHz)

<<

read_sys_cmd

wait for event

timing constraint

FRAME_read_sys_cmd

FRAME_pilot_DSP_functions

FRAME_read_antennaFRAME_read_sys_cmd

demod_traffic

LEDsanten-na

Figure 5: The constraint graph of the MSBN demodulator before (a) and after frame clustering (b)

this system was build using a mixture of FPGA's, ded-icated logic and multiple DSP processors. Figure 4 re-stricts itself to the functionality to be mapped on onespeci�c DSP processor.Triggered by an external interrupt, the read decorrprocess reads periodically (at a rate of 3.4 kHz) thedecorrelator FPGA which is connected in a mem-ory mapped way to the processor. This processsends data to the track pilot&demodulate and thetraffic demodulate processes, which perform thetracking of the transmission carrier and the demodu-lation (i.e. gain correction and carrier and bit phasecorrection) of both the pilot and tra�c channel. Thedemodulated tra�c data is, after a 1:3 rate conversion(i.e. the �rst process must be executed three times be-fore the second one can be executed, the production andconsumption rates are indicated by numbers on the data ow edges) formatted by the traffic manage data pro-cess and via the send vocoder process transmitted toa second, memory mapped processor. In contrast, thedemodulated pilot data will be further processed on thesame processor.The track pilot&demodulate process not only deliv-ers its demodulated data to the pilot manage dataprocess, it steers the frequency of the NCO (numeri-cal controlled oscillator) in the preceeding analog de-modulation part through use of the on-chip serial pe-ripheral. Moreover, together with traffic demodulateprocess it sends information concerning transmissioncarrier synchronization to the display LEDs processand write antenna process. The channel decod-ing of demodulated pilot data is carried out by thepilot DSP functions process, which operates on a1024 element frame basis, so a multi-rate transmission ispresent between the pilot manage data and this latterprocess. The output data of the pilot channel decod-ing is sent to a PC computer using the on-chip DMA(direct memory access) engine. The setup DMA pro-cess is triggered when output data is available from thepilot DSP functions process and sets up and startsthe DMA process.Asynchronously with this chain of periodic processes,the read sys cmd and read antenna process controlthe internal parameters of the demodulation processes.They respectively perform the man-machine interfaceconnected to the system using a memory mapped ag,allowing the user to alter the system operating parame-ters, and the interface with the antenna controller whichis connected via an external interrupt. The former is asporadic process, since a user will adapt the parametersonly once in a while, and is allowed to have a large re-sponse time. The latter is a time-critical process: whenthe antenna controller looses the beam, it will signalthis immediately to the demodulator, which must takespecial re-tracking actions.

Most of processor time is spent on the demodulationand pilot DSP function processes.Constraint Graph Representation Figure 5 (a)outlines the constraint graph (CG) for the demodula-tor capturing the threads, their dependency and theirtiming constraints. For reasons of clarity, the threadexecution times are not indicated.Three event nodes are introduced to capture the timinguncertainty of the periodic interrupt from the decor-relator, the interrupt from the antenna controller andthe setting of the polling ag of the man-machine in-terface. Remark that the event nodes make abstractionfrom whether they are implemented as an interrupt oras a polling loop. In this application, program threadsare not only de�ned by the presence of ND-operations,but also to represent concurrency (e.g. betweenthe demod pilot+format data and demod trafficthreads) and the rate conversion (e.g. betweenthe demod traffic and manage data+send vocoderthreads). Some processes in the original user speci�ca-tion have been combined into one program thread (e.g.manage data+send vocoder).Timing constraints are added as backward edges, e.g.the edge from the manage data+send vocoder to the(periodic) event node expresses that the end of thatthread must be executed before the start of the occur-rence of the next event, and thus the next period.Thread Frame Clustering and TransformationIn �rst instance, the cyclic executive based executionmodel was tried, which proved satisfactory for this ap-plication. The CG after thread frame clustering andtransformation is depicted in �gure 5 (b). Four dif-ferent thread frames are identi�ed, three according tothe original events and one by an arti�cial event intro-duced by the frame cutting transformation. Althoughthe event set of the pilot DSP functions+setup DMAthread is the same as e.g. demod pilot+format datathread, it had to be placed in a separate frame becauseof timing constraints: the execution time of the frametriggered by the decorrelator event in its 1024th exe-cution (according to the relative rate of 1:1024 of thelast thread) would become longer than the period of theperiodic event and thus con icts with the timing con-straints. Cutting the pilot DSP functions+setup DMAframe o� and introducing an arti�cial event whichchecks for the 1024th execution of preceeding frame,will allow the execution model to overlap the1024 executions of the FRAME demodulate with theFRAME pilot DSP functions.Another transformation, called rate matching is appliedon the manage data send vocoder thread: by insert-ing a rate counter which checks for the relative eventoccurrence, and based on this causes a conditional ex-

(b)

int1 int1

DMA

serial port (processor)

int1 int3

write NCO

read antenna

interrupts UNmasked

DMAint

1024 activations of demodulate()

1024...1 1 ...

t

t

...11 2 2

start of pilot DSP functions start of pilot DSP functions

1

write pilot data

read sys. command

pilot DSP functiondemodulate DMA disable interrupt read antenna

CPU

Peripherals

command ?

track_pilot+ demodulate

track_traffic+ demodulate

to antenna

freq

trafficmanage_data

to 2nd DSP proc.

write_antenna

pilotmanage_data(format)

start serial_perih.

interrupt1

display_LEDs

to LEDs

symbol_clock

to NCOserial

peripheral

read 8 bytes

modify parameters

from PC (mem. mapped)

cmd_ready

viterbi

RS_decode

DMA process (H/W)

STOP

to PC (serial)

pilot DSPfunction

DMA

compose packet

setup DMA

start DMA

setup DMA

1024

1024

64send_buffer

FRAME_read_sys_cmd

FRAME_demodulate

FRAME_pilot_DSP_functions

DMA int

from decorrelatorFPGA

corr_symbols

read_decorr

send_vocoder

MASK int.

#data == 1024 ?N

NY

#data == 3 ?

1

3

3

UNMASK int.

round-robin event loop

1

(a)

sequence edge (scheduling)

interrupt assignmentint #

data flow

thread FRAME

added source code

Y

from antenna (mem. mapped)

FRAME_read_antenna

interrupt3ant_ready

resetdemodulator()

READY (ACQU)

read

TRACK

discard=0set_parameters

NY

Figure 6: The Final Implementation of the MSBN demodulator

ecution, the rate of this thread is matched to its framerate. The short execution time of the thread allowsthis, so that the thread did not had to be cut o� likethe demod pilot+format data thread.Implementation The �nal implementation afterstatic thread frame scheduling and introduction of thecommunication bu�ers and with inclusion of the ex-ecution model is shown in �gure 6 (a). Both theFRAME read sys cmd and FRAME pilot DSP functionsframe are placed in the background event loop ofcyclic executive based execution model, and thus ina round-robin schedule. For the man-machine frame,this is possible because of its non-stringent timing con-straint. Its response time to a user setting new sys-tem commands will be limited by the execution timeof FRAME pilot DSP functions. The two other framesare triggered and activated by the environment us-ing the corresponding hardware interrupt. Remarkthat in the frame FRAME demodulate the interruptsare unmasked again (the target processor by defaultdid not allow interrupt nesting) to allow interrupt byFRAME read antenna, in order to reduce the responsetime to the antenna loosing the carrier.Figure 6 (b) outlines the behavior of both the CPUand the processor peripherals on a time-axis. It can beseen clearly that while the FRAME pilot DSP functionsframe is processing the previous data frame in back-ground, the FRAME demodulate frame (activated byan hardware interrupt) is processing consecutively the1024 data samples of the next frame. This can beconsidered as a kind of process time-folding. After1024 activations of the FRAME demodulate frame, theFRAME pilot DSP functions frame can be restarted,which is accomplished by the polling in the eventloop. When the FRAME pilot DSP functions framehas �nished and is waiting for the FRAME demodulateframe to be able to start the next frame, theFRAME read sys cmd frame is executed as second partof the event loop. It can also be seen that interruptnesting of both the FRAME read antenna frame andthe DMA interrupt routine (which disables the DMAengine after completion of its transfer), can in occurin the FRAME demodulate frame (and of course in theFRAME pilot DSP functions frame).Results The overhead implied by the use of the block-ing execution model is really minimal: it only requiresan in�nite loop and two conditional tests for the back-ground event loop, and a (register) context save/restorefor each of the interrupt routines. This is however sup-ported by the processor hardware.This overhead has to be compared the situation when areal-time kernel is used to implement the run-time be-havior of the concurrent communicating process speci�-

cation of �gure 4. The original speci�cation consists oftwelve concurrent user processes, and when this hierar-chy is straightforwardly implemented, using the kernel'ssemaphore primitives to signal between two tasks whenthe former has data ready, this will result in a consid-erable (run-time) overhead compared with the solutionproposed above. Additionally, it does require extra pro-gram memory to hold the program code of the kernel.However, with careful manual tuning of the orginal spec-i�cation and by collapsing a number of user processes,the kernel solution could approach ours. This tuning ishowever a manual task to be performed by the user. Incontrast, in our approach this tuning and transforma-tion can be automated, and works across the division ofthe speci�cation into processes by the user.5 ConclusionsIn this paper we have presented an approach for soft-ware synthesis targeted to real-time information pro-cessing systems, which tackles the software supportproblem at the source level in contrast to contempo-rary coarse-grain, black-box approaches. The proposedapproach tries to exploit the knowledge of the appli-cation at hand, applies transformations and optimiza-tions to the speci�cation and generates an applicationspeci�c solution, with the automatic support for time-liness. The proposed approach, based on a representa-tion model composed of program threads and constraintgraphs, decouples the timing characteristics of any par-ticular functionality from the speci�cation style used.The method features a selectable execution model whichcombines a detailed static analysis of the input speci-�cations, resulting in a static partitioning of the inputspeci�cations and a static schedule for each partition,and a run-time activation for the dynamic compositionof the speci�cation partitions.Starting from the general framework presented in thispaper, future work will focus on the di�erent optimiza-tion issues involved in the translation of the input spec-i�cations into program threads and in the de�nition ofe�cient static and dynamic scheduling algorithms.AcknowledgmentThe authors wish to thank the members of IMEC'sADT group, and SAIT Systems for the information onthe mobile terminal example.References[1] M. Chiodo, P. Giusto, A. Jurecska, H. C.Hsieh, A. Sangiovanni -Vincentelli, L. Lavagno,\Hardware-Software Co-design of Embedded

Systems," IEEE Micro, Vol. 14, No. 4, Aug.,1994.[2] P. Chou, G. Borriello, \Software Scheduling inthe Co-Synthesis of Reactive Real-Time Sys-tems," Proceedings of the 31st Design Automa-tion Conference, Jun., 1994.[3] M. Cornero, F. Thoen, G. Goossens, \SoftwareSynthesis for Real-Time Information ProcessingSystems," to appear in Proceedings of 1st Work-shop on Code Generation for Embedded Proces-sors, Dagstuhl (Germany), Aug. 31 - Sept. 2,1994, Kluwer.[4] European Space Agency (ESA), \Mobile Satel-lite Business Network (MSBN) - System Re-quirement Speci�cation," Issue 3.1, ESA - Es-tec, Keplerlaan 1 - Noordwijk - The Nederlands,Nov. 17, 1992.[5] K. Ghosh, B. Mukherjee, K. Schwan,\A Sur-vey of Real-Time Operating Systems," reportGIT-CC-93/18, College of Computing, GeorgiaInstitute of Technology, Atlanta, Georgia, Feb.15, 1994.[6] G. Goossens, D. Lanneer, M. Pauwels, F.Depuydt, K. Schoofs, A. Ki i, M. Cornero, P.Petroni, F. Catthoor, H. De Man, "Integra-tion of Medium-throughput Signal ProcessingAlgorithms on Flexible Instruction-set Archi-tectures," Journal of VLSI Signal Processing,Kluwer Academic Publishers, 1995.[7] D. Lanneer, J. Van Praet, A. Ki i, K. Schoofs,W. Geurts, F. Thoen and G. Goossens, \Re-targetable Code Generation for Embedded DSPProcessors," to appear in Proceedings of 1stWorkshop on Code Generation for EmbeddedProcessors, Dagstuhl (Germany), Aug. 31 -Sept. 2, 1994, Kluwer.[8] R. de Gaudenzi, R. Viola, C. Elia, \Band Lim-ited Quasi-synchronous CDMA : A Novel Satel-lite Access Technique for Mobile and PersonalCommunication Systems," IEEE Journal on Se-lected Areas in Communications, 10(2), Feb.,1992.[9] Rajesh K. Gupta, Giovanni De Micheli,\Hardware-Software Cosynthesis for DigitalSystems," IEEE Design & Test of Computers,pp. 29-40, Sept., 1993.[10] R. K. Gupta, \Co-Synthesis of Hardware andSoftware for Digital Embedded Systems," PhDDissertation, Stanford University, Dec., 1993.

[11] D. Harel, \Statecharts - A Visual Formalism forComplex Systems," Science of Computer Pro-gramming 8, 1987.[12] N. Gehani, W. D. Roome, \The Concurrent CProgramming Language," Prentice Hall, 1989.[13] IEEE Inc., \IEEE Standard VHDL LanguageReference Manual," IEEE Inc., 345 East 47thStreet, New York,NY, 10017. IEEE Standard1076-1987, Mar., 1982.[14] D. Ku, G. De Micheli, \Relative Scheduling Un-der Timing Constraints," in Proceedings of the27th Design Automation Automation Confer-ence, Orlando, FL, Jun., 1990.[15] L. Philips, I. Bolsens, A. Rabaeijs, B. Van-hoof, J. Vanhoof, H. De Man, \Silicon Synthe-sis of a Flexible CDMA/QPSK Mobile Commu-nication Modem," DSP Applications Magazine,Jan., 1994.[16] J. F. Ready,\VRTX : A Real-Time OperatingSystem for Embedded Microprocessor Applica-tions," IEEE Micro, pp. 8-17, Aug., 1986.[17] SPECTRON Microsystems, \SPOX - the DSPOperating System," 5266 Hollister Av., SantaBarbara CA 93111 (U.S.A.), Jun., 1992.[18] H. Tokuda, C. Mercer,\Arts: A DistributedReal-TimeKernel," Operating Systems Review,Vol. 23, No. 3, pp. 29-53, Jul., 1989.[19] E. Verhulst, \Virtuoso : Providing Sub-microsecond Context Switching on DSPs witha Dedicated Nano Kernel," International Con-ference on Signal Processing Applications andTechnology, Santa Clara, Sept., 1993.[20] J. Xu, D. L. Parnas, \Scheduling Processes withRelease Times, Deadlines, Precedence and Ex-clusion Relations," IEEE Transactions on Soft-ware Engineering, Vol. 16, No. 3, Mar., 1990.