A mixed-level virtual prototyping environment for SystemC-based design methodology

ARTICLE IN PRESS

Microelectronics Journal 40 (2009) 1082–1093

Contents lists available at ScienceDirect

Microelectronics Journal

0026-26

doi:10.1

� Corr

E-m

syyoon@

sooikch

journal homepage: www.elsevier.com/locate/mejo

A mixed-level virtual prototyping environment for SystemC-baseddesign methodology

Sanggyu Park, Sangyong Yoon, Soo-Ik Chae �

School of Electrical Engineering and Computer Science, Seoul National University, Seoul, Republic of Korea

a r t i c l e i n f o

Article history:

Received 17 March 2007

Received in revised form

7 January 2008

Accepted 19 May 2008Available online 17 July 2008

Keywords:

SystemC

Transaction level model

Channel

Architecture template

Virtual prototype

92/$ - see front matter & 2008 Elsevier Ltd. A

016/j.mejo.2008.05.010

esponding author. Tel.: +82 2 880 5457; fax: +

ail addresses: [email protected] (S

sdgroup.snu.ac.com (S. Yoon),

[email protected], [email protected] (S.-I.

a b s t r a c t

We propose a flexible mixed-level virtual prototyping environment, where models in different

abstraction levels such as transaction level, register-transfer level, and software level can be co-

simulated together. In the proposed environment, the designers should capture a transaction level

system model before hardware–software partitioning, from which mixed-level virtual prototyping

models can be refined with pre-defined and pre-verified communication primitives. We explain several

techniques employed in the environment such as ID ports for software template efficiency, abstraction

adapters in SystemC for mixed level simulation, and trace-driven simulation for faster performance

evaluation. Moreover, transaction level descriptions in SystemC can be compiled and executed as

software together with the DEOS, which is an operating system that provides SystemC APIs. We

compared the simulation speed of several mixed-level virtual prototypes of a H.264 decoder to show the

effectiveness of the proposed environment.

& 2008 Elsevier Ltd. All rights reserved.

1. Introduction

In designing a complex system that contains both hardwareand software, it is getting more difficult to meet the time-to-market requirement. It is even worse for the designers who stillstart developing the software part of the system after its hardwarepart is completely implemented because the simulation speed isvery slow at the register-transfer level. It would be much better ifthis limitation is alleviated by trading the simulation speed withthe simulation accuracy. Therefore, transaction level modeling hasbeen employed to enhance the simulation speed of virtualprototypes because the transaction level models are much fasterthan the register-transfer level models and they are available inthe earlier stage of the design flow [1–5].

In the conventional bus-based transaction level modelingadopted in commercial EDA tools such as ConvergenSCTM andMaxSimTM, transaction level virtual prototyping can be performedonly when all hardware components modeled at transaction leveland software components described with software APIs areavailable. Note that these components can be developed aftermaking decisions about hardware–software partitioning and

ll rights reserved.

82 2 888 1691.

. Park),

Chae).

memory architecture, which require accurate performance andarea evaluation. However, the designers cannot make accurateestimation of performance and complexity because hardware andsoftware components are not completely implemented yet. There-fore, it is highly likely to change some of the architectural decisionsmade at the transaction level, which results in manual modifica-tions of components in the later design stages. Furthermore,modification of a component in the bus-based transaction leveloften incurs modifications of other components. Therefore, thebus-based transaction level modeling is still expensive and risky.

We adopted three approaches to alleviate this problem. First,we adopted an un-timed transaction level whose abstraction levelis a little higher than the bus-based transaction level modeling tocapture both hardware and software with SystemC for functionmodeling before hardware–software partitioning. Second, wedeveloped a library of reusable communication primitives toreduce the designer’s effort in the transaction level functionmodeling, architecture exploration, and virtual prototyping. Third,we developed a mixed-level simulation environment that cansimulate heterogeneous systems. We adopted SystemC to com-bine several simulation tools such as logic simulators andinstruction-set simulators because it can model various abstrac-tion levels including the transaction, software, and register-transfer levels. By using the mixed-level simulation environment,we can verify and evaluate partially refined systems, whichcontain component models in different abstraction levels,obtained by progressively refining a system from the transaction

dx.doi.org/10.1016/j.mejo.2008.05.010

mailto:[email protected]


mailto:[email protected],


ARTICLE IN PRESS

S. Park et al. / Microelectronics Journal 40 (2009) 1082–1093 1083

level to the register-transfer level by reusing the communicationprimitives. Consequently, we developed a channel architecturetemplates tree (CATtree)-based design environment, in whichwe adopted all of the above three approaches [7]. For thisenvironment, we defined channels as behavioral patterns relatedto communication, memory, and synchronization and developedvarious CATs for each channel. Each CAT is a parameterizedimplementation of the channel with a unique micro-architecture.

This paper is an extended version of our previous publication [6],which explained a mixed-level virtual prototyping environment forthe CATtree-based design environment. The proposed virtualprototyping environment provides co-simulation links (CSL), whichconnect SystemC and HDL simulators as well as the GNU debuggers,and abstraction adapters written in SystemC, which bridge twocomponents in different abstraction levels. The proposed environ-ment also provides DEOS, a C++-based POSIX-style operating system,which enables us to re-use the transaction level descriptions inSystemC as software codes. Furthermore, the environment providesthe trace-driven simulation and verification features to accelerateperformance evaluation and functional verification.

This paper is organized as follows. In Section 2, the commu-nication library and the design flow are overviewed. In Section 3,we explain the proposed mixed-level simulation environment. InSection 4, we describe our design experience of an H.264 decoderwith the virtual prototyping environment to show how theproposed virtual prototyping environment can be used in realdesign projects. The conclusion is provided in Section 5.

2. CATtree-based design environment

In this section, we first explain the motivation for our workwith a JPEG decoder example. Then, we briefly describe thecommunication library. Finally, we explain what features arerequired in a virtual prototyping environment for our design flow.

2.1. JPEG decoder example

In designing the components in a system, the micro-architec-ture and interface of each component depend on the system-level

Bus Master I/F

Interrupt

Controller

Bus Slave I/F

Embedded

ProcessorHuffman

Decoder

(SW)

Device Driver

OS Kernel

API

Communication System

SDRAM Controller

Bus Slave I/F

Bus Slave I/F

Registers

Bus Master I/F

OBUF IBUF

External SDRAM

IRQ

I/F

IDCT

(HW)

Fig. 1. Two different JPEG de

architecture. Fig. 1 shows two different implementations of a JPEGdecoder whose behaviors are as follows. The HufDec block readscompressed packets from the input buffer (IBUF) and outputstransform coefficients to the inverse discrete cosine transform(IDCT) block. The IDCT block transforms the received coefficientsinto transformed pixels and writes those pixels to the outputbuffer (OBUF).

The first implementation, shown in Fig. 1a, is a low-costversion, where the HufDec block is software and the IDCT block ishardware. Both the IBUF and OBUF blocks are mapped to anexternal SDRAM. Because the HufDec block is implemented in anRISC processor, which has a bus master interface, the IDCT blockcontains a bus slave interface and address mapped registers toreceive transform coefficients from the HufDec block. The IDCTblock contains an interrupt request (IRQ) interface to notify anevent to the HufDec block via an interrupt controller. The IDCTblock also has a bus master interface to access to the OBUF blockin the SDRAM. The software stack contains a device driver thatreceives interrupt requests and sends the transform coefficients tothe IDCT block.

The second implementation, shown in Fig. 1b, is a high-performance version, in which the HufDec and IDCT blocks areboth hardware. In this implementation, the IBUF and OBUF blocksare mapped to an on-chip SRAM and an external SDRAM,respectively. The HufDec block has an SRAM interface to accessthe IBUF block and it also has a FIFO interface to send thetransform coefficients to the IDCT block, which also has a FIFOinterface to receive them. The IDCT block has a bus masterinterface to access to the OBUF block. To reduce SDRAM accesslatencies, the bus master interface is backed up by buffers.

Consider that the JPEG decoder has been designed as the firstimplementation, shown in Fig. 1a, but needs to be changed to thesecond implementation, shown in Fig. 1b. In this case, the HufDecand IDCT blocks should be re-described although their corefunctions are not changed because micro-architectures andinterfaces of them are changed. Note that re-describing compo-nents is cumbersome and error-prone work that requiresdesigner’s efforts.

From the above example, we concluded that device drivers,interrupt controllers, and memory controllers, and bus interfaces

Bus Master I/F

Buffer

Communication System

SDRAM Controller

Bus Slave I/F

Huffman

Decoder

(HW)

SRAM

Interface

HW FIFO

OBUF

External SDRAM

On-chip

SRAMIBUF

IDCT

(HW)

FIFO

I/F

FIFO

I/F

coder design examples.

ARTICLE IN PRESS

S. Park et al. / Microelectronics Journal 40 (2009) 1082–10931084

should not be included in the component descriptions. Instead,they should automatically be inserted into the design according tothe architectural decisions to ease the architecture explorationand virtual prototyping of a system.

2.2. The CATtree library

We defined communication patterns related to data transfer,memory, and synchronization, which are highly likely to recur inspecific application domains such as multimedia applications. Inthis paper, these communication primitives are called channels, inaccordance with the SystemC’s notations. For each of the pre-defined channels, we constructed various reusable implementa-tions that have different performances and areas. We will referthese reusable implementations to CATs because each CAT for achannel is a parameterized implementation for a unique micro-architecture of the channel [6,7]. We can provide more complexand optimized templates in the library by capturing data transfercircuits together with their associated memory in a CAT. A channeland its CATs are collectively called a CATtree and a collection ofCATtrees is called the CATtree library.

To construct the CATtree library, we defined eight interfaces aslisted in Table 1. For each interface, we defined three differentabstraction levels: transaction level, register-transfer level, andsoftware level. We defined the software level interface to beidentical to the transaction level one so that transaction leveldescriptions of computation modules can be re-used as softwarecode without any modifications.

For easier understanding of the rest of this paper, we explain asingle-put interface of a FIFO channel in detail. This interfacedefines three methods: put, sync, and clear. The method put writesa message to the channel if a room is available in the channel;otherwise it delays the producer module until a room is available[10]. The method clear removes all messages stored in the channelimmediately. The method sync suspends the module until all

Table 1Interfaces

Interface names Abbreviation SystemC i

Single-put SPI spi/TSSingle-get SGI sgi/TSSingle write SWI swi/TSIndexed write IWI iwi/T, NSSingle read SRI sri/TSIndexed read IRI iri/T, NSEvent notify ENI eni

Event accept EAI eai

WD

WEN

RDY

SYNC

CLR

CLKtemplate<T>

class spi: public sc_interface

{

void put(T data);

void clear(void);

void sync(void);

};

p

Fig. 2. Single-put interfaces: (a) transaction level in

messages stored in the channel are consumed. Fig. 2a showsSystemC declarations of this interface for the transaction levelmodeling.

The register-transfer level interface of the single-put interfaceconsists of three module-driven signals—WD, WEN, and CLR—

and two channel-driven signals—RDY and SYNC—as shown inFig. 2b. The RDY signal indicates that the channel is ready toreceive a new request and the SYNC signal indicates that thechannel is empty. To initiate a transaction, a module has to waituntil RDY is high. To put a message to the channel, the moduleshould drive WD with a valid message and set WEN to high. Toinitialize the channel, the module should drive CLR to high.To implement the method sync, the module should wait untilSYNC is high.

In the library, we defined six types of channels as shown inTable 2. For example, the FIFO-type channels have a producerinterface and a consumer interface. In contrast, the broadcast-type channels have a producer interface and multiple consumerinterfaces, while the conflux-type channels have multiple produ-cer interfaces and a consumer interface.

A CAT is a reusable implementation of a channel. Some CATsare composition of several sub-components such as on-chipSRAMs, buses, interrupt controllers, caches, and adapters. Each ofthese sub-components is represented with three templates: atransaction level template for transaction level virtual prototyp-ing, a register-transfer level template for hardware implementa-tion, and a software template for the device driver generation.Although each sub-component has a transaction level template,not all sub-components have both register-transfer level andsoftware level templates. For example, a sub-component with abus slave interface has only a register-transfer level hardwaretemplate. Each CAT is implemented with its specific architecturalassumptions. Therefore, it should be used in a design only if itsassumptions are satisfied.

Here, we explain several CATs of a FIFO channel illustrated inFig. 3. A CAT for register FIFOs (Fig. 3b) is a simple hardware

nterface Blocking methods

void put(T data), void clear(void), void sync(void)

T get(void), T peek(void), void clear(void)

void write(T data), void sync(void)

void write(int index, T data), void sync(void)

T read(void)

T read(int index)

void notify(void), void wait_ack(void)

void wait_req(void), void ack(void)

ut clearput put

idle wait

terface and (b) register-transfer level interface.

ARTICLE IN PRESS

Table 2Channels

Channel Description Producer interface Consumer interface

FIFO Point-to-point ordered and synchronized data transmission 1 SPI 1 SGI

Array Addressable data storage with one writer and one reader 1 IWI 1 IRI

Variable Single data storage with one writer and one or more reader 1 SWI 1 or more SRI

Event Point-to-point event notification and acknowledgement 1 ENI 1 EAI

Broadcast Copies a message from a producer to multiple consumers 1 SPI 2 or more SGI

Conflux Collect messages from multiple producers for one consumer 2 or more SPI 1 SGI

FIFOP G

RegFifoP G

Channel (Function)

CATs (Architecture)

F2S

SRAM

GP

sSPRegFIFOP G G

sSG RegFifoP GPP

sSPSG

SRAM

P Gb

d

g

h i

SWFifoP G

c

a

F2B GP

e

SDRAM

CF2B GP

SDRAM

SRAM

f

mSP mSGmSP

GmSG

Fig. 3. FIFO CATtrees [7].


implementation with registers, which can be used when producerand consumer modules are hardware. A CAT for SRAM FIFOs(Fig. 3d) and a CAT for SDRAM FIFOs (Fig. 3e) are hardwareimplementations that store messages in an on-chip SRAM andan external SDRAM, respectively. A CAT for cached SDRAM FIFOs(Fig. 3f) is an enhanced version of the CAT for SDRAM FIFOs thatcontains a cache to reduce SDRAM access latencies [6]. A CAT forsoftware FIFOs (Fig. 3c) is implemented using operating systemAPIs, which should be used to connect two software modulesthat are assigned to the same processor. CATs for bus-based FIFOs(Fig. 3g–i), which consist of several bus adapters and on-chipbuses, are prepared for implementing communications betweenhardware and software modules. For example, a CAT for master-put-slave-get (mSP-sSG) FIFO (Fig. 3g) can be used if the producermodule is software and the consumer module is hardware.This CAT includes a mSP adapter (mSP) and a sSG adapter (sSG),where the adapter mSP is a device driver that reads statusregisters, sends messages, and handles interrupt requests from theadapter sSG.

2.3. CATtree-based design flow

In this paper, we assume the CATtree-based design flow, whichis based on the SystemC design methodology [8,9]. This designflow consists of four major design steps: transaction levelmodeling, computation architecture exploration, channel archi-tecture exploration, and backend implementation. We brieflyexplain the design flow except the backend implementation,which is out of scope in this paper.

In the step of transaction level modeling, a designer shouldpartition the system functions into modules and channels.Transaction level models of modules are described manually inSystemC, whereas those of channels are selected and re-used fromthe library. In this step, selecting right channels is importantbecause it determines the communication architecture space

provided by the library. During the architecture exploration, thedesigners should often modify and re-verify several modules atthe various abstraction levels. For easier verification, we providetrace-driven verification, which compares the traces of a modifiedmodel with those of the golden-reference model in the transac-tion level. The trace-driven verification is explained in detail inSection 3.4.2.

In the step of computation architecture exploration, for eachmodule, a designer should decide whether it will be implementedas hardware or software. It is necessary to evaluate theperformances and area requirements of the modules. In theproposed environment, the software performance of a module iseasily estimated from the transaction level descriptions using themodule-only software simulation feature, which will be explainedin Section 3.3. However, the hardware performance of a modulecannot be evaluated if its synthesizable register-transfer levelmodel is not available. To minimize manual HDL descriptions, thedesigners should select software modules if possible andprogressively change some software modules into hardwaremodules only if the system does not meet its design constraints.After hardware–software partitioning, the designers shoulddescribe a HDL description for each hardware module. To verifyHDL descriptions, we co-simulate transaction-level and register-transfer level models, which will be explained in Section 3.2.

In the step of channel architecture exploration, we replace eachchannel with a suitable CAT one by one to implement it either inhardware at register-transfer level or in software running on anembedded processor. For this step, we provide several features inthe proposed virtual prototyping environment. First, we supportmixed-level simulation because we should simulate partiallyrefined systems that contain heterogeneous components atdifferent abstraction levels. Second, we support trace-drivensimulation because hardware simulation with logic simulatorsand software simulation with instruction-set simulators are tooslow to be employed in verification and evaluation of complex

ARTICLE IN PRESS


systems. Third, we provide bus adapters, device drivers, andmemory controllers, which are inserted into the virtual prototypeas sub-components of CATs because the virtual prototypes shouldbe constructed efficiently according to architectural decisions.

However, it is not easy to model time-dependent behaviorswith this design flow because we employ un-timed transactionlevel models as the starting point of the design flow. To alleviatethis problem, we should model the time-independent function ofthe system by separating its time-dependent part and implementthe time-independent part with the proposed design flow. Then,we should insert time-dependent behaviors into the software partof the implemented time-independent system. This approach isuseful especially for designing video codec systems in which time-dependent behaviors are not dominant. For example, the parts forNAL decoding and sequence-level syntax parsing are timedependent in the H.264 decoders.

3. Mixed-level virtual prototyping environment

We developed a mixed-level virtual prototyping environmentthat provides all required features explained in Section 2. It is aSystemC-based virtual prototyping environment that supportsseven simulation modes, as shown in Fig. 4a. A transaction levelmodule is connected to a transaction level channel in mode I and aregister-transfer level module is connected to a register-transferlevel channel in mode II. A register-transfer level module isconnected to a transaction level channel in mode III, and atransaction level module is connected to a register-transfer levelchannel in mode IV. A software level module is connected to atransaction level channel and a software level channel in mode Vand mode VI, respectively. In mode VII, a software module isconnected to a device driver that communicates with a transactionlevel bus slave component. In the proposed environment, virtualprototypes of any partially refined systems can be constructed bycombining different simulation modes as shown in Fig. 4b, wheremodules in the transaction level, register-transfer level, andsoftware level are connected through transaction level channels.

3.1. Transaction level simulation

For transaction level modeling and simulation of a system, weprovide a library that includes interfaces and channels at the

Mode I

Mode II Mode III Mode IV Mode V Mode VI Mode V

TLM

TLM TLM

TLM

TLM TLM

Logic

Simulator

RTL

RTL RTL

RTL

RTL-TLM TLM-RTL SW-TLM

DEOS

SW SW

SWSWSW

DEOS

System C

Fig. 4. (a) Eight simulation modes supported in the mixed-level virtua

transaction level. The Open SystemC Initiative (OSCI) released a TLMstandard which is useful for developing such a library [3]. Thisstandard recommends a layered approach that employs twotransaction level layers: a transport layer and a user (convenience)layer. This standard defines several transport layer interfaces such astlm_blocking_get_if and tlm_blocking_put_if; and it defines severaltransport layer channels such as tlm_fifo, tlm_req_rsp_channel, andtlm_transport_channel. With these transport layer interfaces andchannels, we can define user-layer interfaces and channels, whichare more convenient for modeling application specific behaviors.

The CATtree interfaces and channels correspond to the user-layer interfaces and channels. For example, Fig. 5a shows a confluxchannel that connects a consumer module C to two producermodules P1 and P2. This CATtree channel can be modeled usingthe transport layer interfaces and channels of the OSCI TLMstandard, as shown in Fig. 5b. In this model, the conflux channel isa hierarchical SystemC module that contains an internal compo-nent M, which receives messages from either P1 or P2, stores themessages in the internal memory, and provides the messages to C.All modules and the internal component M are connected by threetlm_transport_channels.

If the CATtree library is implemented with the OSCI TLMstandard, we should use initiator ports which translate methodcalls into request/response packets transferred through thetransport layer interfaces and channels [3]. Such translationoverheads are allowable for transaction level simulation but notfor software implementation. Therefore, transport layer interfacesand channels are not adequate for describing software imple-mentations of CATtree channels. To avoid the overheads, wedescribed software models of the channels and CATs with only theSystemC language primitives by following the modeling stylepresented in [1]. Furthermore, we described the transaction levelchannels and CATs with the same modeling style to simplify thedevelopment and maintenance of the library.

We can represent CATtree channels with a producer interfaceand a consumer interface as a SystemC primitive channelinherited from the sc_prim_channel class. However, we cannotrepresent channels with multiple producers or consumers such asbroadcast and conflux channels as a SystemC primitive channel. Inan example for an arbiter with multiple master interfaces in thereference about the OSCI TLM standard [3], a master interface thatrequested a transaction is identified by attaching dedicatedtransport layer channels to each master interface. According to

Mode I Mode I Mode I

IIMode III Mode V

Mode I

TLM TLM

TLM

TLM

TLM

RTL

RTL-TLM SW-TLM

SW

ISS

Module

Channel

Abstraction

Adapter

l prototyping environment and (b) an example virtual prototype.

ARTICLE IN PRESS

Conflux

channel

tlm_transport_channel

tlm_transport_channel

M

P2

P1

P2

P1

CC

sc_export sc_export

Fig. 5. (a) A conflux channel and (b) its implementation with OSCI TLM standard.

enum FF_CMD { IDLE, PUT, SYNC, CLEAR };

template<class T>

class id_spi : public sc_prim_channel,

public spi<T>

{

public:

id_spi(void) { };

virtual put(T data) {

m_req_cmd = PUT;

m_req.data = data;

m_req.notify(SC_ZERO_TIME);

wait(m_ack);

}

void ack(void) {

m_req_cmd = IDLE;

m_ack->notify();

}

FIFO_CMD m_req_cmd;

T m_req_data;

sc_event m_req, m_ack;

};

Fig. 6. An ID port for a single-put interface.

template<class T>

class conflux : public sc_module,

public sgi<T>

{

public:

id_spi<T> p0, p1;

SC_HAS_PROCESS(conflux);

conflux(sc_module_name name) :

SC_THREAD(do_put);

}

void do_put(void) {

id_spi<T>* p[2] = {&p0, &p1};

while(true) {

wait(p[0]->m_req | p[1]->m_req);

for(int id=0; id<2; id++) {

switch(p[id]->m_req) {

case PUT:

...

p[id]->ack();

break;

case SYNC:

...

}

}

}

T get(void) {

...

}

};

Fig. 7. A transaction level model of a conflux channel.


the transaction method call, the corresponding initiator port of amaster module prepares a request packet and puts the packet tothe transport layer channel, which stores the packet in its localmemory and provides it to the arbiter. However, this approachrequires two copy operations to transfer the request packet: (1)from the initiator port to the transport layer channel and (2) fromthe transport layer channel to the arbiter. Therefore we imple-mented an ID port only with SystemC language primitives to savea copy operation by unifying the initiator port and the transportlayer channel. Although the ID port is equivalent to the solution inRef. [3] in the functional viewpoint, the proposed solution is moreefficient, especially for software implementations.

The ID port receives a transaction request from a module, savesthe request in local variables, and notifies an event to the channel.Each ID port receives a transaction request from a specific moduleso that the channel can identify the module that called themethod. Fig. 6 shows a description of an ID port for the single-putinterface. If a module calls the method put, the ID port saves thetransaction type and the message in member variables m_req_cmd

and m_req_data, respectively. Then, it notifies the arrival of newtransaction request to the channel through a sc_event object m_req

and awaits an acknowledgement from the channel using asc_event object m_ack. The module is suspended until the channelacknowledges it by calling the ack method of the ID port.

Fig. 7 shows a SysemC description of a conflux channel withtwo single-put interfaces and one single-get interface. Thechannel has two ID ports for two single-put interfaces, whichwill be connected to two producer modules. The conflux channelcreates a thread that processes the method do_put, which awaits

new transaction requests from either of the two ID ports, andservices those requests. Since the single-get interface is unique,the SystemC description of the conflux channel inherits the single-get interface class and implements methods defined in theinterface.

3.2. TL–RTL co-simulation

In the proposed environment, two different simulation toolsare connected by a CSL which consists of a server, a client and twounidirectional inter-processor communication channels such asnamed pipes and TCP/IP sockets. The CSLs can be implementedeither in the register-transfer level or in the transaction level.We will call a CSL implemented in the register-transfer level asan R-CSL; similarly, a CSL in the transaction level as a transactionlevel co-simulation link (T-CSL). Through an R-CSL, as shown in Fig. 8b, a server and a client communicate by exchangingregister-transfer level packets for the TL–RTL co-simulation,where abstraction adapters should be inserted in the SystemC

ARTICLE IN PRESS

MA

MC

MB

CA

CB

SystemC

MA

MC

MB

CA

SystemC Logic Simulator

RTL-to-TL adapter

TL-to-RTL adapter

CB

Se

rve

r

Clie

nt

R-CSL

MA

MC

MB

CA

SystemC Logic Simulator

RTL-to-TL adapter

TL-to-RTL adapter

CB

Se

rve

r

Clie

nt

T-CSL

Fig. 8. The TL–RTL co-simulation: (a) a system example; (b) and (c) two TL–RTL

co-simulations with R-CSL and T-CSL, respectively, after refining MC into hardware.

template<class T>

class tl2rtl_spi : public sc_module,

public spi<T>

{

public:

sc_in<bool> RESETn, CLK, RDY;

sc_out<bool> WEN, CLR, SYNC;

sc_out<u_int> WD;

SC_CTOR(tl2rtl_spi) { }

void put(T value) {

while(RESETn == false || RDY == false) {

wait(CLK->posedge_event());

}

WEN = true;

CLR = false;

WD = (u_int) value;

wait(CLK->posedge_event());

WEN = false;

}

};

Fig. 9. A TL-to-RTL adapter for a single-put interface.


environment. Through a T-CSL as shown in Fig. 8c, a server and aclient communicate by exchanging transaction level packets forTL–RTL co-simulation, where abstraction adapters described witha C-language interface of a logic simulator should be inserted inthe logic simulator. For example, the foreign language interface(FLI) is a C-language interface of Mentor Graphics ModelSims.The simulation speed of the T-CSL is superior to the R-CSL becausethe data size of a transaction level packet is smaller than that of aregister-transfer level packet. In the TL–RTL co-simulation,however, we adopted the R-CSL because describing abstractionadapters with a C-language interface of a logic simulator is moredifficult.

For the TL–RTL co-simulation, abstraction adapters, which arealso called as transactors [3], should be attached into the co-simulation link. For each interface, we provide two abstractionadapters. One is a TL-to-RTL adapter for connecting a transactionlevel module and a register-transfer level channel, and the other isan RTL-to-TL adapter for connecting a register-transfer levelmodule and a transaction level channel. Let us consider a systemthat contains three modules (MA, MB, and MC), which areconnected by two channels (CA and CB), as shown in Fig. 8a. IfMC and CB are decided to be hardware, they should be simulated atthe register-transfer level, while the others are at the transactionlevel. In this example, a TL-to-RTL adapter is used to connect atransaction level module (MB) to a register-transfer level channel(CB) and an RTL-to-TL adapter is used to connect a register-transfer level module (MC) to a transaction level channel (CA), asshown in Fig. 8b. If MA in the transaction level has a single-put

interface, a TL-to-RTL adapter is used for the single-put interface.Similarly, if MC in the register-transfer level has a single-getinterface, a RTL-to-TL adapter is used for the single-put interface.

A TL-to-RTL adapter is an abstraction adapter that convertstransaction level method calls into register-transfer level signalactivities. This adapter is used when a channel is at the register-transfer level and a module is at the transaction level. As alreadyexplained in [1], an abstraction adapter of this type is inheritedfrom the SystemC interface class and its methods are described todrive register-transfer level signals. Fig. 9 shows the method put ofa TL-to-RTL adapter of a single-put interface as an example. Thismethod waits until the signal RESETn and the signal RDY are high(true). Then, it sets the signal WEN to high, and drives the signalWD with the value to be sent. Then, it waits for one clock cyclebefore resetting the signal WEN to low.

An RTL-to-TL adapter is an abstraction adapter that monitorssignal-level activities and calls appropriate transaction levelmethods. This adapter is used when a module is at the register-transfer level while a channel is at the transaction level. Anabstraction adapter of this type is described as a SystemC modulethat has a method-process created by the macro SC_METHOD [2].This method-process executes the method do_adaptation, whichperforms just a one-cycle behavior according to the signal inputs.This method calls non-blocking methods of the transaction levelchannel because it should terminate without blocking its execu-tion. Fig. 10 shows a SystemC description of a RTL-to-TL adapterof a single-put interface. In this description, we assume that anR-CSL latches signals to reduce synchronization overheadsbetween a SystemC environment and a logic simulator. Therefore,the values in the input signals are those driven in the previouscycle. When RESETn, RDY and WEN are all high, the methoddo_adaptation calls the non-blocking method nb_put.

Although they are not included in Table 1, all interfaces definenon-blocking methods also. However, some sub-components ofCATs do not support non-blocking methods if they are toocomplex. For such components, an RTL-to-TL adapter has a specialfeature that uses blocking methods instead of non-blockingmethods. The feature is activated only if the template parameterNBLEMU is set to true. If this feature is activated, the adaptercreates a new thread nbl_emulation and a sc_event objectm_nbl_event. The thread nbl_emulation awaits an event fromm_nbl_event. If a register-transfer level module drives WEN to

ARTICLE IN PRESS

template<class T, int TN, bool NBLEMU, bool PR>

class rtl2tl_spi : public sc_module {

public:

sc_port<spi<T> > tlm;

sc_in<bool> RESETn, CLK, WEN, CLR;

sc_in<u_int> WD;

sc_out<bool> RDY, SYNC;

SC_CTOR(rtl2tl_spi) {

SC_METHOD(do_adaptation);

sensitive_pos << CLK;

if(NBLEMU == false)

SC_THREAD(nbl_emulation);

}

void do_adaptation(void) {

if(RESETn == false) return;

if(prev_ready == true && WEN == true)

nb_put((T) WD);

RDY = prev_ready = get_ready();

}

void nb_put(T data) {

if(m_nbl_ready == false) return false;

if(NBLEMU == false) return tlm->nb_put(data);

else {

m_nbl_ready = false;

m_nbl_data = data;

m_nbl_cmd = WRITE;

m_nbl_event.notify();

return false;

}

}

bool get_ready(void) {

if(m_nbl_ready == false) return false;

if(PR == true && rand_wait() == true) return false;

return tlm->pready();

}

void nbl_emulation(void) {

while(true) {

wait(m_nbl_event);

if(m_nbl_cmd == WRITE) {

tlm->put(m_nbl_data);

m_nbl_ready = true;

}

}

}

};

sc_event m_nbl_event;

bool prev_ready, m_nbl_ready;

FIFO_CMD m_nbl_cmd;

T m_nbl_data;

Fig. 10. A RTL-to-TL adapter for a single-put interface.


true, the method do_adaptation calls the non-blocking method ifNBLEMU is false; otherwise, it sets the m_nbl_ready flag to false sothat the signal RDY will be driven to low, and it wakes up thethread nbl_emulation by notifying an event through m_nbl_event.Then, the thread nbl_emulation calls the corresponding blockingmethod, and sets the m_nbl_ready flag to true.

In verifying a register-transfer level model of a module, RTL-to-TL adapters are necessary for re-using the other transaction levelmodules and channels as testbenches. Since transaction leveldescriptions of modules, which were described manually bydesigners, are un-timed models, RTL-to-TL adapters provide arandomization feature [11]. If this feature is enabled, the adapters

randomly drive the signal RDY to high or low to achieve higherverification coverage. With this simple extension, the interface-related logics of a register-transfer level module can be verifiedmore easily. This feature is activated only if the templateparameter PR is true, as shown in Fig. 10.

3.3. TL–SW co-simulation

In the proposed virtual prototyping environment, we use theGNU debugger as an instruction-set simulator. A SystemCenvironment and a GNU debugger are connected by a T-CSL. In aTL–SW co-simulation, multiple GNU debuggers can be employedby integrating more T-CSLs to the mixed-level virtual prototype.There are two software simulation modes. One is the refined-channel simulation mode (mode VI and VII in Fig. 4a) and anotheris the module-only simulation mode (mode V in Fig. 4a).

In the refined-channel simulation mode, all software modules,device drivers and software channels are executed in instruction-set simulators. This simulation mode can report accurateperformance of the system including communication overheads.For this simulation mode, all channels in a system should bereplaced with their CATs.

Here, we demonstrate an example of a library-based softwarerefinement approach, which is similar to the component-basedapproach [12]. In this example, we assume that the externalinterfaces of an embedded processor are restricted to one busmaster interface and an interrupt request signal. In the examplesystem, shown in Fig. 8a, MA and MC are decided to be softwareand hardware, respectively. If CA is a FIFO channel and MA is aproducer, CA should be refined with an mSP-sSG FIFO CAT (Fig. 3g)or an mSP-mSG FIFO CAT (Fig. 3i). Because MC is hardware, themSP-sSG FIFO CAT is selected in this case. In this CAT, the adaptermSP is assigned to the processor. This software adapter acts as adevice driver that receives interrupt services, allocates memories,reads status registers and writes control registers of a hardwareadapter sSG. A simplified software template of the adapter mSP isshown in Fig. 12a. In this template, the method put polls the statusregister of the sSG and awaits an interrupt until the sSG is ready toreceive a new message, and then it writes a new message to thedata register in the adapter sSG. Fig. 11a shows the refined systemmodel and Fig. 11b shows its virtual prototype.

To use transaction level descriptions in SystemC as softwarecode, we re-defined SystemC APIs such as SC_MODULE,SC_THREAD, and sc_event with POSIX API functions of the DEOS,which is an operating system employed for the embeddedprocessors. For example, the macro SC_THREAD is re-definedwith the pthread_create, shown in Fig. 13a, which receives apointer to its entry function as an argument, allocates a contextmemory for the new thread, and sets the program counter field inthe context memory with the address of the entry function.However, although a SystemC thread created by the SC_THREADtakes a member method as its entry as shown in Fig. 13b, thepointer to the entry method cannot be used as an entry functionfor the pthread_create because we cannot invoke one of membermethods of a class without specifying the class object explicitlyin C++. To solve this problem, we re-defined a class sc_module,which is a base class of all SystemC modules, as shown in Fig. 13c.The class sc_module has a member method invoke, whichreceives the pointer to the entry method and indirectly jumps tothe entry method. As shown in Fig. 13d, we defined a datastructure SC_CONTEXT, which contains a pointer to the moduleand a pointer to the entry method. We also defined a commonentry function sc_entry_func, which receives the pointer to theSC_CONTEXT as an argument and calls the method invoke of asc_module object. The SC_THREAD macro is defined to prepare a

ARTICLE IN PRESS

SW-to-TLadapter

MC

MB

ISS

GDB

DEOSemu

MA

ISS

GDB

DEOS

CB

mSP

sSG RegFifo

SystemC

Refined-channel simulation mode Module-only simulation mode

MA

MC

MBCB

sSG RegFIFOmSP bus

MA

MC

MB

CA

CB

SystemC

T-CSL T-CSL

sp2emu

Fig. 11. The TL–SW co-simulation: (a) an example of the software refinement and (b) the block diagram of the virtual prototype of the example.

template<class T>

class mSP_SW : public spi<T> {

public:

mSP_SW(void) {

deos_register_channel( … );

}

void put(T data) {

while(m_avail == 0) {

m_avail = read_hw(STATUS_ADDR);

if(m_avail == 0) {

deos_wait_interrupt( … );

}

}

write_hw(DATA_ADDR, data);

m_avail--;

}

}

template<class T, int ID>

class sp2emu : public spi<T> {

public:

sp2emu(void) { }

void put(T data) {

__asm__ (“ mcr p0, 1, %0, c0, c1, %1\n” : : “ r” (data), “ i” (ID));

}

}

Fig. 12. Software implementation of an mSP adapter: (a) an adapter for the

refined-channel simulation mode and (b) an adapter for the module-only

simulation mode.


SC_CONTEXT object and call the pthread_create by passing thepointer to the sc_entry_func and the pointer to the SC_CONTEXTobject as arguments to create a new thread, as shown in Fig. 13e.The new thread jumps to the sc_entry_func and calls the invoke

method of the module, which jumps to the entry method.Another software simulation mode in the proposed environ-

ment is the module-only simulation mode, in which softwaremodules are simulated on an instruction-set simulator whilechannels are simulated at the transaction level. For example, themodule MB and the channel CB in Fig. 11b are simulated in this

mode. This simulation mode is useful for (1) evaluating perfor-mance of software modules before channels are refined and (2)creating trace files of software modules. For this simulation mode,we attached DEOSemu or a DEOS emulation model to the GNUdebugger, which is a co-processor that emulates part of the DEOSroles such as task scheduling and memory management. It alsorelays the transactions from software modules in an ISS tochannels in SystemC through a T-CSL. In this simulation mode,the software modules can request transactions to the DEOSemuby using interface adapters, which are described with in-lineassembly macros to minimize their overheads. For example, asimple adapter of the single-put interface is shown in Fig. 12bwhen the processor is one of the ARM processors.

3.4. Trace-driven verification

In this sub-section, we present two trace-based virtualprototyping features. One is the trace-driven simulation toaccelerate performance evaluation of complex systems andanother is the automatic trace compare (ATC) for efficientfunctional verification.

3.4.1. Trace-driven simulation

All channels in the CATtree library can record traces oftransactions in a file. If a system is simulated at the transactionlevel, un-timed traces without temporal information are saved. Ifsome components are simulated at either register-transfer level orsoftware level, timed traces with temporal information are saved.A trace model is a virtual component that mimics the input andoutput behaviors of the original model with their traces.

Because a hardware module can initiate concurrent transac-tions through multiple interfaces, a trace model should also beable to handle concurrent traces. A simple method to handleconcurrent traces is to attach a dedicated thread that initiatestransactions according to traces from a trace file to each interface.However, the simulation speed is slow because it incursmany context switches. Another method is a scheduling-basedmethod, in which a single scheduler reads all trace files, sorts thetraces with their time stamps, and distributes them to theirassociated interfaces. The scheduler has one thread that calls the

ARTICLE IN PRESS

SC_MODULE(example) {

SC_CTOR(example) {

SC_THREAD(run);

}

void run(void) { … };

};

int pthread_create (

pthread_t* tid,

const pthread_attr_t* attr,

void *(* func)(void *),

void* arg);

typedef void (sc_module::*sc_entry_t)(void);

class sc_module {

public:

void invoke(sc_entry_t func) {

sc_entry_t m_func;

m_func = func;

((*this).*m_func)();

};

};

struct SC_CONTEXT {

sc_module* module;

sc_entry_t entry;};

void* sc_entry_func(void* args){

SC_CONTEXT* data;

data = (SC_CONTEXT *)args;

data->module->invoke(data->entry);

}

#define SC_MODULE(NAME) class NAME : public sc_module

#define SC_CTOR(NAME) typedef NAME SC_CURRENT_USER_MODULE; \

NAME()

#define SC_THREAD(FUNC) {pthread_t tid; \

SC_CONTEXT data = { this, static_cast<sc_entry_t> (&

SC_CURRENT_USER_MODULE::FUNC)};

pthread_create(&tid, NULL, &sc_entry_func, &data);

Fig. 13. A SystemC API layer of DEOS.


non-blocking methods in the interfaces. The scheduler behaves asfollows.

Step 1: The scheduler collects transactions of the current timefrom the sorted traces and marks interfaces that have transactionsas busy.

Step 2: The scheduler calls non-blocking methods for each busyinterface one by one. If the non-blocking method returns true,which means the transaction is completed, it marks the interfaceas idle.

Step 3: The scheduler waits for one clock cycle.Step 4: The scheduler iterates Step 2, Step 3, and Step 4 until all

interfaces are marked as idle.Step 5: The scheduler collects the next transaction from the

sorted traces. Let the current time be TC, and the next time be TN.Then, the scheduler waits for TN–TC and updates the current timeto TN.

Step 6: The scheduler goes to the step 2.In the trace-driven simulation, we employ the thread-based

method as well as the scheduling-based method because somesub-components of CATs do not support non-blocking methods. Ifa module has N interfaces, its trace model contains one schedulerthread and maximum N�1 dedicated threads. At the beginning ofthe simulation, the scheduler thread checks whether a channelsupports the non-blocking methods or not. If the channel supportsthe non-blocking methods, transactions to the channel arehandled by the scheduler thread. Otherwise, a dedicated threadhandles the traces for the interface.

3.4.2. Automatic trace compare (ATC)

When a system is modified, it should be re-verified to makesure of its functional correctness. For convenient verification, (1)the transaction level models should be re-used as testbenches

without manual modifications, (2) the correctness of the systemshould be checked automatically, and (3) the location of bugsshould be easily identified. In the proposed virtual prototypingenvironment, the testbench reuse is natural. However, a newfeature is needed for the automatic correctness checking andlocating the bugs. The ATC feature is such a feature.

The basic idea is simple and common [11]. All channels andchannel sub-components can be configured to compare incomingtransactions with traces, which were obtained from the transac-tion level system model. If the ATC feature is enabled, thesimulation is stopped as soon as some mismatches are foundreporting the channel names, interface names, and module namesas well as transactions themselves.

To support this feature, all implementations of channels shouldbe able to read traces and compare the traces with incomingtransactions. For register-transfer level channels, the ATC featureis implemented using the C-interface of logic simulators. For themodule-only software simulation mode, no special care is neededbecause all channels are simulated at transaction level inSystemC. For the refined-channel software simulation mode,however, this feature should be implemented in the instruction-set simulator because the trace file is stored in the host systemwhile a channel is simulated on an instruction-set simulator.Therefore, we modified the GNU debugger as follows [13]. Wedefined special software interrupts which are handled by theGNU debugger instead of software interrupt service routines.Then, we modified software templates in the CATtree library togenerate the software interrupt if a transaction method is calledso that the GNU debugger handles the interrupt, reads a tracefrom the associated trace file, and compares the trace with thearguments of the method. Note that the arguments of a methodare always passed through specific registers. For example, if theprocessor is ARM7TDMI and the compiler is GNU gcc, the

ARTICLE IN PRESS

Table 3Simulation time for various mixed-level simulations

Case

no.

Purpose Transaction level

models

Register-transfer level

models

SW models Trace models Simulation time per

frame (s)

(a) Functional verification All – – – 3

(b) Software performance estimations All but MC – MC – 50

(c) All but VLD VLD 13

(d) Parser 7

(e) Register-transfer level verification of hardware

modules

All but MC MC – – 169

(f) All but VLD VLD – – 29

(g) Channel architecture exploration All (fully refined) – – – 5

(h) All but MC, VLD MC, VLD – – 203

(i) All channels parser All HW modules – – 530

(j) All Channels All HW modules Parser 535

(k) All HW modules Parser all SW

channels

– 538

All HW channels

(l) Trace-driven simulation for the channel architecture

exploration

All but MC – – MC 6

(m) All but VLD – – VLD 4

(n) All but MC, VLD – – MC, VLD 8

(o) All channels parser,

DF

– – All HW modules

but DF

26

(p) All channels, DF – Parser all SW

channels

All HW modules

but DF

33


argument of the put method for the single-put interface is storedin the register R0.

4. Experiments

In this section, we explain how the proposed virtual prototyp-ing environment was used in designing an H.264 decoder. Weassumed that an ARM7TDMI processor is available in the system.Table 3 compares the simulation speed of different mixed-levelvirtual prototypes with their purposes. This simulation wasperformed on a Linux workstation with a 3.2 GHz Dual CorePentium-IV and a 4GB memory.

4.1. Transaction level function capture

We manually partitioned the functions of a H.264 decoder into13 modules including the motion compensation (MC) block, thevariable length decoder (VLD) block and the syntax parser(Parser). All transaction level descriptions of modules weredescribed manually. To connect modules, we selected 61 channelsfrom the CATtree library. We simulated and verified the transac-tion level system with the simulation mode I. Then, we configuredthe virtual prototype to generate un-timed traces so that theywould be used by the ATC feature to check the functionalcorrectness of hardware modules in the next step.

4.2. Computation architecture exploration

To estimate software performances, we refined each moduleinto software and simulated the virtual prototype with themodule-only simulation mode (mode V). This simulation modewas 4–20 times slower than the transaction level simulation assummarized in Table 3b–d. According to the reported perfor-mances, we decided that only the parser block is software and theothers are hardware. We manually wrote HDL descriptions ofhardware modules. To verify the HDL descriptions, we configuredthe virtual prototype applying the simulation mode I and III. Wechecked the code coverage of hardware modules with the Mentor

Graphics ModelSims. The achieved code coverage of the MC andthe VLD blocks was 60% and 75%, respectively. The reason for thelow code coverage was that interface-related control logics werenot verified because the testbenches were un-timed models. Toenhance the code coverage, we enabled the ATC feature, and re-checked the code coverage. The final code coverage became 85%and 87% for the MC and VLD blocks, respectively. The TL-RTL co-simulation was approximately 100 times slower than thetransaction level simulation as shown in Table 3e–f.

4.3. Channel architecture exploration

We progressively refined channels with their CATs to meet thedesign constraints. Since we assumed that completely verifiedCATs are provided in the library and modules are fully verified inthe previous design steps, functional verification was not an issuein this step. However, performance evaluation was important tovalidate the architectural decisions. In the proposed environment,a designer can construct various mixed-level virtual prototypestrading off the simulation speed and the accuracy as compared inTable 3g–k.

If the register-transfer level simulation is involved, thearchitecture exploration takes several hours to decode 30 framesof images. Therefore, we applied the trace-driven simulation tothe virtual prototype. As shown in Table 3l–p, the trace-drivensimulation is 13–30 times faster than the register-transfer levelsimulation, while it reported the performance results in reason-able accuracy.

5. Conclusion

Since a system model that contains transaction level, register-transfer level, and software level components should be simulatedand evaluated in the architecture exploration steps, we developeda mixed-level virtual prototyping environment that provides thefollowing features. First, it provides a set of design primitives forthe function modeling and architecture exploration so that thedesigner can construct various mixed-level virtual prototypeseasily. Second, it provides CSLs and abstraction adapters, which

ARTICLE IN PRESS


can connect different tools and abstraction level models. Third, itprovides the DEOS, which is an operating system that supportsSystemC APIs, so that transaction level descriptions can be compiledand executed as software codes. Fourth, it provides trace-drivensimulation, which can enhance the simulation speed of the mixed-level virtual prototypes for faster performance evaluation.

In the proposed virtual prototyping environment, we cangenerate heterogeneous virtual prototypes easily by reusingtransaction descriptions as software, reusing CATs in the CATtreelibraries, and attaching CSLs and abstraction adapters. Currently,however, architectural decisions have to be made manually bydesigners. We plan to augment the proposed virtual prototypingenvironment by providing automatic architecture explorationengines for some architectural decisions.

References

[1] Open SystemC Initiative, Functional specification for SystemC 2.0, /http://www.systemc.orgS, April 2002.

[2] IEEE Computer Society, IEEE Std 1666TM-2005: IEEE Standard SystemCs

Language Reference Manual, March 2006.

[3] Adam Rose, Stuart Swan, John Pierce, Jean-Michel Fernandez, Transactionlevel modeling in SystemC, /http://www.systemc.orgS, 2005.

[4] Adam Donlin, Transaction level modeling: flows and use models, in:Proceedings of CODES+ISSS’04.

[5] Sungjoo Yoo, Ahmed A. Jerraya, Hardware/software cosimulation frominterface perspective, IEE Proc. Comput. Digit. Tech. 152 (3) (2005).

[6] Sanggyu Park, Sangyong Yoon, Soo-Ik Chae, A mixed-level virtual prototypeenvironment for refinement-based design environment, Proceedings of RSP,June, 2006.

[7] Sanggyu Park, Sangyong Yoon, Soo-Ik Chae, Reusable component IP designusing refinement-based design environment, in: Proceedings of ASPDAC,2006.

[8] Thorsten Grotker, Stan Liao, Grant Martin, Stuart Swan, System Design withSystemC, Kluwer Academic Publisher, 2002.

[9] Wolfgang Muller, Wolfgang Rosenstiel, Jurgen Ruf, SystemC Methodologiesand Applications, Kluwer Academic Publisher, 2003.

[10] Peter van der Wolf, Erwin de Knock, Gerben Essink, et al., Design andprogramming of embedded multiprocessors: an interface-centric approach,Proc. CODES (2004) 206–217.

[11] Janick Bergeron, Writing Testbenches Using SystemVerilog, Springer, 2006.[12] Wander O. Cesario, Damien Lyonnard, Gabriela Nicolescu, Yanick Paviot,

Sungjoo Yoo, Ahmed A. Jerraya, Lovic Gauthier, Mario Diaz-Nava, Multi-processor SoC platforms: a component based design approach, IEEE Des. Test(2002).

[13] ARM LTD, ARM Software Development Toolkit Reference Guide, /http://www.arm.comS, 1997.

http://www.systemc.org



http://www.arm.com

http://www.arm.com

Documents

A mixed-level virtual prototyping environment for SystemC-based design methodology