Platform-based SW/HW Synthesis -...

Preview:

Citation preview

PlatformPlatform--based SW/HW based SW/HW SynthesisSynthesis

Zhong Chen, Ph.D.(Visiting Prof. from Peking University)zchen@cs.ucla.edu

SOC Group,UCLALed by Jason Cong

ICSOC Workshop Taiwan 03/28/2002

ContentsContents

OverviewHW/SW Co-design FlowSystem Data Model

Capability SIRMOC

SDM-API

Jpeg ExampleFurther Research Topics

OverviewOverview

Platform-based SynthesisStart from system level design descriptionTarget to FPSoC platformAutomate the process as much as possible

System Data ModelMOC – Model of Computation

System-Level Synthesis AlgorithmsIncorporate models such as Funstate model etc.

Internal Representationcover whole life-cycle of the flowSDM-API supports inter-operatability of CAD tools

Proposed PlatformProposed Platform--based HW/SW Synthesis Systembased HW/SW Synthesis System

Design Specification

Spec& Implementation

Simulation

Profiling

HardwareEstimation

SystemSynthesis

C Code VHDL

TargetSW

TargetPLD

VHDL

SWCode Gen

HWCode Gen

System Data Model Partitioning

Scheduling

System P.E.

InterfaceSynthesis

SoftwareEstimation

HW synthesis

SW synthesis

PlatformInformation

System Level Description to System Level Description to FPSoCFPSoC PlatformPlatform

SLD: Support of concepts needed in system designStructural and behavioral hierarchyConcurrencyState transitionsCommunicationException handlingTiming

Select SpecC Language as an InputSuperset of ANSI-C

ANSI-C plus Extensions for HW-designLeverage of large set of existing programSoftware requirements are fully covered

SpecC model – PSM MOCSeparation of communication and computationHierarchical network of behaviors and channelsPlug-and-play

Source: System Design: A Practical Guide with SpecC, Andreas Gerstlauer etc.,Kluwer Academic Publishers

Our Sample Target Our Sample Target FPSoCFPSoC PlatformPlatform–– ExcaliburExcaliburTMTM PlatformPlatform

EP20K200EProgrammable Logic Device

NiosCPU

High-Performance Embedded Processor

Up to 150K Gates Availablefor Customization

Other optional version of ExcaliburOther optional version of Excalibur

NiosCPU

75K Gates Availablefor Customization

Nios

ESB

Nios

ESB

Nios

ESB

Nios

ESB

Multi-Processor Micro-Coded System

500K Gates Availablefor Customization

Embedded SystemBlocks(ESBs)

EP20K 100E

Components in our Components in our FPSoC FPSoC PlatformPlatform

PLD: APEX TM20K200E (8320 LEs)Processor: Nios 16-bit or 32-bit Configurable

5-stage pipeline architectureOne instruction per cycle

Optimized for APEX PLD efficiency20% of APEX EP20K200E device in 32-bit configurationUp to 50MIPS and 50MHz

Memory: on-chip 256KSupports on-chip and off-chip memories

I/O: Customizable, on-chip peripherals JTAG, PCI user-definable

JPEG Encoder JPEG Encoder –– An exampleAn example

BMPImageFile

BMPImageFile

ImageFragmentation

ImageFragmentation

DCTDCT

EntropyCoding

EntropyCoding

JPGImageFile

JPGImageFile

QuantizationQuantizationJPEG: an standard for image compressionDCT: Discrete Cosine Transform(ChenDCT)

Four mode of the operations in JPEG standardSequential DCT-based mode

Progressive DCT-based modeLossless modeHierarchical mode

JPEG: an standard for image compressionDCT: Discrete Cosine Transform(ChenDCT)

Four mode of the operations in JPEG standardSequential DCT-based mode

Progressive DCT-based modeLossless modeHierarchical mode

Jpeg in Jpeg in SpecCSpecC –– Source Code FilesSource Code Files

GlobalGlobal

ChannChann

Dct++#Dct++#

Default+Default+

Design+-Design+-

Encode+-Encode+-

Huff+Huff+HeaderHeader Handle+Handle+

IoIo

Jpeg+-Jpeg+-

Quant+Quant+

Adapter+Adapter+

TbTb SpecC: Specification Language and Methodology – Daniel D. Gajski etc. , CECS, UC Irvine

Jpeg inJpeg in SpecCSpecC –– Program StructureProgram Structure

SDM SDM MoC MoC : : FunStateFunState--based based MoCMoC

F SW F HWc1 c1FS FR

M1 M2

CE(M 1 )/FS

CE (M 2 )/F HW

/FR… …

bb b

b

FunState – An Internal Design Representation for Codesign, Karsten Strehl etc.IEEE Transactions on VLSI System, VOL. 9, No.4 AUGUST 2001

Jpeg : From Jpeg : From SpecC SpecC to SDM representationto SDM representation

Input Jpeg OutputHeader

Pixel

Data

Input Jpeg Output

Tb.sc with fixed control flow

Jpeg: ItsJpeg: Its MoCMoC Representation in SDM Representation in SDM

JpegInit JpegEncode

JpegEndImageWidthImageHeightDCEhuffACEhuff

JpegInit JpegEncode JpegEnd

Data

JpegInit, JpegEnd are functions, JpegEncode is InnerComponent

Jpeg.sc

Jpeg: ItsJpeg: Its MoCMoC Representation in SDMRepresentation in SDM

ReceiveData

JpegEncodeStripe

stripeMDUWide

Cond/ReceiveData

JpegEncodeStripe

~Cond

DCEhuffACEhuff

ImageWidthImageHeight

mduHigh

mduHigh=0,MDUHigh=

(ImageHeight+7)>>3mduHigh+

+

MDUHigh

JpegEncode.sc

Cond is (mduHigh < MDUHigh)

Jpeg: ItsJpeg: Its MoCMoC Representation in SDMRepresentation in SDM

HandleData

dct Quantization

HuffmanEncodeA B C

Cond/HandleData dct

~Condquantization huffmancode

mduHighstripe

MDUWide

DCEhuffACEhuff

mduWide = 0

mduWide++

mduWide

JpegEncodeStripe.sc

Partitioning and Scheduling Partitioning and Scheduling -- (now manually)(now manually)

Input

JPEG

OutputData

Input Jpeg Output

HWSW

ReceivedataJpegEncodeStripe

Send Recv

Recv Send

DCT

Current flow Current flow –– where are we todaywhere are we today

Designer

MyDesign.scMyDesign.sc

Design.scDesign.sc

Design.sirDesign.sir

Design.sdmDesign.sdm

Design.vhdlDesign.vhdlDesign.cDesign.c

Design.ccDesign.cc

Profiling.exeProfiling.exe

Simulator.exeSimulator.exe

1) A System Designer Write a System-level design app in SpecC;

2) Rewrite it in order to go through our flow; Using a SubSetformat of SpecC and modified semantics

3) Using scc to create .sir4) Using psm2fs to convert .sir

to .sdm5) Using simgen to generate .cc for

simulator6) Compile the simulator using CC

compiler;7) Execute the simulator;8) Compile to Profiling.exe using

CC with profile options;9) Execute it to generate Profile

report;10) Using hwcgen to generate .vhdl11) Using Altara’s tools to generate

circuit .srec12) Using sccgen to generate .c13) Using target C compiler to

generate executable code

(1)

(2)

(3)

(4)(5)

(6)

(8) (12) (10)

Design.exeDesign.exe HW.srecHW.srec

(13) (11)

Profile rptProfile rpt

Simulate actSimulate act(7)

(9)

Intermediate Research AchievementsIntermediate Research Achievements

SDM- ConverterSDM- SimulatorSDM- C Code Generation toolSDM- SW Profiling toolSDM- HW Code Generation tool (partial)

Jpeg Implementation on Jpeg Implementation on ExcaliburExcaliburTMTM PlatformPlatform

EP20K200EProgrammable Logic Device

NiosCPU

Jpeg Software

DCT Circuit

Jpeg Compression ResultsJpeg Compression Results

116x96x8image in bmp format(12214 Bytes)

116x96x8image in jpeg format(1704 Bytes)

JPEG Encoder JPEG Encoder –– ProfilingProfiling……

BMPImageFile

BMPImageFile

ImageFragmentation

ImageFragmentation

DCTDCT

EntropyCoding

EntropyCoding

JPGImageFile

JPGImageFile

QuantizationQuantization

1.72%

77.47%

4.84%

15.97%

RunRun--time Profiling of Jpeg Programtime Profiling of Jpeg Program

1339.96/s746.29 µs65.98%

1339.96/s746.29 µs18.05%

42010.25/s23.8 µs15.97%

HuffmanEncode

5668.41/s176.42 µs4.27%

316.4/s3160.56 µs76.46%

19878.67/s50.31 µs1.22%

NIOS(SW)

5668.41/s176.42 µs15.60%

138533.91/s7.22 µs4.84%

Quantization

6328/s158.03 µs13.97%

8659.61/s115.48 µs77.47%

DCT

19878.67/s50.31 µs4.45%

391259.70/s*2.56 µs1.72%

HandleData

NIOS(SW+HW)PC(PIII 650MHz)Module Name

*Unit: execution times per second; time in micro-second(µs) of one time execution; rate among one time executionfor processing one 8x8 image block of 256 colors.

RunRun--time Results of Jpeg Exampletime Results of Jpeg Example

*Notes: one time execution for processing one 8x8 image block of 256 colors.

1: with half DCT implementation in order to fit in the area; Nios 1.1 work at 33Mhz

2: optimized DCT full implementation with simulation only ( by moduleSim)

time (10-6s) rate(%) time (10-6s) rate(%) time (10-6s) rate(%) time (10-6s) rate(%)

2.56 1.72% 50.31 1.22% 50.31 1.92% 50.31 4.45%

(391259.7) (19878.67) (19878.67) (19878.67)

115.48 77.47% 3160.56 76.46% 1641.04 62.78% 158.03 13.97%

(8659.61) (316.4) (609.37) (6328.00)

7.22 4.84% 176.42 4.27% 176.42 6.75% 176.42 15.60%

(138533.91) (5668.41) (5668.41) (5668.41)

23.80 15.97% 746.29 18.05% 746.29 28.55% 746.29 65.98%

(42010.25) (1339.96) (1339.96) (1339.96)

Total 149.06 100.00% 4133.57 100.00% 2614.05 100.00% 1131.04 100.00%

PC(PIII 650MHz) NIOS(SW) NIOS(SW+HW)1 NIOS(SW+HW)2

HandleData

DCT

Quantization

HuffmanEncode

Module Name

Research TopicsResearch Topics

Sytem-Level Synthesis AlgorithmPartitioning and SchedulingPerformance EstimationArchitecture Exploration

Hardware Interface SynthesisArchitecture ExplorationPlatform Resource Information

Software SynthesisCode Optimization with Resource ConstraintsSupport Polymorphism Description of Channel and Interface(?)

HW implementations of DCTHW implementations of DCT

33.6826%2784029%11154%4555569.26/s(2)NIOS+dct+Memory

33.68007%3021.18%1762(Dct only)

609.37/s

Performance

34.37007%3048%4004Half_dct only (recv+send)

33.3324%2649629%11181%6797(1)Nios+H_dct+recv+send

Max. 33M100%106496100%376100%8320EP20K200EFC484-2X

Clock Frequency

RateESBsRatePINsRateLEs

(1) Half-Dct implementation + Interface with PIO of Nios (through send + recv)(2) Full-Dct implementation + Memory as Interface

Open DiscussionOpen Discussion

Recommended