Upload
vuhuong
View
219
Download
0
Embed Size (px)
Citation preview
PlatformPlatform--based SW/HW based SW/HW SynthesisSynthesis
Zhong Chen, Ph.D.(Visiting Prof. from Peking University)[email protected]
SOC Group,UCLALed by Jason Cong
ICSOC Workshop Taiwan 03/28/2002
ContentsContents
OverviewHW/SW Co-design FlowSystem Data Model
Capability SIRMOC
SDM-API
Jpeg ExampleFurther Research Topics
OverviewOverview
Platform-based SynthesisStart from system level design descriptionTarget to FPSoC platformAutomate the process as much as possible
System Data ModelMOC – Model of Computation
System-Level Synthesis AlgorithmsIncorporate models such as Funstate model etc.
Internal Representationcover whole life-cycle of the flowSDM-API supports inter-operatability of CAD tools
Proposed PlatformProposed Platform--based HW/SW Synthesis Systembased HW/SW Synthesis System
Design Specification
Spec& Implementation
Simulation
Profiling
HardwareEstimation
SystemSynthesis
C Code VHDL
TargetSW
TargetPLD
VHDL
SWCode Gen
HWCode Gen
System Data Model Partitioning
Scheduling
System P.E.
InterfaceSynthesis
SoftwareEstimation
HW synthesis
SW synthesis
PlatformInformation
System Level Description to System Level Description to FPSoCFPSoC PlatformPlatform
SLD: Support of concepts needed in system designStructural and behavioral hierarchyConcurrencyState transitionsCommunicationException handlingTiming
Select SpecC Language as an InputSuperset of ANSI-C
ANSI-C plus Extensions for HW-designLeverage of large set of existing programSoftware requirements are fully covered
SpecC model – PSM MOCSeparation of communication and computationHierarchical network of behaviors and channelsPlug-and-play
Source: System Design: A Practical Guide with SpecC, Andreas Gerstlauer etc.,Kluwer Academic Publishers
Our Sample Target Our Sample Target FPSoCFPSoC PlatformPlatform–– ExcaliburExcaliburTMTM PlatformPlatform
EP20K200EProgrammable Logic Device
NiosCPU
High-Performance Embedded Processor
Up to 150K Gates Availablefor Customization
Other optional version of ExcaliburOther optional version of Excalibur
NiosCPU
75K Gates Availablefor Customization
Nios
ESB
Nios
ESB
Nios
ESB
Nios
ESB
Multi-Processor Micro-Coded System
500K Gates Availablefor Customization
Embedded SystemBlocks(ESBs)
EP20K 100E
Components in our Components in our FPSoC FPSoC PlatformPlatform
PLD: APEX TM20K200E (8320 LEs)Processor: Nios 16-bit or 32-bit Configurable
5-stage pipeline architectureOne instruction per cycle
Optimized for APEX PLD efficiency20% of APEX EP20K200E device in 32-bit configurationUp to 50MIPS and 50MHz
Memory: on-chip 256KSupports on-chip and off-chip memories
I/O: Customizable, on-chip peripherals JTAG, PCI user-definable
JPEG Encoder JPEG Encoder –– An exampleAn example
BMPImageFile
BMPImageFile
ImageFragmentation
ImageFragmentation
DCTDCT
EntropyCoding
EntropyCoding
JPGImageFile
JPGImageFile
QuantizationQuantizationJPEG: an standard for image compressionDCT: Discrete Cosine Transform(ChenDCT)
Four mode of the operations in JPEG standardSequential DCT-based mode
Progressive DCT-based modeLossless modeHierarchical mode
JPEG: an standard for image compressionDCT: Discrete Cosine Transform(ChenDCT)
Four mode of the operations in JPEG standardSequential DCT-based mode
Progressive DCT-based modeLossless modeHierarchical mode
Jpeg in Jpeg in SpecCSpecC –– Source Code FilesSource Code Files
GlobalGlobal
ChannChann
Dct++#Dct++#
Default+Default+
Design+-Design+-
Encode+-Encode+-
Huff+Huff+HeaderHeader Handle+Handle+
IoIo
Jpeg+-Jpeg+-
Quant+Quant+
Adapter+Adapter+
TbTb SpecC: Specification Language and Methodology – Daniel D. Gajski etc. , CECS, UC Irvine
Jpeg inJpeg in SpecCSpecC –– Program StructureProgram Structure
SDM SDM MoC MoC : : FunStateFunState--based based MoCMoC
F SW F HWc1 c1FS FR
M1 M2
CE(M 1 )/FS
CE (M 2 )/F HW
/FR… …
bb b
b
FunState – An Internal Design Representation for Codesign, Karsten Strehl etc.IEEE Transactions on VLSI System, VOL. 9, No.4 AUGUST 2001
Jpeg : From Jpeg : From SpecC SpecC to SDM representationto SDM representation
Input Jpeg OutputHeader
Pixel
Data
Input Jpeg Output
Tb.sc with fixed control flow
Jpeg: ItsJpeg: Its MoCMoC Representation in SDM Representation in SDM
JpegInit JpegEncode
JpegEndImageWidthImageHeightDCEhuffACEhuff
JpegInit JpegEncode JpegEnd
Data
JpegInit, JpegEnd are functions, JpegEncode is InnerComponent
Jpeg.sc
Jpeg: ItsJpeg: Its MoCMoC Representation in SDMRepresentation in SDM
ReceiveData
JpegEncodeStripe
stripeMDUWide
Cond/ReceiveData
JpegEncodeStripe
~Cond
DCEhuffACEhuff
ImageWidthImageHeight
mduHigh
mduHigh=0,MDUHigh=
(ImageHeight+7)>>3mduHigh+
+
MDUHigh
JpegEncode.sc
Cond is (mduHigh < MDUHigh)
Jpeg: ItsJpeg: Its MoCMoC Representation in SDMRepresentation in SDM
HandleData
dct Quantization
HuffmanEncodeA B C
Cond/HandleData dct
~Condquantization huffmancode
mduHighstripe
MDUWide
DCEhuffACEhuff
mduWide = 0
mduWide++
mduWide
JpegEncodeStripe.sc
Partitioning and Scheduling Partitioning and Scheduling -- (now manually)(now manually)
Input
JPEG
OutputData
Input Jpeg Output
HWSW
ReceivedataJpegEncodeStripe
Send Recv
Recv Send
DCT
Current flow Current flow –– where are we todaywhere are we today
Designer
MyDesign.scMyDesign.sc
Design.scDesign.sc
Design.sirDesign.sir
Design.sdmDesign.sdm
Design.vhdlDesign.vhdlDesign.cDesign.c
Design.ccDesign.cc
Profiling.exeProfiling.exe
Simulator.exeSimulator.exe
1) A System Designer Write a System-level design app in SpecC;
2) Rewrite it in order to go through our flow; Using a SubSetformat of SpecC and modified semantics
3) Using scc to create .sir4) Using psm2fs to convert .sir
to .sdm5) Using simgen to generate .cc for
simulator6) Compile the simulator using CC
compiler;7) Execute the simulator;8) Compile to Profiling.exe using
CC with profile options;9) Execute it to generate Profile
report;10) Using hwcgen to generate .vhdl11) Using Altara’s tools to generate
circuit .srec12) Using sccgen to generate .c13) Using target C compiler to
generate executable code
(1)
(2)
(3)
(4)(5)
(6)
(8) (12) (10)
Design.exeDesign.exe HW.srecHW.srec
(13) (11)
Profile rptProfile rpt
Simulate actSimulate act(7)
(9)
Intermediate Research AchievementsIntermediate Research Achievements
SDM- ConverterSDM- SimulatorSDM- C Code Generation toolSDM- SW Profiling toolSDM- HW Code Generation tool (partial)
Jpeg Implementation on Jpeg Implementation on ExcaliburExcaliburTMTM PlatformPlatform
EP20K200EProgrammable Logic Device
NiosCPU
Jpeg Software
DCT Circuit
Jpeg Compression ResultsJpeg Compression Results
116x96x8image in bmp format(12214 Bytes)
116x96x8image in jpeg format(1704 Bytes)
JPEG Encoder JPEG Encoder –– ProfilingProfiling……
BMPImageFile
BMPImageFile
ImageFragmentation
ImageFragmentation
DCTDCT
EntropyCoding
EntropyCoding
JPGImageFile
JPGImageFile
QuantizationQuantization
1.72%
77.47%
4.84%
15.97%
RunRun--time Profiling of Jpeg Programtime Profiling of Jpeg Program
1339.96/s746.29 µs65.98%
1339.96/s746.29 µs18.05%
42010.25/s23.8 µs15.97%
HuffmanEncode
5668.41/s176.42 µs4.27%
316.4/s3160.56 µs76.46%
19878.67/s50.31 µs1.22%
NIOS(SW)
5668.41/s176.42 µs15.60%
138533.91/s7.22 µs4.84%
Quantization
6328/s158.03 µs13.97%
8659.61/s115.48 µs77.47%
DCT
19878.67/s50.31 µs4.45%
391259.70/s*2.56 µs1.72%
HandleData
NIOS(SW+HW)PC(PIII 650MHz)Module Name
*Unit: execution times per second; time in micro-second(µs) of one time execution; rate among one time executionfor processing one 8x8 image block of 256 colors.
RunRun--time Results of Jpeg Exampletime Results of Jpeg Example
*Notes: one time execution for processing one 8x8 image block of 256 colors.
1: with half DCT implementation in order to fit in the area; Nios 1.1 work at 33Mhz
2: optimized DCT full implementation with simulation only ( by moduleSim)
time (10-6s) rate(%) time (10-6s) rate(%) time (10-6s) rate(%) time (10-6s) rate(%)
2.56 1.72% 50.31 1.22% 50.31 1.92% 50.31 4.45%
(391259.7) (19878.67) (19878.67) (19878.67)
115.48 77.47% 3160.56 76.46% 1641.04 62.78% 158.03 13.97%
(8659.61) (316.4) (609.37) (6328.00)
7.22 4.84% 176.42 4.27% 176.42 6.75% 176.42 15.60%
(138533.91) (5668.41) (5668.41) (5668.41)
23.80 15.97% 746.29 18.05% 746.29 28.55% 746.29 65.98%
(42010.25) (1339.96) (1339.96) (1339.96)
Total 149.06 100.00% 4133.57 100.00% 2614.05 100.00% 1131.04 100.00%
PC(PIII 650MHz) NIOS(SW) NIOS(SW+HW)1 NIOS(SW+HW)2
HandleData
DCT
Quantization
HuffmanEncode
Module Name
Research TopicsResearch Topics
Sytem-Level Synthesis AlgorithmPartitioning and SchedulingPerformance EstimationArchitecture Exploration
Hardware Interface SynthesisArchitecture ExplorationPlatform Resource Information
Software SynthesisCode Optimization with Resource ConstraintsSupport Polymorphism Description of Channel and Interface(?)
HW implementations of DCTHW implementations of DCT
33.6826%2784029%11154%4555569.26/s(2)NIOS+dct+Memory
33.68007%3021.18%1762(Dct only)
609.37/s
Performance
34.37007%3048%4004Half_dct only (recv+send)
33.3324%2649629%11181%6797(1)Nios+H_dct+recv+send
Max. 33M100%106496100%376100%8320EP20K200EFC484-2X
Clock Frequency
RateESBsRatePINsRateLEs
(1) Half-Dct implementation + Interface with PIO of Nios (through send + recv)(2) Full-Dct implementation + Memory as Interface
Open DiscussionOpen Discussion