19
Video on DSP and Video on DSP and FPGA FPGA John Johansson John Johansson April 12, 2004 April 12, 2004

Video on DSP and FPGA

  • Upload
    avak

  • View
    78

  • Download
    0

Embed Size (px)

DESCRIPTION

Video on DSP and FPGA. John Johansson April 12, 2004. Agenda. Overview of video processing A typical video encoder and the DCT Requirements of DCT Comparison of DSP and FPGA chips Analysis and conclusions Questions. Overview of Video Processing. Video processing generally involves - PowerPoint PPT Presentation

Citation preview

Page 1: Video on DSP and FPGA

Video on DSP and Video on DSP and FPGAFPGA

John JohanssonJohn Johansson

April 12, 2004April 12, 2004

Page 2: Video on DSP and FPGA

AgendaAgenda

►Overview of video processingOverview of video processing► A typical video encoder and the DCTA typical video encoder and the DCT► Requirements of DCTRequirements of DCT► Comparison of DSP and FPGA chipsComparison of DSP and FPGA chips► Analysis and conclusionsAnalysis and conclusions►QuestionsQuestions

Page 3: Video on DSP and FPGA

Overview of Video ProcessingOverview of Video Processing

Video processing generally Video processing generally involvesinvolves

► Compression / Compression / DecompressionDecompression

► Special EffectsSpecial Effects► TV BroadcastingTV Broadcasting

► Focus on Compression

Page 4: Video on DSP and FPGA

Video EncodingVideo Encoding

Typical Video EncoderTypical Video Encoder

►Focus on DCT Focus on DCT algorithmalgorithm

Page 5: Video on DSP and FPGA

The Discrete Cosine The Discrete Cosine TransformationTransformation

►DCT is a spatial transform, like the FFTDCT is a spatial transform, like the FFT► Rearranges data into a more compressible Rearranges data into a more compressible

formatformat► Typically done on 64 (8x8) pixels at a timeTypically done on 64 (8x8) pixels at a time

►Big nasty equation …Big nasty equation …

►… … But no sharp teeth (optimizes extremely But no sharp teeth (optimizes extremely well)well)

Page 6: Video on DSP and FPGA

Requirements for DCTRequirements for DCT

Basic IdeaBasic Idea

►Read in data (64 values, 8-24 bits signed / Read in data (64 values, 8-24 bits signed / unsigned)unsigned)►Do transformationDo transformation►Write out dataWrite out data►Profit !!!Profit !!!

►Easy, right ??Easy, right ??

Page 7: Video on DSP and FPGA

Requirements for DCTRequirements for DCT

Memory LimitationsMemory Limitations► Load an entire frame?Load an entire frame?►One frame can vary from 50K to 50 MB in One frame can vary from 50K to 50 MB in

size when uncompressedsize when uncompressed► External memory is much slower, more External memory is much slower, more

plentifulplentiful►Do the DCT in chunks (8x8 block)Do the DCT in chunks (8x8 block)

Page 8: Video on DSP and FPGA

Requirements for DCTRequirements for DCT

Degree of ParallelismDegree of Parallelism►DCT can be done DCT can be done

serially, or broken serially, or broken up and done in up and done in parallelparallel

► Parallelism depends Parallelism depends largely on available largely on available memorymemory

► Price / Performance Price / Performance tradeoffstradeoffs

Page 9: Video on DSP and FPGA

The ChallengersThe Challengers

Xilinx Spartan-3 FPGAXilinx Spartan-3 FPGA► 50K – 5M gates50K – 5M gates► 326 MHz326 MHz► 100 KB – 2.3 MB internal memory100 KB – 2.3 MB internal memory► 4 - 104 dedicated multipliers4 - 104 dedicated multipliers►Oodles of I/O pins (up to 784)Oodles of I/O pins (up to 784)

Look at XC3S1000Look at XC3S1000►1M gates, 560 KB memory, 24 multipliers, 1M gates, 560 KB memory, 24 multipliers, 376 I/O pins376 I/O pins

Page 10: Video on DSP and FPGA

The ChallengersThe Challengers

ADSP-BF5xx Blackfin ProcessorADSP-BF5xx Blackfin Processor► 200 – 750 MHz200 – 750 MHz► Single or dual coreSingle or dual core►DMA memory controllerDMA memory controller► 52 KB – 326 KB internal memory52 KB – 326 KB internal memory►Other processor goodiesOther processor goodies

Look at ADSP-BF533Look at ADSP-BF533►500 MHz, single core, 148 KB memory500 MHz, single core, 148 KB memory

Page 11: Video on DSP and FPGA

PerformancePerformance

How do we correctly benchmark an algorithm How do we correctly benchmark an algorithm between two completely different processors?between two completely different processors?

►I don’t really knowI don’t really know►Look at some rough performance Look at some rough performance indicators and try and draw a conclusionindicators and try and draw a conclusion

Page 12: Video on DSP and FPGA

PerformancePerformance

FPGAFPGA► Varies from 1-25 cycle(s) / pixel for DCTVaries from 1-25 cycle(s) / pixel for DCT► Reading and writing of data takes additional Reading and writing of data takes additional

timetime► Clock speed limited by degree of parallelismClock speed limited by degree of parallelism

DSPDSP► Roughly 5 cycles / pixel for DCTRoughly 5 cycles / pixel for DCT►DMA controller allows parallel reading and DMA controller allows parallel reading and

writing with some setup overheadwriting with some setup overhead

Page 13: Video on DSP and FPGA

(Ideal) Performance(Ideal) Performance

Spartan-3Spartan-3► 64 read + 64 compute + 64 write = 196 64 read + 64 compute + 64 write = 196

cycles / blockcycles / block► 326 MHz = 1.66 Mblocks / second326 MHz = 1.66 Mblocks / second

BlackfinBlackfin► 319 compute + 10 DMA transfer = 329 319 compute + 10 DMA transfer = 329

cycles / blockcycles / block► 500 MHz = 1.52 Mblocks / second500 MHz = 1.52 Mblocks / second

Page 14: Video on DSP and FPGA

AdvantagesAdvantages

FPGAFPGA► Potential for very high parallelismPotential for very high parallelism► Existing video designs available for purchaseExisting video designs available for purchase► Good middleman functionalityGood middleman functionality

DSPDSP► Higher potential clock speedHigher potential clock speed► Much more flexible designMuch more flexible design► DMA memory controllerDMA memory controller

Page 15: Video on DSP and FPGA

DisadvantagesDisadvantages

FPGAFPGA► Low flexibilityLow flexibility►Hard to optimizeHard to optimize► Limited logic blocksLimited logic blocks

DSPDSP►Difficult to achieve full utilizationDifficult to achieve full utilization►Higher power consumptionHigher power consumption

Page 16: Video on DSP and FPGA

ConclusionsConclusions

FPGAFPGA► Best for well defined roles, like DCTBest for well defined roles, like DCT► Faster in situations where throughput mattersFaster in situations where throughput matters► Can be very expensiveCan be very expensive

DSPDSP► Better off for more flexible roles, like full Better off for more flexible roles, like full

encoderencoder► Situations where large amounts of (additional) Situations where large amounts of (additional)

memory are neededmemory are needed

Page 17: Video on DSP and FPGA

Questions?Questions?

Page 18: Video on DSP and FPGA

ReferencesReferences

Xilinx Spartan IIIXilinx Spartan IIIhttp://www.xilinx.com/xlnx/http://www.xilinx.com/xlnx/

xil_prodcat_landingpage.jsp?title=Spartan-3xil_prodcat_landingpage.jsp?title=Spartan-3

Analog Devices BlackfinAnalog Devices Blackfinhttp://www.analog.com/processors/http://www.analog.com/processors/

processors/blackfin/index.htmlprocessors/blackfin/index.html

Page 19: Video on DSP and FPGA

ReferencesReferences

Other articlesOther articleshttp://www.xilinx.com/publications/products/http://www.xilinx.com/publications/products/

services/xc_pdf/xc_videoapps44.pdfservices/xc_pdf/xc_videoapps44.pdf

http://www.xilinx.com/publications/products/http://www.xilinx.com/publications/products/sp2e/xc_dspvid43.htmsp2e/xc_dspvid43.htm

http://www.reed-ectronics.com/ednmag/http://www.reed-ectronics.com/ednmag/article/CA336860?article/CA336860?stt=000&pubdate=11%2F27%25stt=000&pubdate=11%2F27%25