Optimizing FPGA Accelerator Design for Deep Convolution neural Networks By: Mohamad Kanafanai

Preview:

Citation preview

Optimizing FPGA Accelerator Design for Deep Convolution neural NetworksBy: Mohamad Kanafanai

OutlineIntroductionBackgroundMethodologyResultsEvaluation of the systemCriticismQ&A

IntroductionCNN is extend from artificial

neural networkApplication include image

processing Requires high performance

computation hardwareDesign exploration is a must !

What is Deep Convolution neural Networks ? Type of Machine learning8 stepsLimitationsFeed forward computation

Roof Line model Provide a graphical representation

of performance and productivity◦Rates and efficiencies(Gflops, % of peak)◦limitation◦Benefits

Focus ◦Computation◦Communication◦locality

Not for fine tuning

Types of dataIrrelevant Independent Dependent

Double buffering Allows for two way

communicationIncrease throughput

Main concerns Communication overheadBuffer managementBandwidth optimizationBetter Utilization of FPGA

Design ExplorationComputation

◦Loop scheduling◦Loop tile sizes

Communication ratio

Directives loop PipelineSoftware pipeliningIncrease throughput

Directives Loop UnrollingMaximizes computationData flow design

Directives Loop TillingDivides loops into smaller loops

◦ensure data stays in cache◦Great for Data reuse

Memory Optimization Polyhedral based optimizationLocal memory promotion for

irrelevant type communicationsData reuse

Designed Model

Detail of the final design

ResultsVirtex 7 100 MHz as IP using VHLSIntel Xeon E5 2.2 GHz 15 MB cachePre synthesis report used for performance

and exploration

Evaluation of the system 17.42 X speedup on 1 thread GP implementation 4.8 X speedup on 16 thread GP implementation 18.6 watts vs 95 watts GP 3.62X speedup on ICCD2013 Design

My opinionThe techniques used to optimize

loops are well thought out It’s a unique way of looking at an

acceleratorThe memory enhancement offer

great insight

Pitfall of the claimPre cached data testsEvaluation metrics when

comparing other designs Only tested using one imageTechnology difference Claiming Design has best

utilization

Q&A

Referencehttp://crd.lbl.gov/assets/pubs_presos/pa

rlab08-roofline-talk.pdfhttps://www.youtube.com/watch?v=n6h

pQwq7Inwhttp://en.wikipedia.org/wiki/Loop_tilinghttp://en.wikipedia.org/wiki/Polytope_m

odelChen Zhang, Peng Li, Guangyu Sun,

Yijin Guan, Bingjun Xiao, Jason Cong ,Center for Energy-Efficient Computing and Applications, Peking University, China, Computer Science Department, University of California, Los Angeles, USA

Recommended