ECE 545 Lecture 14 ATHENa - Automated Tool for Hardware...

Preview:

Citation preview

George Mason University

ATHENa - Automated Tool for Hardware EvaluatioN

ECE 545Lecture 14

2

Resources

• ATHENa websitehttp://cryptography.gmu.edu/athena

3

ATHENa– AutomatedToolforHardwareEvaluatioN

Supported in part by the National Institute of Standards & Technology (NIST)

ATHENaTeam

Venkata“Vinny”MS CpEstudent

Ekawat“Ice”

PhD CpEstudent

Marcin

PhD ECEstudent

Rajesh

PhD ECEstudent

MichalPhD exchangestudent from

Slovakia

John

MS CpEstudent

ATHENa – Automated Tool for Hardware EvaluatioN

5

Benchmarking open-source tool,written in Perl, aimed at an

AUTOMATED generation of OPTIMIZED results for MULTIPLE hardware platforms

Currently under development at George Mason University.

http://cryptography.gmu.edu/athena

Why Athena?

6

"The Greek goddess Athena was frequently called upon to settle disputes between the gods or various mortals. Athena Goddess of Wisdom was known for her superb logic and intellect. Her decisions were usually well-considered, highly ethical, and seldom motivated by self-interest.”

from "Athena, Greek Goddess of Wisdom and Craftsmanship"

7

Generation of Results Facilitated by ATHENa

vs.old days…

“working” with ATHENa…

ATHENaServer

FPGA Synthesis and Implementation

Result Summary+ Database Entries

2 3

HDL + scripts + configuration files

1

Database Entries

Download scripts and

configuration files8

Designer

4

HDL + FPGA Tools

User

Databasequery

Ranking of designs

56

Basic Dataflow of ATHENa

0Interfaces

+ Testbenches 8

9

synthesizablesourcefiles

configurationfiles

testbench

constraintfiles

resultsummary

(user-friendly)

databaseentries

(machine-friendly)

ATHENaMajorFeatures(1)• synthesis,implementation,andtiminganalysisinbatchmode

• supportfordevicesandtoolsofmultipleFPGAvendors:

• generationofresultsformultiplefamiliesofFPGAsofagivenvendor

• automatedchoiceofabest-matchingdevicewithinagivenfamily

10

ATHENaMajorFeatures(2)• automatedverificationofdesignsthroughsimulationinbatch

mode

• supportformulti-coreprocessing

• automatedextractionandtabulationofresults

• several optimizationstrategiesaimedatfinding– optimumoptionsoftools

– besttargetclockfrequency

– beststartingpointofplacement

OR

11

12

• batch mode of FPGA tools

• ease of extraction and tabulation of results• Text Reports, Excel, CSV (Comma-Separated Values)

• optimized choice of tool options• GMU_optimization_1 strategy

Generation of Results Facilitated by ATHENa

vs.

13

Relative Improvement of Results from Using ATHENa Virtex 5, 256-bit Variants of Hash Functions

0

0.5

1

1.5

2

2.5

AreaThrThr/Area

Ratios of results obtained using ATHENa suggested optionsvs. default options of FPGA tools

14

Other (Somewhat) Similar Tools

ExploreAhead (part of PlanAhead)

Design Space Explorer (DSE)

Boldport Flow

EDAx10 Cloud Platform

15

Distinguishing Features of ATHENa

• Support for multiple tools from multiple vendors

• Optimization strategies aimed at the best possible

performance rather than design closure

• Extraction and presentation of results

• Seamless integration with the ATHENa database of results

ManualDesign

HDLCode

Manual OptimizationFPGATools

Netlist

PostPlace&Route

Results

Functional Verification

Timing Verification

InformalSpecification TestVectors

Traditional Development and Benchmarking Flow

ManualDesign

HDLCode

Option OptimizationFPGATools

Netlist

PostPlace&Route

Results

Functional Verification

Timing Verification

InformalSpecification TestVectors

Extended Traditional Development and Benchmarking Flow

GMU ATHENa

Read the Tutorial!

Install the Required Tools(see Tutorial - Part 1 – Tools Installation)

Run ATHENa_setup

HowToStartWorkingWithATHENa?One-Time Tasks

Download and unzip ATHENa http://cryptography.gmu.edu/athena/

Modify design.config.txt+ possibly other configuration files

Run ATHENa

HowToStartWorkingWithATHENa?Repetitive Tasks

Prepare or modify your source files& source_list.txt

design.config.txtYourDesign

#directorycontainingsynthesizablesourcefilesfortheprojectSOURCE_DIR =<examples/sha256_rs>

#Afilelistcontaininglistoffilesintheordersuitableforsynthesisandimplementation#lowlevelmodulesfirst,toplevelentitylastSOURCE_LIST_FILE =source_list.txt

#projectname#itwillbeusedinthenamesofresultdirectoriesPROJECT_NAME=SHA256

#nameoftoplevelentityTOP_LEVEL_ENTITY =sha256

#nameoftoplevelarchitectureTOP_LEVEL_ARCH=rs_arch

#nameofclocknetCLOCK_NET=clk

design.config.txtTimingFormulas

#formulaforlatencyLATENCY=TCLK*65

#formulaforthroughputTHROUGHPUT=512/(TCLK*65)

design.config.txtApplication&OptimizationTarget

#OPTIMIZATION_TARGET=speed|area|balancedOPTIMIZATION_TARGET=speed

#OPTIONS=default|userOPTIONS=default

#APPLICATION=single_run|exhaustive_search|placement_search|frequency_search |#GMU_Optimization_1|GMU_Xilinx_optimization_1APPLICATION=single_run

#TRIM_MODE=off|zip|deleteTRIM_MODE=zip

design.config.txtFPGAFamilies

#commentingthenextlineremovesallfamiliesofXilinxFPGA_VENDOR=xilinx

#commentingthenextlineremovesagivenfamilyFPGA_FAMILY=spartan3

#FPGA_DEVICES=<listofdevices>|best_match|allFPGA_DEVICES=best_matchSYN_CONSTRAINT_FILE =defaultIMP_CONSTRAINT_FILE =defaultREQ_SYN_FREQ =120REQ_IMP_FREQ =100MAX_SLICE_UTILIZATION =0.8MAX_BRAM_UTILIZATION=0.8MAX_MUL_UTILIZATION=1MAX_PIN_UTILIZATION=0.9

ENDFAMILY

ENDVENDOR

design.config.txtFPGAFamilies

#commentingthenextlineremovesallfamiliesofAlteraFPGA_VENDOR=altera

#commentingthenextlineremovesagivenfamilyFPGA_FAMILY=StratixIII

#FPGA_DEVICES=<listofdevices>|best_match|allFPGA_DEVICES=best_matchSYN_CONSTRAINT_FILE =defaultIMP_CONSTRAINT_FILE =defaultREQ_IMP_FREQ =120MAX_LOGIC_UTILIZATION=0.8MAX_MEMORY_UTILIZATION=0.8MAX_DSP_UTILIZATION=0MAX_MUL_UTILIZATION=0MAX_PIN_UTILIZATION=0.8

ENDFAMILY

ENDVENDOR

LibraryFiles

device_lib/xilinx_device_lib.txtdevice_lib/altera_device_lib.txt

• FilescreatedduringATHENasetup

• Characterize FPGAfamiliesanddevicesavailableintheversionofXilinxandAlteratoolsinstalledonyourcomputer

• Currently supported toolversions:– XilinxWebPACK from9.1to14.7– XilinxDesignSuite from11.1to14.7– AlteraQuartusIIWebEdition from8.1to14.0– AlteraQuartusIISubscriptionEdition from9.1to14.0

• Incasealibraryforagivenversionnotavailableyet,usealibraryfromtheclosestavailableversion

LibraryFilesdevice_lib/xilinx_device_lib.txt

VENDOR=Xilinx#Device,TotalSlices,BlockRAMs,DSP,DedicatedMultipliers, MaximumUser I/OPinsITEM_ORDER =SLICE,BRAM, DSP,MULT,IOFAMILY=spartan3xc3s50pq208-5, 768, 4, 0, 4, 124xc3s200ft256-5, 1920,12, 0, 12,173xc3s400fg456-5,3584,16, 0, 16,264xc3s1000fg676-5, 7680,24, 0, 24,391xc3s1500fg676-5,13312,32,0,32,487END_FAMILY

FAMILY=virtex5xc5vlx30ff676-3, 4800,32, 32, 0, 400xc5vfx30tff665-3,5120,68,64,0,360xc5vlx30tff665-3,4800,36,32,0,360xc5vlx50ff1153-3,7200,48,48,0,560xc5vlx50tff1136-3,7200,60,48,0,480END_FAMILY

ResultFilesreport_resource_utilization.txt

xilinx : spartan3 +---------+-----------------+-----+------+---+--------+---+-------+----+-------+----+------+---+----+----+| GENERIC | DEVICE | RUN | LUTs | % | SLICES | % | BRAMs | % | MULTs | % | DSPs | % | IO | % |+---------+-----------------+-----+------+---+--------+---+-------+----+-------+----+------+---+----+----+| default | xc3s200ft256-5* | 1 | 142 | 3 | 74 | 3 | 4 | 33 | 7 | 58 | 0 | 0 | 20 | 11 |+---------+-----------------+-----+------+---+--------+---+-------+----+-------+----+------+---+----+----+

xilinx : spartan6 +---------+------------------+-----+------+---+--------+---+-------+---+-------+---+------+----+----+----+| GENERIC | DEVICE | RUN | LUTs | % | SLICES | % | BRAMs | % | MULTs | % | DSPs | % | IO | % |+---------+------------------+-----+------+---+--------+---+-------+---+-------+---+------+----+----+----+| default | xc6slx9csg324-3* | 1 | 41 | 1 | 22 | 1 | 4 | 6 | 0 | 0 | 9 | 56 | 20 | 10 |+---------+------------------+-----+------+---+--------+---+-------+---+-------+---+------+----+----+----+

xilinx : virtex5 +---------+-------------------+-----+------+---+--------+---+-------+----+-------+---+------+----+----+----+| GENERIC | DEVICE | RUN | LUTs | % | SLICES | % | BRAMs | % | MULTs | % | DSPs | % | IO | % |+---------+-------------------+-----+------+---+--------+---+-------+----+-------+---+------+----+----+----+| default | xc5vlx20tff323-2* | 1 | 101 | 1 | 56 | 1 | 4 | 15 | 0 | 0 | 9 | 37 | 20 | 11 |+---------+-------------------+-----+------+---+--------+---+-------+----+-------+---+------+----+----+----+

xilinx : virtex6 +---------+-------------------+-----+------+---+--------+---+-------+---+-------+---+------+---+----+---+| GENERIC | DEVICE | RUN | LUTs | % | SLICES | % | BRAMs | % | MULTs | % | DSPs | % | IO | % |+---------+-------------------+-----+------+---+--------+---+-------+---+-------+---+------+---+----+---+| default | xc6vlx75tff784-3* | 1 | 44 | 1 | 21 | 1 | 4 | 1 | 0 | 0 | 9 | 3 | 20 | 5 |+---------+-------------------+-----+------+---+--------+---+-------+---+-------+---+------+---+----+---+

ResultFilesreport_timing.txt

REQ SYN FREQ - Requested synthesis clk freq. SYN FREQ – Achieved synthesis clk. freq.REQ SYN TCLK - Requested synthesis clk period SYN TCLK – Achieved synthesis clk. periodREQ IMP FREQ - Requested implement. clk freq. IMP FREQ – Achieved implement. clk. freq.REQ IMP TCLK - Requested implement. clk period IMP TCLK – Achieved implement clk. periodLATENCY - Latency [ns] THROUGHPUT – Throughput [Mbits/s]TP/Area - Throughput/Area [(Mbits/s)/CLB slices Latency*Area – Latency*Area [ns*CLB slices]

xilinx : spartan3

+---------+-----------------+-----+--------------+----------+--------------+----------+--------------+----------+--------------+----------+---------+------------+------------+--------------+

| GENERIC | DEVICE | RUN | REQ SYN FREQ | SYN FREQ | REQ SYN TCLK | SYN TCLK | REQ IMP FREQ | IMP FREQ | REQ IMP TCLK | IMP TCLK | LATENCY | THROUGHPUT | TP/Area | Latency*Area |

+---------+-----------------+-----+--------------+----------+--------------+----------+--------------+----------+--------------+----------+---------+------------+------------+--------------+

| default | xc3s200ft256-5* | 1 | default | 207.370 | default | 4.822 | default | 112.448 | default | 8.893 | 17.786 | 449.792 | 6.078 | 1316.164 |

+---------+-----------------+-----+--------------+----------+--------------+----------+--------------+----------+--------------+----------+---------+------------+------------+--------------+

xilinx : spartan6

+---------+------------------+-----+--------------+----------+--------------+----------+--------------+----------+--------------+----------+---------+------------+------------+--------------+

| GENERIC | DEVICE | RUN | REQ SYN FREQ | SYN FREQ | REQ SYN TCLK | SYN TCLK | REQ IMP FREQ | IMP FREQ | REQ IMP TCLK | IMP TCLK | LATENCY | THROUGHPUT | TP/Area | Latency*Area |

+---------+------------------+-----+--------------+----------+--------------+----------+--------------+----------+--------------+----------+---------+------------+------------+--------------+

| default | xc6slx9csg324-3* | 1 | default | 75.751 | default | 13.201 | default | 78.119 | default | 12.801 | 25.602 | 312.476 | 14.203 | 563.244 |

+---------+------------------+-----+--------------+----------+--------------+----------+--------------+----------+--------------+----------+---------+------------+------------+--------------+

xilinx : virtex5

+---------+-------------------+-----+--------------+----------+--------------+----------+--------------+----------+--------------+----------+---------+------------+------------+--------------+

| GENERIC | DEVICE | RUN | REQ SYN FREQ | SYN FREQ | REQ SYN TCLK | SYN TCLK | REQ IMP FREQ | IMP FREQ | REQ IMP TCLK | IMP TCLK | LATENCY | THROUGHPUT | TP/Area | Latency*Area |

+---------+-------------------+-----+--------------+----------+--------------+----------+--------------+----------+--------------+----------+---------+------------+------------+--------------+

| default | xc5vlx20tff323-2* | 1 | default | 156.347 | default | 6.396 | default | 126.952 | default | 7.877 | 15.754 | 507.808 | 9.068 | 882.224 |

+---------+-------------------+-----+--------------+----------+--------------+----------+--------------+----------+--------------+----------+---------+------------+------------+--------------+

xilinx : virtex6

+---------+-------------------+-----+--------------+----------+--------------+----------+--------------+----------+--------------+----------+---------+------------+------------+--------------+

| GENERIC | DEVICE | RUN | REQ SYN FREQ | SYN FREQ | REQ SYN TCLK | SYN TCLK | REQ IMP FREQ | IMP FREQ | REQ IMP TCLK | IMP TCLK | LATENCY | THROUGHPUT | TP/Area | Latency*Area |

+---------+-------------------+-----+--------------+----------+--------------+----------+--------------+----------+--------------+----------+---------+------------+------------+--------------+

| default | xc6vlx75tff784-3* | 1 | default | 158.053 | default | 6.327 | default | 135.410 | default | 7.385 | 14.770 | 541.638 | 25.792 | 310.170 |

+---------+-------------------+-----+--------------+----------+--------------+----------+--------------+----------+--------------+----------+---------+------------+------------+--------------+

ResultFilesreport_options.txt

xilinx : spartan3 +---------+-----------------+-----+------------+------------------------------+-------------------------+--------------+| GENERIC | DEVICE | RUN | COST TABLE | Synthesis Options | Map Options | PAR Options |+---------+-----------------+-----+------------+------------------------------+-------------------------+--------------+| default | xc3s200ft256-5* | 1 | 1 | -opt_level 1 -opt_mode speed | -c 100 -pr b -cm speed | -w -ol std |+---------+-----------------+-----+------------+------------------------------+-------------------------+--------------+

xilinx : spartan6 +---------+------------------+-----+------------+------------------------------+---------------+--------------+| GENERIC | DEVICE | RUN | COST TABLE | Synthesis Options | Map Options | PAR Options |+---------+------------------+-----+------------+------------------------------+---------------+--------------+| default | xc6slx9csg324-3* | 1 | 1 | -opt_level 1 -opt_mode speed | -c 100 -pr b | -w -ol std |+---------+------------------+-----+------------+------------------------------+---------------+--------------+

xilinx : virtex5 +---------+-------------------+-----+------------+------------------------------+-------------------------+--------------+| GENERIC | DEVICE | RUN | COST TABLE | Synthesis Options | Map Options | PAR Options |+---------+-------------------+-----+------------+------------------------------+-------------------------+--------------+| default | xc5vlx20tff323-2* | 1 | 1 | -opt_level 1 -opt_mode speed | -c 100 -pr b -cm speed | -w -ol std |+---------+-------------------+-----+------------+------------------------------+-------------------------+--------------+

xilinx : virtex6 +---------+-------------------+-----+------------+------------------------------+---------------+--------------+| GENERIC | DEVICE | RUN | COST TABLE | Synthesis Options | Map Options | PAR Options |+---------+-------------------+-----+------------+------------------------------+---------------+--------------+| default | xc6vlx75tff784-3* | 1 | 1 | -opt_level 1 -opt_mode speed | -c 100 -pr b | -w -ol std |+---------+-------------------+-----+------------+------------------------------+---------------+--------------+

COST TABLE - parameter determining the starting point of placementSynthesis Options – options of the synthesis toolMap Options – Options of the mapping toolPAR Options – Options of the place & route tool

ResultFilesreport_execution_time.txt

xilinx : spartan3 +---------+-----------------+-----+----------------+---------------------+--------------+| GENERIC | DEVICE | RUN | Synthesis Time | Implementation Time | Elapsed Time |+---------+-----------------+-----+----------------+---------------------+--------------+| default | xc3s200ft256-5* | 1 | 0d 0h:0m:12s | 0d 0h:0m:36s | 0d 0h:0m:48s |+---------+-----------------+-----+----------------+---------------------+--------------+

xilinx : spartan6 +---------+------------------+-----+----------------+---------------------+--------------+| GENERIC | DEVICE | RUN | Synthesis Time | Implementation Time | Elapsed Time |+---------+------------------+-----+----------------+---------------------+--------------+| default | xc6slx9csg324-3* | 1 | 0d 0h:0m:21s | 0d 0h:1m:13s | 0d 0h:1m:34s |+---------+------------------+-----+----------------+---------------------+--------------+

xilinx : virtex5 +---------+-------------------+-----+----------------+---------------------+--------------+| GENERIC | DEVICE | RUN | Synthesis Time | Implementation Time | Elapsed Time |+---------+-------------------+-----+----------------+---------------------+--------------+| default | xc5vlx20tff323-2* | 1 | 0d 0h:0m:39s | 0d 0h:1m:50s | 0d 0h:2m:29s |+---------+-------------------+-----+----------------+---------------------+--------------+

xilinx : virtex6 +---------+-------------------+-----+----------------+---------------------+--------------+| GENERIC | DEVICE | RUN | Synthesis Time | Implementation Time | Elapsed Time |+---------+-------------------+-----+----------------+---------------------+--------------+| default | xc6vlx75tff784-3* | 1 | 0d 0h:0m:22s | 0d 0h:3m:22s | 0d 0h:3m:44s |+---------+-------------------+-----+----------------+---------------------+--------------+

Synthesis Time - Time of SynthesisImplementation Time - Time of ImplementationElapsed Time - Total Time

design.config.txtFunctionalSimulation(1)

#FUNCTIONAL_VERFICATION_MODE =<on|off>FUNCTIONAL_VERIFICATION_MODE =<off>

#directorycontainingsourcefilesofthetestbenchVERIFICATION_DIR =<examples/sha256_rs/tb>

#Afilecontainingalistoftestbenchfilesintheordersuitableforcompilation;#lowlevelmodulesfirst,toplevelentitylast.#Testvectorfilesshouldbelocatedinthesamedirectoryandlisted#inthesamefile,unlessfixedpathisused.Pleaserefertotutorialformoredetail.VERIFICATION_LIST_FILE =<tb_srcs.txt>

#nameoftestbench'stoplevelentityTB_TOP_LEVEL_ENTITY =<sha_tb>

#nameoftestbench'stoplevelarchitectureTB_TOP_LEVEL_ARCH =<behavior>

design.config.txtFunctionalSimulation(2)

#MAX_TIME_FUNCTIONAL_VERIFICATION =<$time$unit># supportedunitare:ps,ns,us,andms# ifblank,simulationwillrununtilitfinishes=#=nochangesinsignals,i.e., clockisstoppedandnomoreinputscomingin.MAX_TIME_FUNCTIONAL_VERIFICATION =<>

#Performonlyverification(synthesisandimplementationparametersareignored)#VERIFICATION_ONLY =<ON|OFF>VERIFICATION_ONLY =<off>

33

ATHENa – Databaseof Results

34

ATHENa Databasehttp://cryptography.gmu.edu/athenadb

35

ATHENa Database – Result View• Algorithm parameters• Design parameters

§ Optimization target§ Architecture type§ Datapath width§ I/O bus widths§ Availability of source code

§ Platform§ Vendor, Family, Device

§ Timing§ Maximum clock frequency§ Maximum throughput

§ Resource utilization§ Logic blocks (Slices/LEs/ALUTs)§ Multipliers/DSP units

§ Tools§ Names & versions§ Detailed options

§ Credits§ Designers & contact information

36

ATHENa Database – Compare Feature

Matching fields in greyNon-matching fields in red and blue

37

• Already available athttp://cryptography.gmu.edu/athena

• Similar to the database of results for hash functions

• Results can be entered by designers themselves.

• The ATHENa Option Optimization Tool supports automaticgeneration of results suitable for uploading to the database

ATHENa Database of Results for Authenticated Ciphers

38

Ordered Listing with a Single-Best (Unique) Result per Each Algorithm

39

40

41

42

Possible Future Customizations

The same basic database can be customized

and adapted for other domains, such as

• Digital Signal Processing

• Bioinformatics

• Communications

• Scientific Computing, etc.

43

Source Codes

44

• GMU Source Codes forall Round 3 SHA-3 Candidates & SHA-2

made available at the ATHENa website at:http://cryprography.gmu.edu/athena

• Included in this release:• Basic architectures• Folded architectures• Unrolled architectures• Each code supports two variants:

with 256-bit and 512-bit output.• Each source code accompanied by comprehensive

hierarchical block diagrams

GMU Source Codes and Block Diagrams

45

ATHENa Result Replication Files• Scripts and configuration files sufficient to easily

reproduce all results (without repeating optimizations)• Automatically created by ATHENa for all

results generated using ATHENa• Stored in the ATHENa Database

In the same spirit of Reproducible Research as:

• Patrick Vandewalle1, Jelena Kovacevic2, and Martin Vetterli1 (1EPFL, 2CMU)Reproducible research in signal processing - what, why, and how. IEEE Signal Processing Magazine, May 2009. http://rr.epfl.ch/17/

• J. Claerbout (Stanford University)“Electronic documents give reproducible research a new meaning,”in Proc. 62nd Ann. Int. Meeting of the Soc. of Exploration Geophysics, 1992, http://sepwww.stanford.edu/doku.php?id=sep:research:reproducible:seg92

.....

46

Benchmarking Goals Facilitated by ATHENa

1. cryptographic algorithms

2. hardware architectures or implementationsof the same cryptographic algorithm

3. hardware platforms from the point of view of their suitability for the implementation of a given algorithm,(e.g., choice of an FPGA device or FPGA board)

4. tools and languages in terms of qualityof results they generate (e.g. Verilog vs. VHDL, Synplicity Synplify Premier vs. Xilinx XST, ISE v. 13.1 vs. ISE v. 14.7)

Comparing multiple:

Recommended