Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
George Mason University
ATHENa - Automated Tool for Hardware EvaluatioN
ECE 545Lecture 14
2
Resources
• ATHENa websitehttp://cryptography.gmu.edu/athena
3
ATHENa– AutomatedToolforHardwareEvaluatioN
Supported in part by the National Institute of Standards & Technology (NIST)
ATHENaTeam
Venkata“Vinny”MS CpEstudent
Ekawat“Ice”
PhD CpEstudent
Marcin
PhD ECEstudent
Rajesh
PhD ECEstudent
MichalPhD exchangestudent from
Slovakia
John
MS CpEstudent
ATHENa – Automated Tool for Hardware EvaluatioN
5
Benchmarking open-source tool,written in Perl, aimed at an
AUTOMATED generation of OPTIMIZED results for MULTIPLE hardware platforms
Currently under development at George Mason University.
http://cryptography.gmu.edu/athena
Why Athena?
6
"The Greek goddess Athena was frequently called upon to settle disputes between the gods or various mortals. Athena Goddess of Wisdom was known for her superb logic and intellect. Her decisions were usually well-considered, highly ethical, and seldom motivated by self-interest.”
from "Athena, Greek Goddess of Wisdom and Craftsmanship"
7
Generation of Results Facilitated by ATHENa
vs.old days…
“working” with ATHENa…
ATHENaServer
FPGA Synthesis and Implementation
Result Summary+ Database Entries
2 3
HDL + scripts + configuration files
1
Database Entries
Download scripts and
configuration files8
Designer
4
HDL + FPGA Tools
User
Databasequery
Ranking of designs
56
Basic Dataflow of ATHENa
0Interfaces
+ Testbenches 8
9
synthesizablesourcefiles
configurationfiles
testbench
constraintfiles
resultsummary
(user-friendly)
databaseentries
(machine-friendly)
ATHENaMajorFeatures(1)• synthesis,implementation,andtiminganalysisinbatchmode
• supportfordevicesandtoolsofmultipleFPGAvendors:
• generationofresultsformultiplefamiliesofFPGAsofagivenvendor
• automatedchoiceofabest-matchingdevicewithinagivenfamily
10
ATHENaMajorFeatures(2)• automatedverificationofdesignsthroughsimulationinbatch
mode
• supportformulti-coreprocessing
• automatedextractionandtabulationofresults
• several optimizationstrategiesaimedatfinding– optimumoptionsoftools
– besttargetclockfrequency
– beststartingpointofplacement
OR
11
12
• batch mode of FPGA tools
• ease of extraction and tabulation of results• Text Reports, Excel, CSV (Comma-Separated Values)
• optimized choice of tool options• GMU_optimization_1 strategy
Generation of Results Facilitated by ATHENa
vs.
13
Relative Improvement of Results from Using ATHENa Virtex 5, 256-bit Variants of Hash Functions
0
0.5
1
1.5
2
2.5
AreaThrThr/Area
Ratios of results obtained using ATHENa suggested optionsvs. default options of FPGA tools
14
Other (Somewhat) Similar Tools
ExploreAhead (part of PlanAhead)
Design Space Explorer (DSE)
Boldport Flow
EDAx10 Cloud Platform
15
Distinguishing Features of ATHENa
• Support for multiple tools from multiple vendors
• Optimization strategies aimed at the best possible
performance rather than design closure
• Extraction and presentation of results
• Seamless integration with the ATHENa database of results
ManualDesign
HDLCode
Manual OptimizationFPGATools
Netlist
PostPlace&Route
Results
Functional Verification
Timing Verification
InformalSpecification TestVectors
Traditional Development and Benchmarking Flow
ManualDesign
HDLCode
Option OptimizationFPGATools
Netlist
PostPlace&Route
Results
Functional Verification
Timing Verification
InformalSpecification TestVectors
Extended Traditional Development and Benchmarking Flow
GMU ATHENa
Read the Tutorial!
Install the Required Tools(see Tutorial - Part 1 – Tools Installation)
Run ATHENa_setup
HowToStartWorkingWithATHENa?One-Time Tasks
Download and unzip ATHENa http://cryptography.gmu.edu/athena/
Modify design.config.txt+ possibly other configuration files
Run ATHENa
HowToStartWorkingWithATHENa?Repetitive Tasks
Prepare or modify your source files& source_list.txt
design.config.txtYourDesign
#directorycontainingsynthesizablesourcefilesfortheprojectSOURCE_DIR =<examples/sha256_rs>
#Afilelistcontaininglistoffilesintheordersuitableforsynthesisandimplementation#lowlevelmodulesfirst,toplevelentitylastSOURCE_LIST_FILE =source_list.txt
#projectname#itwillbeusedinthenamesofresultdirectoriesPROJECT_NAME=SHA256
#nameoftoplevelentityTOP_LEVEL_ENTITY =sha256
#nameoftoplevelarchitectureTOP_LEVEL_ARCH=rs_arch
#nameofclocknetCLOCK_NET=clk
design.config.txtTimingFormulas
#formulaforlatencyLATENCY=TCLK*65
#formulaforthroughputTHROUGHPUT=512/(TCLK*65)
design.config.txtApplication&OptimizationTarget
#OPTIMIZATION_TARGET=speed|area|balancedOPTIMIZATION_TARGET=speed
#OPTIONS=default|userOPTIONS=default
#APPLICATION=single_run|exhaustive_search|placement_search|frequency_search |#GMU_Optimization_1|GMU_Xilinx_optimization_1APPLICATION=single_run
#TRIM_MODE=off|zip|deleteTRIM_MODE=zip
design.config.txtFPGAFamilies
#commentingthenextlineremovesallfamiliesofXilinxFPGA_VENDOR=xilinx
#commentingthenextlineremovesagivenfamilyFPGA_FAMILY=spartan3
#FPGA_DEVICES=<listofdevices>|best_match|allFPGA_DEVICES=best_matchSYN_CONSTRAINT_FILE =defaultIMP_CONSTRAINT_FILE =defaultREQ_SYN_FREQ =120REQ_IMP_FREQ =100MAX_SLICE_UTILIZATION =0.8MAX_BRAM_UTILIZATION=0.8MAX_MUL_UTILIZATION=1MAX_PIN_UTILIZATION=0.9
ENDFAMILY
ENDVENDOR
design.config.txtFPGAFamilies
#commentingthenextlineremovesallfamiliesofAlteraFPGA_VENDOR=altera
#commentingthenextlineremovesagivenfamilyFPGA_FAMILY=StratixIII
#FPGA_DEVICES=<listofdevices>|best_match|allFPGA_DEVICES=best_matchSYN_CONSTRAINT_FILE =defaultIMP_CONSTRAINT_FILE =defaultREQ_IMP_FREQ =120MAX_LOGIC_UTILIZATION=0.8MAX_MEMORY_UTILIZATION=0.8MAX_DSP_UTILIZATION=0MAX_MUL_UTILIZATION=0MAX_PIN_UTILIZATION=0.8
ENDFAMILY
ENDVENDOR
LibraryFiles
device_lib/xilinx_device_lib.txtdevice_lib/altera_device_lib.txt
• FilescreatedduringATHENasetup
• Characterize FPGAfamiliesanddevicesavailableintheversionofXilinxandAlteratoolsinstalledonyourcomputer
• Currently supported toolversions:– XilinxWebPACK from9.1to14.7– XilinxDesignSuite from11.1to14.7– AlteraQuartusIIWebEdition from8.1to14.0– AlteraQuartusIISubscriptionEdition from9.1to14.0
• Incasealibraryforagivenversionnotavailableyet,usealibraryfromtheclosestavailableversion
LibraryFilesdevice_lib/xilinx_device_lib.txt
VENDOR=Xilinx#Device,TotalSlices,BlockRAMs,DSP,DedicatedMultipliers, MaximumUser I/OPinsITEM_ORDER =SLICE,BRAM, DSP,MULT,IOFAMILY=spartan3xc3s50pq208-5, 768, 4, 0, 4, 124xc3s200ft256-5, 1920,12, 0, 12,173xc3s400fg456-5,3584,16, 0, 16,264xc3s1000fg676-5, 7680,24, 0, 24,391xc3s1500fg676-5,13312,32,0,32,487END_FAMILY
FAMILY=virtex5xc5vlx30ff676-3, 4800,32, 32, 0, 400xc5vfx30tff665-3,5120,68,64,0,360xc5vlx30tff665-3,4800,36,32,0,360xc5vlx50ff1153-3,7200,48,48,0,560xc5vlx50tff1136-3,7200,60,48,0,480END_FAMILY
ResultFilesreport_resource_utilization.txt
xilinx : spartan3 +---------+-----------------+-----+------+---+--------+---+-------+----+-------+----+------+---+----+----+| GENERIC | DEVICE | RUN | LUTs | % | SLICES | % | BRAMs | % | MULTs | % | DSPs | % | IO | % |+---------+-----------------+-----+------+---+--------+---+-------+----+-------+----+------+---+----+----+| default | xc3s200ft256-5* | 1 | 142 | 3 | 74 | 3 | 4 | 33 | 7 | 58 | 0 | 0 | 20 | 11 |+---------+-----------------+-----+------+---+--------+---+-------+----+-------+----+------+---+----+----+
xilinx : spartan6 +---------+------------------+-----+------+---+--------+---+-------+---+-------+---+------+----+----+----+| GENERIC | DEVICE | RUN | LUTs | % | SLICES | % | BRAMs | % | MULTs | % | DSPs | % | IO | % |+---------+------------------+-----+------+---+--------+---+-------+---+-------+---+------+----+----+----+| default | xc6slx9csg324-3* | 1 | 41 | 1 | 22 | 1 | 4 | 6 | 0 | 0 | 9 | 56 | 20 | 10 |+---------+------------------+-----+------+---+--------+---+-------+---+-------+---+------+----+----+----+
xilinx : virtex5 +---------+-------------------+-----+------+---+--------+---+-------+----+-------+---+------+----+----+----+| GENERIC | DEVICE | RUN | LUTs | % | SLICES | % | BRAMs | % | MULTs | % | DSPs | % | IO | % |+---------+-------------------+-----+------+---+--------+---+-------+----+-------+---+------+----+----+----+| default | xc5vlx20tff323-2* | 1 | 101 | 1 | 56 | 1 | 4 | 15 | 0 | 0 | 9 | 37 | 20 | 11 |+---------+-------------------+-----+------+---+--------+---+-------+----+-------+---+------+----+----+----+
xilinx : virtex6 +---------+-------------------+-----+------+---+--------+---+-------+---+-------+---+------+---+----+---+| GENERIC | DEVICE | RUN | LUTs | % | SLICES | % | BRAMs | % | MULTs | % | DSPs | % | IO | % |+---------+-------------------+-----+------+---+--------+---+-------+---+-------+---+------+---+----+---+| default | xc6vlx75tff784-3* | 1 | 44 | 1 | 21 | 1 | 4 | 1 | 0 | 0 | 9 | 3 | 20 | 5 |+---------+-------------------+-----+------+---+--------+---+-------+---+-------+---+------+---+----+---+
ResultFilesreport_timing.txt
REQ SYN FREQ - Requested synthesis clk freq. SYN FREQ – Achieved synthesis clk. freq.REQ SYN TCLK - Requested synthesis clk period SYN TCLK – Achieved synthesis clk. periodREQ IMP FREQ - Requested implement. clk freq. IMP FREQ – Achieved implement. clk. freq.REQ IMP TCLK - Requested implement. clk period IMP TCLK – Achieved implement clk. periodLATENCY - Latency [ns] THROUGHPUT – Throughput [Mbits/s]TP/Area - Throughput/Area [(Mbits/s)/CLB slices Latency*Area – Latency*Area [ns*CLB slices]
xilinx : spartan3
+---------+-----------------+-----+--------------+----------+--------------+----------+--------------+----------+--------------+----------+---------+------------+------------+--------------+
| GENERIC | DEVICE | RUN | REQ SYN FREQ | SYN FREQ | REQ SYN TCLK | SYN TCLK | REQ IMP FREQ | IMP FREQ | REQ IMP TCLK | IMP TCLK | LATENCY | THROUGHPUT | TP/Area | Latency*Area |
+---------+-----------------+-----+--------------+----------+--------------+----------+--------------+----------+--------------+----------+---------+------------+------------+--------------+
| default | xc3s200ft256-5* | 1 | default | 207.370 | default | 4.822 | default | 112.448 | default | 8.893 | 17.786 | 449.792 | 6.078 | 1316.164 |
+---------+-----------------+-----+--------------+----------+--------------+----------+--------------+----------+--------------+----------+---------+------------+------------+--------------+
xilinx : spartan6
+---------+------------------+-----+--------------+----------+--------------+----------+--------------+----------+--------------+----------+---------+------------+------------+--------------+
| GENERIC | DEVICE | RUN | REQ SYN FREQ | SYN FREQ | REQ SYN TCLK | SYN TCLK | REQ IMP FREQ | IMP FREQ | REQ IMP TCLK | IMP TCLK | LATENCY | THROUGHPUT | TP/Area | Latency*Area |
+---------+------------------+-----+--------------+----------+--------------+----------+--------------+----------+--------------+----------+---------+------------+------------+--------------+
| default | xc6slx9csg324-3* | 1 | default | 75.751 | default | 13.201 | default | 78.119 | default | 12.801 | 25.602 | 312.476 | 14.203 | 563.244 |
+---------+------------------+-----+--------------+----------+--------------+----------+--------------+----------+--------------+----------+---------+------------+------------+--------------+
xilinx : virtex5
+---------+-------------------+-----+--------------+----------+--------------+----------+--------------+----------+--------------+----------+---------+------------+------------+--------------+
| GENERIC | DEVICE | RUN | REQ SYN FREQ | SYN FREQ | REQ SYN TCLK | SYN TCLK | REQ IMP FREQ | IMP FREQ | REQ IMP TCLK | IMP TCLK | LATENCY | THROUGHPUT | TP/Area | Latency*Area |
+---------+-------------------+-----+--------------+----------+--------------+----------+--------------+----------+--------------+----------+---------+------------+------------+--------------+
| default | xc5vlx20tff323-2* | 1 | default | 156.347 | default | 6.396 | default | 126.952 | default | 7.877 | 15.754 | 507.808 | 9.068 | 882.224 |
+---------+-------------------+-----+--------------+----------+--------------+----------+--------------+----------+--------------+----------+---------+------------+------------+--------------+
xilinx : virtex6
+---------+-------------------+-----+--------------+----------+--------------+----------+--------------+----------+--------------+----------+---------+------------+------------+--------------+
| GENERIC | DEVICE | RUN | REQ SYN FREQ | SYN FREQ | REQ SYN TCLK | SYN TCLK | REQ IMP FREQ | IMP FREQ | REQ IMP TCLK | IMP TCLK | LATENCY | THROUGHPUT | TP/Area | Latency*Area |
+---------+-------------------+-----+--------------+----------+--------------+----------+--------------+----------+--------------+----------+---------+------------+------------+--------------+
| default | xc6vlx75tff784-3* | 1 | default | 158.053 | default | 6.327 | default | 135.410 | default | 7.385 | 14.770 | 541.638 | 25.792 | 310.170 |
+---------+-------------------+-----+--------------+----------+--------------+----------+--------------+----------+--------------+----------+---------+------------+------------+--------------+
ResultFilesreport_options.txt
xilinx : spartan3 +---------+-----------------+-----+------------+------------------------------+-------------------------+--------------+| GENERIC | DEVICE | RUN | COST TABLE | Synthesis Options | Map Options | PAR Options |+---------+-----------------+-----+------------+------------------------------+-------------------------+--------------+| default | xc3s200ft256-5* | 1 | 1 | -opt_level 1 -opt_mode speed | -c 100 -pr b -cm speed | -w -ol std |+---------+-----------------+-----+------------+------------------------------+-------------------------+--------------+
xilinx : spartan6 +---------+------------------+-----+------------+------------------------------+---------------+--------------+| GENERIC | DEVICE | RUN | COST TABLE | Synthesis Options | Map Options | PAR Options |+---------+------------------+-----+------------+------------------------------+---------------+--------------+| default | xc6slx9csg324-3* | 1 | 1 | -opt_level 1 -opt_mode speed | -c 100 -pr b | -w -ol std |+---------+------------------+-----+------------+------------------------------+---------------+--------------+
xilinx : virtex5 +---------+-------------------+-----+------------+------------------------------+-------------------------+--------------+| GENERIC | DEVICE | RUN | COST TABLE | Synthesis Options | Map Options | PAR Options |+---------+-------------------+-----+------------+------------------------------+-------------------------+--------------+| default | xc5vlx20tff323-2* | 1 | 1 | -opt_level 1 -opt_mode speed | -c 100 -pr b -cm speed | -w -ol std |+---------+-------------------+-----+------------+------------------------------+-------------------------+--------------+
xilinx : virtex6 +---------+-------------------+-----+------------+------------------------------+---------------+--------------+| GENERIC | DEVICE | RUN | COST TABLE | Synthesis Options | Map Options | PAR Options |+---------+-------------------+-----+------------+------------------------------+---------------+--------------+| default | xc6vlx75tff784-3* | 1 | 1 | -opt_level 1 -opt_mode speed | -c 100 -pr b | -w -ol std |+---------+-------------------+-----+------------+------------------------------+---------------+--------------+
COST TABLE - parameter determining the starting point of placementSynthesis Options – options of the synthesis toolMap Options – Options of the mapping toolPAR Options – Options of the place & route tool
ResultFilesreport_execution_time.txt
xilinx : spartan3 +---------+-----------------+-----+----------------+---------------------+--------------+| GENERIC | DEVICE | RUN | Synthesis Time | Implementation Time | Elapsed Time |+---------+-----------------+-----+----------------+---------------------+--------------+| default | xc3s200ft256-5* | 1 | 0d 0h:0m:12s | 0d 0h:0m:36s | 0d 0h:0m:48s |+---------+-----------------+-----+----------------+---------------------+--------------+
xilinx : spartan6 +---------+------------------+-----+----------------+---------------------+--------------+| GENERIC | DEVICE | RUN | Synthesis Time | Implementation Time | Elapsed Time |+---------+------------------+-----+----------------+---------------------+--------------+| default | xc6slx9csg324-3* | 1 | 0d 0h:0m:21s | 0d 0h:1m:13s | 0d 0h:1m:34s |+---------+------------------+-----+----------------+---------------------+--------------+
xilinx : virtex5 +---------+-------------------+-----+----------------+---------------------+--------------+| GENERIC | DEVICE | RUN | Synthesis Time | Implementation Time | Elapsed Time |+---------+-------------------+-----+----------------+---------------------+--------------+| default | xc5vlx20tff323-2* | 1 | 0d 0h:0m:39s | 0d 0h:1m:50s | 0d 0h:2m:29s |+---------+-------------------+-----+----------------+---------------------+--------------+
xilinx : virtex6 +---------+-------------------+-----+----------------+---------------------+--------------+| GENERIC | DEVICE | RUN | Synthesis Time | Implementation Time | Elapsed Time |+---------+-------------------+-----+----------------+---------------------+--------------+| default | xc6vlx75tff784-3* | 1 | 0d 0h:0m:22s | 0d 0h:3m:22s | 0d 0h:3m:44s |+---------+-------------------+-----+----------------+---------------------+--------------+
Synthesis Time - Time of SynthesisImplementation Time - Time of ImplementationElapsed Time - Total Time
design.config.txtFunctionalSimulation(1)
#FUNCTIONAL_VERFICATION_MODE =<on|off>FUNCTIONAL_VERIFICATION_MODE =<off>
#directorycontainingsourcefilesofthetestbenchVERIFICATION_DIR =<examples/sha256_rs/tb>
#Afilecontainingalistoftestbenchfilesintheordersuitableforcompilation;#lowlevelmodulesfirst,toplevelentitylast.#Testvectorfilesshouldbelocatedinthesamedirectoryandlisted#inthesamefile,unlessfixedpathisused.Pleaserefertotutorialformoredetail.VERIFICATION_LIST_FILE =<tb_srcs.txt>
#nameoftestbench'stoplevelentityTB_TOP_LEVEL_ENTITY =<sha_tb>
#nameoftestbench'stoplevelarchitectureTB_TOP_LEVEL_ARCH =<behavior>
design.config.txtFunctionalSimulation(2)
#MAX_TIME_FUNCTIONAL_VERIFICATION =<$time$unit># supportedunitare:ps,ns,us,andms# ifblank,simulationwillrununtilitfinishes=#=nochangesinsignals,i.e., clockisstoppedandnomoreinputscomingin.MAX_TIME_FUNCTIONAL_VERIFICATION =<>
#Performonlyverification(synthesisandimplementationparametersareignored)#VERIFICATION_ONLY =<ON|OFF>VERIFICATION_ONLY =<off>
33
ATHENa – Databaseof Results
34
ATHENa Databasehttp://cryptography.gmu.edu/athenadb
35
ATHENa Database – Result View• Algorithm parameters• Design parameters
§ Optimization target§ Architecture type§ Datapath width§ I/O bus widths§ Availability of source code
§ Platform§ Vendor, Family, Device
§ Timing§ Maximum clock frequency§ Maximum throughput
§ Resource utilization§ Logic blocks (Slices/LEs/ALUTs)§ Multipliers/DSP units
§ Tools§ Names & versions§ Detailed options
§ Credits§ Designers & contact information
36
ATHENa Database – Compare Feature
Matching fields in greyNon-matching fields in red and blue
37
• Already available athttp://cryptography.gmu.edu/athena
• Similar to the database of results for hash functions
• Results can be entered by designers themselves.
• The ATHENa Option Optimization Tool supports automaticgeneration of results suitable for uploading to the database
ATHENa Database of Results for Authenticated Ciphers
38
Ordered Listing with a Single-Best (Unique) Result per Each Algorithm
39
40
41
42
Possible Future Customizations
The same basic database can be customized
and adapted for other domains, such as
• Digital Signal Processing
• Bioinformatics
• Communications
• Scientific Computing, etc.
43
Source Codes
44
• GMU Source Codes forall Round 3 SHA-3 Candidates & SHA-2
made available at the ATHENa website at:http://cryprography.gmu.edu/athena
• Included in this release:• Basic architectures• Folded architectures• Unrolled architectures• Each code supports two variants:
with 256-bit and 512-bit output.• Each source code accompanied by comprehensive
hierarchical block diagrams
GMU Source Codes and Block Diagrams
45
ATHENa Result Replication Files• Scripts and configuration files sufficient to easily
reproduce all results (without repeating optimizations)• Automatically created by ATHENa for all
results generated using ATHENa• Stored in the ATHENa Database
In the same spirit of Reproducible Research as:
• Patrick Vandewalle1, Jelena Kovacevic2, and Martin Vetterli1 (1EPFL, 2CMU)Reproducible research in signal processing - what, why, and how. IEEE Signal Processing Magazine, May 2009. http://rr.epfl.ch/17/
• J. Claerbout (Stanford University)“Electronic documents give reproducible research a new meaning,”in Proc. 62nd Ann. Int. Meeting of the Soc. of Exploration Geophysics, 1992, http://sepwww.stanford.edu/doku.php?id=sep:research:reproducible:seg92
.....
46
Benchmarking Goals Facilitated by ATHENa
1. cryptographic algorithms
2. hardware architectures or implementationsof the same cryptographic algorithm
3. hardware platforms from the point of view of their suitability for the implementation of a given algorithm,(e.g., choice of an FPGA device or FPGA board)
4. tools and languages in terms of qualityof results they generate (e.g. Verilog vs. VHDL, Synplicity Synplify Premier vs. Xilinx XST, ISE v. 13.1 vs. ISE v. 14.7)
Comparing multiple: