ECE545 Lecture2 Project 6

8/3/2019 ECE545 Lecture2 Project 6

1/18

1

Course web page:

ECE 545

Digital System Design with VHDL

ECE web page Courses Course web pages ECE 545

http://ece.gmu.edu/coursewebpages/ECE/ECE545/F10/

Kris Gaj

Office hours: Monday, 7:30-8:30 PM,

Wednesday, 6:00-7:00 PM,

and by appointment

Research and teaching interests:reconfigurable computingcomputer arithmeticcryptographynetwork security

Contact:The Engineering Building, room 3225

[email protected]

ECE 545

Part of:

MS in Electrical Engineering

MS in Computer Engineering

Digital Systems Design

Microprocessor and Embedded Systems

Strongly suggestedfor two concentration areas:

Elective

Elective course in the remaining concentration areas

One of five core courses (must be passed with B or better)

algorithmic

Design level

register-transfer

gate

transistor

layout

devices

Courses

Computer

Arithmetic

Digital System

Design with VHDL

Digital

Integrated

CircuitsPhysical

VLSI Design

VLSI Test

Concepts

ECE

545

ECE

645

ECE

586

ECE

680

ECE

682

ECE684MOS Device

ElectronicsECE 584

Semiconductor

Device Fundamentals

ECE

681

VLSI Design

for ASICs

DIGITAL SYSTEMS DESIGN

Concentration advisors: Kris Gaj, Jens-Peter Kaps, Ken Hintz

1. ECE 545 Digital System Design with VHDL K. Gaj, project, FPGA design with VHDL,

Aldec/Mentor Graphics, Xilinx/Altera

2. ECE 645 Computer Arithmetic

K. Gaj, project, FPGA design with VHDL or Verilog,

Aldec/Mentor Graphics, Xilinx/Altera

3. ECE 681 VLSI Design for ASICs

N. Klimavicz, project/lab, back-end ASIC design with

Synopsys tools

4. ECE 586 Digital Integrated Circuits

D. Ioannou, R. Mulpuri

5. ECE 682 VLSI Test Concepts

T. Storey

Grading Scheme

Homework - 10%Project - 40%Midterm Exam - 20%Final Exam - 30%


2/18

2

Midterm exam 1

2 hours 30 minutesin classdesign-orientedopen-books, open-notespractice exams will be available on the web

Monday, November 1st

Tentative date:

Final exam

2 hours 45 minutesin classdesign-orientedopen-books, open-notespractice exams will be available on the web

Monday, December 20, 7:30-10:15pm

Date:

9

Project

Project

individualsemester-longrelated to the research project conducted by

Cryptographic Engineering Research Group (CERG)

at GMU

supporting NIST (National Institute of Standardsand Technology) in the evaluation of candidates

for a new cryptographic standard

11

Background

Hash Function

arbitrary length

message

hashfunction

hash valueh(m)

h

m

fixed length

It is computationally

infeasible to find such

m and m that

h(m)=h(m)


3/18

3

Main Application: Digital Signature

Signature

DIGITALHANDWRITTEN

A6E3891F2939E38C745B

25289896CA345BEF5349

245CBA653448E349EA47

Main Goals: unique identificationproof of agreement to the contentsof the document

Message

Hash

function

Public key

cipher

Alice

Signature

Alices private key

Bob

Hash

function

Alices public key

Typical Digital Signature Scheme

Hash value 1

Hash value 2

Hash value

Public key

cipher

yes no

Message Signature

Handwritten and Digital SignaturesCommon Features

Handwritten signature Digital signature

1. Unique

2. Impossible to be forged

3. Impossible to be denied by the author

4. Easy to verify by an independent judge

5. Easy to generate

Handwritten and Digital SignaturesDifferences

Handwritten signature Digital signature

6. Associated physically

with the document

7. Almost identical

for all documents

8. Usually at the last

page

6. Can be stored and

transmitted

independently

of the document

7. Function of the

document

8. Covers the entire

document

Hash function algorithms

Customized

(dedicated)

Based on

block ciphers

Based on

modular arithmetic

MDC-2

MDC-4

IBM, Brachtl, Meyer, Schilling, 1988

MASH-11988-1996

MD2Rivest 1988

MD4Rivest 1990

MD5Rivest 1990

SHA-0

SHA-1

RIPEMD

RIPEMD-160

European RACE Integrity

Primitives Evaluation Project, 1992

NSA, 1992

NSA, 1995

SHA-256, SHA-384, SHA-512 NSA, 2000

Attacks against dedicated hash functions

known by 2004

MD2

MD4

MD5 SHA-0

SHA-1

RIPEMD

RIPEMD-160

partially broken

broken, H. Dobbertin, 1995

(one hour on PC, 20 free bytes at the start of the message)

partially broken,

collisions for the

compression function,Dobbertin, 1996

(10 hours on PC)

weaknessdiscovered,

1995 NSA,

1998 Francereduced roundversion broken,

Dobbertin 1995

SHA-256, SHA-384, SHA-512


4/18

4

MD4

MD5

SHA-0

SHA-1

RIPEMD

RIPEMD-160

SHA-256, SHA-384, SHA-512

broken;

Wang, Feng, Lai, Yu

Crypto 2004

(1 hr on a PC)

attack with

240 operations

Crypto 2004

What was discovered in 2004-2005?

broken;

Wang, Feng, Lai, Yu, Crypto 2004

(manually, without using a computer)

broken;

Wang, Feng,

Lai, Yu,

Crypto 2004

(manully, without

using a computer)

attack with

263 operations

Wang, Yin,

Yu, Aug 2005

263 operationsSchneier, 2005

In hardware:

Machine similar to the one used to break DES:Cost = $50,000-$70,000 Time: 18 days

or

Cost = $0.9-$1.26M Time: 24 hours

In software:

Computer network similar to distributed.net

used to break DES (~331,252 computers) :

Cost = ~ $0 Time: 7 months

Cryptographic Standards

So how the cryptographic standards

have been created so far?

National Security Agency

(also known as No Such Agency

or NeverSay Anything)

Created in 1952 by president Truman

Goals:

designing strong ciphers (to protect U.S. communications)breaking ciphers (to listen to non-U.S. communications)Budget and number of employees kept secret

Largest employer of mathematicians in the world

Larger purchaser of computer hardware

NSA-developed Cryptographic Standards

time

1970 1980 1990 2000 2010

DES Data Encryption Standard

1977 1999

Triple DES

SHA-1Secure Hash Algorithm

SHA-2

Block Ciphers

Hash Functions 1995 20031993

SHA-0

2005

Cryptographic Standard Contests

time

96 97 98 99 00 01 02 03 04 05 06 07 08 09 10 11 12

AES

NESSIE

CRYPTREC

eSTREAM

SHA-3

34 stream ciphers4 SW+4 HW winners

51 hash functions1 winner

15 block ciphers1 winner

IX.1997 X.2000

I.2000 XII.2002

V.2008

X.2007 XII.2012

XI.2004


5/18

5

25

SHA-3 Contest - NIST Evaluation Criteria

Security

So*ware

Efficiency

HardwareEfficiency

Simplicity

FPGAsASICs

Flexibility Licensing

Software or hardware?

SOFTWARE HARDWAREsecurity of data

during transmission

flexibility

(new cryptoalgorithms,

protection against new attacks)

speed

random key

generation

access control

to keys

tamper resistance

low cost

resistance to

side-channel attacks

Memory

Power

consumption

Primary efficiency indicators

Software Hardware

Speed Memory Speed Area

Efficiency parameters

Latency Throughput = Speed

Encryption/

decryption

Time to

encrypt/decrypt

a single block

of data

Mi

Ci

Number of bits

encrypted/decrypted

in a unit of time

Encryption/

decryption

MiMi+1Mi+2

CiCi+1Ci+2

Throughput =Block_size Number_of_blocks_processed_simultaneously

Latency

Advanced Encryption Standard (AES) Contest

1997-2001

15 Candidatesfrom USA, Canada, Belgium,

France, Germany, Norway, UK, Israel,

Korea, Japan, Australia, Costa Rica

June 1998

August 1999

October 2000

1 winner: RijndaelBelgium

5 final candidates

Mars, RC6, Rijndael, Serpent, Twofish

Round 1

Round 2

Security

Software efficiency

Flexibility

Security

Hardware efficiency

0

50

100

150

200

250

300

350

400

450

500

Serpent Rijndael Twofish RC6 Mars

Speed of the final AES candidates in Xilinx FPGAs

Speed [Mbit/s] K.Gaj, P. Chodowiec, AES3, April, 2000


6/18

6

0

10

20

30

40

50

60

70

8090

100

SerpentRijndael Twofish RC6 Mars

Survey filled by 167 participants of

the Third AES Conference, April 2000# votes

SerpentRijndael Twofish RC6 Mars

Results of the NSA group

ASICsSpeed [Mbit/s]

606

414

0

100

200

300

400

500

600

700

202

105 10357

431

177143

61

NSA

ASIC

GMU

FPGA

AES3, April, 2000

0

5

10

15

20

25

30

SerpentRijndael TwofishRC6 Mars

Efficiency in software: NIST-specified platform

128-bit key

192-bit key256-bit key

200 MHz Pentium Pro, Borland C++

Speed [Mbits/s] Security

Complexity

High

Adequate

Simple Complex

NIST Report: Security

Rijndael

MARSSerpent

Twofish

RC6

AES Final Report, October 2000

35

NIST SHA-3 Contest - Timeline

51

candidates

Round 114

5-6 1-2Round 2 Round 3

July 2009 End of 2010 Mid 2012Oct. 2008

36

Fair and comprehensive methodology for evaluationof hardware performance in FPGAs

High-speed fully autonomous implementations ofall 14SHA-3 candidates & SHA-2

256-bit & 512-bit variants

optimized for the maximum throughput to area ratio

Open-source benchmarking tool supporting optimizationof tool options and efficient generation of results for multiple

FPGA families

GMU Team Goals


7/18

7

PrimaryDesignersofGMUCodes

Ekawat Homsirikamol

a.k.a IceMarcin Rogawski

Developed optimized VHDL implementations of

14 Round 2 SHA-3 candidates + SHA-2

in two variants each (256 & 512-bit output),for some functions using several alternative architectures 38

Methodology

39

Comprehensive Evaluation

two major vendors: Altera and Xilinx (~90% of the market)multiple high-performance and low-cost families

Altera Xilinx

Technology Low-cost High-

performance

Low-cost High-

performance

90 nm Cyclone II Stratix II Spartan 3 Virtex 4

65 nm Cyclone III Stratix III Virtex 5

40

Language: VHDL Tools: FPGA vendor tools Interface Performance Metrics Design Methodology Benchmarking

Uniform Evaluation

41

Why Interface Matters?

Pin limit

Total number of i/o ports Total number of an FPGA i/o pins

Support for the maximum throughputTime to load the next message block Time to process previous block

42

Interface: Two possible solutions

Length of the message

communicated atthe beginning

+ easy to implement

passive source circuit

area overhead for the counter

of message bits

Dedicated end of message

port

more intelligent source

circuit required

+ no need for internal

message bit counter

msg_bitlen

zero_word

messageend_of_msg

SHA core


8/18

8

43

SHA Core: Interface & Typical Configuration

SHA core is an active component; surrounding FIFOs are passive andwidely available

Input interface is separate from an output interfaceProcessing a current block, reading the next block, and storinga result for the previous message can be all done in parallel

fifoin_empty

fifoin_read

idata

w w

odata

fifoout_full

fifoout_write

fifoin_full

fifoin_write

fifoout_empty

fifoout_read

Input

FIFOSHAcore

clk rst

ext_idata

w

ext_odatadin dout

src_ready

src_read

dst_ready

dst_write

din dout

full empty

wri te read

Output

FIFO

din dout

full empty

write read

w

clk rst

clk rst clk rst

clk rst

clk rst

44

SHA Core: Interface & Typical Configuration

fifoin_empty

fifoin_read

idata

w w

odata

fifoout_full

fifoout_write

fifoin_full

fifoin_write

fifoout_empty

fifoout_read

Input

FIFO SHAcore

clk rst

ext_idata

w

ext_odatadin dout

src_ready

src_read

dst_ready

dst_write

din dout

full empty

write read

Output

FIFO

din dout

full empty

write read

w

clk rst

io_clk rst io_clk rst

clk rst

clk rst

io_clk

io_clk

Some functions may require a faster input/output clock in order to loadinput data at a faster rate

45

Primary Secondary

1. Throughput

(single long message)

2. Area

3. Throughput / Area

3. Hash Time for

Short Messages

(up to 1000 bits)

Performance Metrics

46

Performance Metrics - Area

We force these vectors to look as follows through

the synthesis and implementation options:

0

0

0

0

Areaa

47

Primary Optimization Target: Throughput to Area Ratio

Features:

practical: good balance between speed and cost

very reliable guide through the entire design process,facilitating the choice of

high-level architecture implementation of basic components choice of tool options

leads to high-speed, close-to-maximum-throughput designs

Choice of Optimization Target

48

Our Design Flow

Specification Interface

Datapath

Block diagram

Controller

ASM Chart

VHDL Code

Formulas for

Throughput &

Hash time

Max. Clock Freq.

Resource Utilization

Throughput, Area, Throughput/Area,

Hash Time for Short Messages

Controller

Template

Library of Basic

Components


9/18

9

49

Basic Operations of 14 SHA-3 Candidates

49

NTT Number Theoretic Transform, GF MUL Galois Field multiplication,

MUL integer multiplication, mADDn multioperand addition with n operands

ATHENaAutomatedToolforHardware

Evalua?oN

50

Benchmarkingopen-sourcetool,

wriGeninPerl,aimedatan

AUTOMATEDgenera?onof

OPTIMIZEDresultsfor

MULTIPLEFPAplaorms

Underdevelopmentat

eorgeMasonUniversity.

http://cryptography.gmu.edu/athena

ATHENa

Server

FPGA Synthesis and

Implementation

Result Summary

+ DatabaseEntries

2 3

HDL + scripts +

configuration files

1

Database

Entries

Download scripts

andconfiguration files8

Designer

4

HDL + FPGA Tools

User

Database

query

Ranking

of designs

5

6

Basic Dataflow of ATHENa

0

Interfaces

+ Testbenches 51 52

synthesizable

sourcefiles

configuraKon

files

testbench

constraint

files

result

summary

(user-friendly)

database

entries

(machine-

friendly)

ATHENaMajorFeatures(1) synthesis,implementa?on,and?minganalysisinbatchmode supportfordevicesandtoolsofmulKpleFPGAvendors:

genera?onofresultsformulKplefamiliesofFPAsofagivenvendor

automatedchoiceofabest-matchingdevicewithinagivenfamily

53

ATHENaMajorFeatures(2)

automatedverificaKonofdesignsthroughsimula?oninbatchmode

supportformulK-coreprocessing automatedextracKonandtabulaKonofresults severalopKmizaKonstrategiesaimedatfinding

op?mumop?onsoftools besttargetclockfrequency beststar?ngpointofplacement

OR

54


10/18

10

55

batch mode of FPGA tools

ease of extraction and tabulation of results Excel, CSV (available), LaTeX (coming soon)

optimized choice of tool options

Generation of Results Facilitated by ATHENa

vs.

56

Relative Improvement of Results from Using ATHENaVirtex 5, 256-bit Variants of Hash Functions

0

0.5

1

1.5

2

2.5

Groe

stl

Shav

ite-3

Luffa

Kecc

ak

Hams

iEc

ho

Skein

Fugu

e

Sha-2

BMW

Cube

Hash

Blake

Shab

al

SIMD

JH

Area

Thr

Thr/Area

Ratios of results obtained using ATHENa suggested options

vs. default options of FPGA tools

58

Results

59

Throughput [Mbit/s]

Virtex 5, 256-bit variants of algorithms

0

2000

4000

6000

8000

10000

12000

14000

16000

ECHO

Keccak

Groestl

Luffa

BMW

JH

Cube

Hash

Fugu

e

SHAvite

-3

BLAKE

Skein

Hamsi

Shabal

SIMD

SHA-2

60

Throughput [Mbit/s]


0.0

2000.0

4000.0

6000.0

8000.0

10000.0

12000.0

14000.0

Groestl

BMW

Luffa

Keccak

ECHO

SIMD

JH

SHAvite

-3

BLAKE

Cube

Hash

Skein

Shabal

SHA-2

Hamsi

Fugu

e


11/18

11

61

Normalization & Compression of Results

Absoluteresulte.g.,throughputinMbits/s,areainCLBslices

Normalizedresult

Overallnormalizedresulteometricmeanofnormalizedresultsfor

allinves?gatedFPAfamilies

normalized_ result=result_ for_ SHA 3_ candidate

result_ for_ SHA 2

62

Normalized Throughput& Overall Normalized Throughput

63

Overall Normalized Throughput: 256-bit variants of algorithmsNormalized to SHA-256, Averaged over 7 FPGA families

0

1

2

3

4

5

6

7

8

Keccak EC

HO Luffa BMW Groestl JH

Cube

Has Fu

gue

SHAvite-3

BLAKE Ha

msi Skein Shabal SIMD

64

Overall Normalized Throughput: 512-bit variants of algorithmsNormalized to SHA-512, Averaged over 7 FPGA families

0

0.5

1

1.5

2

2.5

3

3.5

4

Groestl

Luffa

BMW

ECHO

Keccak

JH

SIMD

Cu

beHa

sh

SHA

vite-3

BLAKE

Skein

Shab

al

Hamsi

Fugue

65

Area [CLB slices]


0

1000

2000

3000

40005000

6000

7000

8000

9000

10000

SHA-2

CubeHa

sh

Hamsi

Fugu

e JH

SHAvite

-3

Luffa

Keccak

Shab

al

Skein

Groestl

BLAKE

BMW

ECHO

SIMD

66

Area [CLB slices]


0

2000

4000

6000

8000

10000

12000

14000

16000

18000

SHA-2

CubeHa

sh

Fugue JH

Keccak

Shabal

Skein

SHAvite

-3

Luffa

Hamsi

Groestl

BLAKE

ECHO

BMW

SIMD


12/18

12

67

Overall Normalized Area: 256-bit variants of algorithmsNormalized to SHA-256, Averaged over 7 FPGA families

0

5

10

15

20

25

30

CubeHa

sh

Hamsi

BLAKE

Luffa

Shab

al JH

Keccak

SHAvite

-3

Skein

Fugu

e

Groe

stl

BMW

SIMD

ECHO

68

Overall Normalized Area: 512-bit variants of algorithmsNormalized to SHA-512, Averaged over 7 FPGA families

0

5

10

15

20

25

30

CubeHa

sh

Fugue

Keccak

Shabal JH

Skein

BLAKE

Hamsi

Luffa

SHAvite

-3

Groestl

BMW

ECHO

SIMD

69

Overall Normalized Throughput/Area: 256-bit variantsNormalized to SHA-256, Averaged over 7 FPGA families

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2

Keccak

Lu

ffa

CubeHa

sh

Groestl

JH

Ham

si

BLAKE

Fug

ue

SHAv

ite-3

Sh

abal

S

kein

B

MW

ECH

O

S

IMD

70

Overall Normalized Throughput/Area: 512-bit variantsNormalized to SHA-512, Averaged over 7 FPGA families

0

0.2

0.4

0.6

0.8

1

1.2

1.4

Keccak

CubeHa

sh

Luffa

JH

Groe

stl

Shab

al

BLAKE

Skein

SHAvite

-3

Fugu

e

Hamsi

BMW

ECHO

SIMD

71

Throughput vs. Area Normalized to Results for SHA-256

and Averaged over 7 FPGA Families 256-bit variants

best

worst

72

Throughput vs. Area Normalized to Results for SHA-512

and Averaged over 7 FPGA Families 512-bit variants

best

worst


13/18

13

73

Execution Time for Short Messages up to 1000 bits


74

Execution Time for Short Messages up to 1000 bits


75

Thr/Area Thr Area Short msg. Thr/Area Thr Area Short msg.

256-bit variants 512-bit variants

BLAKE

BMW

CubeHash

ECHO

Fugue

Groestl

Hamsi

JH

Keccak

Luffa

Shabal

SHAvite-3

SIMD

Skein

76

Throughput/Area & Throughput most crucial forhigh-speed implementations

Area cannot be easily traded for ThroughputBest performers so far

1-2. Keccak & Luffa

3. Groestl

Worst performers so far:

14. SIMD

13. ECHO

12. BMW

Summary of Results

77

Cryptology e-Print Archive - 2010/445 (100+ pages) Detailed hierarchical block diagrams Corresponding formulas for execution time and throughput

FPL 2010 paper ATHENa features Case studies

ATHENa web site Most recent results Comparisons with results from other groups Optimum options of tools

More About our Designs & Tools

78

Comparison

withOther Groups


14/18

14

79

OTHERGROUPS GMU

Area Thr Thr/Area Source Area Thr Thr/Area

BLAKE1660 2676 1.61

Kobayashiet al.

1871 2854 1.53

CubeHash590 2960 5.02

Kobayashi

et al.707 3445 4.87

ECHO 9333 14860 1.59 Lu et al. 5445 13875 2.55

Groestl1722 10276 5.97

Gauvaram

et al.1884 8677 4.61

Hamsi718 1680 2.34

Kobayashiet al.

946 2646 2.80

Keccak 1412 6900 4.89 Bertoni et al. 1229 10807 8.79

Luffa1048 6343 6.05

Kobayashiet al.

1154 8008 6.94

Shabal 153 2051 13.41 Detrey et al. 1266 2624 2.07

Skein

(estimated)1632 3535 2.17 Tillich 1463 2812 1.92

Comparison with Best Results Reported by Other Groups


80

BEST REPORTED RESULTS

Area Thr Thr/Area Source

BLAKE 1660 2676 1.61 Kobayashi et al.

BMW 4400 5577 1.27 GMU

CubeHash 590 2960 5.02 Kobayashi et al.

ECHO 5445 13875 2.55 GMU

Fugue 956 3151 3.30 GMU

Groestl 1722 10276 5.97 Gauvaram et al.

Hamsi 946 2646 2.80 GMU

JH 1108 3955 3.57 GMU

Keccak 1229 10807 8.79 GMU

Luffa 1154 8008 6.94 GMU

Shabal 153 2051 13.41 Detrey et al.

SHAvite-3 1130 2887 2.55 GMU

SIMD 9288 2326 0.25 GMU

Skein 1632 3535 2.17 Tillich et al.

Best Overall Reported Results as of Aug. 6, 2010


81

Throughput vs. Area: Best reported results


best

worst

82

Your Project

83

Analysis of Alternative Architectures - Unrolled

r times r/2 times

84

Analysis of Alternative Architectures - Folded

r times 2r times 2r times

Basic

Folded

Vertically-2x

(fv2)

Folded

Horizontally-2x

(fh2)


15/18

15

85

Preliminary results for

CubeHash, Groestl, Keccak & Luffa in Virtex 5

0

1

2

3

4

5

6

7

8

0 1 2 3 4 5 6 7

NormalizedThroughput

Normalized Area

CubeHash

Groestl

Luffa

Keccak

x1 x2x4

fv3^2

x1 x2

fv4

fv2

x1

x1 x2

CubeHash

Luffa

Keccak

Groestl

Your Project

14 SHA-3 candidates left in the contest Given:

specification of the functionreference implementation in Cinterfacetestbench and test vectorsGMU implementation of the basic version including

block diagramsASM chartsshort descriptionformulas for execution time & throughputsource codesresults for Xilinx and Altera FPGAs

Your Project

Develop:

Block diagramASM chartFormulas for execution time & throughputSynthesizable code in VHDLResults for multiple families of FPGAs from Xilinx and

Altera

forat least one architecture from each of the following

three classes of architectures:

Unrolled architecture Folded architecture Architecture based on the use of embedded FPGA

resources (BRAMs, multipliers, DSP units, etc.)[256 bit only, 512-bit only, or both]

88

BlockRAMsandMULs

BlockRAMsandMULs

Configurable

Logic

Blocks

I/O

Blocks

What is an FPGA?

Block

RAMs &

EmbeddedMultipliers

89

RAM Blocks and Multipliers in Xilinx

FPGAs

The Design Warriors Guide to FPGAsDevices, Tools, and Flows. ISBN 0750676043

Copyright 2004 Mentor Graphics Corp. (www.mentor.com)

90

Using Embedded FPGA Resources

Basic design

Your design

( 1536, 0, 0)

( 768, 2, 4)

Basic design

Your design

( 3010, 0, 0)

( 1505, 32 kbit, 4)


16/18

16

91

Block RAM

Spartan-3Dual-Port

Block RAM

Port

A

Port

B

Block RAM

Most efficient memory implementation Dedicated blocks of memory

Ideal for most memory requirements 4 to 104 memory blocks

18 kbits = 18,432 bits per block (16 k without parity bits) Use multiple blocks for larger memories

Builds both single and true dual-port RAMs Synchronous write and read (different from distributed RAM)

92

Block RAM can have various configurations (port

aspect ratios)

0

16,383

1

4,095

4

0

8,191

2

0

2047

8+1

0

1023

16+2

0

16k x 1

8k x 2 4k x 4

2k x (8+1)

1024 x (16+2)

93

Port A Out18-Bit Width

Port B In

1k-Bit Depth

Port A In1K-Bit Depth

Port B Out

18-Bit Width

DOA[17:0]

DOB[17:0]

WEA

ENA

RSTA

ADDRA[9:0]

CLKA

DIA[17:0]

WEB

ENB

RSTB

ADDRB[9:0]

CLKB

DIB[17:0]

Dual-Port Bus Flexibility

94

Embedded Multipliers in Spartan 3

18x18 bit signed multipliers with optional input/output registers

95

The Design Warriors Guide to FPGAs

Devices, Tools, and Flows. ISBN 0750676043

Copyright 2004 Mentor Graphics Corp. (ww w.mentor.com)

Multiplier-Accumulator - MAC

96

Xilinx XtremeDSP

Starting with Virtex 4 family, Xilinx introduced DSP48 blockfor high-speed DSP on FPGAs

Essentially a multiply-accumulate core with many otherfeatures

Now also Spartan-3A and Virtex 5 have DSP blocks


17/18

17

97

DSP48 Slice: Virtex 4

98

Simplified Form of DSP48

Technology Low-cost High-

performance

120/150nm Virtex2,2Pro

90nm Spartan3 Virtex4

65nm Virtex5

45nm Spartan6

40nm Virtex6

Xilinx FPGA Devices Altera FPGA Devices

Technology Low-cost Mid-range High-

performance

130nm Cyclone Stra?x

90nm CycloneII StraKxII

65nm CycloneIII ArriaI StraKxIII

40nm CycloneIV ArriaII StraKxIV

All Projects - Organization

Projects divided into phases Deliverables for each phase submitted through

Blackboard at selected checkpoints and evaluatedby the instructor and/or TA

Feedback provided to students on a best effortbasis

Final report and codes submitted using Blackboardat the end of the semester

Honor Code Rules

All students are expected to write and debugtheir codes individually

Students are encouraged to help and supporteach other in all problems related to the

- operation of the CAD tools,

- basic understanding of the problem.


18/18

18

103

Course Objectives

At the end of this course you should be able to: Code in VHDL for synthesis Decompose a digital system into a controller (FSM) and datapath,and code accordingly Write VHDL testbenches Synthesize and implement digital systems on FPGAs Effectively code digital systems for cryptography, signal

processing, and microprocessor applications

This knowledge will come about through homework, exams,and an extensive project The project in particular will help you know VHDL and the FPGA

design flow from beginning to end

104

Additional Skills Learned in the Project

Reading & understanding specification of a complexalgorithm

Design of new hardware architectures based onexisting architectures (datapath & controller)

Reading, understanding, and modifying existingVHDL code

Using embedded resources of modern FPGAs Characterizing performance of your codes

for multiple FPGA families

105

Project Task 1

Read the following chapters from the GMUtechnical report published at

http://eprint.iacr.org/2010/445 Chapter 1 Introduction & Motivation Chapter 2 Methodology Chapter 3 Comprehensive Designs of SHA-3 Candidates

3.1, 3.2 + subsection concerning your algorithm

Chapter 4 Design Summary and Results Download and get familiar with the package of a hash

function assigned to youhttp://csrc.nist.gov/groups/ST/hash/sha-3/Round2/submissions_rnd2.html

Read carefully the specification of your algorithm106

Project Task 1 cont.

In one week:

Meeting with the instructor devoted to fully understanding

the GMU report, specification, block diagrams,

interface, and timing formulas.

In two weeks:

Draft block diagrams of the

- selected unrolled architecture

- selected folded architecture.

Corresponding timing formulas for execution time &

throughput.

Documents

ECE545 Lecture2 Project 6