Exponentiation1Efficient FPGA Modular Multiplication and Exponentiation Architectures Using Digit...

Efficient FPGA Modular Multiplication and Exponentiation 1

Efficient FPGA Modular

Multiplication and

Exponentiation Architectures

using Digit Serial Computation

Gustavo Sutter, Jean-Pierre Deschamps, José Luis Imañagustavo.sutter@uam.es, jeanpierre.deschamps@urv.cat , jluimana@dacya.ucm.es

Agenda

• Introduction

– Modular exponentiation

• Background

– Montgomery multiplication and exponentiation

• The proposed architecture

– Precomputing q, digit serial and carry save adder

• FPGA Results

– multiplication and exponentiation

• Result comparison

– For multiplication and exponentiation

• Conclusions

Agenda

• Introduction

• Background

• FPGA Results

• Conclusions

Introduction

• Modular exponentiation => public key cryptosystems.

• Montgomery´s modular multiplication algorithm is

normally used since no trial division is necessary and

the critical path is reduced by using carry-save

addition (CSA).

• In this paper, the Montgomery multiplication is

optimized and architectures are proposed to perform

the Least-Significant-Bit (LSB) first and the Most-

Significant-Bit (MSB) first algorithms.

Introduction (II)

• The architecture here presented has the

following distinctive characteristics:

– Use of digit-serial approach for Montgomery

multiplication.

– Conversion of the CSA representation of

intermediate multiplication using carry-skip

addition which reduces the critical path with a

small area-speed penalty.

– Precompute quotient value in Montgomery

iteration in order to speed up operation frequency.

Agenda

• Introduction

• Background

• FPGA Results

• Conclusions

Background: Montgomery’s

algorithm

• The Montgomery product computes Z=X.Y.R-1 mod

M instead of Z=X.Y mod M . The drawback is the

need to convert operands into and out of

Montgomery’s domain, which is almost negligible in

some particular applications such as exponentiation.

Algorithm 1 – modified Montgomery product p := 0;

for i in 0 .. k-1 loop

q(i):= (p(0) + x(i)*y(0)) mod 2;

p := (p + x(i)*y + q(i)*m)/2;

end loop;

if p >= m then z := p-m; else z := p; end if;

Background: Montgomery’s

algorithm (II)

• In the previous algorithm the main contributing factor

to the delay is the carry propagation resulting from

the very large operand additions. This can be

avoided by using Carry Save Adders (CSA)

Algorithm 2 – Montgomery product, carry-save addition pc := 0; ps := 0;

q:= (pc(0) + ps(0) + x(i)*y(0)) mod 2;

(pc, ps) := (pc + ps + x(i)*y + q(i)*m)/2;

end loop;

p = pc + ps;

if p >= m then z:=p-m; else z:=p; end if;

Background: The Exponentiation

• Modular exponentiation (YX mod M) is usually done with

repeated modular multiplications (MSB or LSB first).

• If the operands in Montgomery’s domain, then additional

pre- and post-processing steps are needed.

Algorithm 4 - base 2 mod m exponentiation,

LSB-first using Montgomery product e := exp_k;

ty := mp(y, exp_2k);

for i in 0 .. ke-1 loop

if x(i) = 1 then e := mp(e, ty);

end if;

ty := mp(ty, ty);

end loop;

z := mp(ty, 1);

Algorithm 3 – base 2 mod m exponentiation,

MSB-first using Montgomery product e := exp_k;

ty := mp(y, exp_2k);

for i in 1 .. ke loop

e := mp(e, e);

if x(k-i) = 1 then e := mp(e, ty);

end if;

end loop;

z := mp(e, 1);

Agenda

• Introduction

• Background

• FPGA Results

• Conclusions

THE PROPOSED ARCHITECTURE:

Modular Multiplication

• To speed up algortihm 3, precomputes q(i+1) and

use Carry Save Adders.

Algorithm 6 – modified Montgomery product, carry-save

addition, q precomputed.pc := 0; ps := 0;

q := x(0)*y(0);

qn:= ((pc(1:0) + ps(1:0) + x(i)*y(1:0)

+ q*m(1:0))/2 + x(i+1)*y(0)) mod 2;

(pc, ps) := (pc + ps + x(i)*y + q(i)*m)/2;

q := qn;

end loop;

p = pc + ps;

if p >= m then z:=p-m; else z:=p; end if;

HA FAFA ...

yk-1 y1

pc,k ps,k-1 pc,k-1 ps,1 pc,1ps,k

HA FAFA ...

bs,k bc,k bs,k-1 bc,k-1 bs,1

ps,0 pc,0

m0=1m1

bs,0bc,1

bc,k+1

bs,k+1 bc,(k+1..1) bs,(k+1..1)

new_pc,(k..0)

ps,1 pc,1

ps,0 pc,0

xi qiqi

next q computation

new_ps,(k..0)

Modular Multiplication (II)

• To further optimize

– Use digit serial computation.

– Use carry-skip adder for final addition

two (k+1)-bit

and a one bit registers

k-bit shift-d-register

new_pc,(k..0)

new_ps,(k..0)

d-digits Montgomery Cell

qipc m x

x(d.(i+1)+1.. d.i)

qipc ps

final additions

Modular Multiplication (III)

• The carry-skip is much faster than a carry-propagate

adder but can be slower than the period of the

datapath of divider. The used solution is wait w=

T/ad cycles to finish this final step.

TABLE I. DELAY IN NS AND AREA IN LUTS FOR CARRY SKIP COMPARED AGAINST

RIPPLE CARRY ADDERS IN VIRTEX 5

ripple-carry S=32 S=64 SpeedUp

Overhead Bits Delay Area Delay Area Delay Area

512 11.8 512 4.4 716 5.5 644 267% 40%

1024 26.8 1024 5.3 1452 5.9 1332 505% 42%

2048 56.5 2048 6.6 2924 6.3 2708 896% 32%

Modular Exponentiation

• We have used the traditional MSB and LSB first

algorithm.

– In MSB first the average Montgomery products (MP)

performed is around of 1.5 and worst case is 2.

– In LSB first in turn includes at most two Montgomery

products. In this case both products can be executed in

parallel and the total computation time is 1.

– The computation of exp_k and exp_2k necessary for are

computed using an SRT reducer

Agenda

• Introduction

• Background

• FPGA Results

• Conclusions

FPGA Implemenation Results

• The design entry is behavioral VHDL except for

FPGA carry-skip adder.

TABLE II. VIRTEX 5 IMPLEMENTATION RESULTS OF PROPOSED DIGIT SERIAL MONTGOMERY’S

MULTIPLIERS

k d FF 6-Luts cycles

cycles

Period

Time (ns)

512 1 2581 4130 512 4 1.7 920.5

512 2 2583 6178 256 3 2.6 663.8

512 4 2584 10276 128 2 4.5 585.0

512 8 2584 18494 64 1 8.4 549.3

1024 1 5142 8227 1024 4 1.8 1936.8

1024 2 5144 12323 512 3 2.6 1319.9

1024 4 5145 20527 256 2 4.5 1161.0

1024 8 5145 36937 128 1 8.5 1090.1

2048 1 10263 16417 2048 5 1.8 3867.9

2048 2 10265 24613 1024 4 2.5 2634.8

2048 4 10266 41007 512 2 4.5 2313.0

FPGA Implemenation Results

TABLE III. VIRTEX 5 IMPLEMENTATION OF EXPONENTIATIONS

k = ke Meth d FF LUTs Period

(Mb/s)

512 MSB 1 4144 5696 1.8 0.72 713.6

512 MSB 2 4145 7745 2.5 0.50 1023.6

512 MSB 4 4145 11845 4.5 0.45 1133.0

512 MSB 8 4145 20041 8.5 0.43 1199.6

512 LSB 2 6728 13923 2.5 0.33 1535.4

1024 MSB 1 8242 11330 1.9 2.98 343.2

1024 MSB 2 8243 15427 2.6 2.03 503.6

1024 MSB 4 8243 23623 4.5 1.79 572.5

1024 MSB 8 8243 40011 8.4 1.68 608.7

1024 LSB 2 13387 27750 2.6 1.38 744.6

2048 MSB 1 16436 22595 1.9 12.00 170.7

2048 MSB 2 16437 30790 2.5 7.91 259.0

2048 MSB 4 16437 47176 4.5 7.12 287.8

2048 LSB 1 26699 39012 2.5 10.53 194.6

Agenda

• Introduction

• Background

• FPGA Results

• Conclusions

Performance Comparison:

Modular Multipliers• Circuits reimplemented the multipliers in Virtex 2 devices using

Xilinx ISE 10.1.03.

TABLE V. COMPARISON OF MODULAR MULTIPLIERS IN FPGAS

k Circuit Device slice T

(Mb/s) AxD

512 [9] Virtex E 2972 10.5 16.17 31.7 48.1

512 [3] (5 to 2) Virtex 2 5170 7.9 4.06 126.2 21.0

512 [3] (4 to 2) Virtex 2 5782 8.2 4.21 121.6 24.4

512 [6] Virtex 2 2902 8.2 4.26 120.3 12.3

512 [4] Virtex 2 4029 4.5 2.33 220.2 9.4

512 Prop D=1 Virtex 2 2469 3.6 1.89 270.5 4.7

512 Prop D=2 Virtex 2 3497 4.8 1.25 409.3 4.4

512 Prop D=4 Virtex 2 5538 8.6 1.13 452.2 6.3

512 Prop D=8 Virtex 2 9446 15.6 1.03 497.4 9.7

512 Prop D=4 Virtex 5 2936 4.5 0.59 862.0 -

Modular Multipliers (II)

TABLE V. COMPARISON OF MODULAR MULTIPLIERS IN FPGAS

k Circuit Device slice T

(Mb/s) AxD

1024 [9] Virtex E 5706 10.5 32.17 31.8 183.6

1024 [3] (5 to 2) Virtex 2 10332 9.8 10.09 101.5 104.2

1024 [3] (4 to 2) Virtex 2 11520 9.0 9.22 111.1 106.2

1024 [6] Virtex 2 4512 8.8 9.03 113.4 40.7

1024 [4] Virtex 2 8000 4.5 4.63 221.1 37.1

1024 Prop D=1 Virtex 2 4923 3.7 3.88 262.7 19.2

1024 Prop D=2 Virtex 2 6982 4.8 2.48 410.8 17.4

1024 Prop D=4 Virtex 2 11079 8.4 2.19 471.7 24.1

1024 Prop D=8 Virtex 2 19247 15.5 2.02 508.2 38.8

1024 Prop D=4 Virtex 5 5702 4.5 1.18 868.5 -

2048 [3] (5 to 2) Virtex 2 20986 11.1 22.76 90.0 477.5

2048 [3] (4 to 2) Virtex 2 23108 11.0 22.59 90.6 522.1

2048 Prop D=1 Virtex 2 9831 3.8 7.79 263.0 76.6

2048 Prop D=2 Virtex 2 13954 4.8 4.94 414.8 68.9

2048 Prop D=4 Virtex 2 22201 8.4 4.34 471.3 95.7

2048 Prop D=2 Virtex 5 6837 2.56 2.63 777.3 -

Modular Exponentiators

TABLE VII. COMPARISON FOR 1024 BITS EXPONENTIATORS.

Ref Meth FPGA Area

(slices)

Period

(ns) w

(x1000)

(Mb/s)

[10] (r2) LSB XC4K 4865 19.2 - 2122 40.74 25.1

[10] (r16) LSB XC4K 6683 21.9 - 546 11.95 85.7

[3] (4 to 2) MSB Virtex 2 26136 10.3 - 1054 10.85 94.3

[4] LSB Virtex 2 12537 6.6 - 1579 10.35 98.9

Prop D=2 LSB Virtex 2 9298 4.8 6 798 3.83 267.3

Prop D=4 LSB Virtex 2 13346 8.4 3 399 3.35 305.5

Prop D=2 MSB Virtex 2 16280 4.8 6 532 2.55 401.0

Prop D=4 LSB Virtex 5 6217 4.5 2 397 1.79 572.5

Prop D=2 MSB Virtex 5 7303 2.6 3 529 1.38 744.6

Agenda

• Introduction

• Background

• FPGA Results

• Conclusions

Conclusions

• The key point for exponentiation is an efficient

multiplication. The Montgomery`s multiplication is

widely used since it avoids the trial division.

• The distinctive characteristics of present work are:

– Precomputation of quotient value (q) in Montgomery iteration

in order to speed up operation frequency.

– Use of digit serial computation approach for Montgomery´s

multiplication.

– Maintain intermediate exponentiation values in binary format

instead of carry-save.

– Final conversion of the carry-save representation of

intermediate MP using carry-skip addition.

Conclusions

• Results comparisons show that the proposed

architecture outperforms all the previous published

results to the author’s knowledge in terms of

throughput and also in area-delay.

• The comparison for 512, 1024 and 2048 bits

multipliers doubles the fastest reported result.

Comparison in 1024 bits exponentiation in FPGA

shows also a factor two improvement for similar or

less area.

Questions…

Exponentiation1Efficient FPGA Modular Multiplication and Exponentiation Architectures Using Digit...

Documents

Multi-Digit Multiplication Strategies · Multi-Digit Multiplication Strategies 4th Grade ... • During the lesson, students work in pairs and threes to match the word problem, model,

MULTI-DIGIT MULTIPLICATION WITH WORD PROBLEMS SEPTEMBER 19, 2014

Springdale Primary School · 1. Multiplication by a 1-digit number 2. Multiplication by a 2-digit number 3. Division by a 1-digit number 4. Word problems (D) Fractions 1. Mixed numbers

Multi-Digit Multiplication Strategies

1.cdn.edl.io€¦ · Web viewFactors and Multiples Jeopardy. Multiplication App. Khan Academy-Multiplication Strategies/Area Model. Khan Academy Multiplication . Think Central-Multi-Digit

Math 5 Multiplication: 2- digit x 2-digit Instructor: Mrs. Tew Turner

Before learning how to do double digit multiplication, let’s review what multiplication is

Multiplication 2 digit x 2 digit

Math Module 3 Multi-Digit Multiplication and Division Topic C: Multiplication of up to Four Digits by Single-Digit Numbers Lesson 9: Multiply three- and

teacher.ac · Web viewWord statements Addition of number line Multiplication Multiplication as repeated addition Multiplying two digit numbers by one digit number Word statements

Multi-Digit Multiplication Strategies - Renee' Yates 2 Math · Multi-Digit Multiplication Strategies th4 Grade Mathematical goals This concept-based lesson is intended to help you

Beehive Multi-Digit Multiplication Gamesmrbenzin.weebly.com/uploads/1/0/9/2/109214249/beehivemultgame.pdfBeehive Multiplication Game-Two Digit by Two Digit Multiplication Each player

Pseudo Division and Pseudo Multiplication Processes. E. Meggitt Pseudo Division and Pseudo Multiplication Processes Abstract: Some digit-by-digit methods for the evaluation of the

Multi-digit multiplication and estimation

Topic H Multiplication of Two-Digit by Two- Digit Numbers fro… · · 2013-10-07Multiplication of Two-Digit by Two-Digit Numbers . 4.NBT.5, 4.OA.3, ... multiplication of two-digit

Multi digit multiplication

Mathematical language...Short multiplication Short multiplication Long multiplication Teaching points 1) Calculation layout with place value 2) Multiply from least significant digit

Multi-Digit Multiplication Strategy Posterstateishistem.weebly.com/uploads/5/4/5/2/54529797/... · Multi-Digit Multiplication Strategy Posters . Multiplication Strategies Traditional

Multiplication of Single-Digit Factors and Multiples of 10 · Multiplication of Single-Digit Factors ... to problems posed wi th whole numbers and having whole-number answers; students

I. PATTERNSScope+an… · B. Multiplication with Regrouping Multiplying 2-digit numbers by single-digit numbers, regrouping ones •* •* Multiplying 2-digit numbers by single-digit