Upload
hoangquynh
View
220
Download
2
Embed Size (px)
Citation preview
ECE 646 Cryptography and Computer Network Security
“Area efficient Hardware Implementation of IDEA”
12/12/03
ECE 646 Cryptography and Computer Network Security
Outline • Specifications• IDEA Overview• Building Blocks• Architecture/Design• Verification/Synthesis• Results/Conclusion• Q &A
ECE 646 Cryptography and Computer Network Security
Specifications 1. Design Entry
• RTL design in Verilog HDL• Verification: Verilog HDL, C, DAI SignalScan• Logic Synthesis: Synopsys Design Analyzer• LSI_10K Library
2. I/O SpecificationInputs: 1. A 64-bit plaintext block
2. A 128-bit keyOutput:1. A 64-bit ciphertext
ECE 646 Cryptography and Computer Network Security
Specifications 64 bit plaintext
IDEA128 bit key
64 bit ciphertext
ECE 646 Cryptography and Computer Network Security
Specifications Week Oct
1Oct 8
Oct 15
Oct 22
Oct 29
Nov 5
Nov12
Nov19
Nov26
Dec3
Dec10
Define Project -Finalize Specs.
Specification - Report Submit
Implementation Subblocks
Verification of SubblocksTestbench+Simulation
Implementation of Top module
Verification of Top ModuleTestbench+Simulation
Synthesis script Development+Logical Synthesis
Optimization by applying different methods
Project Report
Project Presentation
ECE 646 Cryptography and Computer Network Security
IDEA OVERVIEW
• Three algebraic operations are applied
- XOR
- Addition modulo 216
- Multiplication modulo 216
• All operations operate on 16-bit sub-blocks
• Eight total rounds+1 output transformation
ECE 646 Cryptography and Computer Network Security
IDEA OVERVIEW• Hardware implementations faster than software counterparts
• Employ “Parallelism”
• A bit parallel approach faster than bit serial approach
• Area constraints will have higher priority than timing
• 4 modular multiplications / round
• Multiplication will take the largest real estate
Major Bottleneck
ECE 646 Cryptography and Computer Network Security
BUILDING BLOCKS
2n +1 Multiplication Options:• Look-up tables (A)
• (n+1)x(n+1) bit multiplier (B)
• modulo 2n +1 adders (C)
• 2n adders using bit-pair recoding (D)
ECE 646 Cryptography and Computer Network Security
BUILDING BLOCKSArea Throughput Delay Regularity
A Very large for Limited by the access times of LUTs (on or off chip)
n>8 Limited by the access times of LUTs (on or off chip)
Regular layout for the ROM design
B Slightly larger than A (One large multiplier+2 adders)
Higher than A LargeCritical path includes A multiplier and two adders.
Relatively irregularDifferent types of cells are being used
C Much larger than B* Lower than A and B* ModerateCritical path includes a full adder and carry-select half-adders
Relatively regularOnly full adder and half-adder cells are laid out
D Much larger than B Highest Moderate Relatively regular
ECE 646 Cryptography and Computer Network Security
BUILDING BLOCKS//the diminished one representations of the inputsassign AD=A-1;assign BD=B-1;
//module instantiation//wallace multiplier
walmul_16 w1(.A(AD),.B(BD),.C(CD));
assign result=CD+AD+BD+1;
always @ (result)
begin if(result_low<=result_high)
C<=(result_low-result_high)+1;else
C<=(result_low-result_high);
end
Diminished one rep.
Wallace Multiplier
ab mod 2n +1=(ab mod 2n – ab div 2n) mod 2n +1
ECE 646 Cryptography and Computer Network Security
ARCHITECTURE/DESIGN
IDEA_ROUND
MODULE
SUBKEYGENERATOR
MODULE
96
SELECT
PLAINTEXT
IN MUX
16 16 16 16
ECE 646 Cryptography and Computer Network Security
PIPE #1
PIPE #2
PIPE #4 CLK
Z1
Z2 Z3
Z4
Z5
Z6
INPUT
PIPE #3
IDEA_ROUND
MODULE
round_n_1
round_n_2
round_n_3
round_n_4
ECE 646 Cryptography and Computer Network Security
SIMULATION RESULTS PLAINTEXT:000B_000C_000D_000E
KEY:0017_0018_0019_001A_001B_001C_001D_001E
SUBKEYS
ECE 646 Cryptography and Computer Network Security
SIMULATION RESULTS
CIPHERTEXT: BB19_9265_37A1_BAB9
ECE 646 Cryptography and Computer Network Security
SYNTHESIS HIERARCHY
idea_top
idea_control idea_round subkey_gen
mod mul mod add mod xor
wal_mul
carry_save_add
ECE 646 Cryptography and Computer Network Security
SYNTHESIS read -format db "idea_control.db"read -format db "idea_round.db"read -format db "subkey_gen.db"link
set_dont_touch {idea_contro,idea_round,subkey_gen}read -format verilog "idea_top.v" > idea_top.lintset_wire_load_model -name "50x50" -library "lsi_10k"uniquifycurrent_design idea_top
set_max_area 0create_clock "clk" -period 10set_clock_skew -ideal -uncertainty 0.33 clkset_dont_touch_network find(clock,"clk")set_input_delay -clock clk 5 all_inputs()set_output_delay -clock clk 5 all_outputs()set_load 5 * load_of("lsi_10k/FD2/D") all_outputs()set_drive drive_of("lsi_10k/FD2/Q") all_inputs()set_operating_conditions -library "lsi_10k" "WCCOM"
compile -map_effort highwrite -format db -hierarchy -output idea_top.dbwrite -format verilog -hierarchy -output idea_top_gate.vcheck_design > idea_top.chkreport_area > idea_top.areareport_timing -path full -delay max -max_paths 5 -nworst 1 >idea_top.timingreport_timing -delay min >> idea_top.timingreport_constraint -all_violators -verbose >> idea_top.const
ECE 646 Cryptography and Computer Network Security
RESULTS/CONCLUSION• Carry_save_add : 30 lines
• Walmul_16 :128 lines
• Mod_add : 32 lines
• Mod_mul : 56 lines
• Mod_xor : 18 lines
• Subkey Generator : 92 lines
• Idea Control Logic : 142 lines
• Idea round module : 228 lines
• Idea top module : 52 lines
728 lines of
Synthesizable
Verilog code
ECE 646 Cryptography and Computer Network Security
RESULTS/CONCLUSIONModule Worst path
(largest delay)Best path (smallest
delay)
Non Combin.Area
Combin.Area
Carry_save_add 6.81 ns 6.38 ns - 306
Wallace_mul 16.54 ns 10.82 ns 5,682
Mod_mul 26.31ns 8.48 ns - 7,941
Mod_add 13.38 ns 6.59 ns - 475
Mod_xor 5.92 ns 5.84 ns - 64
Idea_round 28.24 ns 0.76 ns 3,240 32,636
Subkey_gen 1.06 ns 0.88 ns 4,064 1,111
Idea_control 6.28 ns 1.48 ns 78 48
Idea_top 29.46 ns 0.77 ns 7,382 33,795
ECE 646 Cryptography and Computer Network Security
RESULTS/CONCLUSION
• A portable synthesizable RTL code in Verilog (FPGA, ASIC)
• Verified full functionality
• Limited logic synthesis library LSI_10K
• 728 lines of code, testbench, synthesis scripts
• Area = 41,172 x (2 input NAND-gate) x (area in technology X)
Area in 0.6 um tech = 20 um x 20 um
4 mm x 4 mm = 16 mm2
• Throughput = 64 bits /( (8x4+1) cycles x 25 ns)
77.5 Mbit/s