6.1 ALU Blocks and Control 1. Adder 2. Multiplier 3. Datapath Generation Contents

ALU Blocks and Control

1. Adder

2. Multiplier

3. Datapath Generation

Contents

1. Adder Full Adder

Boolean equation

CARRY A B B C C A

A B C (A B)

SUM A B C A B C A B C A B C

A B C CARRY (A B C)

Sum(Odd Parity) CARRY A+B+CC

Which is better?

Boolean Equation 1 :

CARRY evaluation is more urgent since CARRY is in the critical

[ Ripple Carry Adder ]

CARRY A B C (A B)

SUM A B C CARRY (A B C)

Boolean Equation 2 : CARRY A B C SUM (A B C)

SUM A B C A B C A B C A B C

Alternating Complementary Form

At Odd Stages At Even Stages

CARRY A B C (A B)

SUM A B C CARRY (A B C)

CARRY (A B) (C A B)

SUM (A B C)(CARRY A B C)

CARRYABC

Alternating Complementary Form

Dynamic Serial Adder

a an1 0

b bn1 0

s sn1 0

)]1()1()1([)1()1()1()1()1(

)]1()1([)()1()1()1(

tCtBtAtCARRYtCtBtAtSUM

tBtAtCtBtAtCARRY

Dynamic Configuration

CARRY GATE

OPTIONALPRECHARGE

DEVICE

CKC (CARRY)

SUM GATE

OPTIONALPRECHARGEDEVICE

Set/ResetCircuit

][ BACBACARRY

Full Adder Truth Table

01234567

7 6 5 4

Mutually Complement

FC - on terms

FS - on terms

Conjugate Symmetry ; input 을 뒤집으면 output 도 뒤집힌다

00001111

110011

010101

00010111

01101001

SUM F (A,B,C)

CARRY F (A, B,C)

SUM F (A, B,C)

CARRY F (A,B,C)

Another Configuration of Carry & Sum Logic

CARRY STAGE

1 PROPAGATE

1 GENERATE

SUM STAGE

CARRY(t 1) F (A, B,C) A B B C C A A B C (A B)

SUM(t 1) F (A, B,C) A B C A B C A B C A B C

A B C CARRY (A B C)

Dynamic full adder using np CMOS logic style

Layout of the dynamic full adder

Looking at the FA Truth Table

00001111

110011

010101

00010111

01101001

CPCPSUM

BAP whereBPCPCARRY

0BA when C

1BA when CSUM

0=BA when A(orB)

1BA when CCARRY

Transmission Gate Implementation

BA B CARRY

)( BAP

CLA (Carry Lookahead Adder)

C G P C where G A B

= G P G P P G .. + P P .. P P C

i i i i 1 i i i

i i i 1 i i 1 i 2 i i-1 2 1 0

Available for (# of inputs 4)

Carry bypass structure - basic concept

(N=16)-bit carry bypass adder(each stage: M bits)

tp = tsetup + M * tcarry+(N/M - 1) tbypass + M*tcarry+tsum

tsetup : time to create G and P signals

tcarry : propagation delay through a single bit

tbypass : propagation delay through MUX

tsum : time to generate sum

Worst case delay

Combining 4 Domino Carry Lookahead Blocks

Manchester Carry Chain (4-bit)

Limit 4 stages

In the worst case, 6 Series Tr.s to the ground.

C0 MANCHESTERCARRY CHAIN

G1 P1 G2 P2 G3 P3 G4 P4

C0 C1 C2 C3 C4

C G P C1 2 1 0 GP Block Sum Block

Improving Worst Case Carry Prop. Time

MANCHESTERCARRY CHAIN

CKP1 P2 P3 P4

Faster pass transistor chain due to lower parasitic C loading

Manchester CC Adder Floorplan

Dual CC Scheme One for Carry Prop.

The other for off-loading the 1st CC from the SUM-block.

SUMGENERATE

CSA (Carry Select Adder)

S41 ~ S7

A4 ~ A7 B4 ~ B7

S40 ~ S7

S0 ~ S3

A0 ~ A3 B0 ~ B3

S0 ~ S3

C80 )C(CC

)CC(CC

) 0CC always (since CCCCC

A4 ~ A7 B4 ~ B7

Realization of MUX with restoring logic

Note) Realization of MUX with pass-transistor gates

Threshold voltage loss per stage

Vdd Vdd - Vt Vdd - 2Vt

Carry Selection

Use restoring logic for critical path

For carry propagation, use restoring logic in the alternating pattern

S0 ~ S3

A0 ~ A3 B0 ~ B3

C80 C8

C120 C12

Number of bits for each stageex1) 32-bit case : 4, 4, 5, 6, 7, 6 ( or 4, 4, 5, 6, 6, 7)ex2) 64-bit case : 4, 4, 5, 6, 7, 8, 9, 10

Minimization of Carry Propagation Path Delay

Carry Select Scheme (prepare result for each case, Cin=1, Cin=0)

Simplify the carry selection using the characteristic between Ci0 & Ci

Take complement carries alternating the Even and Odd stages

Adjust each block size with the consideration to the delay of carry select logic carry propagation delay of each block = = carry propagation delay to the

block adjust

4 4 5 6 6 7

eg. for 32-bit path

16-bit Linear CSA(Carry Select Adder)

tadd = tsetup + M * tcarry+ (N/M ) tmux + tsumM: #of bits/stageN : total # of bits

Square Root CSA

tadd = tsetup + M * tcarry+ 2N tmux + tsum

N = M + (M+1) + ….. + (M+P-1) = MP + P(P-1)/2 = P2/2 + P(M - 1/2 ) ~ P2/2 9 stage

Assumed MUX delay is comparable to 1-stage carry prop delay

12 ~6(?) Number of clock cycles

for this signal to be obtained

Propagation Delay of Linear and Square Root CSA and linear RCA

Carry Skip Adder Ripple Carry Adder 와 CLA Adder 의 Compromise

P p p p p

G g g p g p p g p p p

O3 0 1 2 3

O3 3 2 3 1 3 2 0 3 2 1

a3b3a2b2

a1 b1a0 b0

a15b15

a14b14

a13 b13

a12b12

c4c8c12

P12, 15 P8, 11 P4, 7

G12,15 G8,11 G4,7

Worst case delay

pi’s and gi’s are computed from pi=aibi and gi = aibi

Initially, c4, c8 and c12 are cleared

After 4 clock cycle (at T0+4Tc), G-values are calculated as cout assuming ci=0(P-values are also calculated by then)

At this time (at T0+4Tc), true cout in the first stage, c4 is obtained.

After one, two and three clock cycles respectively, assuming the delay of each AOI gate as Tc, true values of c8, c12 and c16 are obtained.

Sum and cout of the last block are obtained at (T0+4Tc+2Tc+4Tc)

Worst case delay

Comparison of Carry Select & Carry Skip Adder

A 32-bit Carry Select Adder

A 32-bit Carry Skip Adder

RCAAreaArea

kkSpeed

delays)r multiplexe where(822

logic-P

22delays)r multiplexe where(12

AreaAreaArea

kkSpeed

Stage # 1 2 3 4 5 6bits/stage 4 4 5 6 7 6inc. delay 4 1 1 1 1 1

Stage # 1 2 3 4 5 6bits/stage 4 5 6 7 8 2inc. delay 4 1 1 1 1 2

32 bit9k2(k2=delay due to 1-bit addition or MUX)

Conditional Sum Adder

S21 C3

1 S20 C3

S11 C2

1 S10 C2

S01 C1

1 S00 C1

Triple 2-input MUX

(C1=1)C3

(C1=1)S1

(C1=1)S2

(C1=0)

S2 C3 S1

(C1=0)

Carry Lookahead Tree Adder

Previous CLA implementation is not very adequate due to fan-in, fan-out problem & irregularity, despite the small(5) number of logic levels. Make it regular, using log2n - logic levels.a3 b3 a2 b2

g3 p3 g2 p2

G2,3 P2,3

G0,3 P0,3

a1 b1 a0 b0

g1 p1 g0 p0

G0,1 P0,1

Gj+1,k Pj+1,k

Gi,k Pi,k

kjjiki

jikjkjki

,,1,1,

[ 1st Part ]

Carry Lookahead Tree Adder

iijjijCPGC ,1

Cj+1 Ci

Gj+1,k Pj+1,k

Gi,kPi,k

Gi,jPi,j

a3b3 a2b2

a1 b1 a0b0S3 S2 S1 S0

C3 C2C1 C0

[ 2nd Part ]

[ Complete CLA Tree Adder ]

Carry Save Adder

Ripple Carry Adder

Carry Lookahead Adder

CSA (Conditional Sum Adder)

CSA (Carry Skip Adder)

CSA (Carry Save Adder)

Carry Propagate Adder

Carry Save Adder

Carry Save Adder is used wherever a large number of operands have to be added.

F.AF.A F.AF.A F.AF.A F.AF.A F.AF.A F.AF.A

aibici

CSAstages

F.AF.A F.AF.A F.AF.A F.AF.A F.AF.A F.AF.AF.AF.A

CarryF/F

SumF/F

Previous CycleCarry

Previous CycleSum Operand

2. Multiplier

Add-and-Shift Algorithm

Multiplication procedure

by Pencil-and-Paper Method

Multiplication procedure

by Add-and-Shift Algorithm

multiplier

multiplicand

The Serial-Parallel Multiplier

b2Ab2Ab2ABA

as expressed is BAproduct The

)b, ... ,b,(bB

)a, ... ,a,(aA If

D D D D

Output

a0a1a2a3

4x4 array multiplier

tmult = [(M-1) + (N-1)] * tcarry + (N-1) * tsum+ tand

both tcarry and tsum are important

Sum and Carry generation time need to be similar.

Carry-save Multiplier(CSM)

Rectangular floorplan of CSM

The Modified Booth Algorithm (cont’)

Booth Encoder Table

multiplied by

Ab2k-1

negative

Booth Encoder

= b2k b2k-1

= b2k+1

Booth Multiplication Example

Initial 0

Add -A

2-bit Shift

Add 2A

2-bit Shift

Add -A

Operation

The Modified Booth Algorithm

Let’s consider a number B = (bn-1, bn-2, ... , b1, b0) written in 2’s-complement.

B may be rewritten as follows :

Example

In this equation, the terms in brackets is in the set {-2, -1, 0, 1, 2}

n-bit multiplier generates exactly n/2 partial products

B b 2 b 2n 1n 1

0)=b (assume 2)b2b(bB 12k

0k12k2k12k

0101 2)b2bb( 2

321 2)b2bb( 4

543 2)b2bb(

01 2b2b2b2b2bb

Parallel Multiplier

Multiplier has two basic operations

The generation of partial products

The summation of partial products

Parallel multiplier avoids the overhead that is due to the separate

controls of these two operations

The gain in speed is obtained at the expense of extra hardware

Parallel multiplier can be implemented such that it supports a high rate

of pipelining

The Braun Multiplier

A straightforward implementation One bit of the new partial prod

( ai . bj )

One bit of the previous partial product

Carry in

In the first four rows there is no horizontal carry propagation (using carry-save adder)

The Braun Multiplier (cont’)

F.A F.A F.A

F.A F.A F.A0

p4p5p6p7

a0a1a2a3

Baugh-Wooley Multiplier

Modified in order to allow multiplication of signed number

Let’s consider 2 number A and B (2’s complement number)

The product A.B is

2b2b)b ... (bB

2a2a)a ... (aA

1-n1-n

12n22n

ji22n12n

2)ba(22)a(b :because

2ba2ab2)b(a2ba2)baba(2

2a22b2b22a2ba2ba

2ab2ba2ba2baBA

ji1n1n1n1n

1a when ,2a2aA

0a when ,2aA

complement s2'in bit sign :a

Baugh-Wooley Multiplier (cont’)

a0a1a2a3

F.A F.A F.A

p4p5p6p7

a3 b3F.A

Wallace Tree Multipliers

Full adder vs Wallace tree

Useful whenever a large number of operands are to add.

Completion time in Braun or Baugh-Wooley multiplier Using Ripple Carry Adder:

Proportional to the twice number of n of bits

Using Wallace trees,

Proportional to log2 (n)

Full Adder

20 20 20

Wallace n

20 20 20

2n 2021

Recursive Decomposition of the Multiplication

A 2 A A

B 2 B B

A B 2 A B 2 (A B A B ) A B

PH L L H L L

Partitioning two operands

Four Terms (AH.BH, AH

.BL, AL.BH, AL

.BL) are computed using 4 p-bits multipliers

The results are collected through Wallace tree

Recursive Decomposition of the Multiplication

AL X BL

AH X BL

AH X BH

AL X BH

AL X BL

AH X BL

AH X BH

AL X BH

Aligning the four partial products

AL X BL

AL X BH

AH X BH

AH X BL

4 X W34 X W3

AH AL BH BL

Booth’s Algorithm Array Multiplication

Another approach to the design of a parallel multiplier for two’s complement operands

The basic cell in rows i perform an add, subtract or transfer-only

CASS (Controlled Add/Subtract/Shift) Cell

Pin (partial product)a

(subtract)

ca)c(aPc1,D If

ca)c(aPc0,D If

ca)c(aD)(Pc

caPP1,H If

PP0,H If

H)(cH)(aPP

inininout

ininout

transfer

Booth’s Algorithm Array Multiplication (cont’)

CASS CASS

CASS CASS CASS CASS

CASS CASS

P5 P4 P3 P2 P1 P0

a3 a2 a1 a0

0 0 0 0

Xi Xi-1

ShiftShift

SubtractAdd

Generalized block diagram of an array multiplier

Q. Why use an array multiplier if it requires as many addition steps?

A1) Array multiplier is combinational circuit, where the signals flow without being clocked.

Multi-pass Array Multiplier : normally use a clock, but the cycle time for passing through k arrays is < kTc

A2) Some speed-up schemes are possible.

e.g. E/O array, Wallace-tree

Even-Odd Array

Wallace-tree Multiplier

6 x 6 Wallace-tree Multiplier Example

(n : width of the Wallace tree)

e.g. For 32-bit, number of adders necessary for each stage is

32 - 22 - 16 - 12 - 8 - 6 - 4 - 3 - 2

Total delay = 9 x adder delay

nDelay2

Datapath and its elements in bit-slice organization

MEMORY

DATAPATH

CONTROL

3. Datapath Generation

Two layout strategies for bit-slice datapath

Layout of 4-bit DP using layout strategy II (feedthrough)

1-D placement vs. 2-D placement

1-D placement vs. 2-D placement(Cont’)

Datapath Layout Flow

circuit design floorplan : block ordering, bus track assignment

schematic drawing : tr. sizing

layout cell drawing : leaf cell layout

layout assemble : leaf cell integration (routing)

DRC / LVS : design rule check, layout vs. schematic

back-annotation simulation with the exact capacitance

RTL descriptionRTL description

FloorplanFloorplan

Schematic DrawingSchematic Drawing

Cell DrawingCell Drawing

Layout AssembleLayout Assemble

DRC / LVSDRC / LVS

Back-AnnotationBack-Annotation

Datapath LayoutDatapath Layout

Datapath Design Case (ACCENT HK386)

real mode support of x86 instruction set

enhanced (pipelined) datapath

problems & practices of general DP layout

Datapath structure

3 major blocks alu, register file(32bit)

barallel shifter(40bit)

segment/effective address(32bit)

BarrelShifter

Track capacity

VSS VDD TRACK(6)

Control, Clock

N-well P-well

6 vertical wires/track in metal 1 metal3 reserved for P & G routing

metal2metal1

Power Grid From bottom & left(chip edges)

Considering IR drop

Cell Structure

Initial cell template decision Nwell in the left

Pwell in the right

data flow vertical

control flow horizontal

Similar cell structure as VTI

Cell width

– 80 for PMOS

– 70 for NMOS

2510 35 45 10 25

N-well P-well

Cell Structure

모든 쎌에 power line 이 통과함 power line width

10 (2 contact)

power line location 25 to the inside

from the boundary

Accent Cell Layout Flow ( 어느 학생의 탄식 )

Block Spec.

Schematic

처음에 cap 을 가정하고 시뮬레이션 TR sizing 은 간단하게 끝냄 Cap 값이 정확하지 않으니까 optimize 는 필요

없고 spec 만 만족하면 된다고 생각함 전체 assemble 이 되어야 정확한 cap 이 나오므로

한참동안 일에서 손을 뗌 assemble 된 다음 layout 을 고치면 새로 다시

assemble 해야 하는데 엄청난 노가다

Data flow

Control

Cell Design(I) Using 45 degree line for cell design

Cell Design(II) needless effort to reduce cell size

ugly poly; current crowding

Data flow

Critical path used for transistor sizing in relevant datapath element

•Track assignment needs to be done before the cell layout (not after).

AssembleData flow

대학 성적과 사회에서의 성공은별로 correlation이 없는데 ,

이것은 사실 신기한 일이 아니다 .

사회 성공의 요인과 대학성적 기준이 종종 상당히 다르니까 .

대학 성적과 사회에서의 성공은별로 correlation이 없는데 ,

이것은 사실 신기한 일이 아니다 .

사회 성공의 요인과 대학성적 기준이 종종 상당히 다르니까 .

학점의 가치학점의 가치

6.1 ALU Blocks and Control 1. Adder 2. Multiplier 3. Datapath Generation Contents

Documents

Digital Integrated Circuits 2nd Arithmetic Circuits 3 Building Blocks for Digital Architectures Arithmetic unit-Bit sliced datapath (adder, multiplier, shifter, comparator, etc.) Memory

Simulation of Booth Multiplier with Verilog-XLrjdang/Booth_Multiplier/EE103_Lab3_Part2.pdf · Simulation of Booth Multiplier with Verilog-XL ... half-adder,full-adder, ... Carry Lookahead

Design and Implementation of Multiplier Using Kcm and Vedic Mathematics by Using Reversible Adder

Hardware Implementation of Modal Interval Adder/Subtractor ...hfahmy/thesis/2012_01_miasm.pdf · Accordingly, modal interval Adder/Subtractor and multiplier units are more efficient

Further Details Contact: A. Vinay 9030333433, … · Implementation of 4x4 Vedic Multiplier using Carry Save Adder in ... Low Power 8-bit ALU Design Using Full Adder and ... PROJECT

B.Tech(CSE) Syllabi - Sir CRR College of · Web viewBinary Adder- Subtractor. Decimal Adder. Binary Multiplier. Magnitude Comparator. Decoders. Encoders. Multiplexers. HDL For Combinational

Design and Implementation of Low Power Multiplier using ... · low power multiplier used for various VLSI applications. The work includes designing of basic gates, half adder and

Design and Implementation of High Radix Booth Multiplier ......4.1 Parallel prefix adder basics. A Parallel Prefix Adder (PPA) is equivalent to the CLA adder. The two differ in the

Full page photo - mhssce.ac.in€¦ · 3.1 Full adder, Ripple carry adder, CLA adder, Carry Skip Adder, Carry Save Adder and carry select adder 3.2 Array Multiplier 3.3 Barrel shifter

Efficient Implementation of 16-Bit Multiplier-Accumulator Using Radix-2 Modified Booth Algorithm and SPST Adder Using Verilog

B Tech in Computer Science & Engineering · decomposition, Multilevel NAND and NOR Circuits, Addition of unsigned and signed numbers, BCD Adder, Fast adder, Array multiplier, Multiplexer,

COMPARISON OF MULTIPLIER CIRCUITS - IJRTER · Carry Save Adder Multiplier, 4x4 Carry Look Ahead Adder Multiplier circuits. All the above circuits are impleme nted using Verilog module

ECE 331 – Digital System Design Multi-bit Adder Circuits, Adder/Subtractor Circuit, and Multiplier Circuit (Lecture #12)

Figure 4-1 Serial Adder with Accumulatorusers.ece.utexas.edu/~roth/book/CH4_Slides.pdf · Figure 4-1 Serial Adder with Accumulator X Y ... -- This is a behavioral model of a multiplier

Low Area Wallace Multiplier Using Energy Efficient CMOS Adder … · · 2017-07-27An Existing Full adder design using alternative logic scheme gives low power delay product (PDP),

Datapath Components · Datapath Components Multipliers, Counters, Timers, Register Files. 8 Multipliers ... slide) Instead, ALU design uses single adder, plus logic in front of adder’s

RCA - CSA Adder Based Vedic Multiplier · PDF fileRCA - CSA Adder Based Vedic Multiplier ... Figure 9: Block diagram of CSA based 4x4 Vedic Multiplier Figure 10: Modified CSA blocks

EE 658-102 VLSI I Project - Information Services & Technologydmisra/lab/pdf/EE658_pp.pdf · EE 658-102 VLSI I Project 2-bit [3x3]x[3x3] ... Full Adder, Half Adder, Multiplier] •

Design of 4-Bit Braun Multiplier using Kogge-Stone Adder

DESIGN AND IMPLEMENTATION OF 64 BIT MULTIPLIER BY …pep.ijieee.org.in/journal_pdf/11-94-141502138517-19.pdf · carry save adder based multiplier (a) (b) Figure 13. RTL views for