A CORDIC Arithmetic Processor Chip

IEEE TRANSACTIONS ON COMPUTERS, VOL. C-29, NO. 2, FEBRUARY 1980

A CORDIC Arithmetic Processor ChipGENE L. HAVILAND AND AL A. TUSZYNSKI

Abstract-A monolithic processor computes products, quotients,and several common transcendental functions. The algorithms arebased on the well-known principles of "CORDIC," but recourse to asubtle novel corollary results in a scale factor of unity. Compared toolder machines, the overhead burden is significantly reduced. Also,expansion of the functional repertoire beyond the circular domain, i.e.,addition to the menu of hyperbolic and linear operations, is a rela-tively trivial matter, in terms of both hardware cost and executiontime. A bulk CMOS technology with conservative layout rules is usedfor the sake of high reliability, low-power consumption, and good cyclespeed.

INTRODUCTIONTHOUGH the concept of CORDIC Arithmetic is said to beTquite old [1], [41, its implementations and applicationscontinue to evolve. The acronym comes from Volder's Coor-dinate Rotations Digital Computer [1], developed in 1959 forair navigation and control instrumentation. An avuncular idea,particularly effective in decimal radix computations, was pre-sented by Meggit in 1962 [2], under the label of "pseudo-division and pseudomultiplication." In 1971, Walther [3]generalized elegantly the mathematics of CORDIC's, showingthat the implementation of a wide range of transcendentalfunctions can be fully represented by a single set of iterativeequations.. Cochran [4] benchmarked, about the same time,various algorithms and found that CORDIC techniques surpassalternative methods in scientific calculator applications.The pertinent effort of the Naval Ocean Systems Center

(NOSC) culminates in the CORDIC Arithmetic Processor Chip(CAP Chip) of Fig. 1 that simplifies the architecture, booststhe speed, and reduces the power consumption of monolithicarithmetic modules. All computations are based on the ex-ecution of either

xi+, =xi +yi2-' (la)or

X(i+),2 =(1 + 72 x(i+ ), . (lb)While the first of these equations represents the regularCORDIC iterations, the second [5], [61 forces the scale fac-tors of circular and hyperbolic functions to unity. ROM in-structions govern the selection of either (la) or (lb), but the option is executed by the sign bit of one of the operands.This paper begins with multiplication and division, because

Manuscript received January 4, 1979; revised June 26, 1979.The authors are with the Microelectronic Circuit Design Branch,

Naval Ocean Systems Center, San Diego, CA 92152.

(a)

I BL)B Yi (APe In

Sign Bit(b)

Fig. 1. (a) The CAP chip. (b) Block diagram of the CAP chip.

the pertinent algorithm compares well, in its own right, withalternative techniques, especially in digital filter applications[71, [8], [17]. Moreover, the said algorithm is simple andtransparent enough to project the feedback principle as thefundamental and common link of the CORDIC [1], [3],

U.S. Government work not protected by U.S. copyright

68

HAVILAND AND TUSZYNSKI: CORDIC ARITHMETIC PROCESSOR CHIP

yes

Legend:

Decision node

Output the product or quotient

ROM Instruction

, I75- Z, 8000+.50 +

..25 + *Z2 .3000

0-

-.25 +

-.50 +

* Z3 .0500+ Z5z -.0125*Z4 =-.0750

+.75 +

+.50 +

*.25 -

0-

-.254

-.504

Z1 .1000Z5:.0375

-X -Z6 .00625* Z4 .0250Z3 .1500

* Z2 -.4000

a) Z0:.8 b) ZO =.1Fig. 2. (a) The multiply-divide algorithm. (b) Reduction to zero of operand z.

[4], the Meggitt [2] and the Chen [16] procedures. Also,once established, it can be easily expanded to trigonometricand hyperbolic functions.

A DIGITAL FEEDBACK LOOPTake three numbers, xo, yo, and zo, zo being restricted to

the range

O < z0 < 1. (2)Perform the following iterations:

Yj+j1 =yi+xo i2-' (3a)=Yo +Xo (5j2-') for i= 1 throughn (3b)

but

Xi+i =Xi=xo.

(3e)(3f)

To the Si operator assign the values of either plus one or minusone, depending on the polarity of zi. In other words, let

+ I+ if zzi>ot-1 .if zi


A 1st Ouadrant

Bot B,0 BIo Bi,, Bot eIB

Columns

2through9 -b

Fig. 3. Schematic diagram of the 2'i scaler.

What we have here is an arrangement which amounts to anautonomous feedback loop of attractive simplicity. The zero-seeking mechanism of the loop is controlled entirely by thesign bit of z; the sign bit determines e5i and that operator imple-ments, in turn, the crucial add-subtract option of (3) and (4).Equation (5) implies that a sufficiently high i, say i = n, will

justify the approximationZn+ 1 = Z 0(6a)

and will, therefore, lead to the expressionn

E(5i2-') -zo (6b)i=1

and hence, by substitution into (3b), to the product equationYn+i =Y 'Yo +xOZO- (6c)It goes without saying that we could have reduced to zero

the operand y rather than z. Exercising that option, one ar-rives at

Yn+1 =Y z0 (7a)or

xo[ (i2-')]--yo (7b)and, therefore, at the quotient equation

zI=ZO +-*20(7c)xO

Equations (6) and (7) demonstrate that the digital feedbackalgorithm, defined by (2)-(4), leads to practical implementa-tions of functions germane to multiplication and division [3].Compared to alternative techniques [9], digital feedback looksgood in division and, as we shall soon see, it becomes evenmore attractive when circular and hyperbolic functions areconsidered.

THE CHIPBlock diagram particulars and layout details of the CAP chip

are shown in Fig. l(a) and (b), respectively. There are butthree major circuit blocks: the all important 2-i scaler [1 ], a12-bit two's complement adder, and a 24-bit accumulator ofthe shift-register variety. The narrow block at the top of thechip is the "i" counter, called the "sequencer." The I/O buf-fers are distributed around the periphery of the chip, but allmultiplexers are merged with the appertaining functionalblocks.The 24-bit data are processed in two 12-bit steps. The lower

byte of the word held by the accumulator is released into theadder-subtractor by the local clock, an intermediate step ofaddition or subtraction is performed, and the result is returnedto the accumulator. The upper byte is subjected to similartreatment, beginning with the release of data that now in-cludes the carry generated by the lower byte, and terminatingwith the acceptance of the result by the accumulator.The scaler takes up a large part of the chip's surface and a

sizable fraction of the cycle time. This is both understandableand acceptable considering its function, namely, the two'scomplement multiplication of every 8i and every xo8i by 2-i.The scaler is indeed the centerpiece of CORDIC hardware;the present implementation is distinctly faster than its shift-register counterparts. The circuitry is really quite simple,owing to the highly efficacious transmission gates of theCMOS technology [12]. A matrix of such gates, arranged asshown in Fig. 3, propagates the sign bit while it shifts the databy "i" bits. The signal flow matrix of the scaler is square(Fig. 4) with 24 columns for bit locations and as many rowsfor cycle numbers. However, since there is some redundancein Fig. 4, the physical matrix need only be half as large as isits model, and that is why we have in Fig. 3 a matrix of 12rows for 12 exponents and 24 columns for as many bits.The transmission gates are driven by a sequencer with out-

puts A, B, and C (Fig. 5). Output A enables either the upperor the lower byte and output C picks the first or the second

70


Bit ICycla 23 22 21 20 19 18 17 16 is 14 13 12 11 10 09 08 07 06 05 04 03 02 01 00

00 23 22 21 20 19 18 17 16 15 14 13 12 11 10 09 08 07 06 05 04 03 02 01 0001 23 23 22 21 20 19 18 17 16 15 14 13 12 11 10 09 06 07 06 05 04 03 02 0102 23 23 23 22 21 20 19 18 17 16 10 14 13 12 11 10 09 08 07 06 05 04 03 0203 23 23 23 23 22 21 20 19 18 17 16 15 14 13 12 11 10 09 08 07 06 0o 04 0304 23 23 23 23 23 22 21 20 19 18 17 16 19 14 13 12 11 10 09 08 07 06 05 0405 23 23 23 20 19 18 17 16 19 14 08 07 06 05Quadrant 11 Quadrant06 23 23 23 21 20 19 18 17 16 Is 09 08 07 0607 23 23 23 23 23 23 23 23 22 21 20 19 18 17 16 19 14 13 12 11 10 09 09 0708 23 23 23 23 23 23 23 23 23 22 21 20 19 18 17 16 19 14 13 12 11 10 09 0809 23 23 23 23 23 23 23 23 23 23 22 21 20 19 19 17 16 15 14 13 12 11 10 0910 23 23 23 23 23 23 23 23 23 23 23 22 21 20 19 18 17 16 19 14 13 12 11 1011 23 23 23 23 23 23 23 23 23 23 23 23 22 21 20 19 18 17 16 15 14 13 12 1112:23 23--23- 23 23 23......23 23......23 23 23 23 23.2221-20-19-16-1?-12 23 23 23 23 23 23 23 23 23 23 23 23 23 22 21 20 19 1917 16 1914 13 12

1 23 23 23 23 23 23 23 23 23 23 23 23 23 23 22 21 20 19 18 17 16 15 14 1314 23 23 23 1415 23 23 23 1616 23 Quadrant 111: 23 23 Quadrant IV: 1617 23 Same an Row 11 23 23 Same as 1718 23 of Ouadrant 1 23 23 Quadrant li 1819 23 23 23 1920 23 23 23 2021 23 23 23 2122 23 23 23 23 23 23 23 23 23 23 23 23 23 23 23 23 23 23 23 23 23 23 23 2223 23 23 23 23 23 23 23 23 23 23 23 23 23 23 23 23 23 23 23 23 23 23 23 23

Fig. 4. Shifting of xi andyi.

Ci =AjB, ++AOBO(A,+B1) ++C1 (A,+Bl) (AO+Bo)

\ C

Schematic Diagram of'ceis A0, B and C9ST . StW

CLK . CloekICKL: InshlbltelokSaD . SI.90 Cytn Data (12 Sit)

Fig. 5. The sequencer.

quadrant, while outputs B select one out of the 12 pertinentcolumns. Only regular CORDIC cycles are counted by the se-quencer. Clock signals which pace the "double cycle" and the"scale factor" operations are inhibited by status bits outputtedby the instruction ROM (Fig. 10).The adder has a configuration which resembles conventional

look ahead logic, but its circuitry is unique. Selected frag-ments of our "dynamic CMOS" circuits are shown in Fig. 6(a)and (b). Whereas there are 2n transistors in a conventionaln-input CMOS -gate, the complementary CMOS configurationof Fig. 6(a) has only n + 1. The resultant savings in surface

Fig. 6. (a) Dynamic CMOS, carry 1. (b) Fragment of the adder.

area are most welcome in the CARRY module which has a totalof 98 ports in the C12 gate. The precharge clocks 01 and 02are, of course, synchronized with the clock which controls thetiming of the lower and upper byte add-subtract operations.The adder gate logic utilizes a combination of a XOR-AND-

OR element and a HALF-ADDER. Various gate configurations,including the conventional CMOS NOR and the Floating XOR[13], are employed, but the whole thing adds up to only 26transistors. The chip layout for this section of logic is shownin Fig. 6(b). 10 000 p2 of surface area are consumed if metal-gate bulk-CMOS with 8 ,um spacing is employed.

Bit

SBD

ST

CLK

C,O

71


OutPLut x',y',zFig. 7. The generalized CORDIC cycle.

GENERAL CORDIC EQUATIONSWritten in conventional format, the CORDIC equations look

as follows:xi+1=Ixi + 6i2'iy (9a)yj+j1= yj- 6j2-'xj (9b)

and

Zi+l=zi-5ioi. (9C)The CAP chip executes the above and two supplementary

sets of equations:

X(i+ 1), 2 = X(i+ 1), 1 + Si2 y(i+ 1), 1Y(i+l),2 =Y(i+l),i - 8i2 x(i+l),1Z(i+1),2 = Z(i+I),l - Sioi

and

X(i+ 1), 2 = (1 + y2 ') x(i+ 1), 1

(lOa)(lOb)(lOc)

Y(i+ 1), 2 = ( + 721)y(i+ 1),Z(i+ 1), 2 = Z(i+1), 1 + 0.

(1lb)(lIc)

The flow diagram of the generalized instruction cycle is givenin Fig. 7. Equations (10) and (11) represent the "doublecycle" and the "scaling factor K" operations, respectively.The raison d'etre of these operations is explained below.Let us focus our attention on the variable "i" in (9). We

have already come across the relationship

Oi = 2-iand will yet tackle the functions

Oi = arctan (2-i)and

Oi = arctanh (2-').

(12a)

(12b)

(12c)Implied in (6), as well as in the concept of feedback itself, is(11 a) the convergence relationship:

72


TABLE ITHE K SCALING FACTOR FOR TRIGONOMETRIC FUNCTIONS

"oi"l COSINE0 .70710678118651 .89442719099992 .97014250014533 .99227787671374 .99805257848295 .99951207608716 .99987795203477 .99996948381888 .99999237069289 .9999980926568

10 .999999523163211 .999999880790712 .999999970197713 .999999992549414 .999999998137415 .999999999534316 .999999999883617 .999999999970918 .999999999992719 .999999999998220 .999999999999521 .999999999999922 1.000000000000023 1.000000000000024 1.0000000000000

PRODUCT.7071067811865.6324555320337.6135719910779.6088339125177.6076482562562.6073517701413.6072776440935.6072591122989.6072544793325.6072533210899.6072530315291.6072529591389.6072529410414.6072529365170.6072529353859.6072529351031.6072529350324.6072529350147.6072529350103.6072529350092.6072529350089.6072529350089.6072529350088.6072529350088.6072529350088

CORRECTION.5000000000000.7500000000000.8750000000000.9375000000000.9687500000000.9843750000000.9921875000000.9960937500000.9980468750000.9990234375000.9995117187500.9997558593750.9998779296875.9999389648438.9999694824219.9999847412109.9999923706055.9999961853027.9999980926514.9999990463257.9999995231628

Required Product of Cosines =K Multiplier for Cosines =Error at 24 cycles of 40 bits =

MULTIPLIER1.0000000000000.7500000000000.6562500000000.6152343750000.6152343750000.6152343750000.6104278564453.6080433726311.6080433726311.60744958.02750.6074495802750.6073012771548.6073012771548.6072642104265.6072642104265.6072549443100.6072549443100.6072549443100.6072537860631.6072532069407.6072529173798

.6072529350088

.6072529173798

.0000000176290

DIFFERENCE.3927470649912.1427470649912.0489970649912.0079814399912.0079814399912.0079814399912.0031749214365.0007904376222.0007904376222.0001966452662.0001966452662.0000483421460.0000483421460.0000112754176.0000112754176.0000020093011.0000020093011.0000020093011.0000008510542.0000002719319

-.0000000176290

n

Zo - E (5OA) < On. (13)i=1

This inequality is fulfilled by (12a) and (12b) but not (12c),for the simple reason that2-(i+l) = ()(2-), i.e., term i+l=(-)oftermi (14a)

24[arctanh (2-1) - E arctanh(2-i) >> arctanh (224), (ISa)

i=2

but24

arctanhi(2- ) - 1: arctanh (2-i)i=2

andarctan [2-(i+ 1)]I > ( 2 ) arctan (2-i),

i.e., term i + > 2 of term i,but

+ arctanh(2-k)1< arctanh (224),k

(14b) ifk=3,4,7, 12, 13, 18, 19, and 21.

arctanh [2(i+ 1)]


TABLE IICOMPUTATION AND LISTING OF DOUBLE PASS CYCLES FOR HYPERBOLIC FUNCTIONS

"i" 81Arctanh(2

24 .000000059604623 .000000119209322 .000000238418621 .000000476837220 .000000953674319 .000001907348618 .000003814697317 .000007629394516 .000015258789115 .000030517578114 .000061035156313 .000122070313112 .000244140629911 .000488281288810 .00097656281049 .00195312748358 .00390626986847 .00781265895156 .01562627175215 .03126017849074 .06258157147703 .12565721414052 .25541281188301 .5493061443341

24(6 k)

k=i+1

.0000000596046

.0000001788139

.0000004172325

.0000008940697

.0000018477440

.0000037550926

.0000075697899

.0000151991844

.0000304579735

.0000609755516

.0001220107079

.0002440810210

.0004882216509

.0009765029397

.0019530657501

.0039061932337

.0078124631021

.0156251220536

.0312513938057

.0625115722963

.1250931437733

.2507503579138

.5061631697968

Difference

.0000000596046

.0000000596046

.0000000596046

.0000000596046

.0000000596046

.0000000596046

.0000000596046

.0000000596046

.0000000596046

.0000000596047

.0000000596047

.0000000596052

.0000000596088

.0000000596379

.0000000598707

.0000000617334

.0000000766347

.0000001958495

.0000011496984

.0000087846850

.0000699991807

.0005640703671

.0046624539692

.0431429745373

DOUBLEBi l PASS

CYCLES1 02 03 04 05 16 07 18 19 0

10 011 012 013 114 015 116 017 118 019 120 121 122 123 024 1

THETA

.5493061443341

.2554128118830

.1256572141405

.0625815714770

.0312601784907

.0156262717521

.0078126589515

.0039062698684

.0019531274835

.0009765628104

.0004882812888

.0002441406299

.0001220703131

.0000610351563

.0000305175781

.0000152587891

.0000076293945

.0000038146973

.0000019073486

.0000009536743

.0000004768372

.0000002384186

.0000001192093

.0000000596046

REMAINDER

.0431429745373

.0431429745373

.0431429745373

.0431429745373

.0118827960466

.0118827960466

.0040701370951

.0001638672267

.0001638672267

.0001638672267

.0001638672267

.0001638672267

.0000417969136

.0000417969136

.0000112793354

.0000112793354

.0000036499409

.0000036499409

.0000017425923

.0000007889180

.0000003120808

.0000000736622

.0000000736622

.0000000140576

TABLE IIISUM OF ARCTANH (2') INCLUDING DOUBLE PASS CYCLES

Doublet"i"if Pass

CyclesArctanh (2' )

24z (ek)

k=i+1

24 0 .000000059604624 1 .000000059604623 0 .000000119209322 0 .000000238418622 1 .000000238418621 0 .000000476837221 1 .000000476837220 0 .000000953674320 1 .000000953674319 0 .000001907348619 1 .000001907348618 0 .000093814697317 0 .000007629394517 1 .000007629394516 0 .000015258789115 0 .000030517578115 1 .000030517578114 0 .000061035156313 0 .000122070313113 1 .000122070313112 0 .000244140629911 .0 .000488281288810 0 .00097656281049 0 .00195312748358 0 .00390626986848 1 .00390626986847 0 .00781265895157 1 .00781265895156 0 .01562627175215 0 .03126017849075 1 .03126017649074 0 .06258157147703 0 .12565721414052 0 .25541281188301 0 .5493061443341

.0000000596046

.0000001192093

.0000002384186

.0000004768372

.0000007152557

.0000011920929

.0000016689301

.0000026226044

.0000035762787

.0000054836273

.0000073909760

.0000112056732

.0000188350677

.0000264644623

.0000417232513

.0000722408295

.0001027584076

.0001637935639

.0002858638770

.0004679341902

.0006520748200

.0011403561088

.0021169189192

.0040700464028

.0079763162712

.0118825861396

.0196952450911

.0275079040426

.0431341757947

.0743943542854

.1056545327760

.1682361042530

.2938933183935

.5493061302765

.0000000596046

-.0000000000000

-.0000000000000

-.0000002384186-.0000002384186-.0000007152557-.0000007152557-. 0000016689301- .0000016689301-.0000035762787- .0000035762787-.0000035762787-.0000112056732-.0000112056732-.0000112056732-.0000417232513-.0000417232513- .0000417232508-.0001637935639- .0001537935603- .0001637935312-.0001637932984-.0001637914357-.0001637765344-.0040700454028-.0040699271880-.0118825861396- .0118816322906-.0118739973040-.0431341757947- .0430729612990-.0425788901126- .0384805065105.0000000140576

separate, though identical, manipulations of x and y. For ex- The role of the scale factors in the realization of circular andample, setting the gammas in (11) to plus 1 for i = 2, 4, one hyperbolic functions will be discussed in a later section.multiplies both output variables by 1.32812:

CIRCULAR FUNCTIONSx* =(1 +2-2)(1 +2-4)x

= 1.32812x

and

y*=.(l +2-2)(l +2-4)y= 1.32812y.

Prominent among the functions used in servo control is the(1 6a) resolver operation defined by (17) [10]. The search for

"solid-state" resolver hardware is lively and likely to continuefor some time to come. Mathematically, however, one dealswith the old and commonplace rotation of axes depicted inFig. 8. When a pair of rectangular axes is rotated anticlock-

(16b) wise by an angle 0, then the coordinates of a point P transform

24ei

kE (Ok)k=i+1

74


xo

(a)

-oPR

I I o(b)

Fig. 8. (a) "Rotation," given xo, yo, and 00 fmd x' and y'. (b) "Vec-toring," given xo and yo fmd R and 0'.

from xo, yo to x, y in accordance with the equations:x=xo cos0 +Yo sinO (17a)y =

-xo sin 0 +yO cbos . (17b)Interesting, from the viewpoint of implementation, is the

fragmentation property of 0: Theta can be split up into anarbitrary number of other angles. For example, if

0 = 01 + 02, (18a)then

xl =xo cosO +yO sin 01 (18b)YI = -xo sin 01 +yo cos 01

and consequently,x=xl cos02 +Yl sin 02= X0 (COS 01 COS 02 - sin 01 sin 02)+yo (sin 01 cos 02 + sin 02 COS 01)

=X0 cos(0I +02)+yO sin(01 +02). (19c)Interpretation of this result in terms of multiple fragmen-tations leads to a set of recursive formulas, which read asfollows:

xi+, =x,cos80ii+yisinSiOi (20a)Yi+l = -xisinSiOi+yicosSiOi (20b)

and

Zi+ I = Zi- i0. (20c)There are no restrictions on the various 0's, other than thoseconsidered in (10)-(14), in connection with the double cycleoperation. For that matter, (20c) is exactly the same as (9c),though there are significant discrepancies between the other

members of sets (9) and (20). These discrepancies will nowbe eliminated as much as possible, for the sake of hardwaresimplicity.

First off, one can factorize cos i80i in (20a) and (20b):xi+1 = cos SiOi(xi +yi tan SiOi) (21a)

= cos 0i(xi + 6iyi tan 01) (21b)and

yi+ = cos Oi(yi- 8ixi tan Oi). (21c)Next, one can make the arbitrary, but highly convenient, sub-stitution

Oi= arctan (2-1) (22)in order to arrive at

xi+,1 = cos Oi(xi +8i 2-'yi) (23a)Yi+l =cos Oi(yi - S62-'xi) (23b)Zi 1 = Zi 8Si arctan (2-'). (23c)

Finally, one can compare the end results (x**, y, z*) of itera-tions (23) with the end results (x', y', z') of iterations (9) andconclude that

x* =x'Li cos (arctan 21)] (24a).i=O

= Xt [ (+2-2i)- 1/2]2b=xL (1 +2211/](24b)i=o

=Knx'. (24c)That spells out the overall dependence between the two setsof numbers as

(24d'(18c) x* =K,x'and

(19a) y*=KnY'but

(19b) z*=z'.

(24e)

(24f)Since it depends on "n" only, the scale factor K,, is a ma-

chine constant. Consequently, given x' andy', one can realizex* and y* by many simple methods, including ROM look-uptables and regular combinatorial logic, but the scaling factorKtechnique, spelled out in (24) and Fig. 7, is particularly attrac-tive because it offers a host of advantages such as speed, realestate economy, and conceptual simplicity.

INITIALIZATIONWhile the inverse tangent of 2-1 is only 260, the processor

must accommodate angles as large as 1800. This does notpresent any great difficulty, but for the sake of compatibilitywith other functions, it is convenient to implement the rangeextension in two special "initialization" cycles (Fig. 9). Thefirst shifts 0 by 900:

75

n


Img.

2 4 Real

(a)iImg.

Real

(b)Fig. 9. (a) Initialization with 00 = 1500. (b) Initialization with

00 = -15.

HYPERBOLIC FUNCTIONSWhen they are written in vector format, (17) read as follows:

Ix*l |cos0 sinG xoY -sinG CosO |YO (29)

The coefficient matrix is orthogonal, and so are the threegermane matrices shown below:

cosO -sinGA1 = Isin 0 cosO

| cosO -jsin0-jsin0 cosOcosO -jsin0

-jsin0 cosO

(30)

(31)

(32)

Any one of these matrices can be used in expressions equiv-alent to (29) but, naturally enough, a unique geometrical in-terpretation must be associated with any particular matrix.For example, taking A3 and relating it to the imaginary angle

0 =jO, (33)x = xo cos (90) + yo sin 6(900)= 6Yo

y =-xo sin 6(900) + YO cos (900)

= SxOand

z = zO - 6(900).

The second cycle executes the 450 shift,x x cos (450) + y sin 6 (450)

=-/ (x+6y)1

y= (y - sx)

andz=z- 6(450).

(25a) one gets the hyperbolic relationship:(25b) |X* | cos jo -j sin jP xo(25c) |Y* -jisin jp cosj| Yo(25d)

(25e)

(26a)

(26b)

cosh4 sinh | xosinh cosh | Yo

1 tanh xo=coshf tanh1 1 Yo

(34a)

(34b)

(34c)

The last of these equations leads directly to the iterativeformulas

xi+I x1i+(65j2-')yj (35a)(26c) Yi+iIYi + (6i2-')xi

Zi+ Zi- 6itanhl(2-')(26d)

The 1 /IV2 multiplier in (26) had been actually anticipated andincorporated into the geometric K, when (24c) was written as

n

Kn = E (1 + 2-2i)-1/2.i=o

Equation (26) can be normalized, therefore, to readx=x+(6)y

y=y- (6)x

andZ=z- (6)(450).

(27)

(35b)(35c)

and to the scale factorn

Kn = IH cosh (2-')i=l

n

= Hl (1 - 2-2i)-1/2.i=l

(36a)

(36b)

The hyperbolic routine does not require initialization, but(28a) it does call for the "double cycle" operations of (10). Tables

II and III, drawn for n = 24, show 9 double cycles and 11 scale(28b) factor operations; the former produce an overall shift in 0 of

0(max) = zo(max) = 1.09(28c) while the latter generate the scale factor

(37)

00

76


K24 = 1.205. (38)Operations which reduce z to zero (rotations) implement the

transformationx*"xO cosh a+yo sinha

andy* =xo sinha+yo cosha

whereac o

(39a)

(39b)

(39c)or alternatively, generate the functions sinh ca, cosh oa, et, eC,etc., when appropriate values are assigned to the input vari-ables x0 and yo [2]. Vector operations, on the other hand,produce

(40a)z* = zo + arctanh (xo/yo)and

X* = (x2 - y2)-1/2 (40b)as well- as perspicuous mutations of these functions.

OVERVIEWFigs. 10 and 11 show a complete set-up for the execution

of rotation and vectoring operations in the linear, circular,and hyperbolic modes. There are six modules, namely, threeCAP chips, a 16 X 512 ROM, an ROM ADDRESS counter, aclock, and an I/O box. The I/O box loads the data into theprocessor and returns the results to the bus; it also sets thetwo most significant bits of the ROM address. These two bits(A9 and A8) select one of the three sectors of the ROM begin-ning at address zero for linear operations, address 128 forcircular functions, and 256 for hyperbolics. The counter,which generates the other 7 address bits, is first reset and thenallowed to advance one bit per block cycle. The operation ofthe CAP chips is under the control of the instruction sectionof the ROM: Status bits from the ROM cause the executionof either a regular CORDIC cycle, or a "double" cycle, or ascaling factor K operation; they also signal arrival at the "last"cycle, that is, completion of the computation. The sign bit ofeither y or z controls the add/subtract options of all threechips.The external instruction which activates the processor must

include three function selection bits: one for either rotation orvectoring and two for either linear or trigonometric or hyper-bolic functions. These three bits take care of the two deci-sions which start-off the signal flow diagram of Fig. 11. Oncethe function to be executed has been identified, the operationof the calculator is paced along by the local clock. Linearprocessing is purely "CORDIC," but the trigonometric rou-tines add scaling factorK cycles to the menu, while hyperbolicalgorithms use both the double cycle and scaling factor K sup-plements. The execution time of circular functions is slightlylonger than that of linear functions, and the execution timeof hyperbolic functions is longer still, but the implementation

Fig. 10. System organization.

of all three classes of functions is equally simple. Simplicity,of both architecture and circuitry, may indeed be the moststriking and important feature of the CAP chip implementa-tion of the CORDIC concept. Where reliability is at a pre-mium, nothing scores higher than well-founded simplicity.

CONCLUSIONWhereas the performance of a chip depends on its architec-

ture, circuit design, and processing, one may want to separateprocessing from the other two factors when attempting toassess the quality of a device. It is understood that perfor-mance is always technology limited. A faster technology willinvariably bring about higher speed and, possibly, reducepower dissipation at the same time. For a given circuit sche-matic, conversion from conservatively laid out metal-gateCMOS to tightly spaced poly-gate SOS will produce spectac-ular improvement. For that reason, speed alone is hardly asatisfactory measure of circuit design quality; to compare dif-ferent embodiments of an idea, one must speak of minimumcycle times, expressed in multiples (n) of "typical" gate de-lays. The propagation delay of a gate sums up the quality ofthe technology, while "n" gives an estimate of the combinedquality of the architecture and the circuit design. Naturally,one needs a definition of the "typical" gate. Physical dimen-sions present no difficulty-one simply picks a "minimumsize" device-but the typical configuration may be open todispute. We use an inverter with a fan out of three.SPICE analysis [15] of the CAP chip suggests n = 13 as the

minimum cycle time of a two-byte (24-bit) operation. The"gate" delay is roughly 100 ns. More than half of the overalldelay is attributable to the 2-i scaler. This is understandable,

77


Fig. 11. Flow diagram of the CORDIC system.

considering the size of the structure in Fig. 3. Next in order ofnuisance ratings comes the carry circuit C12, whose layout isshown in Fig. 6(b). Taken together, the scaler and the carrydetermine, just about, the effective "n" of the system. Fur-ther improvements in "n" will have to come either from modi-fications of these elements, or from conversion to single-byteoperation. The former approach must await inventive contri-butions, but the latter is feasible right now. The entire systemcan be implemented in single-byte format by recourse tothree micron layout rules; an "n" of 7.5 can thus be realizedwithout changes in circuitry. Furthermore, even an early vin-tage edition of submicron CLOSED COSMOS [14] will accom-modate a complete single-byte system on just one chip. Whatwe have then, in addition to a system which executes tran-scendental functions in 40 Ms, is a good candidate for sub-micron phototyping.

ACKNOWLEDGMENTThe authors acknowledge their indebtedness to C. A. Bass,

Associate Head for Systems, Systems/Aerospace SurveillanceDepartment, Naval Ocean Systems Center, San Diego, CA.

REFERENCES[1] J. E. Volder, "The CORDIC trigonometric computing technique,"

IRE Trans. Electron. Comput., vol. EC-8, pp. 330-334, Sept.1959.

[2] J. E. Meggitt, "Pseudo division and pseudo multiplication pro-cesses," IBM J., pp. 210-226, Apr. 1962.

[3] J. S. Walther, "A unified algorithm for elementary functions," in1971 Proc. Joint Spring Comput. Confi, pp. 379-385, 1971.

[4] D. S. Cochran, "Algorithms and accuracy in the HP-35," Hewlett-Packard J., pp. 10-11, June 1972.

[5] C. A. Bass, "A flow-diagram technique for time-varying coordi-nate systems," NRL Rep. 7199, June 26, 1970.

[6] C. A. Bass and J. F. Kohne, Jr., "Computer program design spec-ification for the AN/WSC-2 digital control system," NELC Tech.Note #2953, NOSC, San Diego, CA, May 30, 1975.

[7] A. Habibi and P. A. Wintz, "Fast multipliers," IEEE Trans.Comput., vol. C-19, pp. 153-157, Feb. 1970.

[8] M. Davis and G. Bioul, "Fast parallel multiplication," PhilipsRes. Rep., vol. 32, no. 1, pp. 44-70, 1977.

[9] H. Schmid. Decimal Computation. New York: Wiley, 1974.[10] S. A. Davis and B. K. Ledgerwood, Electromechanical Com-

ponents for Servomechanisms. New York: McGraw-Hill, 1961.[11] D. G. Steer and S. R. Penstone, "Digital hardware for sine-cosine

function," IEEE Trans. Comput., vol. C-26, pp. 1283-1286, Dec.1977.

[12] E. A. Torrero, "Focus on IC analog switches and multiplexers,"Electron. Des., pp. 64-72, Sept. 1975.

78


[13] RCA Part #CD4030.[14] A. G. R. DingwaU and R. E. Sticker, "C2L: A new high-speed

high-density bulk CMOS technology," IEEE J. Solid-StateCircuits, voL SC-12, pp. 344-349, Aug. 1977.

[15] L. W. Nagel, "Spice 2: A computer program to simulate semicon-ductor circuits," Univ. of California, Berkeley, ELR, 1975.

[16] T. C. Chen, "Automatic computation of exponentials, logarithms,ratios and square roots," IBM J. Res. Develop., pp. 380-388,July 1972.

[17] W. H. Specker, "A class of algorithms for In x, exp x, sin x,cos x, tan-I x," IEEE Trans. Comput., voL C-19, pp. 153-157,Feb. 1970.

Gene L Haviland received the B.S.E.E. degree from the CaliforniaPolytechnic University, Pomona, in 1962.In 1962, he joined the Applied Research Laboratory, Guidance and

Control Division, Litton, IN, where he was actively involved in circuitdesign and hybrid computer development for inertial navigation sys-

tems. In 1967 he joined the SemiconductorDivision of Union Carbide and subsequentlybecame responsible for linear IC design anddevelopment. Since 1972 he has headed theMicroelectronic Circuit Design Branch, NavalOcean Systems Center, San Diego, CA, re-sponsible for design techniques based onlarge-scale integration primarily in the area ofdigital systems.

Al A. Tuszynsld received the B.S. degree fromthe University of London, London, England,the M.S. and D.Eng. Sc. degrees from the NewJersey Institute of Technology, Newark, NJ.After 20 years with various system and semi-

conductor companies, he joined the Depart-ment of Electrical Engineering, University ofMinnesota, Minneapolis, in 1970, to exploreand teach techniques for the design and test-ing of large monolithic circuits. His consultingassociation with the Microelectronics Division

of the Naval Ocean Systems Center, San Diego, CA, dates back to 1974.

Dedicated LSI for a Microprocessor-ControlledHand-Carried OCR System

MICHEL C. RAHIER, MEMBER, IEEE, AND PAUL G. A. JESPERS, SENIOR MEMBER, IEEE

Absthact-The binary picture processing and recognizing stages of anoptical character recognition (OCR) system have been designed usingboth flexibility of available microprocessors and speed of peripheralcuswtom-designed integrated circuits. A dedicated large-scale integrated(LSI) processor performs edge detection and thinning of a 32 X 24 digitized one-piece pattern. The output signal-a set of 3 bit vectors de-scribing the skeletonized character contour-feeds a microprocessorwhich controls the character recognition algorithm including patternsegmentation, filtering, feature extraction, and classification decision.This low-cost equipment is especiaUy suitable for hand-carried OCRsystems where well-formed printed alphanumerics are to be read. How-ever, continously deformed patterns like carefully handprinted charac-ters are recognized as well. A system reading speed of 100 characters/s(or 30 cm/s) can be achieved.

Manuscript received May 29, 1979; revised September 11, 1979. Thiswork was supported by an IRSIA fellowship. This work was presentedat the 1979 International Solid-State Circuits Conference, Philadelphia,PA.The authors are with the Microelectronics Laboratory, Catholic

University of Louvain, Louvain-la-Neuve, Belgium.

I. INTRODUCTIONTHE steadily growing use of computers in industrial and

business environments involves a huge need for data-entry devices with document direct-reading capability withoutuse of tedious high-cost keyboarding operations.Currently available optical character recognition (OCR)

equipments can roughly be divided into three main classesaccording to their complexity level. In the first one arbitrarycontrolled source documents, using bar codes or magnetic-inksupports [1], lead to fairly simple recognition schemes andrelated hardwares, but the algorithms can generally not beextended for real-world character reading. At the other end ofthe available OCR systems, large machines have been built inan attempt to recognize handwriting. For example, automaticmail sorting [2] was successfully achieved with handwrittenpostal zip-code classification even though the large variety ofcharacter shapes encountered and the random backgroundnoise often mixed with the numerals. This generally results in

0018-9340/80/0200-0079$00.75 X 1980 IEEE

79

Documents

A CORDIC Arithmetic Processor Chip