13
IEEE TRANSACTIONS ON COMPUTERS, VOL. C-25, NO. 10, OCTOBER 1976 On the Computational Cost of Approximating and Recognizing Noise-Perturbed Straight Lines and Quadratic Arcs in the Plane DAVID B. COOPER, MEMBER, IEEE, AND NESE YALABIK, MEMBER, IEEE Abstract-Approximation of noisy data in the plane by straight line or elliptic or single-branch hyperbolic curve segments arises in pattern recognition, data compaction, and other problems. A number of questions concerning the efficient search for and ap- proximation of data by such curves are examined. Recursive least-squares linear curve fitting is applied to the original or to transformed data in the plane to provide computationally simple algorithms for sequentially searching out appropriate data points and fitting straight line segments or quadratic arcs to the data found. The error minimized by the algorithms is interpreted. Central processing unit (CPU) times for estimating parameters for fitting straight lines and quadratic curves are determined and compared. CPU time for data search is also determined for the case of straight line fitting. Quadratic curve fitting is shown to require about six times as much CPU time as does straight line fitting. Curves relating CPU time and fitting error are determined for straight line fitting. A number of the preceding results are extended by using maximum likelihood curve estimation, and the modifica- tion of the classical least-squares curve fit resulting from the use of an a priori probability density function for the unknown curve parameters is determined. The problem of deciding with controlled probabilities of error whether a data subset represents a straight line segment or a quadratic arc is examined. Exact or approximate solutions are given for a number of cases and results are also developed for se- quential decision making for achieving specified recognition error with a minimum number of data points. A brief sketch is also provided of a formalism for incorporating noisy curve fitting in the recognition of complicated line drawings. It is argued that maximum likelihood recognition is appropriate and an approach to the decomposition of noisy line drawings and the formation of the likelihood function is sketched. Index Terms-Computational cost, curve-fitting error measures, pattern recognition, recursive least-squares and maximum likeli- hood curve fitting, sequential curve recognition. INTRODUCTION ONE of the fundamental approaches to pattern rec- ognition, in general, and to recognition of line drawings, in particular, is a statistical theoretic approach. Though other, usually heuristic, approaches are also taken, if one is interested in classification with controlled prob- ability of error or with minimum computation cost, then Manuscript received March 21, 1975; revised December 15, 1975. This work was funded by the National Aeronautics and Space Administration under Grant NSG-5036. The authors are with the Division of Engineering, Brown University, Providence, RI 02912. a probabilistic approach is necessary. The question of minimizing central processing unit (CPU) time for rec- ognition is, of course, of great importance if one wishes to treat the recognition of highly complicated patterns and is a subject of growing interest [1]-[6]. In this paper we focus on a subproblem of the more general problem of optimally recognizing complicated line drawings, namely, the problem of recognizing underlying straight lines and quadratic arcs in line drawings and in approximating very noisy data by such curves. These methods are useful in other applications as well, e.g., picture data compression, contour line representation in maps, etc. We present a number of specific algorithms and discuss extensions and interpretations for application to more complicated sit- uations where exact analysis is impossible, but theory for the simpler situations can be a useful guide. We use a model for data generation in which there is an underlying picture, in two dimensions, consisting of straight lines and quadratic arcs and a datum is a noisy perturbation of a point on one of these lines. A probabilistic description of the data is then given by a distribution for the underlying straight lines and quadratic arcs and for the noisy perturbations of these curves. For a portion of this paper we assume the two-dimensional field of the picture is divided into cells, and the data available to the computer are a rectangular array of l's and O's-a 1 if an appreciable portion of a cell is filled by the picture. For the most part, we deal with data generated as a perturbation of a single underlying straight line or of an elliptic or hyperbolic arc. The points of a noisy curve are modeled as independent identically distributed Gaussian perturbations, usually in the perpendicular direction, of the points in the underlying parameterized curve. Denote a curve parameter vector by a. We assume knowledge of an a priori probability density function (pdf) p (a), for this vector, which represents the processors knowledge of the structure of the underlying curve. Let w = (x1,y1,x2,y2, ... ,xm,ym) denote a data vector consisting of a subset of m data points constituting a noisy curve, with (xi,yi) the ith data point. Let f(w a) be the conditional likelihood of the data given a. Then the joint pdf of (a,w) is p (a)f(w I a). (1) We concentrate on two problems. Assume the processor 1020

On the Computational Cost of Approximating and Recognizing Noise-Perturbed Straight Lines and Quadratic Arcs in the Plane

Embed Size (px)

Citation preview

IEEE TRANSACTIONS ON COMPUTERS, VOL. C-25, NO. 10, OCTOBER 1976

On the Computational Cost of Approximating andRecognizing Noise-Perturbed Straight Lines and Quadratic

Arcs in the PlaneDAVID B. COOPER, MEMBER, IEEE, AND NESE YALABIK, MEMBER, IEEE

Abstract-Approximation of noisy data in the plane by straightline or elliptic or single-branch hyperbolic curve segments arisesin pattern recognition, data compaction, and other problems. Anumber of questions concerning the efficient search for and ap-proximation of data by such curves are examined. Recursiveleast-squares linear curve fitting is applied to the original or totransformed data in the plane to provide computationally simplealgorithms for sequentially searching out appropriate data pointsand fitting straight line segments or quadratic arcs to the datafound. The error minimized by the algorithms is interpreted.Central processing unit (CPU) times for estimating parameters forfitting straight lines and quadratic curves are determined andcompared. CPU time for data search is also determined for the caseof straight line fitting. Quadratic curve fitting is shown to requireabout six times as much CPU time as does straight line fitting.Curves relating CPU time and fitting error are determined forstraight line fitting. A number of the preceding results are extendedby using maximum likelihood curve estimation, and the modifica-tion of the classical least-squares curve fit resulting from the useof an a priori probability density function for the unknown curveparameters is determined.The problem of deciding with controlled probabilities of error

whether a data subset represents a straight line segment or aquadratic arc is examined. Exact or approximate solutions aregiven for a number of cases and results are also developed for se-quential decision making for achieving specified recognition errorwith a minimum number of data points.A brief sketch is also provided of a formalism for incorporating

noisy curve fitting in the recognition of complicated line drawings.It is argued that maximum likelihood recognition is appropriateand an approach to the decomposition of noisy line drawings andthe formation of the likelihood function is sketched.

Index Terms-Computational cost, curve-fitting error measures,pattern recognition, recursive least-squares and maximum likeli-hood curve fitting, sequential curve recognition.

INTRODUCTION

ONE of the fundamental approaches to pattern rec-ognition, in general, and to recognition of line

drawings, in particular, is a statistical theoretic approach.Though other, usually heuristic, approaches are also taken,if one is interested in classification with controlled prob-ability of error or with minimum computation cost, then

Manuscript received March 21, 1975; revised December 15, 1975. Thiswork was funded by the National Aeronautics and Space Administrationunder Grant NSG-5036.The authors are with the Division of Engineering, Brown University,

Providence, RI 02912.

a probabilistic approach is necessary. The question ofminimizing central processing unit (CPU) time for rec-ognition is, of course, of great importance if one wishes totreat the recognition of highly complicated patterns andis a subject of growing interest [1]-[6]. In this paper wefocus on a subproblem of the more general problem ofoptimally recognizing complicated line drawings, namely,the problem of recognizing underlying straight lines andquadratic arcs in line drawings and in approximating verynoisy data by such curves. These methods are useful inother applications as well, e.g., picture data compression,contour line representation in maps, etc. We present anumber of specific algorithms and discuss extensions andinterpretations for application to more complicated sit-uations where exact analysis is impossible, but theory forthe simpler situations can be a useful guide.We use a model for data generation in which there is an

underlying picture, in two dimensions, consisting ofstraight lines and quadratic arcs and a datum is a noisyperturbation of a point on one of these lines. A probabilisticdescription of the data is then given by a distribution forthe underlying straight lines and quadratic arcs and for thenoisy perturbations of these curves. For a portion of thispaper we assume the two-dimensional field of the pictureis divided into cells, and the data available to the computerare a rectangular array of l's and O's-a 1 if an appreciableportion of a cell is filled by the picture.

For the most part, we deal with data generated as aperturbation of a single underlying straight line or of anelliptic or hyperbolic arc. The points of a noisy curve aremodeled as independent identically distributed Gaussianperturbations, usually in the perpendicular direction, ofthe points in the underlying parameterized curve. Denotea curve parameter vector by a. We assume knowledge ofan a priori probability density function (pdf) p (a), for thisvector, which represents the processors knowledge of thestructure of the underlying curve. Let w = (x1,y1,x2,y2,... ,xm,ym) denote a data vector consisting of a subset ofm data points constituting a noisy curve, with (xi,yi) theith data point. Let f(w a) be the conditional likelihood ofthe data given a. Then the joint pdf of (a,w) is

p(a)f(w I a). (1)We concentrate on two problems. Assume the processor

1020

COOPER AND YALABIK: STRAIGHT LINES AND QUADRATIC ARCS

has been directed to and found a portion of a noisy curvenear an end. 1) Recursive estimation is explored wherebeginning with a few data points, the parameter vectorresulting in a best curve fit to this data is determined andthe resulting curve is then used to seek out additionalnearby data which are appropriate to the curve. Theserecursions are repeated until the fit is completed. Since theappropriate data are generally a portion of a picture con-sisting of many lines and quadratic arcs, a recursive pro-cedure appears to be necessary to segment the data. (Theproblem of intelligent data search arises in a variety ofcontexts, e.g., in ballistic missile decoy tracking [7].) Someresults relating errors and CPU time are also presented.2) Results are obtained on decision making with controlledprobabilities of error and minimum CPU time as to whichof two or more classes of straight lines and quadratic arcsis the curve underlying the data. We treat a simple versionof this problem and then discuss extensions to otherproblems of interest.At the end of the paper, we briefly formulate aspects of

an approach to the modeling and recognition of picturesconsisting of noisy curves. The ideas and results of thebody of this paper constitute important ingredients forsuch an approach.

MODELS

Since we take f(w a) to be Gaussian, if p(a) is alsoGaussian then Bayesian estimation and decision makingcan be used in the preceding problems 1) and 2). However,since the p (a) used may be otherwise, such as a functionrestricted to constant values over a few simply describedregions, for the most part we use maximum likelihoodfunctions (mlf), as shown in (2), for estimation and decisionmaking instead:

max p(a)f(w a). (2)a

We first derive results for p(a) removed from (2), and thenpoint out the result of including p (a). Omission of p (a)reduces the curve fitting problem to that of least-square-error curve fitting, i.e., f(w a) is (27rc2)-(m/2) exp [-(1/2a.2)q(w,a)] where q(w,a) is the sum of the squaresof the distances of the m data points to the curve havingparameter vector a. Then maxa f(w a) is realized forq (w, a) a minimum-which occurs for the a resulting in aleast-square error curve fit to the data.We briefly describe the two curve fitting models on

which this paper is based. Both are linear regressionmodels. 1) First, we are interested in the least squares fitof a straight line to n data points. The most reasonableerror measure is undoubtedly the perpendicular distancefrom a data point to the line. However, results are simplerto obtain and essentially as satisfactory if we use the par-ameterization y = c0 + c1x and error measure yi - (c0 +

c1xi) for a given data point (xi,yi) or the parameterizationx = c0 + c1y, and analogous associated error measure, de-pending on whether the slope of the fitted line has mag-nitude less than or greater than 1. 2) The second model ofinterest is the fitting of data by a quadratic arc. A portionof an ellipse is most useful for our purposes. The param-eterization x(t) = xo + a cos t, y(t) = yO, + b cos t + c sint has many desirable features and variations have beenused successfully in some problems, e.g., [8]. However,difficulties requiring nonlinear programming may arise inrelating values of t and data points-especially if the dataare very noisy and if only a subset of the available data areused. At alternative procedure is to ask for the coefficientsfor which the sum of the perpendicular distances squaredfrom n data points to the curve

C1X2 + C2Xy + c3y2 + C4X + C5y- 1 = 0 (3)is a minimum. The solution can be found but the requiredcomputation is considerably greater than if an alternativeerror measure, which we now describe, is used.1 Let c de-note the vector (c1,c2, -- ,c5)T and z denote the fivecomponent vector (x2,xy,y2,x,y) T. Let zi denote the zvector for the point (xi,yi). Then we seek the c vector forwhich

n

E (CzTi - 1)2i-l1

(4)

is a minimum. The resulting curve can be any one of anellipse, a hyperbola, or a parabola. The chance of the lastof these occurring for noisy data is essentially zero. Wewould like to obtain an ellipse and in the section on ex-perimental results we discuss ways of coaxing the algo-rithm to generate an ellipse or at least a useful hyperbola.The formulation can of course be used for a straight line,for curves of higher order or for other parameterizationsas well.An interpretation of (4) is the following. Consider the

ellipse and data Vd (xd,yd)T in Fig. 1. Let v0 (X0,y0)Tdenote the ellipse center and ve (xe,ye)T denote theintersection of the ellipse with the straight line from v0 toVd. c Tz - 1 can be rewritten as

(v - v.)TC(v - v,) -vTCv0 - 1 (5)where

c [ci C2/2] and vo = -(1/2)C-1 [ ]-C2/2 C3 C5

Note that the presence of v7Cv0 in (5) restricts the ellipsesthat can be realized with this parameterization. A meansof circumventing this difficulty is discussed in the sectionon experimental results. Upon defining ue to beVe - Vo

'A "T" superscript for a vector denotes the transpose of the vector.

1021

IEEE TRANSACTIONS ON COMPUTERS, OCTOBER 1976

^AUe

0

Fig. 1. Illustration of the parameters in (6).

and to be a scalar such that Vd -v = U, + 6Ue, (5) be-comes (ve - Vo)TC(ve - v0) - vOJCV0 - 1 + 3UYC6Ue +

2buTCUe, or b(6 + 2)u CU5. Finally, since UTCUe -V TCVO- 1 = 0, (5) can be written as

b(b + 2)p (6)

where p is the constant 1 + vjTCvO. Note that 6 is the ratioI Vd -Ve I I/ I Ue II and, hence, is the ratio of the distance

of the data point to the ellipse, along the line from the datapoint to the ellipse center, divided by the distance from theellipse to the ellipse center along this line (see Fig. 2). Since

lUe I I is a maximum in the direction of the ellipse majoraxis and a minimum in the direction of the ellipse minoraxis, it can be seen that if two data points are equidistantfrom the ellipse, but one lies along the ellipse major axisand the other along the ellipse minor axis, then the con-

tribution to (4) of the data point lying along the ellipseminor axis will be greater, assuming 6 << 1. Also, for 6Ifixed, (6) has larger magnitude for b > 0 than for 6 < 0. Thedifference in magnitudes is negligible for b << 1.

RECURSIVE ESTIMATION

Estimation of the parameters of the models y = cO + c1x

and c1x2 + C2XY + C3y2 + C4X + C5Y - 1 = 0 is efficientlyrealized through use of recursive least-squares estimation.Specifically, letting z = (1,X)T and bn = (yl,y2, *,yn)T,or z = (x 2,xy,y 2,x,y)T and bn = (1,1, ',I) T, n i's, de-pending on whether we are dealing with the first or thesecond model, respectively, and An = [z lZ21 . *z* ZnJT, theproblem of determining the optimum c is that of deter-mining the c which minimizes IAnC - bn 12. The solutionfrom [9] or [10, p. 75] is

cn = A+bn (7)

where A+ denotes the pseudoinverse (ATAn)-1A' T. How-ever, from [11] we see that (AnTAn)1 can be computedrecursively. Let Bn denote A [TA. Then

B1~ (jzizniT> 1 =B _ Bl zn+±1zn+lTBn1n+l I~~+ Zn+lTB -lZn2+1

(8)

Since Z= z=bi, we see that the optimumcn+l is given by

cn+l= Bnj4Zn+l

[Bl-nl_ B- z +Zf+1TB+ 1

nB~ + Zn±lTB -lzn+l (9)

Hence, B 1̀ and Zn are efficiently computed recursively,

and c n±l is computed in terms of these functions and thelatest data, i.e., bn+1 and zn+l.The error 6n = I Anc' - bnl 12 can also be computed

efficiently. Since

n+I2

n1 T n+1

'Ch+l = E3 bi - 2 (3 bkZk cn+l + E (zkTcn+1)2k=l \k=1 k/=1

it is trivial to show that

en+1 =1 bn+1l 12 Zn+l±TB-1Zn+llIbn+11 12 Zn+lTcn+l (10)

where Ibn+I 112, Zn+1, B;1 are all trivially recursivelyupdatable so that Cn+ is computed in terms of I bn I 1 2,Zn, Bn 1, bn+1, and Zn+l. Note that almost all of the com-putational effort for Cn+l and en+1 is accounted for in thecomputation of B'lI and then B7,1,Zn+1. Since this com-putation is needed for the determination of both Cn+l anden+l, it follows that once one computes one of these esti-mates, the other is computed cheaply.

USE OF a priori INFORMATION IN PARAMETERESTIMATION

Assume the data are that for an underlying straight line,we are more or less free to choose the xi's, and the associ-

yJ

1022

COOPER AND YALABIK: STRAIGHT LINES AND QUADRATIC ARCS

3500

3000

2500 FQ

:10L

w

U.)

2000

1500 t

1000 F

500

0 .1 .2 .3 .4 .5 .6

Er ror

Fig. 2. Fitting cost (CPU time) and fitting error as functions of thenumber n of data points used for straight line fitting to noisy data.

ated y,-'s are random variables. Then (2) becomes

maxp(,y)(27ru2)-r/2 exp-Y

*[-(1/2U2 ) E (yi yo Y X,)2]i=1

and can be reformed as

maxp(y) exp

respect to the x -axis play a role analogous to that of xi forthe straight line. We model the data point (xi,yi) as a

roughly Gaussian random variable NA( IVei,I2 11Vei112/4)from the origin along the ray at angle Oi. (Then bi is

(11) tN(O,¢2/4).) More generally, for vo not quite 0 and p notquite 1, assume (Xi,yi)T - vo is generated as an observationof a Gaussian random variable N(0,fC2|IUe 112/4p2) along a

ray through vo and at angle roughly Oi. This pdf at pointzi is given by (13):

n

E Xi. -(1/2of2)(Cn y)T n i=1

n n

i=l i=l

(cn )-y)

.(2iro2)fn/2 exp [-(1/2Uf2) E (yi - c - cnxi)2] 0(12)

where [Cn,C ncIT is cn. The second line of (12) involves onlythe sum of squared errors in the least squares line fit to thedata. Hence, the maximization in (12) involves the max-imization of the first line only, i.e., the best ry is determinedby p(y) and Cn, and is seen to be simple to compute. In(12), y can be augmented to include xl and xn as randomvariables.

If the data x 1,y 1, -,Xn,Y0 are that for a quadratic curve,

f(X1,Y,*** ,X0,yn Ic) is not as simple a function of c. As-sume j6 << 1 and that the data can be recentered in a

computationally reasonable way so that vo O and p 1.Then (6) 26 and a maximum likelihood decomposition

similar to (12) is easily derived upon letting angle Oi with

(27r)-n/2[var]>1/2exp[-(l/2o-2)(yTzi 1)21 (13)

var or21 Uei112/4p2 (14)

where I1ue, 12/p2 is well approximated by 1 (xi,yi) II2/(xi,yi)C(xi,yi)T. An alternative model resulting in (13) even

when p * 1 is to treat (xi,yi)T -vo as a two-dimensionalrandom vector with the angle a random variable, perhapsuniformly distributed over an interval, and the amplitude,given the angle, distributed as in (13).

COMPUTATION COSTS

Since recursive estimation schemes are- under consid--eration, we estimate the CPU time for one such recursion.We deal with four costs, these being:

1) t, cost of computing cn+2) te additional cost of computing en+13) tp cost of predicting the best initial search point

for new data point search4) ts (n + 1) new data point expected search cost given

the first search point.

1x720

Xl

6,171

X\ 6

'x n=4x~ ~~~~~ __.

1023

IEEE TRANSACTIONS ON COMPUTERS, OCTOBER 1976

TABLE IAssembly Language Instructions and Average

Realization Times for IBM 360/67

Average RealizationTime (Approx-imation to theClosest Micro-

Instruction Mnemonic second)

Add memory to register A IAdd register to register AR 1Brancha B 1Compare C 1Compare logical CLIDivide register to

register DR 9Load from memoryto register L I

Load address LA 1Load halfword LH 1Load multiple LM 1 +R/2bLoad register to register LR IMultiply memory to

register M 5Multiply register to

register MR 4Move MVI IAnd N 2Shift left double logical SLDL 2Subtract register to

register SR 1Shift right double logical SRDL 2Store from register

to memory ST IStore multiple STM 1 +R/2b

a Various conditional branch instructions used.bR is the number of registers stored or loaded.

All costs are estimated by-determining the numbers-ofvarious types- -and associated CPU realization times forarithmetic, Boolean, and memory management operations.CPU realization times are for assembly language programsand are taken from [12]. The instruction list and associatedrealization times are given in Table I. Table II contains alist of the number of the various instructions determiningthe four aforementioned costs and the resulting costs. Ashort explanation of the details of these calculations is thefollowing. The inversion of BV,I is carried out by firstcomputing Bn1zn+l. This vector is then used in the com-putation of (8). Since Zn+1 = (1,xn+1)T when fitting astraight line, only two multiplications are required incomputing B 1zn+l in this case. Since B l1n+nzn+1TBjlis symmetric, only the diagonal and half the off-diagonalentries must be calculated. Finally, it is not necessary todivide each component of Bn zn+lzn+1TB-l by 1 +zn+1TB-lzn+1. Dividing the vector B81zn+1 by 1 +zn+1TBnlzn and then computing [(1 +zn+lTB-lZn+)-1Bn- lzn+lIz n+ITB 1 is more, efficient.Now consider costs 3) and 4). For model 1 (the straight

line model), a choice is made for xn+1 and then a searchmade for a data point having this xn+1. If the data arecomplex consisting of more than one noisy line and arc, thedata point of interest is that closest to-the line determined

by c n. If-there is only one line and no arcs present, the onlyconsideration is a least cost search for the data point andsuch a search, for data generated by Gaussian perturba-tions of a straight line, involves a good initial estimate Yn+1followed by successive examination of cells above andbelow those last explored. The necessary estimate in thefirst case, which is also the minimum mean-squared errorestimate to be used in the second case, is

YA+1 = Zn+lTCn. (15)

In practice, x,+1 would probably be chosen such that thedifferences x,+, - x, are equal and this constant chosenat the start of the search. For model 2 (the quadratic arcmodel), the preceding considerations also apply. The initialchoice we consider in the search for a zn+l is a point zn+on the curve

-n+lTcn = 1. (16)

Such a point might be found by choosing an xn+1 or a Yn+Iand then solving (16) to determine the other and, hence,zn+1. The search for zn+1 is then probably most cheaplycarried out by determining and then searching in the di-rection perpendicular to the curve at z n+1. Alternative lessaccurate choices for An+1 and An+1 might be considered,although the saving in-costs-for- computing these estimates

1024

COOPER AND YALABIK: STRAIGHT LINES AND QUADRATIC ARCS

TABLE IINumbers of Major Operations and Total CPU Times PerRecursion for Computing cn, &n, jn +I for Straight Line

and Quadratic Arc

Quadratic Arc Straight Line

cn 86AR + 5DR + 74LR + 2A + 6AR + 2DR + L +80M + 83SRDL + 15LR+M+ 10MR+others = 880,Us 2SLDL + 5SR + 1 1SRDL

= 118,is6n 5M+ others = 46/is A + AR + L + 4LR + 3MR

+ SR + 3SRDL + ST =27/s

Yn+l DR +5M + others AR+LA+LR+M+58 /s SRDL + ST = 1 1 As

Search AR + B + CLI +2LA +i cells 2LH + 4LR + 3MvI +

SLDL+2N +(i- 1) x(4B + C + 2CLI + 2LA +LR + 2MVI) = 21 + (i-1)12 ,uS

For generality, only fixed-point operations were used;noninteger parameters were represented by multiplicationby 216 and then truncation. The experimentally determinedsum of the CPU times for computing one recursion for Cn,n, and nn+ 1 is roughly 200/s for straight lines, which is al-most 35 percent greater than the above estimates. Thismodest discrepancy is probably due to our having usedquantized estirnated execution- times and to other factorssuch as interruptions, etc.

is small compared with the total computational cost of onerecursion, and, hence, not a significant consideration.For the first model, the search for (xn+i,yn+i) is the

sequence (xn+l,9n+l + A), (xn+l,9n+l-,A), (xn+1,9n+1 +2A), etc., where A is the quantization interval. Under theassumption that Yn+l - Zn+jTC is a Gaussian randomvariable, with c = (co,c)T the true parameter vector, thespecified search is of lowest cost. Assuming the varianceJ2 of Yn+1 - (co + CiXn+l) given xn+l is much greater than/A2, an estimate of the expectation of the number of itera-tions to finding Yn+l is obtained as follows. Yn+l and 9n+1

zZn+1TCn are independent Gaussian random variableswith common expectation Zn+1TC. The variance ofYn+1 isof course, o-2 and that ofYn+1 is var cn + (xn+1)2 var cn [13,p. 298]. Note that by cn and cn we mean the first and sec-ond components of cn. Hence,

var9+ =1 [n1 + (xn+l- n)2/L (xi -n

and T2 = var(yn+l -9n+1)= varYn+l + varY9n+

= U2[1 + n-1 + (Xn+1- n)2/ E (xi -n)2

(n denotes n-1 E= xi.) The expected number of itera-tions is

PIYn+1 _n+I = 1) + 2 PYn+1-Y9n+1 = -1l+ 3 * PIYn+1 Yn+I = 21 +

which is roughly

2 E 2iA(2r 2+1) (112) exp[-(1/2uf+i)i2A2]i=O

4(27r) 1/2Un+l/A\ (17)

for /A>> 1. Note that //A»>> 1 ffn+l/A>> 1, which thenimplies that the probabilities of the events Yn+ 1 = Yn+1 +iA and Yn+1 = Yn+1 - iA are each approximatelyAv(2a2+i2)-1/2 exp[-(1/2 +1)i2A2] and the sum in (17) canbe approximated by an integration. When xi - xi- is aconstant for all i < n,

af+1 =.2(n3 + 12n2 + 20n + 4)/(n3 + 8n2). (18)Then, the expected number of iterations is roughly

4(2Xr)1/2(U/A)[(n3 + 12n2 + 20n + 4)/(n3 + 8n2)]1/2.(19)

As n - oo, (19) converges to 4(2r)-1/2(a/A) and is indeedclose to this value for n > 10.From Table II note that the sum of costs 1)-3) is roughly

six times as great for a quadratic arc as for a straightline.

VARIABILITY IN STRAIGHT LINE APPROXIMATIONAND COMPUTATION TIME WITH THE DATA SUBSET

USED

We see that fitting cost is a function of n-the numberof data points used. How does approximation variability

1025

IEEE TRANSACTIONS ON COMPUTERS, OCTOBER 1976

vary with fitting cost? Assume the yi are perturbations ofsome line c0 + c Ix. In [6], the measure of variability usedis the square root of the sum of the variances of the ycomponents of the fitted line end points. Here, we com-ment instead on the variability of the likelihood function(12) with n. (For a more direct evaluation of the effect ofn on curve classification error, the section on stoppingbehavior is more illuminating.) Expression (20a), the ex-ponent in the first line of (12), is a quadratic form whichhas a chi-square distribution with two degrees of freedomwhen -y is the true underlying parameter vector, -yt.

n.

f-2(cn _ )T (Cn=_ycn). (20a)

Equation (20b), the exponent in the second line of (12), isa quadratic form which is distributed as chi square withn - 2 degrees of freedom.

n(-2 E (yi- cn CX)2. (20b)L11

The two quadratic forms are statistically independent.Also, co, Cn and the yi are all Gaussian random variables.Ifp (-y) is a broad function of -y encompassing yt, then themaximized first line of (12) is simply p(cn). If p(y) isnarrow and concentrated about yt and (12) is maximized,then zy in (20a) is yt and (20a) is essentially chi square withtwo degrees of freedom. In all cases, if p (-y) is Gaussian,then (12) will be Gaussian. In the two extreme cases justmentioned and also in intermediate cases, we see thatunder the true hypothesis the interesting dependence of(12) on n takes place in n. Since 6n is also a useful mea-sure of goodness of curve fit, we take

Std dev (6n )1f21E [(6n )1/2] (21)as our measure of variability. The expected CPU time forfitting a curve is

nn(tc + te + tp) + E ts(i). (22)

i=1

Typical curves of (21) and (22) as functions-of n are shownin Fig. 2.

STOPPING PROCEDURES

Assume that a -straight line- or q_uadratic curve -is to befit to data entering a local region. Our interest is to choosean appropriate subset of the data associated with the

.__ _7 . W_1_ LZ_I 1_ _ I I

noisy perturbation of a linear function of the variable xagainst the alternative H1 that the data are a noisy per-turbation of a quadratic or unknown nonlinear functionof the variable x. An understanding of this problem shouldprovide a rough understanding of the reliability of deci-sions concerning underlying straight line and quadraticarc hypotheses modeled as (4).The x values x1,x2, - - *, at which data points are sought

are chosen first, based on prior knowledge of the lengthsof lines or curves likely to be encountered. In practice,equal increments xi - xi_- would probably be used. In thissection we assume the increment fixed and determine thestructure of optimum or near optimum decision rules. Anappropriate choice for the x increments can then be easilydetermined. We assume we have an a priori joint distri-bution for whether the underlying curve is a line or aquadratic and for the associated unknown parameters.Denote this densityp (H,c) where H denotes the hypoth-esis and takes values H. and H1 for line and quadratic,respectively, and c is the parameter vector for the curvetype for the associated hypothesis. The joint likelihood ofH,c and the acquired data Y1lY2, * ,Yn to which we areinterested in fitting a curve is

(23)

We consider two versions of a simple decision problembecause these lead to results which are particularly in-sightful. The two versions represent two different as-sumptions of prior structural knowledge of the underlyingcurves in x.Problem 1) We assume the underlying curve is y = co +

clx under Ho and is y = co + clx + c2x2 with C2 knownunder H1. c0 and c1 are different under Ho and H1, areunknown and may enjoy large ranges of possible values.The assumption of c2 known is made in order that we beable to devise a test to distinguish between linear functionsof x and functions of x which have at least some minimumquadratic content of interest.Problem 2) A linear function is assumed under Ho, and

no assumption is made concerning the form of the under-lying curve under H1. Hence, the assumptions range fromsomewhat restrictive in 1) to very unrestrictive in 2). Anin between case involving hypotheses as in 1), but with c2unknown can be treated, but it seems to offer no additionalinsight and, therefore, is not considered.The decision procedure then for 1) for fixed n is

max p(Hi, )f(y, > (24Yn)Hi,Hi if d -> 1 (24)

underlying curve in order to be able to make at reasonablecost a decision with predetermined probability of error max p(HO, )f(y1,* ,y, IHO,y)as to whether the underlying line is a straight line or aquadratic arc. We are not concerned here with deter- H0 otherwise.mining where the curve ends and a new one begins. In Equivalently, loge of the left side of (24) can be comparedorder to obtain some insight into the problem, rather than with 0. (f and y are parameter vectors appropriate to theworking with curves in two-space we deal with the simpler hypotheses with which they are associated.) We begin byproblem of testing the hypothesis Ho that the data are a neglecting the prior probabilities and hence basing the

1026

p(H,c)f(xl,...,Xn JH,c).

COOPER AND YALABIK: STRAIGHT LINES AND QUADRATIC ARCS

decision on

(25)max f(y1, * *.,Yn IHo,y)

if

Since a sequential likelihood ratio test in some cases enjoysthe property that decisions with specified probabilities ofincorrectly recognizing the true hypothesis can be madeusing a minimum expected number of data points, we focuson this approach. Here, upper and lower thresholds arechosen and the decisions "quadratic arc" or "straight line"are made at the first value of n at which (25) crosses theupper or lower threshold, respectively. For 1) we show thatthe test reduces to a Wald sequential probability ratio test(SPRT) [13, pp. 365-382], [141, having the desired propertythat the decision for accepting Ho or H1 is made withpredetermined probabilities of incorrect decision underHo and H1, set by the designer, and this performanceachieved by processing the minimum expected number ofdata points. The design and behavior ofoptimum decisionmaking for a fixed number of data points is obvious fromthese results. The parameter vector cn giving the bestlinear fit to the data has previously been found to be (9),i.e.,

cn = Bnjlz n.

If the quadratic approximation is used with known coef-ficient c2 for the second-power term, the least squares es-timate for (c,cD)T is given by

C"1 - Bnj1 Ej Zi(yi -CXi=l1

or

c"-=c'-Bj-1 EZC2X2 (26)i=l1

Then a sequential maximum likelihood ratio test consistsof comparison of the loge of the maximum likelihood ratio(25) with a pair of stopping boundaries. The loge of (25)is (27) multiplied by a constant.

(1/2) [- L - zfcn)2 + E3 (y - zfc"'-C2Xi)21.Li-1 i= 1

(27)(z Tc n and z [C"n + C2X2 are merely the approximations toyi provided by the linear and quadratic functions whichare least squares fits to the n data points.)

Expression (27) can be rewritten as

yEyi[ZTC - (Z7Tc"n + C2X 2)]i=11

- (1/2) E [(z7'Tc n)2 - (zf/c'n + c2XiZ)2]1. (28)i= 1

This is the usual operation of cross-correlation of the datasequence with a reference function and then the additionof a number derived from the reference function. Thereference function is a function of the data in this test ofcomposite hypotheses. An interesting simplification of (28)can be made. Through substitution in (28) of (9) and (3),(28) reduces to

i Yi [ (ziTBl zc2x) - C2Xi2] - (1/2)

[(, ZTC2X2) B 1 (E zic2x?) - E (c2x )2 ]. (29)

The interpretation of (29) is as follows: B'l;11zic2xF isthe coefficient vector for a least squares linear fit to thesequence c2X,c2x2, ** ,c2xn. Thus, the line having ith x,yvalues xi and ziTBnj12l zjc2x2 is the line which differs fromthe quadratic function c2x2, at the points x1, * ,xn, leastin a sum of error squares sense (see Fig. 3). Consequently,the decision function redutees to that which would be usedfor testing the hypothesis that the data are a Gaussianperturbation of C2x2 against the alternative that the dataare a Gaussian perturbation of the line which is most likeC2X 2 at the points X1,X2, * ,Xn. The linear hypothesis usedhere is the most conservative possible! The resulting errorrate will of course be higher than if the parameters in thetwo hypotheses were known. The question that then re-mains is whether the data are such that the test performsas would a test for hypotheses which were truly simple.The answer is yes, and this is justified in the Appendix.

In summary, then, the stopping test reduces to the testof a simple hypothesis against a simple alternative and thestopping thresholds are set in accordance with the classicaltheory [13, pp. 365-382] for such hypotheses. (Note thata fixed-n test can also be easily designed to have specifiedprobabilities of misclassification because the distributionsunder the equivalent simple hypotheses have the identicalconstant-element diagonal covariance matrices.)

Case 1: The Wald test cannot be used here because thelikelihood function is incompletely known. A partiallyheuristic alternative is possible. The mean value functionis assumed to be linear under hypothesis Ho and nonlinearunder hypothesis H1. No additional information is as-sumed about the mean value function under the alterna-tive hypothesis. A standard approach then would use achi-square test on the sum of the squares of the errors inthe best linear fit to the data. This would involve fixing nand a >0 and deciding Ho or H1 depending on whether thesum of the squared errors in the best linear fit to n datapoints lay within or outside a 1 - a confidence interval.The drawback of such a test is that if the confidence in-terval is large and if H1 is true, the probability of the de-cision being Ho may be large, i.e., the power of the test maybe small. Ifwe wish to design the test to operate with somespecified minimum power, then we are back to Case 1). Acompromise is to choose n such that the length of the 1 -

1027

max f(yl, - - ',Yn IH,,O)

IEEE TRANSACTIONS ON COMPUTERS, OCTOBER 1976

y10

8

6

4

2

0

0

-I

2 4 6 8 \ 10

(b)

Fig. 3. (a) Mean value functions for equivalent pair of linear and qua-dratic simple hypotheses (the straight line is a least squares fit to thequadratic arc). (b) Optimum reference function for the test of hy-pothesis.

a confidence interval can be fixed beforehand at a valuewe consider to be reasonable for Ho. Then, if the confi-dence interval is not violated, we can be quite certain thatHo is true. A reasonable choice for the length d of theconfidence interval might be the following. Let e(n) denote(en/n)12, i.e., the square root of the average of the squarederrors in the best linear fit to n data points. Then we willconsider a smooth curve of length 1 to be a line if it lieswithin a rectangle of length I and width d, which we spec-

ify, with dil << 1. Since the data for our curves may exhibitconsiderable fluctuation, we shall consider n data pointsto have been generated under H0 if

E(n) <d. (30)

Since the variation in lengths of curves of interest may beconsiderable, we choose an appropriate number p and thendetermine d from

dp = 1. (31)

Of course, a meaningful d must be greater than the noisestandard deviation a. In summary, then, a confidence in-terval length d is chosen in terms of specified p and 1 by(31) and n is then determined in order that a 1- a confi-dence interval under Ho have this length. We might choose1 to be somewhat smaller than the a priori expected lengthsof the lines which could be present or simply choose 1 to beof minimal size yet sufficiently large that n is not excessive.Finally, we mention that these results can be used as a

guide for devising a sequential test in an obvious way.

The preceding can be extended to include a priori pdf'sfor the unknown parameters; (24) takes interesting formsfor two extreme cases. If p (Hz,y) and p (Hi, 3) are broadfunctions of y and 1, respectively, then upon using the form

(12) for the numerator and denominator of (24), we see themaximum likelihood ratio is just (25) multiplied by theconstant p (H1,fn)/pp(HO, yn) and a Wald type test of hy-pothesis can be designed much as before, but now to havejoint probabilities of Ho and misrecognition or of H1 andmisrecognition. On the other hand, ifp(H0<y) and p(HI,f)are highly concentrated about points y' and 13', respec-tively, then (24) is p (Hl,fl')f(y1,y2 * * ,Yn |H1,f3')/p(Ho,Y')f(YbYy2, . * Jn,H,,y'), or the form (12) may beused in the numerator and denominator of (24). In eitherrepresentation, (24) is a test of a simple Gaussian hy-pothesis against a simple Gaussian alternative and againa Wald type test can be constructed. .

If p (H.,y) and p (H1,fl) are Gaussian, then the numer-ator and denominator of (24) are Gaussian distributions(they become constant xfp(H,fl)f(y1, *** ,Yn I H1, 3)dfand constant X fP (H0,y)f(y1, . . ,Yn H,y)dy, respec-tively) and the decision problem is a test of a simple hy-pothesis versus a simple alternative. An optimum decisionfunction for fixed n can be easily realized, and suboptimummethods for sequential decision making are treated in theliterature.

If Ho and H1 are both linear hypotheses, then uponusing (12) the second exponentials in the numerator anddenominator of (24) are identical and cancel, and (24)becomes a ratio of two exponentials-each a first line of(12).

Finally, these methods can be applied to model (4) forcurves in the plane by using one of the Gaussian modelsdescribed following (13) for the pdf of the data given thequadratic curve parameters.

EXPERIMENTAL RESULTS ON FITTING ELLIPSES TONoisy DATA

Experiments were run using the proposed method forfitting quadratic arcs to noisy data which appeared to beappropriate for approximation by an ellipse.Two phenomena of interest occurred. First, the opti-

mum quadratic curve was on occasion a hyperbola witheach branch fitting a portion of the data. Since no con-straint is imposed to restrict the parameters to those foran ellipse, the fitted curve can be an ellipse, a parabola, ora hyperbola. The parabola essentially never occurs, beinga degenerate ellipse or hyperbola. A hyperbola will some-time result in smaller fitting error than an ellipse-espe-cially if the sampling interval is not uriform along thecurve and the data to be fit are somewhat clustered as inFig. 4. Since a single branch of a hyperbola is a satisfactoryrepresentation for our purposes, but an approximationinvolving two branches is not, the problem is to force thebest fitting quadratic to be an ellipse or single branch ofa hyperbola without giving up the computational advan-tages of the linear regression methodology. Two ap-proaches to this problem have been tried. See [6, pp. 28-29]for a description of the first of these.

1028

COOPER AND YALABIK: STRAIGHT LINES AND QUADRATIC ARCS

126 points

7 points

9 points.

-12

0

0 o

-4

- 8

11 points'

- 2

- 2

-4

-6

Fig. 4.arc.

Two-branch hyperbolic fit to noisy perturbations of an elliptic

A second and generally successful approach to theproblem is the addition of a penalty function to (4) if thebest fitting curve consists of portions of two branches ofa hyperbola or of an exceedingly small ellipse. For cl, c2,

C3 as in (4), the penalty function used was

N(c2 4c1c3) (32)

for (C2 - 4c1c3) > 0 and where N is a positive large con-

stant, large compared with the discriminant function (c2- 4clc3). With (4) augmented by (32), the equations forthe optimal c are still linear. Note that forN large, if c2- 4c1c3 > 0, (32) can impose a large penalty on (4). Hence,the use of this penalty function tends to drive c2 - 4clc3to 0. Unfortunately, this can happen by forcing all of c1, c2,

C3 to be small, and, in fact, this is what happens in practice.That is, an appropriately shaped and sized ellipse seemsto be prevented by use of (32), and instead the best fittingcurve is forced to be a single branch of a hyperbola withthe other branch pushed out toward infinity for large N.This is a satisfactory solution for many purposes. Inpractice, if (32) is required for a set of n data points, theinversion of Bn at that stage cannot be carried out rec-

ursively in terms of Bn1. However, ifN is then held fixedBn1 1, Bn12, etc. can be computed recursively from B-1. Areasonable procedure then is to compute two sequences ofcnS, one with a penalty function and fixed N and one

without penalty function. When the one without settlesdown to that for an elliptic curve fit, the sequence basedon the penalty function can be discontinued.The curves in Fig. 5 illustrate the use of (32) in the se-

quential fit of quadratic curves to data. Data points are

found in sequence starting at the upper y-axis and pro-

ceeding through the second quadrant down through thethird quadrant. The points were generated as Gaussianperpendicular perturbations of an ellipse. The basic se-

quential algorithm (9) fits hyperbolas to the first sets of

All points' -80

Fig. 5. Sequential curve fitting using a penalty function to force sin-gle-branch hyperbolic fits in the initial stages.

6 and 7 data points. Both branches are involved in a fit. Byuse of (32), hyperbolic fits involving only a single branchresult. These single branch curves are shown in the figure.The same N was used in both. A smaller N would havepermitted the branch used to have greater curvature. Be-ginning with the ninth data point, the basic algorithm (9)fit an ellipse. Note that beginning with the flt to seven datapoints, the resulting curves are useful for locating the nextdata point.The second phenomenon requiring comment is the

following. Suppose data in the vicinity of the origin are bestfit by an elliptical arc. Then if the data are translated inthe x and y directions by x' and y', respectively, the el-liptical arc which best fits the new data is not the transla-tion by (x',y') of the elliptical arc fitting the original data.More specifically, the dependence of (6) on p, which is afunction of vu and C, leads us to understand that the bestfitting ellipse to data removed from the origin will havemajor and minor axes rotated and scaled differently thanwould result were the data in the vicinity of the origin. Fig.6 contains overlays of the elliptical arc fits for data at theorigin and for the same data translated by (25,25). Thesolution to the removal of this distortion is to produce arough elliptical curve fit to the data removed from the vi-cinity of the origin, then subtract the coordinates of thecenter of the rough-fit ellipse from the data points to beused in the fitting process, and then proceed as usual tocurve fitting on the translated data. Analogous statementsapply if single branch hyperbolic fits are involved.

PATTERN RECOGNITION

A fundamental approach to pattern recognition is tobegin with a probabilistic model capable of generating thedata. For computational and other reasons, a hierarchical

1029

IEEE TRANSACTIONS ON COMPUTERS, OCTOBER 1976

Fit for center at (25,25)~o AX

Original curve 0 8 33

/1 4- 29

13.o 17 21 29

-I2\ \\° -8 -4 46 o

\D -4 21

Fit for center A\af origin 0 o

0 -8 70

o0

0-12 13

Fig. 6. Overlay hyperbolic fit to data near the origin and elliptic fit tothe same data translated from the origin.

model is useful whereby patterns are decomposed into ahierarchy of subpatterns such that 1) the statistical de-pendence of data points within a subpattern at a level isgreater than is their dependence on data points of anothersubpattern at the same level, and 2) the conditional dis-tributions for the subpictures are functions parameterizedby only a few parameters. (It is useful to view data com-pression in this way [15].) As illustration, a reasonabledecomposition of English capital letters is into strokes. Forexample, the letter "A" could be modeled roughly as in-dependent identically distributed Gaussian perturbationsof the points of three underlying straight lines. Each datapoint would be a Gaussian perturbation of a point on oneof the underlying straight lines. Since the data associatedwith a line then depend on the parameters of the line, adata point associated with a line is statistically more de-pendent on another data point of that line than on mostdata points of any other line.The joint likelihood of w and the letter A might be

specified as follows. Four parameters determine each ofthe three underlying lines, e.g., either two end points or theslope, y intercept, and x components of the line endpointsconstitute satisfactory parameter sets for a line. Let AA bethe 12-parameter vector for the three lines andalAA,a2A,a3A be the parameter vectors for the left linesloping up to the right, the right line sloping down to theright, and the crossbar, respectively. Denote by P(A) thea priori probability of an A, by p (aA IA) the a priori pdffor aA, and by f(w I aA) the likelihood of the data vector wgiven aA. Then define the joint likelihood ofA and w to bemaxA P(A)p(aA IA)f(w j aA,A), which can be further de-composed as

max P(A)p (aA|A)f(wJa1A)f(wIa2A)f(WIa3A), (33)atA

where for each i the components (xi,yi) of w appear in onlyone of the three pdf's f(w I lA), f(w Ia2A), f(w Ia3A), spe-cifically, the pdf which results in (33) being a maximum.(Determination of the appropriate pdf with which to as-

sociate (xi,yi) might be referred to as the data segmenta-tion problem.) For letter recognition 27 likelihood func-tions of the type (33), one for each letter hypothesis, wouldbe computed, and the letter having the maximum likeli-hood would be the recognizer's decision as the letter rep-resented by the data. Thus, this decision making can beviewed as involving hill climbing or as flexible templatematching and includes as special cases ad hoc procedureswhich use only partial descriptions such as that something"is to the right" of something else. As illustration of oneof various results and interpretations which carry overfrom the body of this paper to more complicated likeli-hoods such as (33), use form (12) for the f(w I aA), i = 1,2,3,in (33). Then, e.g., if A were the true hypothesis and thedata segmentation were good, f(w I a 1A)f (W I a 2A)f(W I a 3A)could be decomposed into the product of two exponen-tials-one with an exponent which would be chi squarewith six degrees of freedom and one which would be chisquare with n - 6 degrees of freedom.There are a variety of meaningful and practical forms

for p(aA IA), for example as

P(alA IA)p(a2A 1A,A)p(a3A I a2A,a1A,A).Here, we might choose P(a2A I a1A,A) to be Gaussian ina2A. The mean vector and covariance matrix would befunctions of alA, very likely nonlinear functions. This isessentially a clustering model and the pdf parameterswould be easy to choose by a designer on the basis of ob-serving some A's and or through careful measurement onexamples of many A's. Analogous comments apply top (a3A a 2A,a 1AA)-The fundamental questions of interest in approximately

computing the 27 likelihoods such as (33) are how shouldthe data subset w be chosen and in what order and withhow many iterations should the parameters be adjusted.If the perturbation noise is small, sequential decisionmaking is attractive and has been used in the past in re-lated, but more heuristic, approaches to line drawing rec-ognition. Here straight line and quadratic arcs are se-

1030'

-

COOPER AND YALABIK: STRAIGHT LINES AND QUADRATIC ARCS

quentially fitted and along the way letter hypotheses forwhich the fit to the data processed to that time is undulypoor are eliminated. When computing likelihoods for thepurpose of ruling out letter hypotheses, in our formula-tions, all likelihoods should be determined for the samedata subset. We do not treat the appropriate decisioncriteria for eliminating letter hypotheses here. However,a partial example serves to illustrate a number of impor-tant considerations. Assume the noise perturbations aresmall. The left edge of all capital letters is a straight lineor a curve opening to the right or up. Suppose a data subsethas been found on the left which provides a good leftstraight line fit for hypothesis A and also for a number ofother letter hypotheses such as E, N, D, etc. Then: 1) whereshould the recognizer begin to search for new data-at thetop of the recognized line, at the middle, elsewhere? 2) Ifa search region has been decided upon and data found ina local region, what type of curve should the recognizer tryto fit and seek data for-a quadratic arc, a horizontal line,a line sloping down to the right, etc.? If it is assumed thateach straight line and quadratic arc can be recognizedwith near certainty, then an approximately minimumCPU time recognition procedure can be devised. Assumefor the moment the procedure calls for a search for data atthe top of the left almost vertical line fitted, and if data arefound, then calls for a data search and fitting of the secondline in the A hypothesis. Denote by an' the parametervector determined in the first line-fit under the A hy-pothesis to nI data points and by w2 the data subset foundin fitting the second line under the A hypothesis. Then a2A2is the a2A of (34).

P(A)p (canl A) max p(a2A a A,A)f(W2W

2A). (34)a2A

In practice, the decision to conduct the data search throughmaximization of one or more likelihoods is determined bya number of distributions and fitting costs, e.g., whetherother letters such as N, etc., have conditional pdfsp(a2NIanj'N) which are very close to that for A above.Note that in a case such as this, the same least squaresstraight line fit can be used in the likelihoods under anumber of hypotheses, thus saving considerable compu-tation, and the estimates a2(.) are then fine tuned under theinfluence of their respective pdf's p (a2(.) a1(.),( - )). Thedecisions concerning regions in which to search for dataand the curve type to then search for or test for can be di-rected through a state space approach to the problem.Here, straight lines and quadratic arcs would be quantizedinto classes, e.g., straight lines which are almost vertical,straight lines which slope up to the right, etc. A node of thetree then represents the sequence of straight line andquadratic arc classes fitted or detected up to that stage andalso the information that data may not have been foundin areas searched in. Equivalently, a node represents thelikely possible letters at that stage of the search. Associated

with each node is a command for the recognizer to searchfor data in a certain region and, if successful, to performa specific operation which will consist of curve fitting orcurve type testing. As a result, the system moves to a newstate in the tree. Based on the node transition probabilitiesand the CPU times for the various operations, dynamicprogramming can be used for devising a tree resulting inapproximately minimum CPU time recognition. Koppe-laar [1] has taken this approach, and the resulting tree canbe used as a rough guide for directing sequential recogni-tion when the curve perturbations are large. Backtrackingand other options (i.e., readjustment of previously deter-mined parameter values) must be available in order torecover from stroke misrecognition and in some situationsto improve stroke segmentation. The latter problem mayrequire simultaneous adjustment of parameters for twocontiguous strokes.How many data points should be used in fitting each

curve? Some feeling for this can be had from the sectionon likelihood variability with n and the section on linear-quadratic function hypothesis testing. A good choice foreach curve depends on the contribution of the associateddata to the various letter and data likelihood functions, anda more definitive answer would involve consideration ofthe letter rejection decision rule used.

REMARKS

The parameter computation costs derived for a singlerecursion in least squares curve fitting were shown to beabout six times higher for quadratic arcs than for straightline segments. If noisy data search costs are included, theratio probably will not change much since the straight linefitting cost may increase by up to 100 percent, but thequadratic arc fitting cost may almost double due to thepeculiarities previously noted. Thus, quadratic arc fittingis relatively expensive. However, for the purpose of rec-ognition of highly variable pictures, curve fitting is oftennecessary and quadratic arc fitting may be much moreuseful than piecewise-linear fitting followed by polygonrecognition and manipulation.Our fitting costs could be reduced by exploiting the fact

that the intervals xi - xi-1 used would often be constantand by using truncated variable values and thus perhapsreplacing multiplications and divisions by operations re-quiring less computation time.

APPENDIX

We show here that the test (31) behaves as if the twohypotheses are simple. The Gaussian distributions underthese simple hypotheses have mean vectors (c2x1,c2x2, -

*,c2xn)T and (Zj'C'n,Z'Ccn, .* * ,ZnTCln), respectively, wherethe latter is a least squares linear fit to the former and

cfnT (c'n 'cn)T

1031

*_.5,

IEEE TRANSACTIONS ON COMPUTERS, OCTOBER 19761032

is given byn

-B'1 E ZtC2XF2.i=1

Hence, all the components of the mean value vector for thelatter hypothesis are functions of n. Justification for theconclusion that the test performs as though the hypothesesare simple is as follows. We first show that the linear por-tion of the mean value function of yi in the test statistic(28) does not contribute to the value of (28). Denote by rnthe n vector having ith component ziTBn12;7=,zjTc2xX, andby sn the n vector having ith component c2xF. Because rnis a linear least squares fit to sn, the sum 0,(r,-s ) is0 and the inner product of rn with rn - sn is 0. Since thelinear portion of the mean value of yn can be written as aconstant vector plus a scalar multiple of rn, it follows thatthe inner product of the linear part of the mean value ofyn with rn -sn is 0. Hence, (28) would have exactly thesame value if yn had zero mean value vector under Ho andmean value vector (c2x1, * * *,c2xn)T under H1. Indeed, (28)would have exactly the same value if the mean value of y n

were rn under Ho and (c2x1, * * ,C2Xn)T under H1. But (28)is the SPRT for these hypotheses, thus concluding theproof.

ACKNOWLEDGMENT

D. Cooper would like to thank Professor C. D. J. M.Verhagen and his pattern recognition group and H. Kop-pelaar of the Technische Hogeschool Delft, Delft, TheNetherlands, for many stimulating discussions on therecognition of line drawings; this led to his interest in in-vestigating the relative computation costs of line and curvefitting.

REFERENCES

[1] H. Koppelaar, "Ontwerp Van Optimalisering Van Letterherken-nende Komputerprogramma's," Laboratorium Voor TechnischeNatuurkunde, Technische Hogeschool Delft, Delft, The Nether-lands, Tech. Rep., Mar. 1974.

[2] T. Pavlidis and S. L. Horowitz, "Segmentation of plane curves,"IEEE Trans. Comput., vol. C-23, pp. 860-870, Aug. 1974.

[3] A. Rosenfeld, personal communication concerning his and his stu-dents' research results.

[4] E. 4iseman and R. W. Ehrich, "Contextual word recognitionus*ig binary digrams," IEEE Trans. Comput., vol. C-20, pp.397-13, Apr. 1971.

[5] C. H. Chen, 'On a class of computationally efficient feature selectioncriteria," Pattern Recognition Soc., vol. 7, no. 1, 1975.

[6] D. B. Cooper and N. Yalabik, "On the cost of approximating andrecognizing a noise perturbed straight line or a quadratic curvesegment in the plane," Div. Eng., Brown Univ., Providence, RI,Tech. Rep., NASA NSG5036/1, Mar. 1975.

[7] R. Singer, R. Sea, and K. Housewright, "Derivation and evaluationof improved tracking filters for use in dense multitarget environ-ments," IEEE Trans. Inform. Theory, vol. IT-20, pp. 423-432, July1974.

[8] M. Yoshida and M. Eden, "Handwritten Chinese character recog-nition by an analysis-by-synthesis method," in Proc. First Int. JointConf. Pattern Recognition, Washington, DC, Oct. 30-Nov. 1, 1973,pp. 197-204.

[9] R. Duda and P. Hart, Pattern Classification and Scene Analysis.New York: Wiley, 1973, pp. 328-334.

[10] W. J. Hemmerle, Statistical Computations On A Digital Computer.Waltham, MA: Blaisdel, 1967.

[11] A. E. Albert and L. G. Gardner, Stochastic Approximation andNonlinear Regression, Res. Mono. 42. Cambridge, MA: M.I.T.Press, 1967, pp. 109-112.

[12] IBM System/360 Model 67 Functional Characteristics Manual,2nd ed., File No. S360-01, Form GA27-2719-1, Jan. 1970.

[13] A. F. Mood, Introduction To The Theory Of Statistics. New York:McGraw-Hill, 1950.

[14] K. S. Fu, Sequential Methods In Pattern Recognition And MachineLearning. New York: Academic, 1968, p. 176.

[15] D. B. Cooper, "Feature selection and super data compression forpictures occurring in remote conference and classroom communi-cations," in Conf. Rec. Second Int. Conf. Pattern Recognition,Lyngby, Denmark, Aug. 13-15, 1974, pp. 111-115.

David B. Cooper (S'53-M'64) was born inNew York, NY, on January 12, 1933. He re-ceived the B.S. and M.S. degrees in electricalengineering from the Massachusetts Instituteof Technology, Cambridge, under the indus-trial co-op program in January 1957, and thePh.D. degree in applied mathematics from Co-lumbia University, New York, NY, in June1966.From 1957 to 1966 he worked first for Syl-

vania Electric Products, Inc., MountainView, CA, and then for the Raytheon Company, Waltham, MA. He ispresently an Associate Professor of Engineering at Brown University,Providence, RI, where he has been since 1966. He held an M.I.T. SummerOverseas Fellowship at Marconi Wireless, England, in 1955, and was aVisiting Professor at the Technische Hogeschool Delft, Delft, TheNetherlands, in the 1972-1973 academic year. His research interests arein pattern recognition and system theory.

Dr. Cooper is a member of the Association for Computing Machin-ery.

Nese Yalabik (S'74-M'75) was born in Istan-bul, Turkey, on March 10, 1950. She receivedthe B.S, and M.S. degrees in electrical engi-neering from Middle East Technical Universi-ty, Ankara, Turkey, in 1971 and 1972, respec-tively. She is presently working towards thePh.D. degree at Brown University, Providence,RI. Her research interests are in the areas ofpattern recognition and data compression.