8
Shape in machine vision David C Hogg The representation of shape in machine vision is reviewed with emphasis on the most common types of representation and recent developments. Both planar shape and solid shape are examined with connections and generalizations drawn wherever possible. Particular emphasis is placed on the importance of invariant descriptions and on the representation of shape classes. Keywords: shape, model, representation Representations of shape are crucial to the operation of many machine vision systems. This review covers the most widely used types of representation and several promising recent innovations. It is not in any way exhaustive, but hopefully gives a balanced insight into previous achievements and current directions for this fascinating and important area. The derivation and processing of shape representa- tions are discussed only where they serve to motivate some aspect of a particular representational scheme. The origins and some applications of particular schemes are referenced in the text. The review is principally concerned with shape in machine vision involving standard light-based imaging devices. Other forms of imaging are considered where these are relevant to the aims of this special issue. In particular, there are references to types of representa- tion devised for use with active range finders (e.g. based on lasers) and medical imaging devices (e.g. magnetic resonance imaging). A more detailed review on the representation of shape in the plane has been undertaken by Marshall’. Bes12 presents an elegant review of geometric model- ling and matching in machine vision. Koenderink’s” recent book on solid shape contains a very readable and expansive introduction to the role of differential geometry in the representation of shape. The collection edited by Mundy and Zisserman4 contains papers on a broad cross-section of contemporary research on geometric invariance in vision, together with a clear and concise introduction to invariance and projective geometry. School of Computer Studies, University of Leeds, Leeds LS2 9JT, UK Paper received: I I September 1992; revised paper received: I Februaq I993 Two kinds of shape arise naturally in machine vision: the solid shape of 3D objects in space, and the planar shape of 2D objects embedded in the plane. Both kinds are examined in this review. There is clearly a close relationship between the two kinds of shape not least because planar shapes arise through the projection of solid shapes onto an imaging surface. This relationship extends to the types of representation used for either kind of shape since there exist underlying representational schemes that may be specialized either for planar shape or for solid shape. Of course, planar shapes arise in ways other than through projection. For example, flat configurations in space such as the faces of a polyhedron may be treated as planar shapes within the context of the plane in which they are embedded. Entities for which shape representations are needed in machine vision are not confined to entire planar or solid objects, but also include fragments of surface (in 3D) or boundary (in 2D), surface markings (e.g. a tee- shirt logo), and identifiable surface contours (e.g. someone’s jaw line’). The selection or design of a representational scheme depends crucially on the purpose for which it is intended; a certain type of representation may be ideal for one application yet next to useless for another. Possible purposes include: modelling familiar shapes for recognizing objects; modelling surrounding terrain and obstacles for planning the path of a mobile vehicle; modelling graspable objects for controlling the fingers of a robot gripper; modelling terrain for the generation of graphical reconstructions in a visualization system; modelling moving jointed objects (e.g. a person); modelling evolving systems of clouds; modelling machined parts for industrial inspection, using only the information contained in a CAD model; modelling a set of shapes to provide a quantitative measure of similarity. To illustrate the different needs, compare the modelling of terrain for path planning as opposed to graphical reconstruction. For planning the path of a mobile vehicle it may be desirable to model the shape of each obstacle relatively coarsely - sufficient to ensure that the vehicle path avoids collision whilst minimizing computational effort. In contrast, for com- puter graphics a detailed representation of the shape will be necessary to ensure pictorial reconstructions appear realistic to the viewer as the simulated view- point changes. In this case the computational effort will 0262-8856/93/060309-08 0 1993 Butterworth-Heinemann Ltd vol I1 no 6 julylaugust 1993 309

Shape in machine vision

Embed Size (px)

Citation preview

Page 1: Shape in machine vision

Shape in machine vision

David C Hogg

The representation of shape in machine vision is reviewed with emphasis on the most common types of representation and recent developments. Both planar shape and solid shape are examined with connections and generalizations drawn wherever possible. Particular emphasis is placed on the importance of invariant descriptions and on the representation of shape classes.

Keywords: shape, model, representation

Representations of shape are crucial to the operation of many machine vision systems. This review covers the most widely used types of representation and several promising recent innovations. It is not in any way exhaustive, but hopefully gives a balanced insight into previous achievements and current directions for this fascinating and important area.

The derivation and processing of shape representa- tions are discussed only where they serve to motivate some aspect of a particular representational scheme. The origins and some applications of particular schemes are referenced in the text.

The review is principally concerned with shape in machine vision involving standard light-based imaging devices. Other forms of imaging are considered where these are relevant to the aims of this special issue. In particular, there are references to types of representa- tion devised for use with active range finders (e.g. based on lasers) and medical imaging devices (e.g. magnetic resonance imaging).

A more detailed review on the representation of shape in the plane has been undertaken by Marshall’. Bes12 presents an elegant review of geometric model- ling and matching in machine vision. Koenderink’s” recent book on solid shape contains a very readable and expansive introduction to the role of differential geometry in the representation of shape. The collection edited by Mundy and Zisserman4 contains papers on a broad cross-section of contemporary research on geometric invariance in vision, together with a clear and concise introduction to invariance and projective geometry.

School of Computer Studies, University of Leeds, Leeds LS2 9JT, UK

Paper received: I I September 1992; revised paper received: I Februaq I993

Two kinds of shape arise naturally in machine vision: the solid shape of 3D objects in space, and the planar shape of 2D objects embedded in the plane. Both kinds are examined in this review.

There is clearly a close relationship between the two kinds of shape not least because planar shapes arise through the projection of solid shapes onto an imaging surface. This relationship extends to the types of representation used for either kind of shape since there exist underlying representational schemes that may be specialized either for planar shape or for solid shape. Of course, planar shapes arise in ways other than through projection. For example, flat configurations in space such as the faces of a polyhedron may be treated as planar shapes within the context of the plane in which they are embedded.

Entities for which shape representations are needed in machine vision are not confined to entire planar or solid objects, but also include fragments of surface (in 3D) or boundary (in 2D), surface markings (e.g. a tee- shirt logo), and identifiable surface contours (e.g. someone’s jaw line’).

The selection or design of a representational scheme depends crucially on the purpose for which it is intended; a certain type of representation may be ideal for one application yet next to useless for another. Possible purposes include: modelling familiar shapes for recognizing objects; modelling surrounding terrain and obstacles for planning the path of a mobile vehicle; modelling graspable objects for controlling the fingers of a robot gripper; modelling terrain for the generation of graphical reconstructions in a visualization system; modelling moving jointed objects (e.g. a person); modelling evolving systems of clouds; modelling machined parts for industrial inspection, using only the information contained in a CAD model; modelling a set of shapes to provide a quantitative measure of similarity.

To illustrate the different needs, compare the modelling of terrain for path planning as opposed to graphical reconstruction. For planning the path of a mobile vehicle it may be desirable to model the shape of each obstacle relatively coarsely - sufficient to ensure that the vehicle path avoids collision whilst minimizing computational effort. In contrast, for com- puter graphics a detailed representation of the shape will be necessary to ensure pictorial reconstructions appear realistic to the viewer as the simulated view- point changes. In this case the computational effort will

0262-8856/93/060309-08 0 1993 Butterworth-Heinemann Ltd

vol I1 no 6 julylaugust 1993 309

Page 2: Shape in machine vision

be great by necessity, possibly requiring the use of special purpose graphics engines.

The representational schemes devised to meet these purposes are varied and numerous. Many schemes are designed to ‘approximate’ the geometry of objects using modelling techniques developed mainly for com- puter graphics and CAD applications where the aim is to produce models with sufficient detail to support rendering and manufacturing. Other schemes are not aimed at approximating objects but rather make explicit salient features relevant to a particular task. Whilst such features may be derivable from approxima- tions, the point is to make them explicit and therefore easily accessible, and the resulting representations concise and compact. A typical example is the use of point sets representing significant and easily identi- fiable landmarks on shapes. This and other non-approximation schemes are examined below.

GEOMETRIC ENTITIES

Essential tools for the representation of shape are the various techniques for approximating plane curves, space curves, regions, surfaces, and volumes using geometric primitives. Besl* identifies five different forms of geometric primitive together with a natural extension to spatio-temporal entities in four dimen- sions. We review briefly the three major forms as they apply to primitives in 2D and 3D (omitting the solution and graph forms).

In the pffrurnet~~c form the points of entities are expressed in terms of functions of one or more ‘material’ co-ordinates over a given domain as follows:

Plane curve: P(U) =(X(U), y(u)); z4 E [0, l]

Space curve: P(u) = (x(u), y(u), z(u)); c4 E [0, l]

Region (in plane): P(u, v) = (x(u, v), y(u, v));

u, v E lo, 11

Surface (in space): P(u, v) = (x(u, v), y(u, v),

z (u, v)); u, v E [O, 11 Volume: P(u, v, w) = (x(u, v, w), y(u, v, w),

z(u, v, w)); u, v, w E [O, 11

The choice of the unit interval as the domain is usual but not essential.

To illustrate, the parametric form Pcyt (u, v) = (u, sin (27rv), cos (27~~)); U, v E [0, l] defines a cylinder of unit length and unit radius with its centre lying along the x-axis. Pentland proposes constructing models from compositions of superquadrics - a simple geometric entity with the parametric form:

P(u, v) = ((cm u)~! (cos v)‘*, (cosu)ci (sinv)‘z, (sin u) ‘I)

The basic shape is determined by the choice of values for el, g2 and can range from a simple sphere through to an arbitrarily good approximation to a cube. Pentland introduces additional flexibility by incorporat- ing simple spatial deformations of primitives prior to composition (see below).

The beauty of parametric forms is in dealing with curves, surfaces and volumes accurately and with a concise formulation. For simpte shapes a correspond-

ingly simple smooth parametric function may be suffi- cient. More complicated primitive shapes can be approximated using high-order polynomials, but at the cost of introducing possibly unwanted local variations in shape. Cleaner approximations are obtained using splines - piecewise smooth functions normally com- posed from low-order polynomial pieces connected together at almost smooth joins. Two of the most common types of spline are formed from linear and cubic polynomials (e.g. bilinear and bicubic-B-spline surfaces). The use of splines in geometric modelling is widespread in computer aided design and computer graphics, and to a lesser extent they have also been used in machine vision (e.g. York et ~1.~; Terzopoulos and Metaxass).

In the implicit form, points of an entity are just the solutions of equahty or inequality constraints:

Plane curve: f(x, y) = 0

Region: f(x, y) < 0

Surface: f(x, y, 2) = 0

Volume: f(x, y, 2) d 0

For example, the implicit entity 3x2 + 4y2 - 110 repre- sents an elliptical region in the plane and 3x2+ 4y2 + 2z2 - 1 = 0 an ellipsoidal surface in space.

In the digital form, a shape is represented as a collection of discrete points sampled at a given resolu- tion -the finer the resolution the better the approxima- tion. Such models are common as intermediate repre- sentations for segmented ptanar and solid objects. Normally, sample points are chosen corresponding to the ceils in a regular tessellation of the plane or space through which the entity passes. This type of model has been likened, to modelling with sugar cubes3 or lego bricks. A compact way of representing a digital model is as a 3D Boolean spatial occupancy array (2D array for planar shapes) in which cells containing matter (a sugar cube or lego brick) contain 1 and empty cells contain 0 - such arrays are normally held in coded form (e.g. as octrees (31)) or quadtrees (2D)33.

Composition of primitives

In practice, the range of entities that can be approxi- mated with individual primitives is limited. This is pa~icularly true where for practical reasons the complexity of primitives is deliberately restricted. Compositions of primitives are used to enlarge the range of entities that can be approximated. Besl* identifies three kinds of composition prevalent in machine vision:

l Boolean composition defines new entities by com- bining pairs of entities (starting with individual geometric primitives) in one of three ways. The most common way is to simply take the union of two entities. This leads to the construction of shape models through ‘gluing’ together primitive shape components, and is a familiar process in activites as diverse as child’s play, the building industry, and molecular biology. Typically, spatial models are composed from a small set of primitives such as spheres, cubes and cyhnders. For example,

310 image and vision computing

Page 3: Shape in machine vision

O’Rourke et al. ‘(’ use a model for a person con- structed from a union of spheres to locate the hands and feet of a moving person from images.

The two other types of Boolean composition are the intersection of two entities and the subtraction of one entity from another.

Boolean composition of volume primitives. is known as Constructive Solid Geometry (CSG), and is widely used in CAD systems. Its full use in machine vision is associated with manufacturing applications in which it is important to integrate with the design-to-product pipeline. Boundary composition defines new entities from the union of entities that jointly define their boundaries. One of the most widely used kinds of representation for volume - the polyhedron - may be defined in terms of the union of polygonal planar surface elements (called facets) that define its boundary in 3D. Similarly, each polygonal facet is defined by its bounding line segments embedded within the plane, and finally each line segment by the endpoints (the original vertices of the polyhedron) embedded in the line. In general, polyhedra are conveniently repre- sented by the co-ordinate positions of the vertices and the ordered list of vertices from which each polygonal facet is composed. A convex polyhedron can also be defined in terms of the intersection of the half-spaces delimited by planes containing each of the faces.

Patchworks of polygonal surface facets are com- monly used for representing surfaces in their own right and not simply as the closed boundaries of polyhedra. They are popular not only for their simplicity but also for their ability to approximate surfaces to arbitrary precision. The ‘triangulated surface’ involving only triangular facets is a special case of the polygonal surface, and has the practical advantage that all vertices on each face are guaran- teed to be coplanar. Sweep composition defines a new entity from a 1D space curve along which is swept a 2D closed curve. The geometric entity created is known as the generalized cylinder (so called because an ordinary cylinder can be constructed in this manner by sweeping a circle along a line segment).

OBJECT CENTRED COORDINATES

A desirable and sometimes essential requirement in shape modelling is invariance to changes in the conditions under which images are acquired - in particular to changes in viewpoint, illumination level, occlusion, and imaging noise. In other words, represen- tations constructed for the same shape should be identical no matter how raw data on the shape are acquired. Closely related to this is the need for broadly similar shapes to have similar representations. Marr” called these two criteria uniqueness and stability. Both are important when shape descriptions must be com- pared with one another in, for example, recognizing objects or analysing the differences and similarities between objects.

Invariance is by no means always necessary. For

P2 (lea

/ Pl

@I3

\ (0.1)

Figure 1. Construction of a coordinate system invariant to similarity transformations

example, invariance across different viewpoints may not matter in the construction of shape models for the purpose of planning the path of an autonomous vehicle12.

Of fundamental importance in the construction of many kinds of shape description is the choice of a coordinate system within which to specify the structural elements. Clearly, to be independent of viewpoint, any coordinate system must be intrinsic to the shape

It is worth pointing out that a coordinate system is not always necessary. For example, ‘four spheres of equal size and in mutual contact’ is a geometrically precise shape description without reference to a coor- dinate system (this description could, of course, be couched in formal terms instead of natural language).

Several methods exist for the construction of coor- dinate systems invariant to the most common kinds of transformation. All of the constructions that follow depend on being able to identify landmark points {xi} on a shape, a task that is often difficult in itself.

For planar shapes, a coordinate system that is invariant to similarity transformations of the plane (i.e. rotation, translation and uniform scaling) may be obtained as follows’“,‘4. Choose two points P,, Pz. Construct a coordinate frame with origin at P,, and basis vectors P2 P2 and the equal length vector orthogo- nal to this (see Figure 1). No matter where the shape is placed, oriented or scaled, all points will have the same coordinates in terms of this basis.

Being based on just two landmark points, this method is sensitive to variations in the relative posi- tions of these points with respect to others. Where errors in position are expected or when attempting to assign local coordinates to compare similar shapes, alternative constructions taking all points into account may be preferable. One such method” starts by constructing a coordinate system invariant to trans- lation and scaling by choosing:

l the origin at the mean landmark position: XX, = 0 l the scale so that: C 1 xi 1’ = 1

Finally, in the so-called Procrustean method” a Carte- sian coordinate system is obtained that minimizes the sum of the squares of the Euclidean distance from a standard template of landmark positions {x:} (i.e. Clx:-x,12a minimum).

For planar shapes that are perspective projections of (distant) flat solid shapes, a coordinate frame invariant to rigid motion (including 3D rotation) can be obtained through modelling both projection and rigid motion as an affine transformation of the plane (planar rotation, translation, scaling and shearing) - the so-called affine approximation to perspective projection. A coordinate frame invariant to affine transformation is constructed as follows”. Choose three non-collinear points

vol I I no 6 julyiaugust I993 311

Page 4: Shape in machine vision

p,&m/m’~ ,

Figure 2. Construction of a coordinate system invariant to affine transformations

PI, P2, P3. Construct a coordinate frame with origin at PI and basis vectors mp2,m (see Figure 2). Rigid motion in space leaves the coordinates in this frame of reference for all points unchanged. This construction is the basis of the geometric hashing method for object recognitioni7.

An obvious way to deal with non-flat objects that are free to rotate in space is to work with solid shape models directly. A coordinate system invariant to similarity transformations of space is easily constructed from three points. A coordinate system invariant to arbitrary affine transformations of space can be con- structed from four non-coplanar points. Unfortunately, to apply these constructions directly requires the 3D positions of points with respect to some initial co- ordinate system - this can be difficult to achieve reliably usin

Faugeras’ ! passive vision. has recently proposed an alternative way

of tackling this problem which avoids recovering the 3D positions of points directly, but in return delivers an intrinsic (projective) coordinate system that is unique but related to any world coordinate system by an unknown projective transformation. All that is neces- sary is the positions of at least five pairs of correspond- ing points in two views from uncalibrated perspective cameras. A basis for the intrinsic (projective) co- ordinate system is chosen so that the five unknown points have predetermined homogeneous coordinate positions:

Pi = (1, 0, 0, (0, P2 = (0, 1, 0, O),

p, = (O,O, 1, O), P4 = (O,O,O, I), p5 = (I, 1, 1, 1)

Now, given at least three other pairs of corresponding points, their position within the intrinsic coordinate system can be derived. This is an interesting idea since the unknown transformation linking the constructed coordinate system to a world coordinate system could, for example, map points on an ellipsoid (perhaps representing a human head) onto points on a two sheet hyperboloid - the supposition is that this may not matter for many machine vision tasks since the impor- tant thing is to obtain an invariant representation in some coordinate system whether or not the link to world coordinates is known.

SHAPE MEASURES

One of the most succinct kinds of representation for the shape of an object is a small collection of selected measurements. Such measurements may, for example, be sufficient to recognize a given object, although caution is needed since many different shapes may share the same measurements. Possible measures include:

l compactness of a solid shape: volume/surface:

Figure 3. Cross-ratio is computed from four collinear points

l compactness of a planar shape: area/perimeter* l elongation of a planar or solid shape:

length/width (length and width suitably defined) l genus (i.e. number of holes)

To be most useful, measures should be invariant to rigid motions and uniform scaling in the plane (for planar shapes) and in space (for solid shapes). In practice, such measures as those above are rarely used for solid shapes since their values are difficult to obtain without multiple views of an object and active range finding devices. In contrast, planar shape measures have been widely used to recognize objects.

Unfortunately, where a planar shape is the silhouette of a solid shape (i.e. resulting from projection), most of the known measures (including all of those above) depend upon the direction from which an object is viewed - this is hardly surprising since the planar silhouette of an object changes shape as the object rotates in space. The usual way around this problem, without abandoning planar shape measures, is to treat essentially different views of an object as if they were distinct objects with associated characteristic values for the planar shape measures. Ideally, what is needed are planar shape measures that are invariant to viewing direction. Fortunately, such invariant measures do exist and have recently come to the fore in machine vision.

The basic planar measure invariant to the viewing transformation is the cross-ratio of four collinear points (Figure 3). This is defined as:

The cross-ratio has the property that it has the same value for any perspective projection of the points, and indeed for the points themselves in space. The dual relationship of points and lines in projective geometry means there is also an equivalent definition of the cross-ratio for a pencil of four coplanar lines (i.e. four lines intersecting at a point) - see Mundy and Zisserman4 for more on this. For the purpose of machine vision, useful invariants to perspective projec- tion can be constructed from the cross-ratio. For example, five coplanar lines (see Figure 4) have an invariant constructed by intersecting one of the lines

Figure 4. Constructing an invariant of five coplanar lines

312 image and vision computing

Page 5: Shape in machine vision

(Lo) with the other four and taking the cross-ratio of the four points of intersection. This invariant can be used to characterize and then recognize approximately planar objects from their projected planar shape provided five coplanar lines can be reliably associated with the shape. This is straightfo~ard if, for example, the shape has a stable polyhedral representation with some faces delimited by five or more edges.

Thus, where the necessary coplanar features are available, the cross-ratio provides an invariant descrip- tion of shape. Amongst other applications, such descriptions have been shown to be especially useful in recognizing objects from arbitrary viewpoints’“.

LANDMARKS

Point landmarks are widely used as representations for planar and solid shapes. They arise naturally from several sources, including, for example, structure-from- motion methods where they derive from the joint displacement of point features in images (e.g. corners). In some applications they serve as intermediate repre- sentations to be fleshed out by curve or surface approximations”, but are also used directly for comparing and recognizing shapes’“. In the latter case, it is crucial that points be identified with the same landmarks on similar shapes. For example, the human face is usefully characterized by landmark points such as the locations of the left and right extremities of the eyes and mouth - these landmarks may be points in images (2D) or in range images (3D). Such charac- terizations have been used as the basis for discrimina- ting between different faces using standard feature space classification methods. A recent novel use for landmark models has been in deforming pictures of faces to a standard landmark shape using a planar transformation obtained by interpolation between landmarks”.

DIFFERENTIAL PROPERTIES

Differential properties of surfaces and curves play an important role in making visual shape characteristics explicit. Koenderink” proposes a succinct characteriza- tion for the shape of all possible local surface patches (except for planar points) in a single shape index SE i-1, + 11 composed from the principal curvature K1, Kz:

S = - z arctan K1 fK2

T iK,-‘d

In the (K,, K~) plane, this is the angle the line through the origin and the curvature tuple makes with the line K1 = -K2, measured so that angle increases in the direction of the positive quadrant. As S varies from - 1 to f 1, the patch shapes are at first (SC -0.5) concave elliptic (ki, k2<0). then (-O.%S< 10.5) hyperbolic (k, k2 CO), and finally (S> +OS) convex elliptic (k,, k$-0), passing through parabolic (k, = 0 or k2 = 0) at each transition (S = riz 0.5). At S= 0 the principal curvatures have opposite sign and equal magnitude.

Grouping together points with the same local shape is an intuitively plausible way to segment surfaces into parts . *’ Koenderink’s shape index could be used in doing this although standard shape categories used

I I Parabolic

Figure 5. Elliptic, hyperbolic and parabolic regions of the surface of a light bulb

directly give a plausible segmentation. Figure 5 illustrates the classic example of a light-bulb divided into distinct elliptic, hyperbolic and parabolic regions (omitting parabolic curve between elliptic and hyper- bolic regions).

Certain types of identifiable surface point (e.g. parabolic points) are in general* organised as curves on the surface. Such loci treated as space curves are in themselves useful as representations of surface shape. Of course, space curves also arise in other ways (e.g. as the intersection of two surfaces) and carry their own differential structure. Kishon et aL2” and Gueziec and Ayache’ use curvature and torsion values com- puted along space curves as a means for matching space curves derived from 3D medical images against a database of stored curves. To obtain reliable estimates of these differential quantities they first approximate discrete pointwise representations of curves with B-Splines.

SHAPE CLASSES

One of the most challenging problems in machine vision is the representation of shape classes (e.g. the generic shape of the human body). There are at least two important reasons for wanting to do this. The first is to be able to recognize that an individual belongs to a given class. The second reason is to provide concise descriptions of individuals in terms of a configuration space particular to a given class of objects. Such descriptions can be used to recognize more specific classes of object or configuration, for example that a person is in fact Mary Brown or is standing on one leg. They may also provide a quantifiable means for labelling and then classifying objects (e.g. fossils).

Where variations in shape within a class are small, a simple way to avoid modelling these explicitly is to incorporate tolerance to minor perturbations into the algorithms operating on shape descriptions. For example, small variations in the positions of vertices in a polyhedral model may be accommodated by ignoring minor apparent errors in the retative positions of edge junctions within an image. This approach is adopted in many machine vision systems.

One of the earliest and most widely used ways of representing shape variation within a class is simply in terms of identified regions within a feature space,

*Figure 5 depicts an exception to this generic situation for parabolic points

vol I I no 6 ~uly~aug~t 1993 313

Page 6: Shape in machine vision

defined in various ways (e.g. using nearest neighbour methods24). The crucial element in such methods is the choice of features and although many of the techniques below can be interpreted in terms of feature spaces it is the shape dimensions of these spaces that are of special interest for this review.

Space warping

In 1917 D’Arcy Thomson observed that cross-sections through different living creatures could be mapped onto one another using warping transformations of the plane, developing an earlier idea of Durer. This same idea has been exploited in machine vision to produce generic shape models for object classes. Thus, a class of shapes is defined in terms of a prototype shape together with a set of spatial transformations that deform the prototype into any desired member of the shape class.

Roberts*’ pioneered this approach with scene models that were compositions of polyhedral shape primitives individually transformed by arbitrary projec- tive transformations - combining shape deformations with similarity transformations and perspective projec- tion. The task of the vision system was to instantiate the correct primitives and to recover the parameters of the unknown projective transformations. Thus, his system could build representations for scenes that were com- positions of (projectively) deformed shape prototypes.

Recently, Sparr 26 has shown that the depths of coplanar points can be recovered if their collective affine shape is known - this is equivalent to knowing that they are affine transformations of a known point set (i.e. a prototype). The method works from a single uncalibrated view and returns depths up to a global scale factor. For example, in a scene containing polyhedral objects with polygonal faces some of which are affine deformations of a unit square, the relative depth of the vertices on these deformed square faces can be estimated without knowledge of the intrinsic camera parameters.

Parameterized models

The most commonly used technique for representing classes of shape is to introduce degrees of freedom directly into the structure of a ‘rigid’ model. With each degree of freedom is associated a variable, and the set of all joint substitutions for such variables defines a configuration space in which an instance of the object class is represented by a single point. Typicaly, variables have numerical values, but Boolean and other types have also been used.

Perhaps the most well known system of this type is ACRONYM developed by Brooks*‘. ACRONYM used 3D models composed of agglomerations of generalized cylinders with variables substituted in place of dimensions, displacement distances, rotational angles and scale factors. It is also possible to vary the number of components, for example to allow four or six flanges on a machine part. In addition, variables are constrained by sets of inequalities involving simple functions of the variables themselves (including trigo- nometric functions). Joint substitutions satisfying these constraints together form a sub-region of configuration space - the satisfying set - within which objects of a

particular class must lie. Such satisfying sets may themselves have a non-trivial ‘shape’ in the configura- tion space. Thus, the ratio of the length and height of a machine part might be controlled by the following constraint:

1 < plengthlpheight < 2

One of the most notable features of ACRONYM is its additional mechanism for dealing with class hierarchies through partitioning inequality constraints across the nodes of a directed graph - the restriction graph. Those constraints at the root of the restriction graph define the most general object class. Any other node of the graph defines a (possibly) narrower class of objects through the combination of all constraints on the unique path from that node to the root. Additional shape information acquired during image interpreta- tion is represented as additional bundles of constraints held at new nodes tacked onto the restriction graph. ACRONYM unifies the notions of class and instance by representing objects as they are finally recognized as new nodes with (hopefully) tightly bound satisfying sets - but typicaly not a single point in configuration space.

Whilst ACRONYM deals with many shape primi- tives connected to form complicated shapes, useful practical systems have been constructed with just a single parameterized shape primitive. For example, Lipson et aL2’ use a single parameterized ellipse with major and minor dimensions free to vary across a fixed domain (in addition to rigid motions) as a generic model of the appearance of the vertebral trabecular bone in CT images.

In addition to representing the variations in shape within broad object categories, parameterized shape models can also be used to ca ture the possible shapes of non-rigid objects. Hogg 29 used a parameterized model composed of jointed cylinders to represent explicitly the set of shapes into which the human body could be articulated during the action of walking.

Shape spaces from landmarks

We have already noted the role of landmark models in encoding the shape of objects. However, they also provide a simple and powerful way to characterize variations in shape and shape classes2’. The essential idea is to establish an intrinsic coordinate system and to then compare shapes or characterize a class of shapes in terms of the joint position of all landmark points within this coordinate system. Thus the dimension of the shape space for a landmark model with n points is 2n (in 2D) and 3n (in 3D) (note: in fact, dimension is one less than this for Kendall’s method since his shape space is on the unit hypersphere).

In practice, only subsets of shape space will corres- pond to the class of shapes under study since each of the shape spaces gives unlimited flexibility in the shapes of objects. This subset could, for example, be defined formally using any of the standard feature space based classification techniques.

An important issue is to capture any reduction in the underlying dimensions of the defining subset against that of the overall shape space. Recently, Cootes et aL3’ have tackled this problem by identifying principal

314 image and vision computing

Page 7: Shape in machine vision

components in the shape space from training sets of shapes. The first few principal components serve as a basis for as sub-space of shape space that is tractable, and in many cases accurately characterize the most significant variations in shape. Initial alignment of the training set is achieved by applying the Procrustean method iteratively, using the first shape in the training set as an initial reference template and updating the template after each iteration to a normalized mean template computed from the training set transformed according to the result of the previous Procrustean iteration. The result of this approach is a generic shape model with a small number of shape parameters controlling the major modes of shape variation.

Dynamical models

In constructing physical models of objects the appro- priate choice of materials is clearly important since each can be fashioned into a limited range of shapes. A rubber sheet, for example, can be stretched and easily folded, whereas a metal sheet resists stretching and bending unless subjected to severe forces (e.g. bashing). In view of this, a given piece of material implicitly defines a class of shapes - those shapes into which it can be moulded under the action of constrained forces.

Such shape classes can be actualized in machine vision by simulating the action of a given range of forces on parameterized models with appropriate material properties and boundary conditions. Members of the shape class are just the possible configurations of the parameterized model in equilibrium. In practice, the applied forces serve to mould the model according to fragmentary shape information obtained directly from images (e.g. intensity edges).

The ‘snake’ model of Kass et aL3’ is a one- dimensional flexible curve segment with internal elas- ticity properties to ensure that it bends smoothly. This model has been widely used for the extraction of contours from images.

Of course, there is no need to simulate only everyday properties for deformable models. In the interpretation of images depicting radially symmetric but bent objects (e.g. a squash) Terzopoulos et ~1.~~ use a generalized cylinder with an ‘elastic’ spine and tube linked by forces tending to maintain radial symmetry of the tube about the spine.

Dynamical simulations are not limited to flexible ‘free-form’ shape primitives such as snakes or elastic sheets. They can also be applied with parameterized models having only a few global parameters. For example, Lipson et af.*’ conduct simulations with the parameterized ellipse primitive mentioned earlier. The generic model is adjusted to fit a particular bone image by hill climbing on an energy surface obtained by filtering the image to locate intensity edges.

For the purpose of constructing shape descriptions of objects with moderately stable and known global shapes, deformable free-form primitives suffer from being overflexible - they do not sufficiently constrain the range of possible shapes that could be present in a scene. On the other hand, it is sometimes important to adapt to local deviations from an idealized shape - something that primitives with only global shape

~011 I no 6 julylaugust I993

parameters fail to do. In an attempt to obtain the best of both worlds, Terzopoulos and Metaxass propose using primitives with global shape parameters (e.g. superquadrics) incorporating local surface deforma- tions represented by flexible spline displacement func- tions. The simulation is over both global and local parameters which ensures that the overall shape is constrained without sacrificing local accuracy.

CONCLUSION

Perhaps more than any other aspect of machine vision, the need to represent shape is shared with many other disciplines, and whilst machine vision has special needs of a representational scheme it is clearly important to keep abreast of developments in these parallel areas. The study of shape in the fields of neurophysiology, psychology, geometry, computer aided design, anima- tion, simulation and molecular modelling are all relevant to machine vision, and many of the ideas covered in this review have been borrowed and adapted from these or other areas.

The review has examined some well established techniques and a number of new developments. Certain trends are evident, notably in the exploitation of geometric invariants and in an increasing emphasis on the representation of shape classes. In addition to these topics, important issues for further work include: the broad problem of learning about shape automatic- ally, the representation of natural forms such as trees and clouds, and the representation of shape at different scales.

REFERENCES

Marshall, S ‘Review of shape coding techniques’, Image & Vision Comput., Vol 7 No 4 (1989) pp 281-294 Besl, P J ‘Geometric modelling and computer vision’, Proc. IEEE, Vol 76 No 8 (1988) pp 936- 958 Koenderink, J J Solid Shape, MIT Press, Cambridge, MA (1990) Mundy, J L and Zisserman, A ‘Projective geo- metry for machine vision’, in Geometric Invariance in Computer Vision, MIT Press, Cambridge, MA (1992) Gueziec, A and Ayache, N ‘Smoothing and Match- ing of 3-D space curves’, Comput. Vision - ECCV’92, Springer-Verlag, London (1992) pp 620-629 Pentland, A P ‘Perceptual organisation and the representation of natural form’, Artif. Intell., Vol 28 No 3 (1981) pp 293-331 York, B W, Hanson, A R and Riseman, E M ‘3D object representation and matching with B-splines and surface patches’, Proc. 7th Int. Joint Conf. on Artif. Zntelf. (1981) pp 648-651 Terzopoulos, D and Metaxas, D ‘Dynamic 3D models with local and global deformations: deformable superquadrics’, IEEE Trans. PAMI, Vol 13 No 7 (July 1991) Badler, N I, O’Rourke, J and Toltzis, H ‘A spherical representation of a human body for visualising movement’, Proc IEEE, Vol 67 No 10 (1979) pp 1397-1403

315

Page 8: Shape in machine vision

10

11

12

13

14

15

16

17

18

19

20

21

22

O’Rourke, J and Badler, N I ‘Model-based image analysis of human motion using constraint propagation’, IEEE Tram PAMI, Vol 2 No 6 (1980) 522-536 Mar-r, D Vision, WH Freeman, San Francisco (1982) Charnley, D and Blissett, R ‘Surface reconstruc- tion from outdoor image sequences’, Image & Vision Cornput., Vol 7 No 1 (1989) Bookstein, F L ‘Size and shape spaces for landmark data in two dimensions’, Statist. Sciences, Vol 1 (1986) pp 181-242 Bookstein, F L Morphometric Tools for Landmark Data, Cambridge University Press, UK (1991) Kendall, D G ‘Shape manifolds, procrustean metrics and complex projective spaces’, Bull. London Math. Sot., Vol 16 (1984) pp 81-121 Goodall, C R ‘Procrustes methods in the statistical analysis of shape’, J. Roy. Statist. Sot., Vol B53 (1991) pp 285-339 Wolfson, H J ‘Model-based object recognition by geometric hashing’, Comput. Vision - ECCV90, Springer-Verlag (1990) pp 526-536 Faugeras, 0 D ‘What can be seen in three dimensions with an uncalibrated stereo rig’, Comput. Vision - ECCV’92 (ed. G Sandini), Springer-Verlag, London (1992) Rothwell, C A, Zisserman, A, Forsyth, D A, and Mundy, J L ‘Using projective invariants for con- stant time library indexing in model-based vision’, Br. Machine Vision Conf , (ed. P Mowforth), Springer-Verlag, London (1991) Kent, J T and Mardia, K V ‘Statistical shape methodology in image analysis’, Shape and Pic- tures - NATO workshop, Driebergen, Germany (1992) Craw, I and Cameron P ‘Parameterised images for recognition and reconstruction’, Br. Machine Vision Conf., Glasgow, UK (1991) pp 367-370 Hoffman, D and Richards, W ‘Parts of Recogni- tion’, Cognition, Vol 18 (1985) pp 65-96

23

24

25

26

27

28

29

30

31

32

33

Kishon, E, Hastie, T and Wolfson, H 30 curve matching using splines, Technical report, AT&T (1989) Dasarathy, B V (ed.) Nearest Neighbor Pattern Classification Techniques, IEEE Computer Society Press, WA (1991) Roberts, L G ‘Machine perception of three- dimensional solids’, in Tippett, J T. et al. (eds.), Optical and Electra-Optical Information Proces- sing, MIT Press, Cambridge, MA (1965) pp 159- 197 Sparr, G ‘Depth computations from polyhedral images’, Comput. Vision - ECCV’92, (ed. G Sandini), Springer-Verlag, Berlin (1992) pp 378- 386 Brooks, R A ‘Symbolic reasoning among 3-D models and 2-D images’, Artif. Intell., Vol 17 (1981) pp 285- 348 Lipson, P, Yuille, A L, O’Keefe, D, Cavanaugh, J, Taeffe, J and Rosenthal, D ‘Deformable Templates for Feature Extraction from Medical Images’, Comput. Vision - ECCV90 (ed. 0 Faugeras), Springer-Verlag, Berlin (1990) pp 413-417 Hogg, D C ‘Model-based Vision: a program to see a walking person’, Image & Vision Cornput., Vol 1 No 1 (1983) pp 5-20 Cootes, T G, Taylor, C J, Cooper, D H and Graham, J ‘Training Models of Shape from Sets of Examples’, Proc. Br. Machine Vision Conf., London, UK (1992) Kass, M, Witkin, A and Terzopoulos, D ‘Snakes: Active contour. models’, Int. J. Comput. Vision, Vol 1 (1987) pp 321-331 Terzopoulos, D, Witkin, A and Kass, M ‘Con- straints on deformable models: recovering 3D shape and non rigid motion’, Artif. Intell. Vol 36 (1988) pp 91- 123 Jackins, C L and Tanimoto, S L ‘Ott-trees and their use in representing three-dimensional objects’, Comput. Graph. & Image Process., Vol 14 No 3 (1980) pp 249-270

316 image and vision computing