Upload
others
View
5
Download
0
Embed Size (px)
Citation preview
Chapter 4
Edge and Texture
Edges describe the spatial differences across an image. These differences form boundaries that allow
the human visual system to distinguish between homogeneous colour regions in an image. Simi-
larly, content-based image retrieval systems use low-level edges in higher level feature extraction
techniques such as contour extraction and texture analysis to differentiate between regions within
an image.
Edges have been used extensively for content-based image retrieval and much research has been
conducted [13, 60, 19, 4, 104]. However, many of the techniques proposed in the literature only use
very simple edge detectors [27, 19, 4]. By themselves simple edge detectors perform well against
complex edge detectors, however they perform poorly when used for higher level feature extraction
such as contour extraction and texture analysis. Specific edge detectors have been designed to
extract features required by texture analysis [60, 105, 39] but few edge detectors have been designed
with the intent of accurate contour following. Contours are important in content-based retrieval
systems as they are one of the high-level structural representations within an image. Since the
performance of contour following is intrinsically dependent on edge detection the primary purpose
of this chapter is to investigate edge detection techniques for contour following and to build upon
these techniques to produce an edge detector tuned for contour following that can also be used for
texture analysis. The resulting edge detector, called the Asymmetry edge detector, is able to provide
the best single pixel responses across multiple orientations compared with existing techniques.
Section 4.1 discusses the limitations of existing edge detection techniques. Section 4.2 identifies
the requirements of an edge detector within the context of this research. Section 4.3 analyses
and compares existing multi-orientation operators. Section 4.4 presents the new Asymmetry edge
detector. Section 4.5 presents a new technique for thinning multi-orientation edge responses. Section
4.6 discusses the Asymmetry detector as a computational model of the visual cortex. Section 4.7
presents a new technique for inhibiting textures in edge detection. Section 4.8 presents conclusions
drawn from the findings of this chapter.
75
(a) Original image (b) Simple difference operator (c) Laplacian
(d) Roberts (e) Prewitt (f) Sobel
(g) Frei-Chen (h) Kirsch (i) Robinson
Figure 4.1: Some common edge detectors applied to image (a). Each result image represents the
absolute maximum magnitude at each pixel after the individual masks have been applied.
4.1 Edge Detection
Edges form where the pixel intensity changes rapidly over a small area. Edges are detected by
centring a window over a pixel and detecting the strength of edge within the window. The result is
stored at the same pixel location. Edge responses produced by a number of common edge detectors
are shown in Figure 4.1.
Edge detection techniques often use a mask that is convolved with the pixels in the window.
A simple difference mask is shown in Figure 4.2 (a). The difference mask is directional. An edge
detector can also be non-directional such as the Laplacian or difference of Gaussians as shown in
Figure 4.2 (b). Since edges are directional and contours consist of oriented edges, we are primarily
interested in directional edge detectors.
Other simple, but extensively used edge detectors, include the Sobel, Roberts, and Prewitt
76
-1 1
(a) Simple Difference Operator 0
-1
0
-1
0
0 -1
-1
4
(b) Laplacian
1 0
0 -1 -1
10
0
(c) Roberts
-1 -1 -1
000
1 1 1 1
1
1
-1
-1
-1 0
0
0
(d) Prewitt
-1 -2 -1
000
1 2 1 1
2
1
-2
-1
-1 0
0
0
(e) Sobel
Figure 4.2: Some common edge detectors.
operators (Figure 4.2) [69]. Such operators are directional and can be used to detect orientations at
90 ◦ intervals. Other operators such as the Frei-Chen [58], Kirsch [57], and Robinson [59] operators
can also be oriented at 45 ◦ intervals allowing up to 4 orientations to be detected (Figure 4.3 (a)).
In contrast, the human vision system detects 18 different orientations at 10 ◦ intervals [10].
Operators that are specified by a continuous function rather than a fixed mask can be rotated
to any arbitrary orientation. The Gabor filter [60] and the Canny operator [13] (Figure 4.4) can
both be described mathematically and are two of the most advanced edge detectors as they have
a similar receptive field to the edge detectors of the human vision system [12].
Figure 4.1 shows the output of the various edge detectors discussed in this section applied to a
test image. However, it is not possible to determine a good edge detector simply by looking at its
output. Instead we must look at the design and features of an edge detector with respect to the
requirements of contour following.
4.2 Edge Detector Requirements
For each pixel, contour following requires the orientation and strength of each edge. Contour
following also requires highly tuned edge responses. Tuning can occur across orientations and also
across spatial locations. Figure A.8 shows the orientation tuning response curve for a simple cell
in human vision. Likewise, oriented edge detectors produce different edge responses depending on
the orientation of the edge input. The output will peak when the orientation of the edge and the
77
3 3 3
303
-5 -5 -5 3
3
3
-5
3
-5 -5
3
0
-5 3 3
30-5
-5 3 3 3
3
3
-5
-5
3 3
-5
0
(a) Kirsch masks
(b) Kirsch mask results
Figure 4.3: The Kirsch mask [57] applied to the image in Figure 4.1(a). The masks detect edges at
0 ◦, −45 ◦, −90 ◦, and −135 ◦.
78
detector are aligned and will fall off as their orientations change. Since contour following will follow
the orientation with the largest strength it is important that the edge detectors are tuned tightly so
that the contour following algorithm doesn’t inadvertently follow the wrong orientation. However,
the tuning can not be too tight as responses by two edge detectors with adjacent orientations can
be used to determine the exact orientation of an edge that lies between the two orientations.
Position tuning is also important as a contour following algorithm will also consider a neigh-
bourhood of pixels to determine the next pixel to include in the contour. If two adjacent pixels
produce a strong response then the contour following algorithm may unnecessarily create two
contours at that point rather than following the pixel that the edge is truly aligned to.
Adjacent orientation responses are used to determine the exact orientation of an edge. In the
same manner it is possible to use adjacent position responses to determine the exact position of
an edge. This process is called subpixel edge detection [106], however subpixel edge detection is
beyond the scope of this research, primarily because each stage of edge and contour processing
assumes that each edge is aligned with the centre of a pixel.
In summary, the edge detector must satisfy the following requirements:
• Produce multi-orientation output
• Orientation-tuned with only two adjacent responses generated
• Position-tuned with only one adjacent response generated
• Efficient, small window, convolution-style operator
4.3 Multi-orientation Operators
The Gabor and Canny operators are the most suitable operators for multi-orientation edge de-
tection as they are described by a continuous function (and therefore can be used to construct
multi-orientation detectors), resemble edge detectors in the human vision system, and have been
extensively investigated [60, 13, 107]. Other fixed mask operators such as the Laplacian, difference
of Gaussians, Roberts, Prewitt, and Sobel operators are not suitable because they only support 1
to 4 orientations. An additional benefit of the Gabor and Canny operators is that they are scalable
and can be used to identify edges of different resolutions.
In this research we have decided to use the S-Gabor filter proposed by Heitger et al. [12] over the
standard Gabor filter. The standard Gabor filter modulates a sine or cosine wave with a Gaussian
envelope:
Godd(x) = e−x2/2σ2sin[2πv0x] (4.1)
Geven(x) = e−x2/2σ2cos[2πv0x] (4.2)
79
where σ is the bandwidth of the Gaussian envelope and v0 is the wavelength of the sine wave.
The odd Gabor filter is used for edge detection whilst the even Gabor filter can be used for line
detection. The Gaussian envelope of the Gabor filter is not able to curtail the periodic nature of the
sine or cosine wave and therefore additional fluctuations of the wave may appear at the extremities
of the filter. Since edges are a local phenomenon there is no need for a periodic wave and the
S-Gabor filter reduces the frequency of the sine wave as x increases so that only one wavelength is
present:
Sodd(x) = e−x2/2σ2sin[2πv0xξ(x)] (4.3)
Seven(x) = e−x2/2σ2cos[2πv0xξ(x)] (4.4)
ξ(x) = ke−λx2/σ2+ (1− k) (4.5)
where k determines the change of wavelength. The Canny operator is simpler as it does not use
periodic functions:
C(x) =xe−x2/2σ2
σ2(4.6)
When a multi-orientation operator is applied to an image multiple edge images are generated.
Therefore the greater the number of orientations per edge detector the greater the amount of
memory is required to store the result images and also the longer it will take to generate the
images. For the purposes of optimisation it is beneficial for the number of orientations to be as
small as possible. We have decided to use 12 orientations at 15 ◦ intervals as a compromise between
the 18 orientations of human vision and the 1 to 4 orientations offered by the fixed mask operators.
4.3.1 Multi-orientation Experiments
The S-Gabor and Canny operators were chosen because they can be used at any orientation and
resemble the receptive fields of visual cortex simple cells. The odd S-Gabor filter was constructed
in two dimensions using the following formulae:
Sodd(x′, y′) = e−(x′2+y′2)/2σ2sin[2πv0y
′ξ(x′, y′)] (4.7)
ξ(x′, y′) = ke−λ(x′2+y′2)/σ2+ (1− k) (4.8)
where x′ and y′ are the rotated and scaled pixel co-ordinates defined below in Equations 4.10 and
4.11. The remaining parameters were adjusted to provide a filter that produces only one period of
the sine wave under the Gaussian envelope with a wavelength of 2 pixels, resulting in σ = 0.646,
v0 = 0.5, λ = 0.3, and k = 0.5.
The Canny filter was constructed in two dimensions using the following formula:
C(x) =−y′e−(x′2+y′2)/2σ2
σ2(4.9)
where σ = 0.35 to also provide a separation of one pixel between lobe peaks.
80
The filters were rotated and scaled by pre-rotating and scaling the x and y pixel co-ordinates:
x′ =x cos(−θ)− y sin(−θ)
sx(4.10)
y′ =x sin(−θ) + y cos(−θ)
sy(4.11)
where θ = n π12 , n = (0→ 11), sy = 1, and sx determines the elongated aspect ratio of the filter.
The S-Gabor and Canny operators are very similar in shape and the similarity is shown in
their respective tuning response curves. The tuning response curves display the magnitude of the
response of the operator at different lateral positions and orientations to the edge stimulus. A
vertical black and white edge was used as the stimulus and 12 orientations of the operator were
convolved with the stimulus. Position response values were taken from the few pixels either side of
the edge whilst orientation responses were taken from each of the 12 resulting images.
In our analysis we are primarily interested in the highest frequency edges representable by the
image. These edges are formed between two adjacent pixels. Therefore the filters have a width of 2
pixels with each lobe centred on a pixel. The length of the filter must be greater than one pixel and
should be less than 10 pixels so that curves are detectable. A longer filter is desirable to filter out
noisy edges. Because there is no exact restriction on filter length we will first analyse the tuning
response curves at different lengths to determine the best length.
4.3.2 Multi-orientation Results
Figure 4.4 shows the aspect ratios of the S-Gabor and Canny operators tested. Figures 4.5 and 4.6
show the tuning response curves for the S-Gabor and Canny operators respectively at the different
aspect ratios. By comparing the graphs it can be seen that the Canny and S-Gabor operators
show very similar results (although at different aspect ratios). This can be explained by the Canny
operator being shorter than the S-Gabor operator. Since there is no difference in orientation and
position tuning between the two operators either one may be used. We have selected the Canny
operator because it requires fewer parameters.
The tuning response curves show that shorter filters provide very good position tuning but
poor orientation tuning, whilst the longer filters provide good orientation tuning but poor position
tuning at orientations slightly different to that of the edge. These tuning response curves can be
explained by visualising the overlapping of the operator lobes over a test edge (see Figure 4.7 (a)
to (d)). These scenarios indicate that whenever the edge stimulus is asymmetrical over the length
of the filter a response shouldn’t be generated. What is required is an asymmetry detector whose
response is negated from the response of the edge detector.
81
S Gabor
Canny
Canny asymmetry
1:11.5:12:13:14:1
1:11.5:12:13:14:16:1
1.33:12:12.67:14:1
Figure 4.4: Filters tested.
4.4 New Asymmetry Detector
A simple approach to identify asymmetry of edge response along the length of an edge detector
could be to simply use the same edge detector but at a 90 ◦ orientation. However, such a filter
would give the same tuning responses as those in Figure 4.6 but shifted 90 ◦ and wouldn’t be
sufficient to nullify erroneous responses. What is required is a filter which is the same shape as the
edge detector but at a 90 ◦ orientation (see Figure 4.7).
The same formula for constructing the Canny edge detector in Equation 4.9 is used for the
asymmetry filter (σ = 0.5) however the rotation and scaling equations are modified to allow for an
orthogonal orientation and aspect ratio:
x′ =3[x cos(π
2 − θ)− y sin(π2 − θ)]
2sx(4.12)
y′ =x sin(π
2 − θ) + y cos(π2 − θ)
sy(4.13)
The direction of asymmetry is not relevant so the absolute asymmetry response is subtracted
from the Canny edge detector modulated by a tuning factor t:
EA = |C| − t|A| (4.14)
where C is the response of the Canny edge detector, A is the response from the asymmetry filter,
and EA is the final edge response.
82
0.00E+00
1.00E+02
2.00E+02
3.00E+02
4.00E+02
5.00E+02
6.00E+02
Response
0 15 30 45 60 75 90 105 120 135 150 165
S1
S2
S3
S4
S5
S6
S7
Orientation (degrees)
Position
S Gabor 1:1
0.00E+00
1.00E+02
2.00E+02
3.00E+02
4.00E+02
5.00E+02
6.00E+02
Response
0 15 30 45 60 75 90 105 120 135 150 165
S1
S2
S3
S4
S5
S6
S7
Orientation (degrees)
Position
S Gabor 1.5:1
0.00E+00
1.00E+02
2.00E+02
3.00E+02
4.00E+02
5.00E+02
6.00E+02
Response
0 15 30 45 60 75 90 105 120 135 150 165
S1
S2
S3
S4
S5
S6
S7
Orientation (degrees)
Position
S Gabor 2:1
0.00E+00
1.00E+02
2.00E+02
3.00E+02
4.00E+02
5.00E+02
6.00E+02
Response
0 15 30 45 60 75 90 105 120 135 150 165
S1
S2
S3
S4
S5
S6
S7
Orientation (degrees)
Position
S Gabor 3:1
0.00E+00
1.00E+02
2.00E+02
3.00E+02
4.00E+02
5.00E+02
6.00E+02
Response
0 15 30 45 60 75 90 105 120 135 150 165
S1
S2
S3
S4
S5
S6
S7
Orientation (degrees)
Position
S Gabor 4:1
Figure 4.5: Gabor tuning response curves
83
0.00E+00
1.00E+02
2.00E+02
3.00E+02
4.00E+02
5.00E+02
6.00E+02
Response
0 15 30 45 60 75 90 105 120 135 150 165
S1
S2
S3
S4
S5
S6
S7
Orientation (degrees)
Position
Canny 4:1
0.00E+00
1.00E+02
2.00E+02
3.00E+02
4.00E+02
5.00E+02
6.00E+02
Response
0 15 30 45 60 75 90 105 120 135 150 165
S1
S2
S3
S4
S5
S6
S7
Orientation (degrees)
Position
Canny 6:1
0.00E+00
1.00E+02
2.00E+02
3.00E+02
4.00E+02
5.00E+02
6.00E+02
Response
0 15 30 45 60 75 90 105 120 135 150 165
S1
S2
S3
S4
S5
S6
S7
Orientation (degrees)
Position
Canny 2:1
0.00E+00
1.00E+02
2.00E+02
3.00E+02
4.00E+02
5.00E+02
6.00E+02
Response
0 15 30 45 60 75 90 105 120 135 150 165
S1
S2
S3
S4
S5
S6
S7
Orientation (degrees)
Position
Canny 3:1
0.00E+00
1.00E+02
2.00E+02
3.00E+02
4.00E+02
5.00E+02
6.00E+02
Response
0 15 30 45 60 75 90 105 120 135 150 165
S1
S2
S3
S4
S5
S6
S7
Orientation (degrees)
Position
Canny 1:1
0.00E+00
1.00E+02
2.00E+02
3.00E+02
4.00E+02
5.00E+02
6.00E+02
Response
0 15 30 45 60 75 90 105 120 135 150 165
S1
S2
S3
S4
S5
S6
S7
Orientation (degrees)
Position
Canny 1.5:1
Figure 4.6: Canny tuning response curves
84
Asymmetry detector
Edge Detector
(a) (b)
(c) (d) (e)
Figure 4.7: (a) to (d) Edge operator scenarios. (a) Alignment of operator with edge; (b) orientation
misalignment; (c) orientation and position misalignment; (d) position misalignment. (e) Asymmetry
detector overlaid on edge detector.
4.4.1 Asymmetry Detector Results
The tuning curves of asymmetry filters for the 3:1, 4:1, and 6:1 Canny edge detectors are shown in
Figure 4.8. The tuning curves are sufficient to nullify erroneous responses (however, shorter aspect
ratios below 3:1 were not sufficient). The result of the asymmetry edge detector with tuning t = 1
is shown in Figure 4.9 (a). With a tighter tuning parameter of t = 2 the result is a perfectly tuned
edge detector in both orientation and position (Figure 4.9 (b)).
The edge stimulus used is a perfect vertical edge aligned to one of the edge detector orientations.
To test whether the Asymmetry detector performs as well with edge orientations which are not
aligned with one of the edge detector orientations the same vertical edge was tested with edge
orientations at a 7.5 ◦ offset which is half way between the usual 15 ◦ interval between edge detector
orientations. Figure 4.10 shows the results for the 7.5 ◦ offset edge detector. The Asymmetry edge
detector successfully provides two identical responses for each adjacent orientation indicating that
the orientation of the edge lies exactly halfway between the two orientations.
The tuned operator appears to work well for any aspect ratio greater than or equal to 3:1.
However, because the tuned operator is inhibited by asymmetrical stimulus it may have problems
at corners (Figure 4.11). The tuning curves at corners for the three aspect ratios are shown in Figure
4.12. Figure 4.12 shows that there is no response for the edge as the edge detector approaches the
corner. Larger aspect ratio operators fall off early whilst the 3:1 aspect ratio operator falls off only
one pixel before the end of the contour. Therefore, the best operator for all scenarios is the 3:1
aspect ratio operator.
85
0
100
200
300
400
500
600
Response
0 15 30 45 60 75 90 105 120 135 150 165
S1
S2
S3
S4
S5
S6
S7
Orientation (degrees)
Position
Canny Asymmetry 2.67:1 (matches 4:1)
0
100
200
300
400
500
600
Response
0 15 30 45 60 75 90 105 120 135 150 165
S1
S2
S3
S4
S5
S6
S7
Orientation (degrees)
Position
Canny Asymmetry 4:1 (matches 6:1)
0
100
200
300
400
500
600
Response
0 15 30 45 60 75 90 105 120 135 150 165
S1
S2
S3
S4
S5
S6
S7
Orientation (degrees)
Position
Canny Asymmetry 2:1 (matches 3:1)
0
100
200
300
400
500
600
Response
0 15 30 45 60 75 90 105 120 135 150 165
S1
S2
S3
S4
S5
S6
S7
Orientation (degrees)
Position
Canny Asymmetry 1.33:1 (matches 2:1)
Figure 4.8: Asymmetry tuning curves.
0
100
200
300
400
500
600
Response
0 15 30 45 60 75 90 105 120 135 150 165
S1
S2
S3
S4
S5
S6
S7
Orientation (degrees)
Position
Canny Tuned 3:1
0
100
200
300
400
500
600
Response
0 15 30 45 60 75 90 105 120 135 150 165
S1
S2
S3
S4
S5
S6
S7
Orientation (degrees)
Position
Canny Tuned 3:1 minus 2x Asymmetry
Figure 4.9: Combined edge detector and asymmetry inhibitor at 3:1 aspect ratio. (a) t = 1, (b)
t = 2.
86
0
50
100
150
200
250
Response
0 15 30 45 60 75 90 105 120 135 150 165
S1
S2
S3
S4
S5
S6
S7
Orientation (degrees)
Position
Canny Tuned 3:1 minus 2x Asymmetry, 7.5˚ offset
Figure 4.10: Tuned edge detector at 7.5 ◦ orientation offset.
The one pixel fall off in edge response before the end of the contour may affect contour extraction
as the contours extracted will not include the last pixel of the corner. However, losing one pixel
before the end of a contour appears to be a fair trade off for the improved orientation and position
tuning gained. In addition, contour-end detection and vertex extraction which are investigated in
the following chapter would be able to identify the corner and higher level processing stages would
be able to link the vertex to the edges.
Figure 4.13 shows how the Asymmetry detector compares with the standard Canny detector
for sample test images. The Asymmetry edge detector results of Figure 4.13 (c) show tighter
positional tuning than the Canny edge detector results of Figure 4.13 (b). The orientation tuning
performance is not as easily seen in a single aggregate image however the impact of improved
orientation tuning in the Asymmetry detector can be seen in later stages of processing which is
indicated by the thinned Canny and Asymmetry responses of Figure 4.13 (d) and (e) respectively.
The thinned Asymmetry edges using the thinning technique discussed in the next section contain
fewer spurious responses than the thinned Canny edges.
4.5 Thinning
Using the Asymmetry edge detector developed in the previous sections the edge responses should
be tightly tuned in both orientation and position. However, it is still possible that an edge may
generate responses over a number of positions because it’s wavelength is greater than that of the
edge detector. Therefore it is still necessary to perform some thinning on the edge responses to
reduce contours to 1 pixel thickness, which is required by the contour following algorithm.
We are only interested in thinning along the direction of a contour. Current thinning techniques
such as morphological thinning and skeletonisation ignore the direction of a contour. As a result
thinning will occur in all directions. Figure 4.15 (a)-(d) shows the results of thinning the cube
87
Figure 4.11: Possible problem when tuned edge detector is placed over a corner.
0
100
200
300
400
500
Response
1 2 3 4 5 6 7 8 9 10 11 12
S1
S2
S3
S4
S5
S6
S7
Orientation (degrees)
Position
Canny Tuned 4:1 Corner
0
100
200
300
400
500
600
Response
1 2 3 4 5 6 7 8 9 10 11 12
S1
S2
S3
S4
S5
S6
S7
Orientation (degrees)
Position
Canny Tuned 3:1 Corner
0
100
200
300
400
500
600
Response
1 2 3 4 5 6 7 8 9 10 11 12
S1
S2
S3
S4
S5
S6
S7
Orientation (degrees)
Position
Canny Tuned 2:1 Corner
0
50
100
150
200
250
300
350
Response
1 2 3 4 5 6 7 8 9 10 11 12
S1
S2
S3
S4
S5
S6
S7
Orientation (degrees)
Position
Canny Tuned 6:1 Corner
Figure 4.12: Corner tuning curves
88
(e)
(d)
(c)
(b)
(a)
Figure 4.13: Canny and Asymmetry edge responses for the Chapel, Plane, and Claire images. (a)
Sample images; (b) Results after applying the Canny edge detector; (c) Results after applying the
Asymmetry edge detector; (d) Thinned Canny edges; (e) Thinned Asymmetry edges.
89
(b)(a)
Figure 4.14: (a) Cube test image; (b) asymmetry edge detector responses.
responses of Figure 4.14 using the skeletonisation and morphological thinning algorithms.
Thinning can be applied to either the individual orientation responses or to the aggregate
edge responses. However, as can be seen in Figure 4.15 (a)-(d), skeletonisation and morphological
thinning techniques ignore the direction of the contour responses and any interaction between
different orientations. Therefore a new thinning process has been developed which only thins along
the direction of a contour whilst taking into account adjacent orientations.
Morphological thinning approaches process a small neighbourhood of an image, for example a
3×3 neighbourhood of pixels. In non-directional techniques the goal is to remove any pixel adjacent
to another which lies on an edge. In directional techniques the approach is similar but pixels are
only considered adjacent along the perpendicular to the orientation of the edge response (Figure
4.16). Morphological approaches work well for edge responses that are aligned to the horizontal,
vertical, and diagonal layout of pixels. Thinning occurs by removing a pixel if two pixels are found
to be in adjacent locations. Which pixel is removed depends on the depth of the image. For binary
images there is often an iterative process where the pixels lying on the edge of the region are
removed first and the process stops when no more pixels are removed. For greyscale images, the
magnitude of the edge response can be used to determine which pixel will be removed. Usually the
pixel with the lesser magnitude is removed.
Using a neighbourhood aligned to pixel positions becomes less useful when working with more
than four orientations because the positions of adjacent responses no longer align to the centre of
existing pixels (Figure 4.16 (c) and (d)). In fact this is also true even for 45 ◦ orientations because
the distance between pixel centres is greater than the distance between the centres of horizontally
and vertically aligned pixels. Therefore if more than the horizontal and vertical orientation are to
be used for thinning then a more sophisticated technique is required to determine neighbourhood
90
(f)(e)
(d)(c)
(b)(a)
Figure 4.15: (a) Skeletonisation of the aggregate edge responses; (b) aggregate of the skeletonisation
of individual orientation edge responses; (c) morphological thinning of the aggregate edge responses;
(d) aggregate of the morphological thinning of individual orientation edge responses; (e) Gaussian
thinning; (f) diagonal removal.
91
(a) (b) (c) (d)
Figure 4.16: Positions of perpendicularly adjacent responses used for thinning. (a) Vertical, (b)
horizontal, (c) 45 ◦, and (d) 15 ◦.
responses.
4.5.1 New Gaussian Thinning Approach
The problem facing morphological techniques is a sampling problem where the sampling no longer
occurs at pixel centres. To solve the sampling problem we have created three elongated Gaussian
filters that sample at three positions orthogonal to the orientation of the edge (see Figure 4.17).
The distance between each filter remains constant regardless of the orientation, thereby solving
the sampling problem. The outputs from the three filters are then used to thin laterally along the
orientation.
The three Gaussian filters are based on the two-dimensional Gaussian envelope:
G = e−(x′2+y′2)/2σ2(4.15)
where σ is the bandwidth of the envelope and is set to 0.5, and x′ and y′ are the scaled, translated,
and rotated pixel co-ordinates:
x′ =x cos(−θ)− y sin(−θ)
sx(4.16)
y′ =x sin(−θ) + y cos(−θ) + ty
sy(4.17)
where θ is the orientation of the elongated Gaussian filter ranging in 15 ◦ increments from 0 ◦
to 165 ◦, sx and sy determine the shape of the filter and are set to sx = 4 and sy = 1, and ty
determines the lateral translation of the elongated Gaussian filter and has the values (−1, 0, 1) for
the three lateral filters centring each Gaussian filter one pixel from the centre pixel in an orthogonal
direction from the orientation of the edge.
An edge response is cleared if either of the two lateral Gaussian samples in the same orientation
or the four lateral samples in adjacent orientations are greater than the Gaussian sample centred
at the current pixel, that is, if either of the following are true:
G−1(θ) > G0(θ) (4.18)
92
Figure 4.17: Position of Gaussian filters used for thinning.
Figure 4.18: Potential double pixel lines after Gaussian thinning.
G1(θ) > G0(θ) (4.19)
G−1(θ +π
12) > G0(θ) (4.20)
G1(θ +π
12) > G0(θ) (4.21)
G−1(θ −π
12) > G0(θ) (4.22)
G1(θ −π
12) > G0(θ) (4.23)
where G−1, G0, and G1 are the three lateral Gaussian samples and θ is the orientation of the
elongated Gaussian filter. This first criteria thins laterally across orientations but does not perform
orientation competition at the centre pixel.
Orientation competition is performed by preserving the largest two adjacent edge responses in
a local neighbourhood along the orientation axis. Two adjacent edge responses are preserved so
that the true orientation of the edge can be interpolated. To be preserved, the current orientation
Gaussian response must be greater than or equal to the two adjacent orientation responses:
G0(θ) ≥ G0(θ ±π
12) (4.24)
or, the current orientation may have a greater adjacent orientation but it must be greater than the
responses adjacent to these two, that is:
G0(θ) < G0(θ −π
12) and G0(θ) > G0(θ +
π
12) and G0(θ) > G0(θ +
2π
12) (4.25)
or:
G0(θ) < G0(θ +π
12) and G0(θ) > G0(θ −
π
12) and G0(θ) > G0(θ −
2π
12) (4.26)
The result of applying the Gaussian thinning technique is shown in Figure 4.15 (e) where it can
be seen that the edges are successfully thinned along the orientation of the edges. There is still one
93
Step 1 Step 2
Figure 4.19: Diagonal removal.
problem with this technique in that it is not able to reduce the 45 ◦ orientation edge responses to
a one pixel thick line (see Figure 4.18). This is because the resulting two pixel line contains very
little overlap in the perpendiculars, so the existing two pixels aren’t compared with each other. The
technique for thinning the 45 ◦ orientations is shown in Figure 4.19. If both positions of a diagonal
are occupied in a 2× 2 block then the other two positions are removed. If not then the reverse is
checked to see if the first diagonal should be removed. The values in adjacent orientations are also
checked. The result after removing diagonals is shown in Figure 4.15 (f). Compared with Figure
4.15 (a)-(d), Gaussian thinning produces thinner lines and conforms to the original orientations of
the edge responses.
4.5.2 Gaussian Thinning Results
Results of Gaussian thinning are shown in Figure 4.13 (d) and (e) applied to Canny and Asymmetry
edge responses respectively. The edges extracted are successfully thinned along the orientation of
the contours. The figure also demonstrates the benefits of using the Asymmetry detector for higher
level edge processing such as thinning. The thinned Asymmetry responses contain fewer spurious
edges than the thinned Canny responses showing that the multi-orientation Gaussian thinning
technique performs better with tightly tuned orientation and position edge detectors.
4.6 Asymmetry Edge Detector as a Computational Model
of the Visual Cortex
In Section 2.6 computational models of the visual cortex were presented. These models are de-
signed to validate vision processing theories rather than to be efficient edge detectors for use in
CBVR applications. The asymmetry edge detector presented in this chapter is also motivated by
the architecture of the visual cortex but is designed to be used in CBVR and other image pro-
cessing applications. Figure 4.20 shows the asymmetry edge detector and thinner in the context
of vision processing in the visual cortex. Both the Canny edge detector and asymmetry detectors
are represented as simple cells with the output of the asymmetry detector inhibiting the Canny
94
Simple Cell
Canny 3:1
Simple Cell
Asymmetry 2:1
Photoreceptors
RGB Image
Inhibition
Orientation and Spatial Competition
Gaussian Thinning
Spatial Competition
Remove Diagonals
Figure 4.20: Asymmetry edge detector model of the visual cortex.
edge detector. The Gaussian thinning stage represents both orientation and spatial competition in
the visual cortex whilst the remove diagonals stage represents spatial competition between simple
cells. The asymmetry edge detector differs from other models such as Marr’s [56] and Grossberg’s
[94] as it does not attempt to model the non-directional ganglion and LGN cells. It is also a purely
feed-forward implementation resulting in a simpler architecture and faster execution. Higher-level
stages of the model such as edge linking and end-stopped detection are discussed in the following
chapter.
4.7 New Texture Inhibition Approach
In this chapter edge detection techniques have been presented that can detect boundaries between
regions of homogeneous colour. Detecting boundaries between regions of heterogeneous colour,
such as texture, is more complex because local edges are also formed within the regions. Consider
Figure 4.21 (a) for example, even though the different textures are easily distinguishable by the
human brain there are no contours formed by a consistent change in homogeneous colour, as can be
seen by the lack of edge response along the texture borders in Figure 4.21 (b). Therefore the edge
techniques presented in this chapter alone are not enough to identify boundaries between regions
of texture.
Identifying texture boundaries is crucial for higher-level processing of contours. Since textures
consist of contours, a contour processing stage will process all of the contours within the texture,
which is unnecessary as these contours do not represent boundaries. Therefore it is beneficial to
inhibit texture regions before higher-level processing such as contour extraction occurs. Identifying
texture regions can be difficult as any occurrence of contours could be considered texture. There-
fore rather than simply identifying textures we present a technique that identifies the boundaries
between textures, which would also include non-textural contours. Higher level processes will only
process contours that lie within texture boundaries.
95
(a) (b) (c)
Figure 4.21: (a) A composite of Brodatz textures D9, D38, D92, and D24 histogram equalised [108],
(b) Edge responses of composite texture image, (c) Moving average of maximum edge responses.
4.7.1 Psychological and Perceptual Basis
Through intensive psychological studies Tamura et al. [39] found that humans group textures into
three groups based on coarseness, contrast, and directionality. Coarseness refers to the size of
the repeating pattern, contrast refers to the overall ratio between darkness and lightness in the
texture, and directionality refers to the orientation of the texture. A similar study conducted by
Rao and Lohse [68] found that humans grouped patterns by repetitiveness, directionality, and
complexity. Once again repetitiveness refers to the scale of the pattern and directionality refers to
the orientation of the texture. However, the third texture dimension of complexity refers to how
ordered the placement of the texture patterns are. The complexity could also be considered as
noise.
The first challenge is whether the edge responses of the Asymmetry edge detector are sufficient
to represent the three dimensions of texture. Since the primary component of the Asymmetry edge
detector is the Canny operator, which is similar to a Gabor filter, the edge detector is able to filter
spatial frequencies in a similar way to a wavelet. Therefore, the edge detector is able to detect
Tamura’s coarseness [39] or Rao and Lohse’s repetitiveness [68] which is essentially the spatial
frequency of the texture. Since the edge detector is also oriented, elongated, and uses an asymmetry
inhibitor to fine tune the orientation response, the edge detector is quite capable of representing
the orientation of a texture. Tamura’s contrast can also be represented by the amplitude of the
edge detector response since the edge detector responds to spatial changes which also affect the
contrast of the texture. The component that the edge detector does not represent directly is Rao
and Lohse’s complexity. However, the complexity of the texture is implicit in the location of the
edge responses. Therefore further processing of the edge responses is required to determine the
complexity of the texture. However, our goal is not so much to simply extract the features of the
texture but more importantly to define the spatial extent of a texture and the boundaries between
textures.
96
(a) (b)
Illusory contour
Figure 4.22: (a) Patch-suppressed cell; (b) Abutting grating stimulus.
There is some basis for the inhibition of edge responses through texture detectors in human
vision research. Sillito et al. [109] found a majority of cells (33/36) in V1 where the response
was suppressed by an increasing diameter of a circular patch of drifting sinusoidal grating. These
cells are known as patch-suppressed cells. They found that a small disk grating or a large disk
grating with an empty centre will evoke a response but not when both are combined. Therefore,
larger areas of dense edge responses will be inhibited. Sillito et al. [109] also performed cross-
correlation experiments on pairs of cells that were cross-oriented (had preferred stimulus that were
approximately 90 ◦ to each other). They found a high correlation between cross-oriented simple
cells when the stimulus had inner and outer gratings at 90 ◦ to each other (see Figure 4.22 (a)),
suggesting functional connectivity. Larkum et al. [110] found pyramidal neurones in layer 5 which
fired if both distal and proximal dendrites received input but not if either alone were activated.
Therefore, larger areas of dense edge responses are inhibited, but only if they do not border another
area of dense edge responses which ideally have a perpendicular orientation. Grosof et al. [111] have
also found cells in V1 which respond to the illusory contour formed at the end of abutting gratings
which are different to the cells found in V2 by Soriano et al. [112] which respond to more general
types of illusory contours. The abutting grating stimulus (see Figure 4.22 (b)), which is essentially
the boundary of a texture, shows that the edge boundary between textures is detected early on in
the visual pathway.
Some textures do not have clearly defined boundaries and segregation is dependent on higher
level processing. One example is that texture elements with differing numbers of line ends are easier
to segregate than those with the same number of terminations [113]. Psychophysical experiments
performed by Beck et al. [114] found that the strength of segregation depended on the contrast
and size difference of texture elements. The size difference can also be represented as a contrast
difference, hence the perception can be explained solely through contrast. They also found that
hue can have the same effect but only if the texture element and background are of the same lu-
minance. Beck et al. [114] were able to simulate the psychophysical results using bandpass filters.
It may appear possible that the oriented bandpass filters of the primary visual cortex can perform
texture segregation. Based on the results of Beck et al. [114] this appears possible, however neu-
rophysiological recordings have found that global segregation does not occur at this stage [115]. It
97
is possible that texture segregation can occur at a number of levels which provides a basis for the
low-level approach for processing texture boundaries taken in this chapter.
4.7.2 Texture Identification
Areas of texture need to be identified so that they do not interfere with the extraction of re-
gion boundaries. However, the boundary between two textures should also be considered a region
boundary. Therefore, a technique is required that identifies areas of texture but does not consider
the boundaries between textures as texture. An area of image consists of texture if it contains a
repeating pattern of contours. Therefore the first characteristic of a texture is that it consists of
a uniform spatial distribution of contours. The smallest unit of a repeating pattern is the texture
element, also known as a texton [116]. The distribution of contours within the texture element
does not need to be uniform, however, there must be some uniformity in the distribution of tex-
ture elements. Uniformity of distribution can be represented by the moving average of the edge
responses. Changes in the moving average reflect a change in spatial density of edge responses
within a window. The window of the moving average must be equal or greater than the size of the
texture element. For this research we have chosen a window size of 32 pixels wide and high.
Using the composite image formed from the four Brodatz textures of Figure 4.21 (a) the first
step is to extract the edge responses. The edge responses consist of 12 images representing each
15 ◦ orientation. Figure 4.21 (b) shows the maximum response from all orientations for each pixel.
Applying a moving average to the maximum edge responses produces the image in Figure 4.21 (c).
Unfortunately, applying a moving average to the maximum edge responses does not reveal much
change between the textures. This is because the textures of Figure 4.21 (a) have a relatively similar
edge density. However, the shapes of the texture elements are different and should be revealed by
processing edge orientations individually.
Figure 4.23 (a) shows the moving average applied to each orientation individually. The results
are multiplied by a factor of 10 to make the differences more visible. The differences between the
four textures begin to be revealed when the orientations are processed individually. This approach is
similar to the bandpass filters used by Beck et al. [114] to simulate visual cortex texture segregation.
A problem with the moving average approach is that the square window produces rectangular
artefacts in the average responses. This is caused by the moving average function giving every
pixel equal weighting, even those on the border of the window. The rectangular artefacts can be
removed by using a window with a Gaussian envelope where pixel weighting decreases as the radius
increases from the centre of the window. A two dimensional Gaussian filter with a bandwidth (σ)
of 10 pixels was used in place of the moving average function.
f(x, y) = e−x2+y2
2σ2 (4.27)
Since the convolution of the Gaussian filter with the edge responses can be applied in the Fourier
98
domain, the processing time is considerably less than the moving average approach. The results of
applying the Gaussian filter to the edge responses are shown in Figure 4.23 (b). The rectangular
artefacts are now removed, however the texture borders are less defined. Even so, the Gaussian
moving average of the oriented edge responses is able to detect areas of consistent texture.
4.7.3 Texture Edges
Even though the Gaussian moving average approach is able to successfully identify texture regions
it does not identify borders between textures. The borders between textures must be identified so
that they are not included in the texture areas that will inhibit higher level contour processing.
With the Gaussian moving average approach textures are represented by areas with similar moving
average values. Since the moving average is applied to each orientation, differences between textures
containing texture elements that vary by shape can also be identified. Textures that exhibit a strong
orientation will distribute most of the edge responses in one orientation, such as the top right
hand texture of Figure 4.21 (a). However, textures with multiple orientations will distribute edge
responses across multiple orientations. Nonetheless, differences in shape between textures can still
be identified in the individual orientation responses, as can be seen in Figure 4.23 (b). Therefore,
a texture border will occur when there is no consistency of oriented texture within a region. The
lack of consistency can be represented by the variance (σ2) of moving average responses within a
window.
σ2 =∑
(x− µ)2 (4.28)
A window of 32× 32 pixels was used to compute the variance. The individual variance images
for each orientation are then summed to produce the final image which is shown in Figure 4.24 (a).
The final image clearly shows the borders between the top right texture and the other textures but
only partially represents the bottom and left borders. The variance of moving averages of the edge
responses is similar to the patch-suppressed cells of the human visual cortex reported by Sillito et
al. [109] in that large areas of similar edge responses will be inhibited unless there is variance in
the edge responses over the area.
Since the variance computation also uses a square window similar to the moving average com-
putation it was investigated whether using a Gaussian mask for the variance computation would
improve the results. The computation of µ remains the same however the squared difference of
(x − µ)2 is multiplied by the corresponding Gaussian mask before adding to the variance value.
The results of the Gaussian mask are shown in Figure 4.24 (b) and do not appear to provide
a significant improvement over the square variance approach. Minor differences between the two
images are mainly due to the Gaussian mask being slightly larger than the square window.
99
(a)
(b)
Figure 4.23: (a) Moving average applied to individual orientations, (b) Gaussian filter with band-
width of 15 pixels applied to individual orientations.
100
(a) (b)
Figure 4.24: (a) Variance of moving average, (b) Gaussian variance of moving average.
4.7.4 Texture Noise
The edge responses used to identify texture and texture borders in the last few sections primarily
represent the shape of the texture. The results of the variance computation in Figure 4.24 show
that the shape information alone is not enough to distinguish between textures. The three di-
mensions identified by Rao and Lohse [68] were repetitiveness, directionality, and complexity. The
directionality is represented by the oriented edge responses. However, the edge responses do not
provide a direct indication of the complexity of the texture.
Francos et al. [36] used the Wold decomposition to decompose textures into harmonic and
indeterministic components. The Wold components also relate to the components identified by Rao
and Lohse [68] where the harmonic represents repetitiveness and the indeterministic component
represents complexity. By extending the Wold decomposition into two dimensions Francos et al. [36]
also included a new component called the evanescent component which represents the orientation
of texture. Francos et al. [63] used the auto-regressive moving average (ARMA) model to isolate
the indeterministic component. However, any noise model can and has been used such as moving
average (MA), auto-regressive (AR) [62], simultaneous auto-regressive (SAR) [61], multi-resolution
SAR (MRSAR) [64], Gauss-Markov, and Gibbs [65] models. The SAR model is an instance of
Markov random field (MRF) models [64]. Mao and Jain [64] used SAR and MRSAR models to
perform texture classification and segmentation. In this section we also investigate using the SAR
model for the purpose of identifying boundaries between textures.
101
SAR Model
The SAR model is as follows [64]:
g(s) = µ +∑r∈D
θ(r)g(s + r) + ε(s) (4.29)
where g(s) is the grey level value of a pixel at site s = (s1, s2), D is the set of neighbours at
site s which usually consists of the eight adjacent pixels, ε(s) is an independent Gaussian random
variable with zero mean and variance σ2, θ(r), r ∈ D are the model parameters characterising the
dependence of a pixel to its neighbours, and µ is the bias which is dependent on the mean grey
value of the image.
Texture representation using the SAR model involves determining the parameters µ, σ, and
θ(r), r ∈ D. For a symmetric model where θ(r) = θ(−r), all model parameters can be estimated us-
ing the least squares error (LSE) technique or the maximum likelihood estimation (MLE) method.
Mao and Jain [64] used the LSE technique because it is less time consuming and yields very similar
results to the MLE method.
SAR Implementation
Since more than one variable needs to be determined multiple regression must be used over simple
linear regression. The challenge with the SAR model is to choose an appropriate window size. In
this research the window size will be kept consistent at 32× 32 pixels. For each window, multiple
regression is used to determine the relationship between every pixel in the window and its eight
immediate neighbours. Multiple regression is usually solved using matrices. Equation 4.29 must be
rewritten using matrices:
Y = Xβ + ε (4.30)
Given that n is the set of pixels within a window and p is the set of eight neighbours around
each pixel then Y is the n× 1 matrix of grey level values within the window, X is the n× p matrix
of predictors within the window, that is, each column contains all eight neighbours for each pixel,
β is a p× 1 matrix containing the parameters θ(r), and ε is a n× 1 matrix of random disturbances
for each pixel. Solving equation 4.30 for β involves isolating the β matrix which is shown in the
following equation:
β = (X ′X)−1X ′Y (4.31)
SAR Optimisation
The SAR parameter calculations can be computationally expensive. For a window size of 32 pixels
and a neighbourhood of 8 pixels, 32× 32× 9× 9 = 82, 944 operations are performed per pixel. For
an image with 256× 256 pixels, 5,435,817,984 computations are required resulting in a processing
time of 14 minutes when implemented in Java on a 400MHz PC. In statistics, Equation 4.31 is
102
16 x 16 window
X'
Figure 4.25: The SAR moving window effect on the X ′ matrix.
often optimised using the QR decomposition. However, we investigated an algorithmic approach
for optimisation.
To improve performance, advantage was taken of the fact that a moving window is used to
compute the SAR values. Each subsequent window along the x axis will contain all of the values
of the previous window minus the values in the left column and plus a new column of values for
the right column. This effect can be visualised by looking at X ′. For this example, assume that
the window size is only 16 × 16 pixels. X ′ becomes a matrix with 256 columns and 9 rows. The
256 columns can be divided into groups of 16 columns which represent one column in the original
image window (see Figure 4.25). Since each column in the window is represented by a series of
columns in X ′, when the window moves one pixel to the right, the columns in X ′ which were used
to represent the far left column can be overwritten with the values from the new right column in
the window.
Replacing a section of values in X and X ′ allows an optimisation in the computation of X ′X
to take place. X ′ is a relatively wide matrix and X is a relatively tall matrix, multiplying the two
together results in a small square matrix. Each element (i, j) in the result matrix is calculated by
summing the product of corresponding elements from row j in X ′ and column i in X. When the
window is shifted to the right only the summed product of the old column needs to be subtracted
from the result matrix and the summed product of the new column added in. This results in only
two sets of summed products per pixel rather than the window size, which is 32 in this case.
The number of computations per pixel is reduced to 2 × 32 × 9 × 9 = 5184 and the number of
computations for a 256× 256 image is reduced to 339,738,624. The execution time is reduced from
14 minutes to 2.5 minutes.
103
The same optimisation can be applied to the X ′Y matrix multiplication which results in a 1×9
matrix. Before the optimisation, the computation of X ′Y requires 32×32×9×1 = 9216 operations
which is reduced to 2× 32× 9× 1 = 576 operations after the optimisation.
The optimisation can be taken even further storing the summed products of the previous
columns rather than recomputing them for every new column that is added. This halves the
number of operations to compute X ′X and X ′Y resulting in 32× 9× 9 = 2592 and 32× 9 = 288
operations per pixel respectively.
Finally, the same optimisation can be applied as the window moves down rows in the source
image. As the window moves along the x axis the new column can be computed by using the
summed multiplication for the same column in the previous row and subtracting the top pixel and
adding the new pixel. This reduces the number of operations per pixel to 9 × 9 = 81 to compute
X ′X and 9 to compute X ′Y . For pixels where x >= 1 and y >= 1 the number of computations is
independent of the window size. The only additional overhead is the additional memory required
to store the summed products of previous pixels and rows.
For a 256 × 256 image and a window size of 32 × 32 pixels 90 computations are required for
255 × 255 pixels resulting in 5,852,250 computations. 32 × 32 × 9 × 9 = 82944 computations are
required for the first pixel and 32× 9× 9 = 5184 computations are required for the remaining 255
pixels in the first row. The first pixel in each column also requires 32× 9× 9 = 5184 computations
for all but the top pixel. Therefore the total computations have been reduced to 8,579,034 from
5,435,817,984, a reduction factor of 633.
SAR Application
Using the multiple regression technique presented in the previous section the eight parameters were
determined for each pixel. We weren’t interested in the average (µ) or variance (σ2) as these have
already been computed in the previous sections. Applying the SAR model with a window size of
32× 32 pixels to the test image of Figure 4.21 (a) produced the eight parameter images of Figure
4.26. The SAR parameters show the distinction between the four textures. However, due to the
square window of the LSE technique rectangular artefacts are also produced.
The variance technique of Section 4.7.3 was applied to the SAR images resulting in Figure
4.27 (a). The results are similar to Figure 4.24 however some border responses are slightly com-
plementary. Adding the deterministic component (oriented edge responses) to the indeterministic
component (SAR model parameters) results in the combined texture edges image of Figure 4.27
(b). The combined result is slightly better than either individual result.
104
Figure 4.26: The SAR parameters of Figure 4.21 (a).
4.7.5 Texture Inhibition
The purpose of identifying texture regions and texture borders is to inhibit contours. The texture
edges image of Figure 4.27 (b) is subtracted from the edge response image of Figure 4.21 (b) to
produce Figure 4.27 (c). The resulting image shows that contours within texture areas are largely
inhibited whilst contours near texture borders are not inhibited. Unfortunately the current tech-
nique of using the variance of SAR parameters and oriented edge responses is not accurate enough
to inhibit texture edge responses before contour processing. Ideally the inhibitory action would
result in the suppressed contours of Figure 4.27 (d). The technique could be improved by simulat-
ing the illusory contours generated by cells in V1 when presented with abutting grating stimulus
as was discovered by Grosof et al. [111]. The illusory contours would interfere with the texture
identification stages of moving average oriented edge responses and the SAR model producing more
distinct results at the boundaries between textures.
4.7.6 Comparison with Other Techniques
Unlike other systems such as QBIC [16], ARBIRS [4] detects texture first before analyse colour
regions. ARBIRS uses a relatively simple non-directional first-order derivative edge detector for
determining the basic texture features. The image is subdivided into 24× 24 pixel blocks and edge
density and coarseness values are calculated from the first-order derivative edge responses. A block
is only considered a textured region if the edge density is greater than 25% of the block. Blocks
are then grouped into regions if they have similar colour histograms. The major difference with the
texture detection used in ARBIRS and the texture inhibition approach presented in this chapter
105
(b)(a)
(c) (d)
Figure 4.27: (a) Variance of SAR parameters, (b) Combined variance of SAR parameters and
oriented edge responses, (c) Contour image inhibited by (b), (d) Ideal inhibition.
106
is that the ARBIRS system uses large 24×24 pixel blocks which do not allow for arbitrary texture
boundaries to be identified. However, for the purposes of image retrieval (rather than contour
extraction) the ARBIRS texture subsystem performs well.
4.8 Conclusion
Edge detection must accurately represent the edges present at each pixel. When used for contour
following the accuracy and tuning of the edge detector becomes paramount. In this chapter a
number of existing edge detectors were analysed for suitability for contour following. We found
that a majority of edge detectors that are commonly used such as the Roberts, Prewitt, Sobel, and
Laplacian are not suitable for contour following. Contour following requires multiple arbitrarily
orientated edge detectors. Of the currently used operators, only the Gabor and Canny operators
satisfy these criteria. The S-Gabor and Canny operators were analysed at multiple aspect ratios
to determine their orientation and position tuning performance. We found that neither operator
had a significant advantage over the other. We also found that as the aspect ratio increased there
was a trade off between orientation and position tuning.
An Asymmetry detector was developed that position tunes elongated orientation filters. By
itself, the elongated orientation filter produces good orientation tuning but poor position tuning.
Inhibiting the elongated orientation filter’s responses with the Asymmetry detector provided both
near-perfect orientation and position tuning. The result is a filter that outperforms any other filter
for the purposes of contour following.
To further comply with the requirements of contour following, thinning was investigated to
remove ambiguous edge responses. Morphological thinning and skeletonisation thinning were in-
vestigated but were unable to provide the correct edge responses as they could only be applied
within the discrete horizontal-vertical pixel layout of images. A new technique was developed that
allows thinning to work in the orientation of the edge response using elongated Gaussian filters
perpendicular to the edge orientation. This thinning approach is further refined by also thinning
across adjacent orientations and finally a removal of diagonals. The result is a multi-orientation
edge image that is representative of the true edges in the original image and is ideal for the sub-
sequent phase of contour following. The Asymmetry edge detector is more suitable for contour
following than the Sobel, Roberts, Prewitt, Kirsch, Robinson, and Laplacian operators and pro-
duces better results than just Gabor or Canny filters on their own whilst providing more accurately
thinned results than skeletonisation and morphological thinning.
A new approach for texture analysis was developed using the Asymmetry edge detector. The
purpose of low-level texture analysis is to inhibit edge responses before the contour following
stage to reduce processing overhead. Texture regions were identified using the Asymmetry edge
detector as well as an optimised SAR implementation. However, rather than simply identifying
texture regions, the approach is also able to distinguish between neighbouring textures so that
107
boundaries between textures can propagate up to higher-level contour processing stages allowing
the boundaries between textures to be identified and used to form regions. The boundary detection
phase uses the moving variance to detect changes in textural distribution in Asymmetry edge and
SAR features. Even though the approach is able to identify textures and boundaries between
textures more work is required to achieve reliable texture inhibition before contour processing.
Incorporating contour-end detection may improve the technique’s ability to distinguish boundaries
between textures.
108