geometrical processing- texture segmentation- OCR

NEHA RATHORE |ID- 5994499980 1

EE 569- Fall’08

Submitted by-

NEHA RATHORE

5994499980

EE569-Project-3 Geometric modification, Texture Analysis & ;

Optical Character Recognition(OCR)

EE 569- Fall’08 | ID- 5994499980 2

GEOMETRICAL MODIFICATION

Problem 1- Geometrical modification

Objective

We were given four images Boat1-Boat4 all in different orientations and scales. All together they

represent a single image boat.raw. We had to implement an algorithm to properly scale, translate and

rotate these images so as to join these images properly to make final boat.raw.

Motivation

In this problem, the objective is to perform Geometrical Modification on an image. This requires

manipulation to be done on the image co-ordinates and not on the image intensity values. Geometrical

transformations modify the spatial relationship between the pixels in the image. These transformations

are often called rubber-sheet transformation.

Geometric Transform of an image refers to the family of linear operations on an image such as the

spatial translation, spatial rotation, spatial scaling and perspective transformation. These operations are

an integral part of Computer Graphics and Animation which involves a non-linear combination of the

above basic operations. It should be noted that all the above operators when employed in series are not

commutative, which is a basic fact that arises from the property of matrices. Geometric Image

Modification plays an important role in Image registration and Image synthesis.1

In terms of Digital Image Processing, geometrical transformation consists of two basic operations:

• A spatial transformation of coordinates

• Intensity interpolation that assigns intensity values to the spatially transformed pixels.

Image registration is an important application in DIP to align two or more images of the same scene.

The main learning in this assignment was to have knowledge of converting image coordinates to

Cartesian coordinates and processing the image in Cartesian coordinate system. Also I learnt how to

scale , rotate, translate images to get the desired output. Finally I learnt image registration in joining the

4 images together. Along with this I learnt zoom-in and zoom-out concepts, shearing concepts.

PROCEDURES

I modularized this part of assignment in different challenges to be achieved:

• The first challenge in this report was to find the corners of the rotated boat image, that was

located inside a bigger white box.

• The second challenge was to find the center of the image and translate it to (0,0) location.

• The third challenge was to rotate this image about the point(0,0) to the desired angle so as to

make the image straight along horizontal and vertical axis.

1 Digital Image Processing By Rafael C. Gonzalez, Richard Eugene Woods

EE 569- Fall’08 | ID- 5994499980 3

• The fourth challenge was to translate the rotated image back.

• The five challenge was to find the scaling factor of image size into size 256 .

• Fifth challenge was join the four scaled image together.

Algorithm

We have four different images, all the images have different orientations and are scaled by different

factors.

The main task is to design an algorithm to translate, rotate and scale these images such that, they can be

combined to form the desired 512x512 image.

Image coordinate to Cartesian coordinate conversion

As mentioned above the points are first represented in Cartesian coordinate system which has the

center point in the bottom left corner which is transformed into image coordinate system whose center

point is top left corner of the image2 . The relationship between the Cartesian coordinate

representations and the discrete image arrays of the input and the output images are shown by

Xk = k - ½

Yk = J + ½ -j

These equations represent the output array indices to their Cartesian coordinate system. Similarly the

input array relationship is given by

Uq = q -½

Vp = P+½ - p

We find that the basic goal of implementing Geometrical Modification is to find out where a particular

pixel in the input image maps to in the output image. These transformations often yields no integer

values which makes it tough to determine the correct position in the output. However ,the reverse

operation of finding out where a particular pixel in the output image comes from in the input image is

much better . Thus in order to implement this reverse mapping ,we need a function. This is called as

reverse address mapping function. Thus the entire problem of Geometrical Modification is resovled if

we find the “Reverse Address Mapping Function”.

Thus for the each of the different geometrical operations like translation scaling and rotation we have

different reverse address mapping functions which is basically obtained by multiplying the inverse of

each of the transform matrix to the corresponding input coordinates to obtain the final image

coordinates .

Corner Detection

There are different algorithms for detecting the corners like Harris Corner detector etc. that detects the

corner by using the change in intensity levels in nieghboring pixels. However, since our images were very

2 Pratt W. Digital image processing 3ed

EE 569- Fall’08 | ID- 5994499980 4

simple with only 4 corners and are located entirely within the white space, it becomes easy for us to

calculate the corners by merely scanning for the first non-white pixel in different directions.

We scan the 256x256 image in the following order:

//first quadrant

for(a=0;a<256;a++)

{ count=0;

for(b=0;b<256;b++)

{

if(Input[a][b] != 255)

{

//cout<<"top-left corner detected:";

x1=a;

y1=b;

Input[a][b]=0;

cout<<"x1="<<x1<<"y1="<<y1<<endl;

count=1;

goto next;

}

}

}

//second quadrant has the x2 y2 quadrants top

right corner

next:

for(b=255;b>=0;b--)

{ count=0;

for(a=0;a<=255;a++)

{

if(Input[a][b] !=255)

{

cout<<"top-right corner detected:";

x2=a;

y2=b;

Input[a][b]=0;

cout<<"x2="<<x2<<"y2="<<y2<<endl;

count=1;

goto next1;

}

}

}

//third quadrant has no corner and both other

corners are located in 4rth qouadrant.

//bottom right

next1:

for(a=355;a>0;a--)

{ count=0;

for(b=355;b>0;b--)

{


{

cout<<"bottom-right corner detected:";

x3=a;

y3=b;

Input[a][b]=0;

cout<<"x3="<<x3<<"y3="<<y3<<endl;

count=1;

goto next2;

}

}

}

//bottom left

next2:

cout<<"in 4"<<endl;

for(b=0;b<356;b++)

{ count=0;

for(a=355;a>=0;a--)

{


{

cout<<"bottom-left corner detected:";

x4=a;

y4=b;

Input[a][b]=0;

cout<<"x4="<<x4<<"y4="<<y4<<endl;

count=1;

goto next3;

}

}

}

EE 569- Fall’08 | ID- 5994499980 5

OUTPUT:

Since I replaced every corner detected by a black pixel( for check purposes) I found out that the corners

were successfully detected for all four images.

To Find the angles of rotaion we just for a right angles triangle using two corners of the image and use

Angle=sin-1

(opposite/hypotenuse)

We use the following relations3;


EE 569- Fall’08 | ID- 5994499980 6

Intensity interpolation

GRAY LEVEL or INTENSITY INTERPOLATION:

Once we have calculated the relation of coordinates between input and output coordinates, the next

most important operation is intensity interpolation. The address mapping done in the previous step

might give us non integer values. Because the distorted image is digital, its pixel values are defined only

at integer coordinates. Thus using non integer values causes a mapping into locations of the image

points for which no gray levels are defined. Thus coming to the conclusion on what the gray level values

at those locations should be , based only on the pixel values at the integer coordinate location , then

becomes necessary. Thus gray level interpolation is used to get this transformation.

In our discussion we use bi-linear interpolation. Bi-linear interpolation technique uses the gray levels of

the four nearest neighbors to interpolate the value of the new non integer pixel. The gray level of each

of the four integral nearest neighbors of a no integral pair of coordinates is known, the gray level value

at these coordinates can be interpolated from the values of its neighbors by using the relation ship

. (p,q) .(p,q+1)

*(i,j)

.(p+1,q) .(p+1,q+1)

F(i,j) = (1-a)[(1-b)F(p,q) + b.F(p,q+1)] + a[(1-b)F(p+1,q) + b.F(p+1,q+1)] Where a and b are the corresponding distance of the intermediate value f(i,j) with respect to its

neighboring pixel coordinates along horizontal and vertical directions.

Interpolation is basically averaging between neighboring pixels.

Let's say you have 3x3 image.

10 4 8 2 12 6 8 4 2

You might want to make 6x6 image with this image by bilinear interpolation.

(O: No value assigned yet)

10 O 4 O 8 O O O O O O O 2 O 12 O 6 O O O O O O O

EE 569- Fall’08 | ID- 5994499980 7

8 O 4 O 2 O O O O O O O

First, obtain value of unassigned pixels by averaging horizontally neighboring two pixels.

10 7 4 6 8 8 O O O O O O 2 7 12 9 6 6 O O O O O O 8 6 4 3 2 2 O O O O O O

Second, obtain value of unassigned pixels by averaging vertically neighboring two pixels.

10 7 4 6 8 8 6 O 8 O 7 O 2 7 12 9 6 6 5 O 8 O 4 O 8 6 4 3 2 2 8 O 4 O 2 O

Lastly, obtain value of unassigned pixels by averaging neighboring 4 pixels.

(In case of edge pixels, obtain value of unassigned pixels by averaging neighboring 3 pixels

. 10 7 4 6 8 8 6 7 8 7.5 7 7 2 7 12 9 6 6 5 6.5 8 6 4 4 8 6 4 3 2 2 8 6 4 3 2 2

EE 569- Fall’08 | ID- 5994499980 8

RESULT and DISCUSSION

Example of rotation, translation and scalling for one part of boat.raw

Final output

EE 569- Fall’08 | ID- 5994499980 9

DISCUSSION

We were able to rotate the image and scale it accordingly. But we notice that the image has blurred

to some extent. This is because of approximation used in intensity interpolation.As we are distributing

same intensity to set of surrounding pixels, it has en effect of averaging filter which also makes the

image blur. Also we were not able to stitch the image together properly. This is because of rounding

errors in the decision of boundaries. This could be avoided by copying the pixels a little bit outside the

boundaries so as to fill the white space.

1B- SPATIAL WRAPPING

MOTIVATION AND OBJECTIVE:

Spatial warping is a useful technique of determining the coordinate relationship between the input and

the output images so as to get a linear system given the control points or the degrees of freedom. Once

the linear system has been obtained, all the points in the input image will obey or follow this system and

get warped to the corresponding coordinates in the output image. This system of equations or

coefficients can be used to recover an image that has been distorted or warped in a particular manner.

Geometric (or spatial) transformations on an image are typically used to correct for imaging system

distortion or conversely to purposely distort (i.e., warp) for purposes of achieving some desired visual

effect. Geometric correction is an important image processing task in many application areas. Distortion

may arise from aberrations in the sensor.. A geometric transformation is given by a mapping function

that relates the points in the input image to corresponding points in the output image. The mapping

may be represented by a pair of equations or a transformation matrix. The matrix is either known a

priori, or, as is true for a vast majority of applications, must be inferred from a set of points of

correspondence, typically called control points. Once the transformation matrix is known, it may be

used to compute a corrected output image from a known distorted input image. For example, they can

be employed to recover an image that has been distorted by a physical imaging system. Typical

examples include barrel and pincushion distortion. In remote sensing and satellite imagery, the common

distortions are due to earth curvature and various attitude and altitude effects .

Non-linear geometrical modification has also a wide range of applications apart

from its usage in multimedia and graphical illusions.

Ex: The pictures taken during aerial surveys or the photographs taken by satellites have considerable

distortions that are non-linear in nature.

Procedure

In general, a spatial transformation is defined by a polynomial function of the form

where x,y and u, v are point coordinates in the input and output images, respectively, N is the

polynomial order, and a, b are mapping coefficients that characterize the transformation.

EE 569- Fall’08 | ID- 5994499980 10

We are required to an affine transformation resulting in a image whose coordinates have a equation of

degree 2. This implies the transformation is not linear.

We choose the values of x and y such that maximum degree is 2.

U= ( a0 a1 a2 a3 a4 a5)(1 x y x^2 xy y^2)t

V= ( b0 b1 b2 b3 b4 b5)(1 x y x^2 xy y^2)t

Breaking down this into steps

• Finding the control points

• Calculating the A matrix

• Calculating inverse of A

• Finding coefficients a0-a5 and b0-b5 by multiplying Ainv*U and Ainv*B

• Applying these coefficients to the general affine transformation equation to calculate the new

coordinates for input coordinates.

Algorithm

Finding the control points

We have the sample input and sample output images for this problem.

We are also given the radius of the circle in the output image. This makes it easy to calculate the values

of u,v in output image corresponding to the x ,y in the input image. We manually see these points and

find the mapping chart.

This chart is useful in finding the coefficients as these are one of the roots of the above equation.

We then for the A matrix which is given as follows:

A= [1 X Y X^2 XY Y^2]

Then number of rows in this matrix depends on the number of control points.

I deally for six unknown coeff, sic control points should be enough. But to make the result better we can

choose more then 6 control points.

We then calculate the inverse.

I used matlab to find the inverse of this 6x6 matrix.

Reason:I was using an online matrix inverse calculator to find the inverse first, but it was giving me

drastically different results. The calculations were not done properly and hence coefficients were

coming out to be incorrect.

EE 569- Fall’08 | ID- 5994499980 11

If the number of control points are more then 6 we have a rectangular matrix. Inverse of this, does not

exist . so ve instead find the pseudo inverse by the following formula;

A inv =(AtA)

-1A

t

We then multiply this matrix with first U matrix to get a coeffs and then with V matrix to get b coeffs.

Finally we get the coeff and we hardcode them into the program.

Results:

CHOOSING THE CONTROL POINTS

The Image was divided in four parts as indicated above:

Then A matrix and coeff for each part was calculated .

This was done because it is very difficult to find a single value of coefficients that can warp the image in

4 different directions.

INPUT--------------------->OUTPUT

X,Y-----U,v

EE 569- Fall’08 | ID- 5994499980 12

A= [1 x y x^2 xy y^2]

PART 2

Part1

Control points

0,0 -�256,0

128,128--256,128

256,256--256,256

128,384--128,256

0,511----0,256

0,256----181,181

A1 =

1 0 0 0 0 0

1 128 128 16384 16384 16384

1 256 256 65536 65536 65536

1 128 384 16384 49152 147456

1 0 511 0 0 261121

1 0 256 0 0 65536

a=A inverse*u

b=a inverse*v

a0=256.0000;

a1=0.0841;

a2= -0.0841;

a3= 0.0008;

a4= 0.0000;

a5= -0.0008;

b0=0.0;

b1=0.0861;

b2=0.9139;

b3=0.0008;

b4=-0.0000;

b5=-0.0008;

PART2

Control points

256,256----256,256

256,511---181,331

511,511---256,511

128,384--128,256

384,384---256,384

0,511-----0,256

A2 =

1 256 256 65536 65536 65536

1 256 511 65536 130816 261121

1 511 511 261121 261121 261121

1 128 384 16384 49152 147456

1 384 384 147456 147456 147456

acoef =

a0=256.0000;

a1=0.9132;

a2=-0.9132;

a3=-0.0008;

a4=0.0000;

a5=0.0008;

bcoef =

b0=0;

b1=0.0868;

b2= 0.9132;

b3= 0.0008;

b4=-0.0000;

b5=-0.0008;

Part 3

Control Points

256,256---256,256

384,128---384,256

511,511---256,511

Part 4

Control Points

256,256----256,256

0,0--------256,0

128,128----256,128

EE 569- Fall’08 | ID- 5994499980 13

511,256---331,331

511,0-----511,256

384,384---256,384

A3 = 1 256 256 65536 65536 65536

1 384 128 147456 49152 16384

1 511 511 261121 261121 261121

1 511 256 261121 130816 65536

1 511 0 261121 0 0

1 384 384 147456 147456 147456

acoef3 =

256.0000

0.9152

-0.9152

-0.0008

0.0000

0.0008

bcoef3 =

0

0.9132

0.0868

-0.0008

-0.0000

0.0008

256,0------331,181

384,128----384,256

511,0------511,256

A4 =

1 256 256 65536 65536 65536

1 0 0 0 0 0

1 128 128 16384 16384 16384

1 256 0 65536 0 0

1 384 128 147456 49152 16384

1 511 0 261121 0 0

ac4 =

256.0000

0.0861

-0.0861

0.0008

0.0000

-0.0008

bc4 =

0

0.9139

0.0861

-0.0008

-0.0000

0.0008

EE 569- Fall’08 | ID- 5994499980 14

RESULTS & DISCUSSION

Input image

SCANNING REGIONS

EE 569- Fall’08 | ID- 5994499980 15

The warped image of PART4

EE 569- Fall’08 | ID- 5994499980 16

Final OUTPUT

IMAGE

DISCUSSION

If we notice properly the image is shifted by 1 pixel from top and bottom. Also the curve is not exactly

smooth and we can see some scales-like effect in some regions. The possible reason for this is that, since

the warping used here is not linear values of U,V might be in fractions for some values of X,Y. Rounding

this value of U,V places the pixel in the nearest pixel possible. This produces Scale-like effect

Also some the angle of warping is discreet it is not giving the exact point of end points and makes the

image shift 1 pixel down from above and below.

EE 569- Fall’08 | ID- 5994499980 17

This was an efficient but a tedious way of finding the warped image. This warped image can also be

produces by use of polynomial equation of degree 3.

Even a slightest change in coeff. value is drastically reflected on the output image. This process is variant

to rounding errors and decimal approximations. As this process produces a smaller image as compared

to the original image, we didn’t see the lack of intensity in any region.

We can see from final image that the input image is spatially warped successfully.

PROBLEM 2- TEXTURE ANALYSIS

Part A- TEXTURE CLASSIFICATION

OBJECTIVE:

We are given 12 sample of textures . there are four groups containing three textures each. We have to

design an algorithm to classify the clusters of these images belonging to different groups.

MOTIVATION :

In this problem, either a group of images are to be classified according to their texture types or a single

image is to be segmented into different parts with each part having a distinct texture type. Texture

analysis plays an important role in the interpretation of remote sensing images, satellite maps, etc. A

texture is related to the visual appearance of the region. It is due to semi regular patterns which are not

strictly periodic. Texture analysis is carried out basically to describe structured patterns. Edge detection

cannot be used here because if the texture is very fine, the edge density will be very high and hence the

output segmented image will not be appealing. Textures can be structured patterns of object surfaces

such as wood, grain, sand, grass, cloth etc. They are very difficult to define correctly. Each texture is

characterized by a set of characteristics called as “features”. A feature is summarized information which

catches the essence of a texture type but still has the desired discriminated power. If two images have

the same texture type, their features should be identical.4

The texture classification is used to identity features of an image and find out information about that

image. For example, for a picture taken from moon , earth appears in different colors. Through texture

classification, it is possible to identify the regions of water, land,forest and etc. on earth. Another

motivation to do problem is that texture classification can be used to generate applications like voice

automation on basis of texture classification. Image a blind walking through a park and is about to strike

to a tree, if the texture classification is done properly in real time, then this collision can be avoided.

Procedure

4 Pratt W. Digital image processing 3ed;

EE 569- Fall’08 | ID- 5994499980 18

Image Segmentation based on Texture is very challenging and various methods have been proposed for

achieving this goal. However only a few of them have been successful. One such technique is the “Laws

Filters” which has been used for my implementation. The FIFTEEN input images are read into a cell array

such that each image is stored as an element in the array. Each of the fifteen images are passed through

a filter bank that consists of nine filters. The three basic filters that give rise to these nine filters are:

Local Average Filter L3 = 1/6 * [1 2 1]

Edge Detector E3 = ½ * [-1 0 1]

Spot Detector S3 = ½ * [1 -2 1]

The idea was to form the tensor product of each of these 3 filters and get 9 “ 3 x 3” filters. The basic

understanding behind the usage of these 3 filters is as follows: Upon doing the Fourier Analysis of each

of these filters ,it is observed that ,

L3: acts as a Low Pass Filter(L.P.F)

E3: acts as a Band Pass Filter(B.P.F)

S3: acts as a High Pass Filter(H.P.F)

CALCULATION OF THE ENERGY VECTORS: Therefore when all 3 filters are put together, we cover the low frequency ,high frequency and middle

frequency regions. Once the nine filters have been obtained, each of the 15 images are passed through

the filter bank to produce a set of nine output images per input image.

Gi = input image * lawfilter(i) [ i varies from 1 – 9]

For each of the Gi’s produced in the previous step ,the energy is computed as follows:

Energy Fk= (1/N^2) {all i ∑ all j ∑ |Gk(i,j)|^2 }

where k : 0 �9

Each input image when passed through the filter bank will give rise to a set of nine energy components

related to the nine output images produced by the filter bank. These nine components can be treated as

a 9 point Energy vector in the 9 dimensional Feature Space.

Since there are 15 images to be classified, there are a total of 15 such energy vectors in the 9D Feature

Space. Each 9D Energy vector is a point in the Feature Space. Thus the resulting feature space is an array

of 15 x 9 vectors, since each image has 9 different feature vectors for 9 different law filtered image.

EE 569- Fall’08 | ID- 5994499980 19

EUCLIDEAN DISTANCE CLASSIFICATION: The classification can be done based on the proximity of the energy vectors in the feature space. The

nearness of a vector to another vector can be determined by calculating the Euclidean distance as

follows:

Euclidean Distance = || Ei – Ej ||

= ∑ (( Ei – Ej) ^ 2)

The Euclidean Distance is calculated from every energy vector to every other energy vector in the

feature space.

NEAREST NEIGHBOR ALGORITHM: In my classification problem I have used nearest neighboring algorithm. An energy vector in the feature

space whose Euclidean distance from another energy vector is below a threshold is said to be a nearest

neighbour of the same. The essence of the approach is to determine all such vectors that lie in close

proximity to each other.

Since it is known that there are 4 classes of images among a group of 12 images, the Euclidean distance

of one vector from the every vector thus calculated is arranged in a row and the least three distances

are determined and the corresponding energy vectors are said to be in close proximity to the same in

the 9D feature space

RESULT for classification

EE 569- Fall’08 | ID- 5994499980 20

Energy Computation

T1 T2 T3 T4 T5 T6 T7 T8 T9

Image1 0.0420 0.0031 0.0050 0.0037 0.0030 0.0078 0.0079 0.0074 0.0203

Image2 0.0616 0.0037 0.0036 0.0040 0.0040 0.0063 0.0042 0.0070 0.0158

Image3 0.0446 0.0040 0.0072 0.0071 0.0074 0.0183 0.0101 0.0204 0.0524

Image4 0.0350 0.0048 0.0083 0.0041 0.0080 0.0196 0.0060 0.0170 0.0529

Image5 0.0616 0.0040 0.0040 0.0041 0.0041 0.0064 0.0045 0.0071 0.0157

Image6 0.0402 0.0023 0.0037 0.0025 0.0022 0.0055 0.0041 0.0057 0.0160

Image7 0.0327 0.0065 0.0111 0.0045 0.0101 0.0254 0.0072 0.0212 0.0627

Image8 0.0489 0.0043 0.0068 0.0049 0.0067 0.0169 0.0077 0.0180 0.0509

Image9 0.0604 0.0038 0.0039 0.0044 0.0047 0.0072 0.0048 0.0081 0.0178

Image10 0.0313 0.0070 0.0111 0.0052 0.0113 0.0263 0.0081 0.0217 0.0723

Image11 0.0450 0.0041 0.0062 0.0059 0.0067 0.0158 0.0082 0.0164 0.0470

Image12 0.0448 0.0023 0.0039 0.0027 0.0022 0.0055 0.0062 0.0055 0.0159

GRAPHICAL CALCULATIONS

DISCUSSION

We see from the graphs, that the energy values of all the 12 images lie within a certain range. That is, for

a certain kind of texture, the energy valuesof different images fall in close proximity to each other. As

seen from the graph, the 9 feature vectores obtained from each set of image lie in close proximity to

each the images having similar textures. For example, for images 1, 12 , 6, we see that all the 9 vectors

EE 569- Fall’08 | ID- 5994499980 21

have more or less same values. This decreases the euclidean distance between the two images and

hence, classify then as same textures.

we also notice, that our result shows an error in line 4 where it recognized 4 in the group of image 3 and

11. But It was able to detect the proper group later on. I tried to find the logical error in this but could

think of any.so I concluded,this might occur due to some transitional stage in the algorithm.

I decided to take the best of 3 as a result.

Part 2B- TEXTURE SEGMENTATION

Objective:

Now we are given a cluster of different textures within an image. We have to do segmentation such that

we are able draw a boundary between different textures within this image.

In our case, we are given 2 images that have four clusters each. We have to use the method used above

for classification, but modify it in such a way that it applies to the pixels rather than image.

MOTIVATION:

Instead of considering the entire image, we can perform the analysis of the features associated with

every pixel in the image. It is intuitive to note that the visual vector of each pixel is not stable but that of

the entire image is stable.

We also know that pixels belonging to same type of texture have their corresponding feature vectors

close to each other in the feature space. So an analysis of a pixel instead of the entire image would

provide more information about the texture information.

This kind of image processing is used in applications where we have to detect the features or object of a

picture like face detection, where, eyes, hair, nose, lips etc are separated from each other for different

analysis.

One of the exciting applications that I think of is, extracting the features of a particular city when looked

from a large height.

Procedure:

The input image is scanned for each pixel and a 3x3 matrix is formed by picking but all 4-connected

and 8 connected pixels. We then apply the laws filter to each pixel like this and generate 9 images for

9 laws filter.

We then take a large region of the image surrounding a particular pixel and calculate the energy.

We use the kmeans algorithm to find the areas of different clusters.

ALGORITHM AND IMPLEMENTATION:

EE 569- Fall’08 | ID- 5994499980 22

The input image is applied to the 9 (3 x 3 ) filters given by Law to produce nine images namely T1 , We

know that pixels belonging to same type of texture have their corresponding feature vectors close to

each other.T2,T3 ,…….T9. The main assumption that leads us to this analysis of performing operations on

a pixel by pixel basis is that it would be more efficient than on the entire image, the energy

corresponding to each pixel in the set of images ‘G’ is determined. Since there are nine such images,

each pixel in a particular location in the input image will have a 9 dimensional Energy vector as opposed

to part above where each image had a 9 dimensional energy vector in the Feature Space.

Energy Fk= (1/W2) { all I € W € all j ∑ W ∑ |Gk(i,j)|^2 }

where k : 0 �� 9,W:size of the window(15 x 15)

Thus each of the Gi’s is subject to energy computation for each of its pixels. This results in each pixel of

the input image having nine different features namely,

{ f1,f2,f3,f4,f5,f6,f7,f8,f9}.

This kind of averaging is done to reduce the statistical fluctuations in image. This prevents the spreading

of the clusters in the feature space which could make the segmentation process even more complicated.

The size of the window, if large, will cause the clusters to merge in the feature space that could result in

the merging of textures in the output image. On the other hand, a smaller window size could result in

the over spreading of the clusters that could result in over segmentation. Hence the size of the window

should be chosen on an experimental basis so that the above two cases are avoided to the best possible

extent

� I decided to go with a window size of 51x51. I scan the image after extending the size of image

by 25 pixels from each side. I copy the first 25 pixels from each side and copy it to adjacent

areas. This I am doing to fill the new pixels with a value same as the image.

� This is done so as to avoid the error due to averaging. For example; average of 5+5+5/3 is 5 but

0+0+5/3 is 5/5 which is smaller then the previous case. So if we leave the extended bits zero ,

we will get faulty energy values

� I then apply the k means algorithm.

K-MEANS ALGORITHM:

Initially a set of centroids are choosen.

Suppose there are N2 points in the feature space. If they are to be classified into k classes, the first step

is to assume k centroids.

The distance of each and every point from the centroids is calculated.

EE 569- Fall’08 | ID- 5994499980 23

The distances of each and every point from the centroids are calculated by using the Nearest Neighbor

Method or the Euclidean Distance method discussed above.

Each of the points are associated with the closest centroid thus forming various clusters.

The average of the centroids is calculated to get a new centroid. The above two steps are repeated or

the clustering is performed again.

Converging the iteration.

The centroid for each and every cluster thereby formed is calculated and the above steps are repeated

until the centroids of the previous iteration and the current iteration is less than the threshold passed in

the function. The threshold can be as minimal as possible.

RESULTS & DISCUSSION

15x15 window in texture1

15x15 window in texture2

EE 569- Fall’08 | ID- 5994499980 24

we see that the result of 15x15 window was very poor for texture1 but is fairly better for

texture2.However, these is lot of scope left for improvement here.

The reason for this is that the textures in first image are very similar to each other. So it is highly

possible that without any information of their spatial distribution, the total energies calculated for

each pixel are so close that it becomes hard to distinguish. Thus, giving an error in detecting proper

segments.

In second image the textures have a large variation from each other, thus producing energies differing

by an intelligible amount, thus it is easier to detect these patterns correctly.

DISCUSSION1-CHOOSING A DIMENSION FOR WINDOW:

We started with a 15x15 window and calculated energies accordingly. However, in the results you will

see that choosing this size of the window was not sufficient as it is not calculating the segmentation in a

proper way. The reason for this is simple. Say, we are considering a brick pattern. A 15x15 window will

consider a small region of the whole brick and hence produce an energy value that is similar to the

selected region in some other part of the image. This produces error in segmentation. I decided to work

with a 51x51 window.

DISCUSSION2-Adding extra dimension to feature vector

.

Extending the size of window improved the result. However, there is still some scope of improvement

left in the result. I decided to add the coordinates x y of the pixel as an extra 2 dimensions in the feature

space. This helps in binding the pixels by some kind of spatial distribution. Basically we are taking into

consideration, where a pixel is located inside an image.

51X51 window in texture1 with extra 2 dimensions

51X51 window in texture2 with extra 2 dimensions

EE 569- Fall’08 | ID- 5994499980 25

Increasing the window size gives better result. However using a larger window produces an error in

decision of boundaries properly. This is because a 51x51 window we are also considering for the

boundaries , this makes energy of boundaries not distinct and hence producing error in decision of

boundaries.

PROBLEM3-

OPTICAL CHARACTER RECOGNITION (OCR)

OPTICAL CHARACTER RECOGNITION (OCR)

OBJECTIVE:

We have been give a training set that contains alphabets A,B,C,D,E,K,L,M,N,O,P,R,S,T,U. All of them

are of Arial font and same font size. We have to use this as our training set to make our program

understand the characteristics of different features of alphabets. In first part we have to extract the

shape features of the alphabets in terms of line numbers and end points. In second part we have to

read the test image and compare its result with our feature set to find out what kind of character it is.

We have to develop a set of features based on a set of training data and then scan the test images and

declare each of the characters in these test images as the one which is the closest match in the

training data.

MOTIVATION

EE 569- Fall’08 | ID- 5994499980 26

The idea behind Optical Character Recognition is to extract features from the characters and/or

numerals and special symbols and use them as parameters to segment and detect their presence in

any document. This is the principle of Document Processing. This plays an important part in Pattern

Recognition and also for describing objects in an Image Understanding System.5

OCR is also used for shape analysis of images where in a particular symbol is declared to be of a

predetermined character.

PROCEDURE & Algorithm:

Steps part1:

� Binarize the image.( to distinguish between the object and background)

� Thin the image (used for part1 only) (we need to find end points and diagonals by finding the

hit and miss patterns of given set of masks.

� Find the minimum bounding box for each character and segment the characters

� Run the algorithm to find end points and line numbers for each character.

� Store it in an array representing feature vector.

� Compare the characters on the basis of this feature vector.

Steps part2:

� Binarize the image.

� Find the minimum bounding box for each character and segment the character in different

arrays.

� Rum algorithm to find: Area, perimeter, Euler number, circularity, spatial moment, symmetry,

aspect ratio, Euclidean distances (my approach).

The following were the imperative concepts and steps:

BINARIZATION:

Binarize the training image into two gray levels (0 and 255). If the pixels’ value is less than a particular

threshold, then set its gray level to 0, else set its gray level to 255. Our given image has values ranging

from 255 or 0 so binarizing is an important step.

I used the threshold value of 128 and gave value 0 to every pixel below 128 and value 255 to every

pixel above 128.

OBJECT SEGMENTATION:

In my program, I have taken advantage of the fact that the characters are uniformly distributed. I am

checking all white rows and all white columns to segment image in 15 segments roughly. Although

this is not the best approach to do this, but this was a suggestion of the TA and I found it convincing to

not complicate matters. I have first determined 15 boxes containing each character roughly .I mean

the character might not be in the center but is contained inside the image.

BOUNDING BOX DETERMINATION:


EE 569- Fall’08 | ID- 5994499980 27

We detect the corners (as done in the previous case) and store these values of x and y coordinates. I

found the ymin , ymax, xmin, xmax from this set of values and draw a box (virtually).this box contains the

image completely with no extra rows and columns. I use this box to find different features.

FEATURE EXTRACTION:

I have used the following features in my program in order to characterize the numbers/characters

given in the training image

Part 1-

� Line number

� End point number

Part 2-

� Area

� Perimeter

� Euler Number

� Circularity

� Aspect Ratio

� Symmetry (Upper mass, lower mass,rightmass and leftmass)

� Central spatial moment

� Symmetry

� Elongation

� Euclidean distance from feature vector (my approach).

Part 1-

Finding the line direction

I check the image (segmented wherever mentioned from now on) for 4 patterns indicating occurrence

of a line. There are four different line directions that can occur in a character. These line directions are

as follows:

I have declared hline, vline, ldline, rdline as the integers that will store the number of occurrence of

these patterns in the given image.( h=horizontal,v= vertical, ld= left diagonal, rd=right diagonal)

As soon as we get a hit, we record that particular instance as a hit and assign a value 1 to that

respective integer.

I have declared an array linenumber={hline,vline,ldline,rdline}; so for instance for A={1,0,1,1}. This is

one part of my feature vector.

EE 569- Fall’08 | ID- 5994499980 28

End point number

End point is defined as a point that is only connected with one direction. (4 connectivity or eight

connectivity. End point is located at the end of the point. End point number can be calculated by using

the masks below:

I have declared and array:

numberofendpoints1[15][8]={leftendpoint,rightendpoint,topendpoint,bottomendpoint,topleftdiagon

alendpoint,toprightdiagonalendpoint,bottomleftdiagonalendpoint,bottomrightdiagonalendpoint};

This stores the occurrence of these points in a character. This way we have a clear picture of spatial

distribution of end points along with the exact information of end points.

This is my second feature vector.

I concatenate both these feature vectors to a single feature vector. (This helps me find the Euclidean

distance).

PART 2

After segmenting each object in the training data we now compare the objects inside the bounding

box with the following bit quad patterns.

Q1 consists of four masks:

1 0 0 1 0 0 0 0

0 0 0 0 1 0 0 1


1 1 0 1 0 0 1 0

0 0 0 1 1 1 1 0


1 1 0 1 1 0 1 1

0 1 1 1 1 1 1 0

Q4 consists of one masks:

0 0

0 0

Qd consists of two masks:

EE 569- Fall’08 | ID- 5994499980 29

1 0 0 1

0 1 1 0

AREA:

The area of an object is the number of object pixels that constitute the entire object. If an object pixel

has a value equal to 1,

Area = 0.25 * (nq1 + (2 * nq2) + (3 * nq3) + (4 * nq4) + (2 * nqd)

However, the area of the object can be also calculated by simply counting the number of black pixels

in the object. Basically looking for {1} pattern.

Area=n {1}; [I have used this approach]

PERIMETER:

The perimeter of an object is defined as the number of sides of the object which separate the pixels

with different values.

Perimeter = n{Q1} + n {Q2} + n {Q3} + 2*n{Qd}

However , the perimeter of the object can be also calculated by looking at patterns{1 0},{0 1},{1

0}t,{0,1}

t.

Again I have used this approach.

Perimeter =n{1 0}+n{0 1}+n{1 0}t+n{0,1}

t. where n is the number of occurrences.

AREA/PERIMETER RATIO

Ratio of area/perimeter is a better feature then the area or perimeter alone since they are scaling

variant. Sometimes the area is so large that this ratio is guided by area alone and bringing errors in

decision making. So to avoid this we scale the perimeter such the ratio is normalized.

EULER NUMBER:

It is defined as the number of connected components that constitute the object minus the number of

holes within the object.

Euler number = 0.25 * (n{Q1} – n{Q3} – 2*n{Qd})

CIRCULARITY:

The Circularity of an object is defined as the ratio that describes how far the shape of the object

approximates a circle.

Circularity = 4 * pi * Area/ (Perimeter)^2

ASPECT RATIO:

EE 569- Fall’08 | ID- 5994499980 30

The Aspect Ratio is defined as the ratio of the Height of the object to the width of the object.

Height=height of bounding box;

Width=width of bounding box;

Aspect Ratio = Height/Width

Width Ratio = Width/ (Height + Width)

Height Ratio = Height/ (Height + Width)

SYMMETRY:

An object is said to be symmetric horizontally if the mirror image of one half of the object along the

horizontal direction gives the other half. An object is said to be vertically symmetric if the mirror

image of one half of the object along the vertical direction gives the other half. We use upper mass

and lower mass to find the horizontal symmetry. An object is said to be entirely symmetric if it

exhibits the above two properties.

I have calculates symmetry in terms of 2 parameters.

Left mass/right mass ratio and uppermass/ lower mass ratio.

To this I divided the bounding box in 2 parts first horizontally and then vertically. Then I have

calculated the number of pixels in each side of the axis and taken the ratio. For symmetric objects this

ratio is one.

SPATIAL MOMENT:

The (m, n)th moment of the joint probability density function can be used to describe the features of

an object. Here the joint probability density function is replaced by the continuous image function.

The shape of the object is characterized by a few of the low-order moments. The use of the features

for OCR can be justified by the fact that these features are invariant to a certain extent for a particular

symbol. Fill the data structure corresponding to the features of the training data symbols.

Meanx =∑x f(i,j);

Meany=∑y f (i,j);

Moment (m, n) =∑xm

yn f (i, j);

Value of f (I, j) =1 or 0;

Euclidean Distance:

The Euclidean distance(ed) is defined as:

Euclidean Distance = || Ei – Ej ||

= ∑ (( Ei – Ej) ^ 2)

I analyzed that the feature arrays that I have stored for each character can be considered as feature

vectors with each feature contributing a dimension. If my decision tree fails, I can use the ed to find

the character closest to the given character.

The logic behind this is that each character has a certain kind of spatial distribution that is unique. So

feature vector gives us the way to define a character by some unique set of values.

Some of the features are scaling variant, some are font variant. These kinds of variances can lead us to

have a faulty decision sometimes. So we keep ed as a final check to find the closest character.

EE 569- Fall’08 | ID- 5994499980 31

Implementation and results:

As discussed above, I segmented the image in terms of their spatial distribution6. The set of pixels that

belong to each object are stored as an individual entry in a cell array.

I create 15 arrays for each character.

Step 1- Thinned image of training .raw to calculate the number of lines and end points.

I have designed a feature vector as mentioned above which represents endpoints+line numbers.

For example we get for A:

Endpoints={1,1,0,0,0,0,0,0,} line numbers={1,0,1,1}

So the program stores {1,1,0,0,0,0,0,0,0,0,1,0,1,1} as the Value of A.

To recognize the character in the test image I match the features of the character with all the images.

The vector closest to the feature vector is picked as the alphabet for that text character under

consideration.

Results & Discussion

Example: comparison of features of A=

A training ={1,1,0,0,0,0,0,0,1,0,1,1}

A test ={0,1,0,0,0,0,2,2,1,0,1,1}

6 Confirmed by TA.

EE 569- Fall’08 | ID- 5994499980 32

This gives a 1 bit error. So these is a possibility that the procedure can detect a but of we take error

within a limit of plus and minus 2.

-2<error<+2

For hard boundaries like ANDing this procedure failed for almost all alphabets other then, U,T,S,R,P,O.

This is because the characters in training and test both are simple and has no extra extenstions due to

font style. This makes it possible to detect the characters.

I then chose the second approach.

I calculated the above mentioned parameters like area/perimeter ratio, euler number etc.

I then grouped the characters according to their features.

Euler number =0��A,D,O,P,R.

Euler number =-1��B

Similarly.

When I get the imput character,

I first tested it for symmetries. I have save uppermass/lowermass ratios, leftmass/rightmass ratios.

I check these ratios of the input image and see if the image is horizontally symmetric of vertical or

none.

If symmetric- B,D,C,M,O,S,T,U,A,K,E

Else L,P,R.

This narrowed the number of comparison set.

I then check the euler number. And narrow the set further. Euler number is unique(mostly) for all.

EE 569- Fall’08 | ID- 5994499980 33

Later on I check for aspect ratio. This helps me categorize the long characters from wide characters.

Using all these features I was able to successfully detect all the characters.

DISCUSSION & results continued.

The reason for this is that these features are mostly based upon the basic structure of the element

rather than just the font style. Which was the point in first case. Hence, being scaling invariant and

design invariant , we were able to detect the characters as per their basic structure that it should

have.

My approach:

The chart below represents the Euclidean distances of each alphabet in training set to the test set.

The blocks marked in green are the distances for correct detection. Red represents the second closest.

A B C D E K L M N O P R S T U

a

3.8

7

3.6

1

1.7

3

3.3

2 4.58

10.7

7

5.1

0

7.4

2

6.0

0

3.0

0

2.6

5

5.0

0

3.7

4

40.1

2

6.5

6

b

4.4

7

3.4

6

3.7

4

2.8

3 4.00 9.95

4.8

0

5.8

3

4.5

8

3.1

6

1.4

1

3.4

6

3.3

2 2.83

5.6

6

c

4.5

8

5.2

0

2.6

5

4.3

6 5.92

11.1

4

5.6

6

8.4

3

7.0

7

4.5

8

3.3

2

6.4

0

5.2

9 4.58

7.1

4

d

4.4

7

3.7

4

3.4

6

2.4

5 3.74 9.11

3.8

7

6.1

6

5.0

0

2.8

3

1.4

1

4.0

0

3.3

2 2.00

4.6

9

e

4.1

2

4.5

8

4.8

0

4.1

2 5.00 9.17

4.4

7

5.0

0

4.2

4

4.3

6

3.3

2

3.8

7

3.7

4 3.32

6.4

0

k

4.3

6

4.3

6

2.2

4

3.6

1 4.58

10.2

0

4.6

9

7.8

1

6.4

8

3.3

2

3.0

0

5.5

7

4.0

0 3.87

5.7

4

l

3.7

4

4.2

4

2.8

3

3.1

6 4.69 9.64

4.1

2

6.7

8

5.5

7

3.4

6

2.0

0

4.9

0

3.6

1 2.83

5.4

8

m

3.7

4

4.2

4

2.8

3

3.1

6 4.69 9.64

4.1

2

6.7

8

5.5

7

3.4

6

2.0

0

4.9

0

3.6

1 2.83

5.4

8

n

5.2

0

5.0

0

3.8

7

4.1

2 4.12 9.70

4.9

0

7.6

8

6.6

3

3.8

7

3.3

2

5.5

7

4.6

9 3.61

5.7

4

o

4.1

2

4.1

2

2.2

4

3.3

2 4.58

10.1

0

4.6

9

7.6

8

6.4

8

3.0

0

2.6

5

5.3

9

3.7

4 3.61

5.5

7

p

4.0

0

4.6

9

2.0

0

3.7

4 5.48

10.9

1

5.2

0

8.1

2

6.7

1

4.0

0

2.4

5

6.0

0

4.8

0 4.00

6.7

8

r

4.2

4

3.7

4

3.1

6

2.8

3 3.74 9.95

4.5

8

6.3

2

5.0

0

3.1

6

0.0

0

4.0

0

3.6

1 2.45

5.6

6

s

4.2

4

3.4

6

2.8

3

2.8

3 3.46 9.95

4.5

8

6.3

2

4.8

0

2.8

3

1.4

1

4.0

0

3.3

2 2.83

5.6

6

t

3.0

0

3.8

7

3.0

0

2.6

5 3.87 9.27

3.7

4

6.2

5

4.9

0

3.6

1

1.7

3

4.8

0

4.0

0 2.24

5.9

2

u

4.5

8

4.8

0

3.0

0

3.3

2 4.80 9.38

4.0

0

7.6

8

6.4

8

3.6

1

2.6

5

5.7

4

4.2

4 3.00

4.8

0

EE 569- Fall’08 | ID- 5994499980 34

The chart below shows the results of taking error into consideration:

A B C D E K L M N O P R S T U

a

1.7

3

2.6

5

3.0

0

3.3

2

3.6

1

3.7

4

3.8

7

4.5

8

5.0

0

5.1

0

6.0

0

6.5

6 7.42

10.7

7

40.1

2

b

1.4

1

2.8

3

2.8

3

3.1

6

3.3

2

3.4

6

3.4

6

3.7

4

4.0

0

4.4

7

4.5

8

4.8

0 5.66 5.83 9.95

c

2.6

5

3.3

2

4.3

6

4.5

8

4.5

8

4.5

8

5.2

0

5.2

9

5.6

6

5.9

2

6.4

0

7.0

7 7.14 8.43

11.1

4

d

1.4

1

2.0

0

2.4

5

2.8

3

3.3

2

3.4

6

3.7

4

3.7

4

3.8

7

4.0

0

4.4

7

4.6

9 5.00 6.16 9.11

e

3.3

2

3.3

2

3.7

4

3.8

7

4.1

2

4.1

2

4.2

4

4.3

6

4.4

7

4.5

8

4.8

0

5.0

0 5.00 6.40 9.17

k

2.2

4

3.0

0

3.3

2

3.6

1

3.8

7

4.0

0

4.3

6

4.3

6

4.5

8

4.6

9

5.5

7

5.7

4 6.48 7.81

10.2

0

l

2.0

0

2.8

3

2.8

3

3.1

6

3.4

6

3.6

1

3.7

4

4.1

2

4.2

4

4.6

9

4.9

0

5.4

8 5.57 6.78 9.64

m

2.0

0

2.8

3

2.8

3

3.1

6

3.4

6

3.6

1

3.7

4

4.1

2

4.2

4

4.6

9

4.9

0

5.4

8 5.57 6.78 9.64

n

3.3

2

3.6

1

3.8

7

3.8

7

4.1

2

4.1

2

4.6

9

4.9

0

5.0

0

5.2

0

5.5

7

5.7

4 6.63 7.68 9.70

o

2.2

4

2.6

5

3.0

0

3.3

2

3.6

1

3.7

4

4.1

2

4.1

2

4.5

8

4.6

9

5.3

9

5.5

7 6.48 7.68

10.1

0

p

2.0

0

2.4

5

3.7

4

4.0

0

4.0

0

4.0

0

4.6

9

4.8

0

5.2

0

5.4

8

6.0

0

6.7

1 6.78 8.12

10.9

1

r

0.0

0

2.4

5

2.8

3

3.1

6

3.1

6

3.6

1

3.7

4

3.7

4

4.0

0

4.2

4

4.5

8

5.0

0 5.66 6.32 9.95

s

1.4

1

2.8

3

2.8

3

2.8

3

2.8

3

3.3

2

3.4

6

3.4

6

4.0

0

4.2

4

4.5

8

4.8

0 5.66 6.32 9.95

t

1.7

3

2.2

4

2.6

5

3.0

0

3.0

0

3.6

1

3.7

4

3.8

7

3.8

7

4.0

0

4.8

0

4.9

0 5.92 6.25 9.27

u

2.6

5

3.0

0

3.0

0

3.3

2

3.6

1

4.0

0

4.2

4

4.5

8

4.8

0

4.8

0

4.8

0

5.7

4 6.48 7.68 9.38

The chart represents that using this approach we were able to detect B,C,D,O,P,T,S from the test set.

We can detect K,L if we increase our error window. However, we cannot find the correct algorithm to

take a decision on the basis of Euclidean distances only. Our second check should be taking into

consideration each feature separately.

However, we find that this is not a very efficient way for alphabet detection when we take into

consideration only the endpoints and line numbers.

The reason for error-

We see that thinning has a considerable effect in detection of parameters. As shown above, the

thinning effects in training .raw are not as drastic as in test.raw.

EE 569- Fall’08 | ID- 5994499980 35

Unfortunately our algorithm runs only for thinned images and hence is very dependent on font style.

Since in our test image O,P,R,S,T and U are very straight fonts, it is easier to detect them.

Also, I realize there might be some small error in my programming some where which is why it is

showing error in the detection of L and M. but I couldn’t find any possible error in my code.

Part2

Euler Numbers

EULER NUMBERS

Test.raw Training.raw

For A=0;B=1;C=2;D=3;E=4;K=5;L=6;M=7,N=8,O=9,P=10,R=11,S=12,T=13,U=14

Wrong detection of euler number-

FOR ALPHABETS – K,L,M as per our matching set in training.raw. we take this as wrong detection

because we consider results for training as standards.

Detection of B

We just match the euler number of input character by -1. This way we can detect B.

Check for vertical symmetry:

A & O

A—2 end points ------------------correct decision of A;// this is a font dependent decision.

O---0 end points------------------correct decision of 0;

EE 569- Fall’08 | ID- 5994499980 36

We are not using circularity property for O because

We see that circularity of O in training set was not 1. Hence it was not a good parameter to judge.

However, the ratio of area and perimeter are still close.

The reason for this circularity not being 1 is that because the O given in training set is elongated and

not perfect o.

Check for horizontal symmetry:

D & O

D--------------------------------------correct decision of D;

Check for Uppermass/lowermass ratio:

P<R

Check for leftmass & rightmass ratio:

P<R

P & R detected on the basis of this ratio.

This method is invariant of fontsize and style. Hence detects the alphabets correctly here.

Checking for vertical symmetry:

Options: S,T,U.

Here S is under consideration because I am calculating symmetry on the basis of area of left and right.

Because of their structure, the area happens to be same and hence, making them symmetric.

Number of end points check:

Case 3: T—has 3 end points---------------------------detected successfully;

Case else: S,U.

Uppermass/lowermass ratio: Rightmass/leftmass ratio:

S has same UM/LM ratio but U has smaller UM/LM ratio;

U & S are detected.

Checking for horizontal symmetry:

C,E ( the rest are already detected.)

Endpoint check: C has 2 diagonal end points and E has 3 left end points.

C & E Detected properly.

EE 569- Fall’08 | ID- 5994499980 37

Here logically M, K,L should be detected. But since in our test set the detection of euler number for M

is faulty, it reflects in the decision making of M.

Numbers left.

K,L,M

We check for Vertical symmetry: M

M can be detected as A,M,O,N,S,T & U.

M has 2 bottom end points in training set but 4 in test

We check for horizontal and vertical lines.

From the above set only M has one left diagonal and one right diagonal lines. Hence, M matches.

Hence, M is detected properly.

We check for Horizontal symmetry: K. + plus k has 4 endpoints. // Easy decision.

L has one top end point and one left end point.+ L has no symmetry.

So possible choices.

P , R & L.

We check uppermass/lowermass ratio. This only matches with L.

Hence L detected.

Hence, all the alphabets were detected inspite of first check failing. We conclude the second approach

was much efficient inthis case because of the reason mentioned mentioned in above discussion.

PART 2

DETECTION OF NUMBERS (0-9)

DECISION TREE FOR THE TRAINING DATA:

EE 569- Fall’08 | ID- 5994499980 38

When we take decision, we take parameters that doesn’t change on scaling, rotating and only when

there is no other possibility. I have taken area or perimeter rarely and its use minimal in my program.

1)

Check if the Euler Number is 1, 0 or -1.

We first use the Euler number.

We have 3 groups:

E=-1 {8}

E=0 {6,9,4,0}

E=1 {1,2,3,5,7,+,-, . ,/, *, }

2)

If the Euler number is -1, there is only one possibility – the number ‘8’ .so number 8 has been identified!

3)

If the Euler number is 0, there is only 4 possibility – the numbers ‘6,9,4,0’ ,

Now, when we run the training image, we see that ‘0’ has the maximum circularity and upper mass and

lower mass are nearly symmetry. So we can set a small threshold so that it won’t deviate too much in

their symmetry and hence number ‘0’ has been identified!

Next, The numbers ‘6 & 9’ can be differentiated by their leftmass/rightmass (L/R)ratio. In training image

as well as test images, L/R ratio for 6 is maximum while for 9 is minimum. So we can differentiate it by

this ratio and the numbers ‘6’ and ‘9’ has identified!

Then with Euler ‘0’ and aspect ratio minimum ,we can get the number ‘4’ because aspect ratio was

minimum in the test images as well as training images..

So the numbers 6,9,4,0 can be identified

4)

Then with the Euler number ‘1’,we have {1,2,3,5,7,+,-, . ,/, *, }

Since there a lot of characters with this euler number’1’,so utmost care was taken to differentiate the

characters among themselves. An approximate way to detect ‘1’ could be by noting that the Aspect

Ratio of ‘1’ is large enough , ( infact the largest) and also its circularity index is a small value. In my

program I used a threshold value of 1.6 which was obtained by trial and error basis.

We come to the next three possibilities – ‘2’, ‘5’, ‘7’ . A crucial feature that characterizes these three

uniquely is the Upper Mass and the Lower Mass of the bounding box. It is found that the upper mass

and the lower mass in case of the number ‘2’ do not differ by a great extent and it seems to be biggest

among all. Using a small threshold, ‘2’ can be detected. As we can see clearly that the Upper Mass and

the Lower Mass in case of ‘7’ will differ by a large value which is the case in practical . Hence a large

threshold set will detect ‘7’. The other feature that will be able to differentiate 7 is R/L ratio. The ratio

comes approximately to ‘2’.So we can have a threshold from 1.9 to 2.2 and isolate 7.If the above two

conditions do not hold, it can mean only one thing – the number ‘5’.So we have differentiated 2,5,7!

EE 569- Fall’08 | ID- 5994499980 39

Sometimes 7 and 1 can be misjudged. So Another suggested approach to distinguish 7 & 1: (not

implemented)We can take a row-wise histogram of the symbol for its entire height. If it remains almost

constant but with a big peak in the beginning, it is a seven. Otherwise it is a 1. But it’s a difficult and

tedious procedure.

Detecting a ‘3’ was quite a difficult because I couldn’t find a unique parameter that differentiates it from

others. So I had to think a another way .The number ‘3’ can be detected uniquely by some other

method,how?. The difference between the mass of the pixels in the right portion of the bounding box

and the mass of the pixels in the left portion of the box is always a large value. Thus we can choose a

certain threshold to detect the number ‘3’.

5)

Now comes differentiating the symbols. They are 5 unique symbols. Out of which .(dot) can be found

out uniquely. The central moments, Moments, area and perimeters are least for dot(.).So it can easily

detected..

Next, Among the above 4 symbols only ‘-’ and ‘+’ exhibit symmetry. Once detected that they are

symmetric, one way of classifying ‘+’ and ‘-’ is that the normalized area of ‘+’ Is always greater than that

of ‘-’. This condition will hold universally. What I mean by symmetry? For plus, leftmass = rightmass and

uppermass=lowermass and for minus only leftmass=rightmass. So they can be got and detected.

Then we are left with ‘*’ and ‘/’.Centrals moments are approximately same in the test and training

images for /.So we can detect /.For *,leftmass=rightmass and uppermass is greater than lower

mass.Though ‘-’ and ‘*’ properties are likely to be similar,we have to emphasis on the fact that ‘*’ is not

a symmetrical pattern.

So thus based on various parameters and characters ,all the 15 patterns were uniquely identified and

isolated and it was tested using the test patterns below.

DISCUSSION OF RESULTS:

Training Image given used to train

my program,!

EE 569- Fall’08 | ID- 5994499980 40

Word1.raw

Output:

The Number/character is 9

The Number/character is 6

The number/character is 1

The number/character is .


My program was able to recognize the characters 7,1,.,6,9.So my threshold value setting for the above

numbers are perfectly correct and we get the desirable output.

Word2.raw Output :



The number/character is *

The number/character is /

My program wasn’t able to recognize ‘7’ in this word2 and it was able to detect the other words!So

my output was 4,2,*,/ .

METHODS TO OVERCOME THE PROBLEM:

My algorithm to detect the number 7 was U/R (uppermass/lowermass)ratio as mentioned in my

algorithm. The ratio was approximately ‘2’ but in the above image, we can see that ‘7’ was slightly

slanted and it has a different structure than in the training image. So my program wasn’t able to detect

it and in fact we see that u/r ratio was equal in this test image. So this was the reason that I was able to

detect ‘7’ in word1.raw but couldn’t detect ‘’7’’ in word2.raw.

One method to overcome this problem is to manually work out all possible ways of the script for the

letter ‘7’.Once all the scripts are formed, then analysis of all the possible parameters are made and a

certain threshold is obtained and its fed inside the program. Once its fed ,now if we run the program,

then chances of detecting that number is more than the previous method in which I did where we train

the program by giving only one training image. Literally what I mean is, there should be lot of training

images fed to the program and then the test images should be given to the program .Then the program

is less likely to make mistakes in detecting .!

EE 569- Fall’08 | ID- 5994499980 41

I have attached how and why ‘7’ was not detected by my program!

Documents

geometrical processing- texture segmentation- OCR