Al Arraf - hayyanhelal.files.wordpress.com

Clausthal University of Technology

Faculty of Mathematics, Computer Science andEngineering

Bachelorarbeit

Al Arraf

An approach using neural networks on curves todetermine the font of Arabic calligraphy art works

Hayyan Helal; 450100

supervised by:Univ.-Prof. Dr. rer.-nat. habil.

Jurgen Dix

Prof. Dr.

Thorsten Grosch

February 21, 2018

CONTENTS CONTENTS

Contents

1 Introduction 31.1 Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.2 Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.3 State of Art . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41.4 Approach in This Project . . . . . . . . . . . . . . . . . . . . . . 6

2 Neural Networks 62.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62.2 How It Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62.3 Mathematics of a Neural Network . . . . . . . . . . . . . . . . . 82.4 How to Use a Neural Network . . . . . . . . . . . . . . . . . . . . 112.5 Interface for a Neural Network . . . . . . . . . . . . . . . . . . . 12

2.5.1 Dimensions, Structure . . . . . . . . . . . . . . . . . . . . 122.5.2 Weights . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132.5.3 Activation Function . . . . . . . . . . . . . . . . . . . . . 142.5.4 Forward Propagation Function . . . . . . . . . . . . . . . 152.5.5 Backward Propagation Function (Training Function) . . . 172.5.6 Usage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

2.6 Why Are Neural Networks Helpful for This Problem . . . . . . . 21

3 Evolution of the Approach 223.1 Original Idea . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

3.1.1 Steps from Photo to Curves . . . . . . . . . . . . . . . . . 223.1.2 Steps from Curves to Font . . . . . . . . . . . . . . . . . . 253.1.3 First Results . . . . . . . . . . . . . . . . . . . . . . . . . 26

3.2 Modified Idea . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 293.2.1 Steps from Photo to Font . . . . . . . . . . . . . . . . . . 293.2.2 Most Suitable Programs for the Steps . . . . . . . . . . . 293.2.3 Steps from Photo to Curves . . . . . . . . . . . . . . . . . 303.2.4 Steps from Curves to Font . . . . . . . . . . . . . . . . . . 30

4 Photo to Curves 314.1 Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

4.1.1 Naming . . . . . . . . . . . . . . . . . . . . . . . . . . . . 314.1.2 Saving . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 324.1.3 Conventions in the code . . . . . . . . . . . . . . . . . . . 32

4.2 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 334.2.1 Organization . . . . . . . . . . . . . . . . . . . . . . . . . 334.2.2 Rename . . . . . . . . . . . . . . . . . . . . . . . . . . . . 344.2.3 Convert . . . . . . . . . . . . . . . . . . . . . . . . . . . . 354.2.4 Cut in Pieces . . . . . . . . . . . . . . . . . . . . . . . . . 384.2.5 Vectorize . . . . . . . . . . . . . . . . . . . . . . . . . . . 424.2.6 Simplify . . . . . . . . . . . . . . . . . . . . . . . . . . . . 434.2.7 SVG to Lists of Integers . . . . . . . . . . . . . . . . . . . 454.2.8 All Together . . . . . . . . . . . . . . . . . . . . . . . . . 52

1

CONTENTS CONTENTS

5 Curves to Font 535.1 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

5.1.1 Import Data . . . . . . . . . . . . . . . . . . . . . . . . . 535.1.2 Neural Network . . . . . . . . . . . . . . . . . . . . . . . . 56

6 Improvements on the Neural Network 576.1 Measurement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 576.2 First Try . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 596.3 Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 626.4 Number of Pieces . . . . . . . . . . . . . . . . . . . . . . . . . . . 666.5 Dimensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 696.6 Step . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 726.7 Interpretation of the Result . . . . . . . . . . . . . . . . . . . . . 736.8 More Testing Photos . . . . . . . . . . . . . . . . . . . . . . . . . 76

7 Problems and Further Work 777.1 Problems and Suggestions . . . . . . . . . . . . . . . . . . . . . . 77

7.1.1 Dimensions . . . . . . . . . . . . . . . . . . . . . . . . . . 777.1.2 Local Optimum: The Evil in Mathematics . . . . . . . . . 777.1.3 Tests: How many? . . . . . . . . . . . . . . . . . . . . . . 77

7.2 Further Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 787.2.1 Return to the Original Idea . . . . . . . . . . . . . . . . . 787.2.2 Read off Patterns out of the Neural Network . . . . . . . 78

8 Conclusions 798.1 Goals Scored in Bachelor . . . . . . . . . . . . . . . . . . . . . . . 79

8.1.1 Aesthetics . . . . . . . . . . . . . . . . . . . . . . . . . . . 798.1.2 The Aesthetician . . . . . . . . . . . . . . . . . . . . . . . 79

8.2 Lessons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 808.2.1 Project Evaluation . . . . . . . . . . . . . . . . . . . . . . 80

8.3 Future? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80

2

1 INTRODUCTION

Figure 1: A view inside the great mosque of Cordoba, the second capital of theUmayyad khalifate after Damascus (Spain today)

1 Introduction

1.1 Abstract

This project is an approach of writing a program that uses neural networksto determine the font of an Arabic text in a given photo. The project will befocused on Calligraphy works not on normal hand writing. And the approachwon’t be by considering the pixels of the photo as usual, but by considering thecurves that make the letters.

1.2 Problem

Before 1.1.1 in Hijri Calendar(19.04.622 in Julian Calendar) the use of Arabicwas restricted to some places in the Arabian peninsula. Now we’re in 1439 Hijriand Arabic is the 4th spoken language in the world. The use of Arabic hasspread from Indonesia in the east to Spain in the west, Figure 1. This diversityin place caused a lot of other diversities like in dialects but also in fonts. Todayit’s very important to be able to recognize which font a given art work hasbecause:

• A huge amount of Arabic art works will be sorted, which will give us alot of information about the historical events when, why and by who theywere written.

• Most Arabs are not able to recognize all fonts. It will be important forthem to read what they can understand but cannot read in the first place.

• Such a program helps the beginning calligraphists to know where to start.

3

1.3 State of Art 1 INTRODUCTION

Figure 2: Arabic for ”And you (Mouhammad) have great manners”.

• It helps when evolving new fonts by knowing to which category the newfont belongs.

1.3 State of Art

Text recognition in all languages wasn’t a problem any more after the use ofneural networks, but there are still no papers published that considers the char-acteristics of the Arabic language. The font in Arabic calligraphy works cannotbe always recognized using the normal methods because the text is writtensometimes in a very complex way, e.g. Figure 2. Most of the papers publisheduntil now take approaches on Latin letters and use them to determine the textsin Arabic. But these approaches ignore some facts:

• Arabic letters are mostly connected in each word. Latin letters are alwaysseparated by small white spaces even in the one word. That’s why methodstaken from papers on Latin letters recognizes only whole words(or partsof them) in Arabic and cannot work in the letter-level, e.g. Figure 3.

• Because Arabic words are written connected, the calligraphists have morefreedom to write two words tangled. But the models published recognizesonly whole words as said before, so they fail in recognizing the text ofsome art works. e.g. Figure 4.

• Arabic calligraphy has evolved using golden ratios, but this informationis not used in the published papers.

4

1.3 State of Art 1 INTRODUCTION

Figure 3: The English word ”word” is divided using white spaces only into the4 letters that make the word. The Arabic word ”word” can be divided usingspaces into two parts, one of them is actually the points for the last letter inthe first part.

Figure 4: Arabic for ”In the beginning was the word”. We can see that somewords are written tangled with one another, so using only separation by whitespaces is not enough to differentiate between the words.

5

1.4 Approach in This Project 2 NEURAL NETWORKS

1.4 Approach in This Project

As said before, we’ll try to use the information we know about Arabic before westart with the recognition of the font. In effect there will be two major stages:

• Photo to Curves: Here we’ll start with a photo with ”.jpg” extensionof the art work and end with information about the curves.

• Curves to Photo: Here we’ll start with information from the 1st stageand use them as an input for the neural network. The output of theneural network will be information about the font, i.e. percentage of thepossibility that the given photo is in a certain font.

There are, as in any project, problems that have no clear solution. There willbe a long discussion on these after describing the idea and the implementationof the project. But we’ll make a link to the discussion, each time when we faceany of these problems, so that one can move freely, when reading the projectfor a second time.

2 Neural Networks

2.1 Definition

The inventor of one of the first neurocomputers, Dr. Robert Hecht-Nielsen:”...a computing system made up of a number of simple, highly interconnectedprocessing elements, which process information by their dynamic state responseto external inputs”.

Google:”A computer system modelled on the human brain and nervous system.”

2.2 How It Works

Neural neworks are typically organized in layers. Layers are made up of a num-ber of interconnected ’nodes’ which contain an ’activation function’. Patternsare presented to the network via the ’input layer’, which communicates to oneor more ’hidden layers’ where the actual processing is done via a system ofweighted ’connections’. The hidden layers then link to an ’output layer’ wherethe answer is output as shown in figure 5.

Most ANNs contain some form of ’learning rule’ which modifies the weightsof the connections according to the input patterns that it is presented with. In asense, ANNs learn by example as do their biological counterparts; a child learnsto recognize dogs from examples of dogs.

Although there are many different kinds of learning rules used by neuralnetworks, this demonstration is concerned only with one; the delta rule. Thedelta rule is often utilized by the most common class of ANNs called ’backprop-agational neural networks’ (BPNNs). Backpropagation is an abbreviation forthe backwards propagation of error.

6

2.2 How It Works 2 NEURAL NETWORKS

Figure 5: Schemata of a neural network

With the delta rule, as with other types of backpropagation, ’learning’ is asupervised process that occurs with each cycle or ’epoch’ (i.e. each time the net-work is presented with a new input pattern) through a forward activation flowof outputs, and the backwards error propagation of weight adjustments. Moresimply, when a neural network is initially presented with a pattern it makes arandom ’guess’ as to what it might be. It then sees how far its answer was fromthe actual one and makes an appropriate adjustment to its connection weights.More graphically, the process looks something like in figure 7.

Note also, that within each hidden layer node is a sigmoidal activation func-tion which polarizes network activity and helps it to stablize.

Backpropagation performs a gradient descent within the solution’s vectorspace towards a ’global minimum’ along the steepest vector of the error surface.The global minimum is that theoretical solution with the lowest possible error.The error surface itself is a hyperparaboloid but is seldom ’smooth’ as is depictedin figure 6. Indeed, in most problems, the solution space is quite irregular withnumerous ’pits’ and ’hills’ which may cause the network to settle down in a’local minum’ which is not the best overall solution.

Since the nature of the error space can not be known a prioi, neural networkanalysis often requires a large number of individual runs to determine the bestsolution. Most learning rules have built-in mathematical terms to assist in thisprocess which control the ’speed’ (Beta-coefficient) and the ’momentum’ of thelearning. The speed of learning is actually the rate of convergence between thecurrent solution and the global minimum. Momentum helps the network toovercome obstacles (local minima) in the error surface and settle down at ornear the global miniumum.

Once a neural network is ’trained’ to a satisfactory level it may be used asan analytical tool on other data. To do this, the user no longer specifies anytraining runs and instead allows the network to work in forward propagationmode only. New inputs are presented to the input pattern where they filter intoand are processed by the middle layers as though training were taking place,however, at this point the output is retained and no backpropagation occurs.

7

2.3 Mathematics of a Neural Network 2 NEURAL NETWORKS

Figure 6: Delta Rule

The output of a forward propagation run is the predicted model for the datawhich can then be used for further analysis and interpretation.

It is also possible to over-train a neural network, which means that thenetwork has been trained exactly to respond to only one type of input; which ismuch like rote memorization. If this should happen then learning can no longeroccur and the network is refered to as having been ”grandmothered” in neuralnetwork jargon. In real-world applications this situation is not very useful sinceone would need a separate grandmothered network for each new kind of input.1

2.3 Mathematics of a Neural Network

Let’s define:

• n: Number of hidden layers.

• The last 0th layer: The output layer. The last 1th layer: The last hiddenlayer. The last lth layer: The layer layer before the last (l + 1)th layer.The last nth layer: The first hidden layer. The last (n + 1)th layer: Theinput layer.

• L0: The index set of the output.

• Ll: The index set of the output of the last −lth layer; l = −n, ...,−1.

• L−n−1: The index set of the input.

• t ∈ R|L0|: The expected output.

• O0 ∈ R|L0|: The output of the neural network.

• x0 ∈ R|L0|: The input of the nodes in the output layer.

• Ol ∈ R|Ll|: The output of the last −lth layer.

1http://pages.cs.wisc.edu/ bolo/shipyard/neural/local.html

8


Figure 7: Forward vs. Backward Propagation

• xl ∈ R|Ll| the input of nodes in the last −lth layer; l = −n, ...,−1.

• x−n−1 ∈ R|L−n−1| the input of the neural network.

• W lij ∈ R: The weight of the edge from the ith node in the last (−l+ 1)th

layer to the jth node in the next layer (i.e. last −lth layer).

• f : R→ R: The activation function.

• fo : R→ R: The activation function of the output layer.

Figure 7 makes these definitions clear.We define the forward propagation recursively, so that:

x−n−1i ; i ∈ L−n−1 is the input

O−n−1i := x−n−1i ; i ∈ L−n−1Ol

i := f(xli) := f(∑

j∈Ll−1

W l−1ji Ol−1

j ); i ∈ Ll; l = −n, ...,−1

O0i := fo(xli) := fo(

∑j∈Ll−1

W l−1ji Ol−1

j ); i ∈ L0 is the output

So with least mean squares method the error will be E :=1

2||O0 − t||2

Now let’s calculate the derivative of this error w.r.t. any weight, in order todetermine the effect of this one weight on the error:

∂E

∂W lij

=∂

1

2

∑k∈L0

(Ok − tk)2

∂W lij

=∑k∈L0

(Ok − tk)∂Ok

∂W lij

9


=∑k∈L0

(Ok − tk)∂fo(x0k)

∂W lij

=∑k∈L0

(Ok − tk)f ′o(x0k)∂x0k∂W l

ij

Let’s define δk := (Ok − tk)f ′o(x0k); k ∈ L0:

∂E

∂W lij

=∑k∈L0

δk∂x0k∂W l

ij

=∑

m∈L−1

∑k∈L0

δk∂x0k∂O−1m

∂O−1m

∂W lij

=∑

m∈L−1

∂O−1m

∂W lij

∑k∈L0

δk∂∑

m∈L−1O−1m W 0

mk

∂O−1m

=∑

m∈L−1

∂f(x−1m )

∂W lij

∑k∈L0

δkW0mk

=∑

m∈L−1

f ′(x−1m )∂x−1m

∂W lij

∑k∈L0

δkW0mk

Let’s define δm := f ′(x−1m )∑

k∈L0δkW

0mk;m ∈ L−1:

∂E

∂W lij

=∑

m∈L−1

δm∂x−1m

∂W lij

=∑

n∈L−2

∑m∈L−1

δm∂x−1m

∂O−2n

∂O−2n

∂W lij

=∑

n∈L−2

∑m∈L−1

δm∂∑

n∈L−2O−2n W−1nm

∂O−2n

∂f(x−2n )

∂W lij

=∑

n∈L−2

∑m∈L−1

δmf′(x−2n )

∂x−2n

∂W lij

W−1nm

Let’s define δn :=∑

m∈L−1f ′(x−2n )δmW

−1nm;n ∈ L−2:

∂E

∂W lij

=∑

n∈L−2

δn∂x−2n

∂W lij

And so on. It’s clear that this calculation will stop, when the term that is beingdifferentiated is not dependent of the weight W l

ij anymore. This will be the caseafter reaching the layer of this weight. At this layer the only relevant term willOl

j (*). So if we define recursively2:

• δ0k := f ′o(x0k)(Ok − tk)

• δ−1i := f ′(x−1i )∑

k∈L0δ0iW

0ik

• δli := f ′(xli)∑

k∈Ll+1δl+1i W l+1

ik ; l = −2, ...,−n

Note that recursively here means we start here with −2 and go down until −n.We get that:

∂E

∂W lij

=∑i′∈Ll

δl−1i′∂xli′

∂W lij

= δl−1i

∂xli∂W l

ij

= δl−1i

∂∑

m∈LlOl

mWlmj

∂W lij

= δl−1i

∑m∈Ll

∂OlmW

lmj

∂W lij

We know∂Ol

mWlmj

∂W lij

= Oli if m = i and 0 else as (*) states. So we get:

∂E

∂W lij

= δl−1i Oli

2This usage of deltas is influenced by this youtube video

10

https://www.youtube.com/watch?v=aVId8KMsdUU&t=607s

2.4 How to Use a Neural Network 2 NEURAL NETWORKS

Here we can stop the calculation. Now, we should change the weights using thisderivative:

W lij ←W l

ij − µδl−1i Oli (1)

This µ will be called a ”step” because it’s like a step towards the (hopefullyglobal) minimum of the error. The choose of this step won’t be very easy: Theproblem of step.

So the learning process will be by using the forward propagation equations tocalculate the output recursively at each layer. And then using the informationwe have about the expected output to ”correct” the weights using 1 recursively.The correction means here that we’ll use the derivative of the error E w.r.t.each weight to determine whether this weight should be increased or decreased.The minus before the correction term, means that if the derivate is positive (i.e.increasing the weight increases the error), we should decrease the weight, andvice versa if the derivative is negative.

2.4 How to Use a Neural Network

For any neural network to work we need:

• Dimensions: Size of the input(i.e. number of input nodes). Size of theoutput(i.e. number of output nodes). How to determine these numbers isthe problem of dimensions.

• Structure: Number of hidden layers and the number of nodes at eachlayer (these are actually the index sets from before). How to determinethese numbers is the problem of structure.

• Weights: At the start of the training process, the weights should getsome initial (random) values.

• Activation Function: It’s important to know which activation functionis the best for the problem that the neural network is struggling with.

• Forward Propagation Function: This function should use the forwardpropagation equations to calculate the output of the neural network.

• Backward Propagation Function(Training Function): This func-tion should use 1 to ”correct” the weights of the neural network.

• Training Data: A set of tuples (x, y), where x is the input and y theexpected output for the input x.It’s very important that all these tuples in the set have the same dimension.Because in the other case we’ll have the problem of dimensions.

11

2.5 Interface for a Neural Network 2 NEURAL NETWORKS

2.5 Interface for a Neural Network3 We’ll model the neural network as a class

class NN:

2.5.1 Dimensions, Structure

The initialising function should contain information about the structure of theneural network:

• input size: A positive integer that describes the number of input nodes.

• output size: A positive integer that describes the number of outputnodes.

• layers sizes: A list of positive integers that describe the number of nodesat each hidden layer.

• act func str: A string that describes the activation function. The initialvalue is the string for the sigmoid function.

• act func output str: A string that describes the activation function atthe output nodes (i.e. the last result of the neural network). The initialvalue is the string for the linear function, so the result will be just givenas it is.

All of these will be then saved as attributes for the class NN.

def i n i t ( s e l f , i n p u t s i z e , ou tput s i z e , l a y e r s s i z e s ,a c t f u n c s t r=”sigm” , a c t f u n c o u t p u t s t r=” l i n ” ) :

s e l f . i n p u t s i z e = i n p u t s i z es e l f . o u t p u t s i z e = o u t p u t s i z es e l f . l a y e r s s i z e s = l a y e r s s i z e ss e l f . a c t f u n c s t r = a c t f u n c s t rs e l f . a c t f u n c o u t p u t s t r = a c t f u n c o u t p u t s t r

3the used programming language is python 3.5

12


2.5.2 Weights

Add to the structure attributes there will be one attribute w, which is a dictio-nary that contains the weights of the edges between the different nodes. Thisattribute will be assigned random values at first. The keys of the dictionary willbe 2-tuples of strings that describe the nodes of the edge.

s e l f .w = {}for x in range ( i n p u t s i z e ) :

for y in range ( l a y e r s s i z e s [ 0 ] ) :node1 = ” input ” + str ( x )node2 = ” layer0node ” + str ( y )s e l f .w[ ( node1 , node2 ) ] = random ( )

for i in range ( len ( l a y e r s s i z e s )−1):for x in range ( l a y e r s s i z e s [ i ] ) :

for y in range ( l a y e r s s i z e s [ i +1 ] ) :node1 = ” l a y e r ” + str ( i ) + ”node” + str ( x )node2 = ” l a y e r ” + str ( i +1) + ”node” + str ( y )s e l f .w[ ( node1 , node2 ) ] = random ( )

for x in range ( l a y e r s s i z e s [ −1 ] ) :for y in range ( o u t p u t s i z e ) :

node1 = ” l a y e r ” + str ( len ( l a y e r s s i z e s )−1)+ ”node” + str ( x )

node2 = ” output ” + str ( y )s e l f .w[ ( node1 , node2 ) ] = random ( )

13


2.5.3 Activation Function

Now we define the activation function as a method act func: x will be thevalue, where the function should be valued. deriv is a boolean value, so thatwe have at the same time the derivative of the function in one method.Choices here are: The sigmoid function, sine function, identity(linear) function,sign function and the ReLu function.

def ac t fu nc ( s e l f , x , de r i v=False ) :i f s e l f . a c t f u n c s t r == ”sigm” and not de r i v :

return 1/(1 + exp(−x ) )e l i f s e l f . a c t f u n c s t r == ”sigm” and de r i v :

return s e l f . a c t f u nc ( x ) ∗ (1 − s e l f . a c t f u nc ( x ) )i f s e l f . a c t f u n c s t r == ” s i n ” and not de r i v :

return s i n ( x )e l i f s e l f . a c t f u n c s t r == ” s i n ” and de r i v :

return cos ( x )i f s e l f . a c t f u n c s t r == ” l i n ” and not de r i v :

return xe l i f s e l f . a c t f u n c s t r == ” l i n ” and de r i v :

return 1i f s e l f . a c t f u n c s t r == ” s i gn ” and not de r i v :

i f x == 0 :return 0

e l i f x > 0 :return 1

else :return −1

e l i f s e l f . a c t f u n c s t r == ” s i gn ” and de r i v :i f x == 0 :

return i n felse :

return 0i f s e l f . a c t f u n c s t r == ”ReLu” and not de r i v :

i f x >= 0 :return x

else :return 0

e l i f s e l f . a c t f u n c s t r == ”ReLu” and de r i v :i f x >= 0 :

return 1else :

return 0

The same will be done for the activation function of the output layer. Methodact func output will be the corresponding one here.

14


2.5.4 Forward Propagation Function

Now we need a method for forward propagation. This will be called func here,which gets an input input and calculates recursively all the terms we need.We start with making sure that the input has the same size as the input of theneural network.

def func ( s e l f , input ) :i f len ( input ) != s e l f . i n p u t s i z e :

print ( ” Error in input s i z e : ” , len ( input ) ,” should be” , s e l f . i n p u t s i z e )

return False

Then we define the mathematical terms we defined before:

• input[j]: Corresponds to x−n−1j+1 .

• layers output[i][j]: Corresponds to Oi−nj+1 ; i = 0, ..., n− 1.

• layers input[i][j]: Corresponds to xi−nj+1; i = 0, ..., n− 1.

• output[j]: Corresponds to O0j+1.

• output input[i][j]: Corresponds to x0j+1.

l a y e r s o u t p u t = array ( [ [ . 0 ] ∗ s e l f . l a y e r s s i z e s [ i ]for i in range ( len ( s e l f . l a y e r s s i z e s ) ) ] )

l a y e r s i n p u t = array ( [ [ . 0 ] ∗ s e l f . l a y e r s s i z e s [ i ]for i in range ( len ( s e l f . l a y e r s s i z e s ) ) ] )

output = array ( [ . 0 ] ∗ s e l f . o u t p u t s i z e )output input = array ( [ . 0 ] ∗ s e l f . o u t p u t s i z e )

15


Now we recursively calculate these terms using the forward prpagation equa-tions. This will be by first calculating the input of the layer xli and then usingthe activation function to calculate the output of the layer Ol

i. We return theseat the end.

# ca l c u l a t e input o f the f i r s t hidden l a y e rfor x in range ( s e l f . i n p u t s i z e ) :

for y in range ( s e l f . l a y e r s s i z e s [ 0 ] ) :node1 = ” input ” + str ( x )node2 = ” layer0node ” + str ( y )l a y e r s i n p u t [ 0 ] [ y ] +=

s e l f .w[ ( node1 , node2 ) ] ∗ input [ x ]

# ca l c u l a t e output o f the f i r s t hidden l a y e rfor y in range ( s e l f . l a y e r s s i z e s [ 0 ] ) :

l a y e r s o u t p u t [ 0 ] [ y ] =s e l f . a c t f u nc ( l a y e r s i n p u t [ 0 ] [ y ] )

# for each hidden l a y e r ( i +1)for i in range ( len ( s e l f . l a y e r s s i z e s )−1):

# ca l c u l a t e input o f the ( i +1) th hidden l a y e rfor x in range ( s e l f . l a y e r s s i z e s [ i ] ) :

for y in range ( s e l f . l a y e r s s i z e s [ i +1 ] ) :node1 = ” l a y e r ” + str ( i ) + ”node” + str ( x )node2 = ” l a y e r ” + str ( i +1) + ”node”

+ str ( y )l a y e r s i n p u t [ i +1] [ y ] +=

s e l f .w[ ( node1 , node2 ) ] ∗ l a y e r s o u t p u t [ i ] [ x ]# ca l c u l a t e output o f the ( i +1) th hidden l a y e rfor y in range ( s e l f . l a y e r s s i z e s [ i +1 ] ) :

l a y e r s o u t p u t [ i +1] [ y ] =s e l f . a c t f u nc ( l a y e r s i n p u t [ i +1] [ y ] )

# ca l c u l a t e input o f the output l a y e rfor x in range ( s e l f . l a y e r s s i z e s [ −1 ] ) :

for y in range ( s e l f . o u t p u t s i z e ) :node1 = ” l a y e r ” +

str ( len ( s e l f . l a y e r s s i z e s )−1) + ”node”+ str ( x )

node2 = ” output ” + str ( y )output input [ y ] +=

s e l f .w[ ( node1 , node2 ) ] ∗ l a y e r s o u t p u t [ −1 ] [ x ]

# ca l c u l a t e the output ( the output o f the output l a y e r )for y in range ( s e l f . o u t p u t s i z e ) :

output [ y ] = s e l f . a c t func output ( output input [ y ] )

# return a l l t h e s e termsreturn l a y e r s i n p u t , l aye r s output , output input ,

output

16


2.5.5 Backward Propagation Function (Training Function)

And at the end as said, we’ll need a training function: train.Here, we get the definitions from before by using the method for forward prop-agation func:

• input[j]: Corresponds to x−n−1j+1 .

• layers output[i][j]: Corresponds to Oi−nj+1 ; i = 0, ..., n− 1.

• layers input[i][j]: Corresponds to xi−nj+1; i = 0, ..., n− 1.

• output[j]: Corresponds to O0j+1.

• output input[i][j]: Corresponds to x0j+1.

• ex output[j]: Corresponds to tj .

And the deltas:

• delta layers[i][j]: Corresponds to δi−nj+1 ; i = 0, ..., n− 1

• delta output[j]: Corresponds to δ0j+1

def t r a i n ( s e l f , input , ex output , s tep =0.1 , t imes =1):for in range ( t imes ) :

l a y e r s i n p u t , l aye r s output , output input ,output = s e l f . func ( input )

nn output [ tuple ( input ) ] = outputd e l t a l a y e r s = array ( [ [ . 0 ] ∗ s e l f . l a y e r s s i z e s [ i ]

for i in range ( len ( s e l f . l a y e r s s i z e s ) ) ] )de l t a output = array ( [ . 0 ] ∗ s e l f . o u t p u t s i z e )

17


We then recursively calculate the deltas.

# ca l c u l a t e d e l t a o f the output l a y e r ( d e l t a ˆ0)for y in range ( s e l f . o u t p u t s i z e ) :

de l t a output [ y ] =s e l f . a c t func output ( output input [ y ] , d e r i v=True )

∗( output [ y]− ex output [ y ] )

# ca l c u l a t e d e l t a o f the l a s t hidden l a y e r ( d e l t a ˆ(−1))for x in range ( s e l f . l a y e r s s i z e s [ −1 ] ) :

sum = . 0for y in range ( s e l f . o u t p u t s i z e ) :

node1 = ” l a y e r ” +str ( len ( s e l f . l a y e r s s i z e s )−1) + ”node”

+ str ( x )node2 = ” output ” + str ( y )sum += de l ta output [ y ]∗ s e l f .w[ ( node1 , node2 ) ]

d e l t a l a y e r s [ −1 ] [ x ] =s e l f . a c t f u nc ( l a y e r s i n p u t [ −1 ] [ x ] , d e r i v=True )

∗sum

# for each hidden l a y e r ( i +1)for i in range ( len ( s e l f . l a y e r s s i z e s )−2 ,−1):

# ca l c u l a t e the d e l t a o f the l a y e r ( d e l t a ˆ( i +1))for x in range ( s e l f . l a y e r s s i z e s [ i ] ) :

sum = . 0for y in range ( s e l f . l a y e r s s i z e s [ i +1 ] ) :

node1 = ” l a y e r ” + str ( i ) + ”node” + str ( x )node2 = ” l a y e r ” + str ( i +1) + ”node” + str ( y )sum +=

d e l t a l a y e r s [ i +1] [ y ]∗ s e l f .w[ ( node1 , node2 ) ]d e l t a l a y e r s [ i ] [ x ] =

s e l f . a c t f u nc ( l a y e r s i n p u t [ i ] [ x ] , d e r i v=True )∗sum

18


And at the end we correct the weights using the given step as in equation 1.

# cor r e c t the we i gh t s o f the edges from the input l a y e r# to the f i r s t hidden l a y e r

for x in range ( s e l f . i n p u t s i z e ) :for y in range ( s e l f . l a y e r s s i z e s [ 0 ] ) :

node1 = ” input ” + str ( x )node2 = ” layer0node ” + str ( y )s e l f .w[ ( node1 , node2 ) ] +=

−s tep ∗ d e l t a l a y e r s [ 0 ] [ y ]∗ input [ x ]

# for each hidden l a y e r ( i +1)for i in range (0 , len ( s e l f . l a y e r s s i z e s )−1):

# cor r e c t the we i gh t s o f the edges from the# ( i+1) th l a y e r to the ( i +2) th l a y e r

for x in range ( s e l f . l a y e r s s i z e s [ i ] ) :for y in range ( s e l f . l a y e r s s i z e s [ i +1 ] ) :

node1 = ” l a y e r ” + str ( i ) + ”node” + str ( x )node2 = ” l a y e r ” + str ( i +1) + ”node” + str ( y )s e l f .w[ ( node1 , node2 ) ] +=

−s tep ∗ d e l t a l a y e r s [ i +1] [ y ]∗ l a y e r s o u t p u t [ i ] [ x ]

# cor r e c t the we i gh t s o f the edges from the l a s t hidden# lay e r to the output l a y e r

for x in range ( s e l f . l a y e r s s i z e s [ −1 ] ) :for y in range ( s e l f . o u t p u t s i z e ) :

node1 = ” l a y e r ” + str ( len ( s e l f . l a y e r s s i z e s )−1)+ ”node” + str ( x )

node2 = ” output ” + str ( y )s e l f .w[ ( node1 , node2 ) ] +=

−s tep ∗ de l ta output [ y ]∗ l a y e r s o u t p u t [ −1 ] [ x ]

And times determines how many times this whole training process will berepeated.

2.5.6 Usage

It’s important to know now how to use this interface for a neural network. Andfor this we have this small example: func will be the function f : {0, 1}4 →{0, 1}4 : f(x, y, z, t) = ((x ∧ y) ∨ z, (y ∨ z) ∧ z, (z ∧ t) ∨ x, (t ∨ x) ∧ y), which theneural network should imitate. And the steps are as follows:

1. Determine what are the training data.

2. Define the structure and the activation function of the neural network andmake one instance of the class.

3. determine how much time you have (i.e. how many times should the neuralnetwork learn) and learn the neural network.

19


def func (X) :r e s u l t = [ 0 , 0 , 0 , 0 ](x , y , z , t ) = tuple (X)i f ( x and y ) or z :

r e s u l t [ 0 ] = 1i f ( y or z ) and t :

r e s u l t [ 1 ] = 1i f ( z and t ) or x :

r e s u l t [ 2 ] = 1i f ( t or x ) and y :

r e s u l t [ 3 ] = 1return r e s u l t

# 1t r a i n i n g d a t a i n p u t = [ [ 0 , 0 , 0 , 0 ] , [ 1 , 0 , 0 , 0 ] ,

[ 0 , 1 , 0 , 0 ] , [ 1 , 1 , 0 , 0 ] ,[ 0 , 0 , 1 , 0 ] , [ 1 , 0 , 1 , 0 ] ,[ 0 , 1 , 1 , 0 ] , [ 1 , 1 , 1 , 0 ] ,[ 0 , 0 , 0 , 1 ] , [ 1 , 0 , 0 , 1 ] ,[ 0 , 1 , 0 , 1 ] , [ 1 , 1 , 0 , 1 ] ,[ 0 , 0 , 1 , 1 ] , [ 1 , 0 , 1 , 1 ] ,[ 0 , 1 , 1 , 1 ] , [ 1 , 1 , 1 , 1 ] ]

t r a i n i n g d a t a o u t p u t =[ func ( x ) for x in t r a i n i n g d a t a i n p u t ]

# 2i n p u t s i z e = 4o u t p u t s i z e = 4hiddenLayers = [ 1 2 , 1 2 ]actFunc = ”sigm”nn = NN( i n p u t s i z e , ou tput s i z e , hiddenLayers ,

a c t f u n c s t r=actFunc )

# 3max times = 5000000

for in range ( max times ) :for x in t r a i n i n g d a t a i n p u t :

nn . t r a i n (x , func ( x ) , s t ep =0.1 , t imes =1)

20

2.6 Why Are Neural Networks Helpful for This Problem 2 NEURAL NETWORKS

And the error will change like this:

2.6 Why Are Neural Networks Helpful for This Problem

Neural networks can recognize patterns in a very good way. For this problem,each font has properties that makes it the font it is. And just as our minds learnto recognize fonts, neural networks can learn to recognize fonts after ”seeing”enough suitable learning data.

21

3 EVOLUTION OF THE APPROACH

3 Evolution of the Approach

3.1 Original Idea

The original idea was first proposed on 20.04.2017. After about 4 months ofhard work, it was clear that developing such an idea is not in the context of aproject for bachelor degree and takes much more time than expected.

3.1.1 Steps from Photo to Curves

1. Define a graph for the curves in the photo.

2. Make photo black and white: Black for text and white for the vacuumspace.

3. Unfill the text.

4. Find equations:

(a) Consider a random black point and a small box with dimensions d∗daround it:

(b) All the steps next will be for points that are connected to this blackpoint.

(c) Take some points from the unfilled line (which is connected to thefirst black point; e.g. 5 points for an ellipse) and use them to findthe equation of the curve in the small box equ1:

22

3.1 Original Idea 3 EVOLUTION OF THE APPROACH

(d) Make the box a little bit bigger. And then check if the equation isstill valid for another set of points. If not, find a new equation equ2for these points:

(e) When reaching a point, where there’s no more suitable equation,stop the process and save the equation as a node in the graph.

(f) Start now again with the point, where the process stopped. Thecurve we’re getting next will be connected with an edge to the firstnode.

5. Check if any nodes have the same equation. In this case merge them intoone node.

6. Repeat the process for any other points that are still not processed.

7. Return the resulting graph.

23


Step 5 is very important for photos, where the text is tangled, which is veryoften the case in Arabic calligraphy.Consider these two circles as letters. Starting from a random point, let’s saythat the algorithm will find first the equation eq1, which represent the curve eq1.The process will then stop at the red point. There will be three choices: Oneof them will be randomly taken, say the curve eq2. The process will continueuntil getting four equations. If the process stopped there, this will mean, thatthe circles will be divided into four letters, which is not right. But with step5 from the algorithm, we’ll get information about curves that make togetherone letter. So here, because eq1 ≡ eq3, eq2 ≡ eq4 we get only two nodes in thegraph representing only two curves, which is the case.

24


Figure 8: Arabic for ”the universe is made out of leaves and wind; and Damas-cus, the navel of jasmine, is pregnant; spreads its scent on the ceiling and waitsfor the fetus”

3.1.2 Steps from Curves to Font

Use the whole graph as input for the neural network. Equations (which repre-sent curves) that come after one another in a certain order are the pattern thatdetermines the font and which the neural network should find.Use different subgraphs of the graph as an input for the neural network, andcompare it to the whole result. Being closer to the whole result means that thissubgraph (which represents some curves) is an important characteristic of thisfont. For subgraphs, being far from the whole result means that these curvescome in other fonts as well and might be just necessary curves for the languageitself. For example, points are mostly the same in all fonts. Being close to thewhole result, means that this subgraph is an important characteristic of this font.

Figure 8 is in the circlefull Hayyani font. As we can see, the circles andtheir ratio to one another is in every photo of this font the same, which is acharacteristic of this font.

25


Figure 9: Arabic for ”you get the moose if you wish”

3.1.3 First Results

Figure 9 is an example in the classical Hayyani font:After 1 minute we get:

After 30 minutes we get:

26


After 2 hours we get:



Which is a great approximation of the photo, that includes all relevant infor-mation for the font. But it took a lot of time.

In general it was clear that the run-time of the algorithm was O(2

n

d ), where nis the number of pixels in the photo, d is the dimensions of the starting box inpixels.

27


The first solution was to divide the photo into smaller pieces, which can be

explored much faster. This led to a running time of O(k22

n

kd ), where k is thenumber of pieces. But for any k greater than 3, the result was not readableanymore, and this will make it very hard for the neural network to recognise

any patterns more. So for k = 3 we get O(2

n

3d ).For the pieces of the last photo (k = 3 pieces vertically):


Which is also great and satisfy most of our needs. But still takes a lot of time.

This result was presented in the presentation on 10.07.2017. And because it wasclear that I need a lot of time to optimize these running times, the decision wasmade to ask some experts in the field of photo editing and computer graphics.Which lead to a new start from the beginning with a modified idea.

28

3.2 Modified Idea 3 EVOLUTION OF THE APPROACH

3.2 Modified Idea

The modified idea is based on using already defined algorithms to find theequations of the curves in a given photo, a.k.a. vectorization.

3.2.1 Steps from Photo to Font

1. Photo Editing: Edit the photo, so that it fits in the vectorization pro-gram. This will also include cutting the photo in its lines of text and insmaller pieces, so that the resulting curves are not very long. Having verylong curves is the problem of dimensions. Add to that, cutting in pieceswill make it possible to restore some of the original idea, by highlightingpieces instead of curves, highlighting pieces next to their closeness to thewhole result, which will be the median of the result in pieces as we’ll seelater.

2. Vectorization: Vectorize the photo. For this we’ll search for the mostsuitable program in the internet.

3. Interpretation: Edit the result to get a suitable input for the neuralnetwork.

4. Pattern Recognition Using the Neural Network: Patterns are herenot really clear as in the original idea. We try to believe that the samecurves will result the same vectorization results, which’s not totally right,because vectorizing the same curve might happen in a different way de-pending on where it is.But for a huge amount of curves in one photo it will be likely again to getthe same vectorization results by some of them, which won’t be very bad.It’s from now on clear, that the gained time in the modified idea will becompensated with a bigger error in the recognition of the font.

3.2.2 Most Suitable Programs for the Steps

After some researches on the internet, it was clear that there are a lot of waysto save the vectorization results. The most known one and the most suitable forus is SVG (Scalable Vector Graphics). Drawing commands in SVG work like amoving cursor that leaves a track behind it. The drawing commands for curvesare marked with letter C, which makes it convenient to take the information forthe curves out of the ’.svg’-file.The same research showed that there are a lot of programs that convert a photofrom any pixel-wise extension to ’.svg’. The most known programs for this are:

• Vector Magic: Great free trial, but not free for ever.

• Inkscape: Free program, but have no batch mode. Batch mode is im-portant to vectorize a big number of photos, without opening them oneby one using the interface of the program.

29

3.2 Modified Idea 3 EVOLUTION OF THE APPROACH

• AutoTrace: Free with batch mode. One problem is that it saves thedrawing commands in absolute coordinates.Since the drawing commands of SVG represent a moving cursor, thereare two ways to give the coordinates: Either by just saving where thepoint is in absolute coordinates, or by saving the relative position of thenew point to the old point where the cursor was.

• potrace: This one is free, have batch mode, and save vectorization resultsin relative coordinates.

• PILLOW: This one has nothing to do with the vectorization. It’s just agreat tool to deal with photos in python. It’ll be used in the photo editing.

• Google Magick: Another free great tool to deal with photos. This timewith batch mode. We’ll also use it in the photo editing.

3.2.3 Steps from Photo to Curves

Comparing with steps of the modified idea: Photo editing → 1.,2. . Vectoriza-tion → 3. Iterpretation → 4.,5. .

1. Convert: nIput is a photo with extension ”.jpg”, and output is the samephoto with extension ”.bmp”, but in black and white.

2. Cut in Pieces: Input is a photo with extension ”.bmp”, and output is alist of ”.bmp”-photos, which together make the original photo. The piecesare made out w.r.t. lines of text first, and than horizontally.

3. Vectorize: Input is a (piece of a) photo with extension ”.bmp”, andoutput is the same (piece) photo with extension ”.svg”.

4. Simplify: In case that the length of the paths in the photo(piece) are stilltoo long for the neural network, a simplification should be done until weget the right lengths.

5. SVG to Lists of Integers: Input is a photo with extension ”.svg”, andoutput is a list of lists of integers saved in a ”.txt”-file. Each list of integersrepresents a curve. This will be done by analysing the ”.svg”-file.

3.2.4 Steps from Curves to Font

Comparing with steps of the modified idea: Pattern Recognition Using theNeural Network → 1.,2. .

1. Import Data: Import data out of the saved ”.txt”-files and save themin a convenient way: For this we’ll choose a dictionary with a key for eachcurve. This key will be then linked to the corresponding font of this curve.Here we’ll still face the problem of dimensions, because the length of thecurves is still different from one to another. But we’ll start with a naivesolution.

2. Neural Network: Train the neural network with these data.

30

4 PHOTO TO CURVES

4 Photo to Curves

4.1 Organization

Because the steps from photo to curves take very long time, and we’ll do it veryoften (for each training), it’ll be better to do it one time for each photo andthan save the output. For this we’ll need two things:

• Naming: A way to name the photos.

• Saving: A set of folders to save the result at each step of the photoediting.

4.1.1 Naming

There are of course a lot of ways to this. But because this is not very important,we’ll just use the following regular expression as a name for all photos:”font iterren (conv (line line piece (i, j) (vect (s)∗)+)+)+”. 45

Where:

• font: The font given as a positive integer (resembles the output of theneural network)

• iter: A positive integer as an identifier for the photo. Each photo has it’sown identifier.

• (conv)+: This constant is there, when the photo is already converted.

• (line line piece (i, j))+: This expression is there, when the photo is alreadycut in pieces. This expression occurs actually only in the name of a piece.

• line: Resembles the line of the text in the photo, where the piece is takenfrom.

• (i, j): Starting from right to left(as in reading Arabic), from top to bottom,this tuple is the position of the piece.

• (vect)+: This constant is there, when the photo(piece) is already vector-ized.

• (s)n: This means, the photo has been simplified for n times.

4”x+” = ”(x + ε)”5variables in italic

31

4.1 Organization 4 PHOTO TO CURVES

4.1.2 Saving

We’ll define the following folders to save the photos after each step and all ofthem will be in the folder Training Data:

• Input: Includes folders for each font. Each folder includes the ’.jpg’-photos in the corresponding font. The names of the photos here are notrelevant, but the name of the folders should be the number font definedin Naming.

• Renamed: Includes the input photos after renaming them.

• Converted: Includes the renamed photos after turning them into ’.bmp’-files.

• Pieces: Includes a folder for each converted photo. The folder includesthe ’.bmp’-pieces of the photos corresponding photo.

• Vectorized: Includes the pieces after turning them into ’.svg’-files.

• Simplified: Includes the pieces if simplified.Abstraction: Photos that don’t have to be simplified will be just copiedto Simplified as photos which are ”0 times” simplified. In this way,Simplified will be from this step on, the only folder that we have tohandle with(just like any of the other folders described before). In theother case, it can be very complicated to handle with two folders, one forsimplified and one for not.

• ToIntegers: Includes the lists of integers for each piece as ’.txt’ files.

4.1.3 Conventions in the code

• Functions: do this and that which describes what the function does.

• Functions: aB means the a(as an adjective) B. For example median-Color means the median color.

• Variables: b c means the c of b. For example image iter means theiterator of image.

• Variables: aB c means the a(as an adjective) c of B. For example de-faultOutput pathes means the default pathes of output.

32

4.2 Implementation 4 PHOTO TO CURVES

• Variables: a i means an instance of a. Mostly for output of functionsbecause python is confused about variables that has the same name witha function. For example image iter i means the current output of themethod image iter.

• Paths and Names: ”a\b\c.jpg” is a path of an image. The name of thisimage is only ”c”. So the variables image path and image name arestrings that mean the same photo, but one is just a substring of the otheras given top.

4.2 Implementation

4.2.1 Organization

First we define the folders just mentioned in Saving:

de fau l tOutput pathes = {”whole” : ” TrainingData ” ,” ren ” : ” TrainingData \\Renamed” ,”conv” : ” TrainingData \\Converted” ,” p i e c e s ” : ” TrainingData \\Piece s ” ,” vect ” : ” TrainingData \\Vector i zed ” ,”simp” : ” TrainingData \\ S i m p l i f i e d ” ,” i n t ” : ” TrainingData \\ToIntegers ”}

The helping function between takes a string string and two characters x,yand return all substrings of string that start by x and end by y.

def between ( s t r i ng , x , y ) :r e s u l t = [ ]x s t r i n g s = s t r i n g . s p l i t ( x )for x s t r i n g in x s t r i n g s :

i f y in x s t r i n g :r e s u l t += [ x s t r i n g . s p l i t ( y ) [ 0 ] ]

return r e s u l t

Now we’re able to define functions that make it easier to get the iterator, thepiece and the line of a photo just out of it’s path using the regular expressiondefined in Naming:

def i m a g e i t e r ( image path ) :return image path . s p l i t ( ” ren ” ) [ 0 ] . s p l i t ( ” ” ) [−1]

def image p i ece ( image path ) :return ” ( ” + between ( image path . s p l i t ( ” p i e c e ” ) [−1]

, ” ( ” , ” ) ” ) [ 0 ] + ” ) ”def i m a g e l i n e ( image path ) :

return ” ( ” + between ( between ( image path , ” l i n e ”, ” p i e c e ” ) [ 0 ] , ” ( ” , ” ) ” ) [ 0 ] + ” ) ”

33


4.2.2 Rename

The function rename all in folder takes the path of a folder that includes thephotos in just one font. It also takes optionally another output path than theone defined in Saving.First, it makes sure, that there’s a file for the output. Then it makes a copy ofeach photo in the input folder to the output folder. At end this copy will berenamed using a totally new iter and the font given in the name of the folder.

def r e n a m e a l l i n f o l d e r ( f o ld e r pa th , output path =defau l tOutput pathes [ ” ren ” ] ) :

# make sure t ha t t h e r e ’ s a f o l d e r f o r the outputi f output path . s p l i t ( ”\\” ) [−1] not in

l i s t d i r ( output path . s p l i t ( ”\\” ) [ 0 ] ) :makedirs ( output path )

# the f on t i s g i ven in the name o f the f o l d e rf ont = f o l d e r p a t h [−1]

# for each photo in the input f o l d e rfor f in l i s t d i r ( f o l d e r p a t h ) :

# f ind a new i t e r a t o ri m a g e i t e r s = [ ]for f 2 in l i s t d i r ( output path ) :

i m a g e i t e r s += [ int ( i m a g e i t e r ( f 2 ) ) ]i m a g e i t e r i = max( i m a g e i t e r s ) + 1

i f len ( i m a g e i t e r s ) != 0 else 0

# make a copy o f the photo to the output f o l d e rimage path = f o l d e r p a t h + ”\\” + fimage name = image path . s p l i t ( ”\\” ) [−1]c a l l ( ”copy \”” + image path + ”\” \”” +

output path + ”\”” , s h e l l=True )new image path = output path + ”\\” + image name

# rename the copyrename ( new image path , output path + ”\\” +

font + ” ” + str ( i m a g e i t e r i ) + ” ren ” +image path [ −4 : ] )

return output path

And to make it easier, the function rename all will do the renaming in all thefolders in the input folder images path.

def r ename a l l ( images path , output path =defau l tOutput pathes [ ” ren ” ] ) :

for f o l d e r p a t h in l i s t d i r ( images path ) :r e n a m e a l l i n f o l d e r ( images path + ”\\” +

fo lde r pa th , output path )

34


4.2.3 Convert

First, some helping functions:

• pixelMap: For an image of size n ∗m pixels, this function will return alist of all tuples (x, y) such that x = 0, ..., n; y = 0, ...,m.

def pixelMap ( image ) :map = [ ]for x in range ( image . s i z e [ 0 ] ) :

for y in range ( image . s i z e [ 1 ] ) :map += [ ( x , y ) ]

return map

• medianColor: Given an image, this function will return the median colorof the image, which is median in all the components of an ’RGB’-color,for all colors in the images.

def medianColor ( image ) :c o l o r s = [ ]for p in pixelMap ( image ) :

c o l o r s += [ image . g e t p i x e l (p ) ]return ( median ( [ c [ 0 ] for c in c o l o r s ] ) ,

median ( [ c [ 1 ] for c in c o l o r s ] ) ,median ( [ c [ 2 ] for c in c o l o r s ] ) )

• colorNorm: For a color color ∈ ([0, 255] ∩ N)3, this function will return||color||2.

def colorNorm ( c o l o r ) :return s q r t ( c o l o r [0 ]∗∗2+ c o l o r [1 ]∗∗2+ c o l o r [ 2 ] ∗ ∗ 2 )

Now we can define convert that takes the path of a renamed image im-age path, makes it in black and white and save it in the output folder out-put path as a ’.bmp’-photo.For check is false, the BWing 6 will be done using starting ratio.But if check is true, then then the user will be shown the image after theBWing to check if the starting ratio was good. If it isn’t he has the possibil-ity to change it and then do the BWing again and again until the user thinksthat the photo is good enough, so he writes 0 as an answer to the quesition”Change BW ratio?”.The BWing is done by comparing the color of every pixel in the image to themedian color of the it divided by the ratio, for greater it becmoes white and forless black. The comparison is done using the color norm.

6to BW a photo: To make it in black and white, that is, to make the only colors in it blackand white

35


def convert ( image path , output path , check=False ,s t a r t i n g r a t i o = 1 . 4 ) :



# add ’ conv ’ to the nameimage name = image path . s p l i t ( ”\\” ) [ −1 ] [ : −4 ]outputImage path = output path + ”\\” + image name +

” conv .bmp”

# open image us ing PILLOWimage = Image . open( image path ) . convert ( ’RGB’ )

medianColor i = medianColor ( image )r a t i o = s t a r t i n g r a t i o

# make a copy o f the imageBWed = Image . new( ’RGB’ , image . s i z e )

# as long as the user wantswhile ( True ) :

# BW the photo us ing r a t i ofor p in pixelMap ( image ) :

i f colorNorm ( image . g e t p i x e l (p ) ) <=colorNorm ( medianColor i )/ r a t i o :

BWed. putp ixe l (p , ( 0 , 0 , 0 ) )else :

BWed. putp ixe l (p , ( 2 5 5 , 25 5 , 2 5 5 ) )i f not check :

break

# no t i f y user to check i f the BWing was good and# show i t to him then ask him i f he wants to# change anyth ingwinsound . Beep ( Freq , Dur)BWed. show ( )change = f loat ( input ( ”Change BW r a t i o ?\n” ) )i f change == 0 :

breakelse :

r a t i o = r a t i o + changeBWed = Image . new( ’RGB’ , image . s i z e )

# at the end save the conver ted and BWed imageBWed. save ( outputImage path , ”BMP” )

return outputImage path

36


And again, the function convert all in folder will do the converting for allphotos in a given folder. Because this operation takes a lot of time, it’ll be goodto make sure if a photo is already converted and to show the user a progressbar, so that he knows how many photos are still there to be converted.

def c o n v e r t a l l i n f o l d e r ( f o ld e r pa th , output path =defau l tOutput pathes [ ”conv” ] , check=False ,

s t a r t i n g r a t i o =1 .4) :# make sure t ha t t h e r e ’ s a f o l d e r f o r the outputi f output path . s p l i t ( ”\\” ) [−1] not in


done = 0

# for a l l photos in the input f o l d e rfor f in l i s t d i r ( f o l d e r p a t h ) :

# make sure the photo i s not a l r eady conver teda l r eady conve r t ed = Falsefor f conv in l i s t d i r ( output path ) :

i f i m a g e i t e r ( f ) == i m a g e i t e r ( f conv ) :a l r eady conve r t ed = True

# i f not then conver t i ti f not a l r eady conve r t ed :

convert ( f o l d e r p a t h + ”\\” + f , output path ,check , s t a r t i n g r a t i o )

done += 1# show in progre s s bar how many a l r eady conver tedp r o g r e s s b a r ( done , len ( l i s t d i r ( f o l d e r p a t h ) ) )

return output path

37


4.2.4 Cut in Pieces

In order to cut the photo in pieces the way described before, we’ll need first tobreak the photo image in its lines of text. Then cut the lines in pieces usingpieces no.

def c u t i n p i e c e s ( image path , output path , p i e c e s n o ) :# make sure t ha t t h e r e ’ s a f o l d e r f o r the outputi f output path . s p l i t ( ”\\” ) [−1] not in


# make a f o l d e r f o r the p i e c e simage name = image path . s p l i t ( ”\\” ) [ −1 ] . s p l i t ( ” . ” ) [ 0 ]makedirs ( output path + ”\\” + image name )

# open image us ing PILLOWimage = Image . open( image path )i m a g e l i n e s = [ ]

”””1 . break image in l i n e s”””

# fo r a l l l i n e s in the photol i n e i m a g e i t e r = 0for l i n e i m a g e in i m a g e l i n e s :

”””2 . cut in p i e c e s”””

return output path + ”\\” + image name

38


The breaking in lines will use the white space between lines as a criteria tocrop the photo in its lines. These will be also given a name, which is the numberof their occurrence in the photo.Note that this works only for simple art works, where the lines are not tangledin one another, which is sometimes also possible in Arabic calligraphy. For morecomplicated works, we’ll need to cut the photo in lines manually.

# 1. break image in l i n e sl a s tBreak ingL ine = 0newImage l ine = Falsefor y in range ( image . s i z e [ 1 ] ) :

break ingLine = Truefor x in range ( image . s i z e [ 0 ] ) :

i f image . g e t p i x e l ( ( x , y ) ) == ( 0 , 0 , 0 ) :break ingLine = FalsenewImage l ine = Truebreak

i f break ingLine and newImage l ine :box = (0 , la s tBreak ingLine , image . s i z e [ 0 ] , y )i m a g e l i n e s += [ image . crop ( box ) ]newImage l ine = Falsel a s tBreak ingL ine = y

The second step is to cut each line in pieces. First we’ll crop the photo withsmall margins around the actual black pixels, which are the text. The cuttingin pieces will then follow using radius, to get a cropping box of radius2. Andstep, which is the difference between the start of a piece and the start of theother. These two can be calculated given the number of pieces pieces no andthe width of the whole photo of one line.Because writing in Arabic is horizontal, the height of the photo will be most ofthe times for one font the same. But the width will differ w.r.t. the length ofthe text (just like in Latin). So cutting only with respect to the height makesmore sense, since it gives similar results by each one font.

39


# 2. cut in p i e c e sb l a c k P i x e l s i = b l a c k P i x e l s ( l i n e i m a g e )i f len ( b l a c k P i x e l s i ) == 0 :

continue

# crop photo wi th very sma l l marginsl e f t u p = (min ( [ p [ 0 ] for p in b l a c k P i x e l s i ] ) ,

min ( [ p [ 1 ] for p in b l a c k P i x e l s i ] ) )r ight down = (max( [ p [ 0 ] for p in b l a c k P i x e l s i ] ) ,

max( [ p [ 1 ] for p in b l a c k P i x e l s i ] ) )box = ( l e f t u p [0 ]−1 , l e f t u p [1 ]−1 , r ight down [0 ]+1 ,

r ight down [1 ]+1)f i t L i n e i m a g e = l i n e i m a g e . crop ( box )# i f the r e s u l t i n g image have very bad qua l i t y ,

# then the image w i l l be ignoredi f f i t L i n e i m a g e . s i z e [ 1 ] <= 30 :

continue

# cut in p i e c e srad iu s = int ( f i t L i n e i m a g e . s i z e [ 1 ] / p i e c e s n o )s tep = int ( f i t L i n e i m a g e . s i z e [ 1 ] / ( p i e c e s n o + 1) )

i = j = 0for x in range ( f i t L i n e i m a g e . s i z e [0]− step ,− s tep+1,− s tep ) :

for y in range (0 , f i t L i n e i m a g e . s i z e [1]− s tep +1, s tep ) :i f x < 0 : x = 0i f x > f i t L i n e i m a g e . s i z e [ 0 ] :

x = f i t L i n e i m a g e . s i z e [ 0 ] − 1i f y < 0 : y = 0i f y > f i t L i n e i m a g e . s i z e [ 1 ] :

y = f i t L i n e i m a g e . s i z e [ 1 ] − 1i f x + rad iu s > f i t L i n e i m a g e . s i z e [ 0 ] :

x = f i t L i n e i m a g e . s i z e [ 0 ] − 1 − rad iu si f y + rad iu s > f i t L i n e i m a g e . s i z e [ 1 ] :

y = f i t L i n e i m a g e . s i z e [ 1 ] − 1 − rad iu sbox = (x , y , x+radius , y+rad iu s )p i e c e = f i t L i n e i m a g e . crop ( box )p i e c e . save ( output path + ”\\” +

image name + ”\\” + image name +” l i n e ( ” + str ( l i n e i m a g e i t e r ) +” ) p i e c e ” + str ( ( i , j ) ) + ” .bmp” )

j+=1j = 0i+=1

l i n e i m a g e i t e r += 1

40


This again will be done for all photos in a given folder with cut in pieces all in folder.

def c u t i n p i e c e s a l l i n f o l d e r ( f o ld e r pa th , output path =defau l tOutput pathes [ ” p i e c e s ” ] , p i e c e s n o = 1 ) :



done = 0

# for a l l photos in the f o l d e rfor f in l i s t d i r ( f o l d e r p a t h ) :

# check i f the photo i s a l r eady cutalreadyCut = Falsefor f c u t in l i s t d i r ( output path ) :

i f i m a g e i t e r ( f ) == i m a g e i t e r ( f c u t ) :alreadyCut = True

# i f not a l r eady cut then cut i ti f not alreadyCut :

c u t i n p i e c e s ( f o l d e r p a t h + ”\\” + f ,output path , p i e c e s n o )

done += 1# show the progre s s to the userp r o g r e s s b a r ( done , len ( l i s t d i r ( f o l d e r p a t h ) ) )

return output path

41


4.2.5 Vectorize

In the contrary to the other functions, this function (in the modified idea) isvery simple, because all the work is done by other programs: potrace.

def v e c t o r i z e ( image path , output path =defau l tOutput pathes [ ” vect ” ] ,output pathes=defau l tOutput pathes ) :



# make sure t ha t t h e r e ’ s a f o l d e r f o r# the output f o r the s imp l i f i e d photos

i f output pathes [ ”simp” ] . s p l i t ( ”\\” ) [−1] not inl i s t d i r ( output path . s p l i t ( ”\\” ) [ 0 ] ) :

makedirs ( output pathes [ ”simp” ] )

# name the image as the r e gu l a r expre s s i on s t a t e simage name = image path . s p l i t ( ”\\” ) [ −1 ] . s p l i t ( ” . ” ) [ 0 ]outputImage path = output path + ”\\” + image name

+ ” vect . svg ”

# ve c t o r i z e photo us ing potracec a l l ( ” potrace −s −o \”” + outputImage path + ”\” \””

+ image path + ”\”” , s h e l l=True )

# copy the photo to the s imp l i f i e d photos# as a photo s imp l i f i e d 0 t imes

c a l l ( ”copy \”” + image path + ”\” \”” +output pathes [ ”simp” ] + ”\”” , s h e l l=True )

return outputImage path

42


And here again, the vectorization for all photos in a folder will be done usingthe function vectorize all in folder

def v e c t o r i z e a l l i n f o l d e r ( f o ld e r pa th , output path =defau l tOutput pathes [ ” vect ” ] ,output pathes=defau l tOutput pathes ) :



# make sure t ha t t h e r e ’ s a f o l d e r f o r# the output f o r the s imp l i f i e d photos

i f output pathes [ ”simp” ] . s p l i t ( ”\\” ) [−1] not inl i s t d i r ( output path . s p l i t ( ”\\” ) [ 0 ] ) :

makedirs ( output pathes [ ”simp” ] )

# for each photo in the f o l d e rfor f in l i s t d i r ( f o l d e r p a t h ) :

# check i f i t ’ s a l r eady v e c t o r i z e da l r eadyVecto r i z ed = Falsefor f v e c t in l i s t d i r ( output path ) :

i f i m a g e i t e r ( f ) == i m a g e i t e r ( f v e c t ) andimage p i ece ( f ) == image p i ece ( f v e c t ) :

a l r eadyVecto r i z ed = True

# i f i t ’ s not a l r eady v e c t o r i z e d than v e c t o r i z e i ti f not a l r eadyVecto r i z ed :

v e c t o r i z e ( f o l d e r p a t h + ”\\” + f , output path )

return output path

4.2.6 Simplify

The simplification was first done using Inkscape, but because it takes longtime, we’ll just do it by resizing the photo to a smaller scale using GoogleMagick. This resize will make the curves (i.e. number of nodes in curves) afterthe vectorization shorter, which’s just our goal. This scale is chosen to be 75%after some tries:For > 75% we’ll need a lot of time again until getting the right length of thecurves. For < 75% we’ll lose some details, that shouldn’t necessary be loosen.The ’.bmp’-piece will be restored from the folder of the pieces, but this will bedone by the function that’s going to ask for a simplification: svg to integers list.

def s i m p l i f y ( image path , output path ) :# make sure t ha t t h e r e ’ s a f o l d e r f o r the outputi f output path . s p l i t ( ”\\” ) [−1] not in


43


”””o ld way us ing Inkscape

image name = image path . s p l i t (”\\”) [ −1] . s p l i t ( ” . ” ) [ 0 ]outputImage path =

ou tpu t pa th + ”\\” + image name + ”. svg ”c a l l (” copy \”” + image path + ”\” \””

+ ou tpu t pa th + ”\\” + image name +” copy . svg ” + ”\”” , s h e l l=True )

c a l l (” inkscape \”” + ou tpu t pa th + ”\\”+ image name + ” copy . svg ” + ”\”−−verb=Ed i t S e l e c tA l l −−verb=Se l e c t i o nS imp l i f y−−verb=Fi l eSave −−verb=Fi l eC lo s e−−verb=Fi l eQu i t ” , s h e l l=True )

c a l l (” inkscape −−export−png=\”” +outputImage path [ :−4] + ”. png\”−background=whi te \”” +ou tpu t pa th + ”\\” + image name +” copy . svg ” + ”\”” , s h e l l=True )

c a l l (”magick conver t \”” + outputImage path [ :−4] +”. png\” \”” + outputImage path [ :−4] +”.bmp\”” , s h e l l=True )

c a l l (” po trace −s −o \”” + outputImage path [ :−4] +” s imp l i f i e d . svg \” \”” + outputImage path [ :−4]+ ”.bmp\”” , s h e l l=True )

re turn outputImage path [ :−4] + ” s imp l i f i e d . svg ””””

”””new way us ing j u s t magick r e s i z e”””image name = image path . s p l i t ( ”\\” ) [ −1 ] . s p l i t ( ” . ” ) [ 0 ]# name of the r e s i z e d image wi th

# ’ res ’ added to the endr e s i z ed image pa th = output path + ”\\” + image name +

” r e s .bmp”

# the name with one ’ s ’ added to the end# as the r e gu l a r expre s s i on s t a t e s

outputImage path = output path + ”\\” + image name+ ” s . svg ”

# r e s i z e image us ing magickc a l l ( ”magick convert \”” + image path +

”\” −r e s i z e 75% \”” + re s i z ed image pa th +”\”” , s h e l l=True )

# ve c t o r i z e the r e s i z e d imagec a l l ( ” potrace −s −o \”” + outputImage path + ”\” \””

44


+ res i z ed image pa th + ”\”” , s h e l l=True )return outputImage path

Unlike the other function, no ”for all” function is here needed, since this functionwill only be called by the next one.

4.2.7 SVG to Lists of Integers

All what this function does is reading the ’.svg’-file and getting the informationabout the curves out of it. This won’t be very hard, because potrace, as saidbefore, saves these in a friendly way.These will be then simplified if needed and saved in ’.txt’-files.

At first, an ’.svg’-file created by potrace looks like this:

<?xml version=” 1 .0 ” standalone=”no”?>< !DOCTYPE svg PUBLIC ”−//W3C//DTD SVG 20010904//EN”

” h t t p : //www. w3 . org /TR/2001/REC−SVG−20010904/DTD/svg10 . dtd”>

<svg version=” 1 .0 ” xmlns=” ht t p : //www. w3 . org /2000/ svg ”width=” 186.000000 pt” he ight=” 186.000000 pt”viewBox=”0 0 186.000000 186.000000 ”preserveAspectRat io=”xMidYMid meet”>

<metadata>Created by potrace 1 . 15 ,wr i t t en by Peter S e l i n g e r 2001−2017</metadata><g trans form=” t r a n s l a t e (0 . 000000 ,186 .000000)s c a l e (0 .100000 ,−0.100000) ”f i l l =”#000000” s t r oke=”none”><path d=”M437 1813 c−3 −5 −8 −42 −11 −83 l−7 −75 56 −48c67 −58 108 −133 108 l −197 1 −25 −3 −84 −7 −132 −7 −74−5 −88 7 −88 16 0 40 76 62 198 25 147 −19 277 −128 378−57 53 −72 61 −80 47 z”/><path d=”M1 310 c−1 −154 2 −195 18 −245 l19 −60 56 −3c31 −2 56 1 56 5 0 4 −29 100 −64 213 −35 113 −69 221−74 240 −8 26 −10 −11 −11 −150z”/></g></ svg>

The function svg to integers list gets an ’.svg’-file of a piece, paths7 forboth simplification and output and maxima for the length of both the curvesand the lines, so that they fit in the neural network.Having max line length equals 0 means that lines shouldn’t be considered.Getting piece with a curve in it with length greater than the maximum leads toa simplification of this piece.

7Disambiguation: path have here two meaning:1. A path that leads us through the folders to a certain file in the system.2. A path in an ’.svg’-file

45


def s v g t o i n t e g e r s l i s t ( p i ece path , s imp l i f y path ,output path , maxCurve length =100 , maxLine length =0):



piece name = p i e c e pa th . s p l i t ( ”\\” ) [ −1 ] . s p l i t ( ” . ” ) [ 0 ]

# here we ’ l l save the paths in the svg f i l esvg paths = [ ]# here the curves out o f themc u r v e s i n t e g e r s l i s t s = [ ]# and here the l i n e sl i n e s i n t e g e r s l i s t s = [ ]

# we ’ l l read only the important data f o r us# from the svg f i l e as a s t r i n g

svg = open( p i e c e pa th ) . read ( )svg = svg . s p l i t ( ”</metadata>” ) [−1]svg = svg . r e p l a c e ( ”\n” , ” ” )

# the paths are then between ”<” and ”>”# and beg in wi th the word ’ path ’

for x in between ( svg , ”<” , ”>” ) :i f len ( x ) >= 4 and x [ : 4 ] == ”path” and ”c” in x :

svg paths += [ x ]

# for each path o f t h e s efor svg path in svg paths :

# 1. f i nd the cruvesi f maxLine length != 0 :

# 2. f i nd the l i n e s i f needed

# 3. s imp l i f y i f needed

return output path

46


After deleting the non important data (for the font) we get:

<g trans form=” t r a n s l a t e (0 . 000000 ,186 .000000)s c a l e (0 .100000 ,−0.100000) ”f i l l =”#000000” s t r oke=”none”><path d=”M437 1813 c−3 −5 −8 −42 −11 −83 l−7 −75 56 −48c67 −58 108 −133 108 l −197 1 −25 −3 −84 −7 −132 −7 −74−5 −88 7 −88 16 0 40 76 62 198 25 147 −19 277 −128 378−57 53 −72 61 −80 47 z”/><path d=”M1 310 c−1 −154 2 −195 18 −245 l19 −60 56 −3c31 −2 56 1 56 5 0 4 −29 100 −64 213 −35 113 −69 221−74 240 −8 26 −10 −11 −11 −150z”/></g></ svg>

All ’svg’-commands are surrounded by ’<’ and ’>’. The commands that beginwith ’path’ are the paths. These are the paths in this photo, which we saved inthe list svg paths:

<path d=”M437 1813 c−3 −5 −8 −42 −11 −83 l−7 −75 56 −48c67 −58 108 −133 108 l −197 1 −25 −3 −84 −7 −132 −7 −74−5 −88 7 −88 16 0 40 76 62 198 25 147 −19 277 −128 378−57 53 −72 61 −80 47 z”/>

<path d=”M1 310 c−1 −154 2 −195 18 −245 l19 −60 56 −3c31 −2 56 1 56 5 0 4 −29 100 −64 213 −35 113 −69 221−74 240 −8 26 −10 −11 −11 −150z”/>

Each path begins with ’d =’ and then a string. This string contains the drawingcommands, which are in potrace:

• Mx y: Move to point (x, y).

• Cx1 y1 x2 y2 ... xn yn: draw a Bezier curve using (x1, y1), (x2, y2), ..., (xn, yn)as control points.

• Lx1 y1 x2 y2 ... xn yn: Draw straight lines using from (x1, y1) to (x2, y2)and from (x2, y2) to (x3, y3) and so on.

• z means close the path. In other words go to the starting point.

Note that for small letters, the coordinates are relative and not absolute. Forexample Mx y ca b... means that the second control point for the curve after(x, y) is in absolute coordinates (x+ a, y + b).

Where the path starts and ends, is not really important for the font, becausehaving an ”aleph” at the middle or at the end of the photo means the samefor us, so ’M ’ and ’z’ drawing commands are not important for us. The onlyrelevant information are actually the control points of the Bezier curve and thosefor the line. So we’ll save these as integers in a list. This list will be here calledintegers list using our convention for variables.

47


for svg path in svg paths :i = 0# we ’ l l t ake the curves out o f i twhile i < len ( svg path ) :

i f svg path [ i ] == ”c” :j = i + 1while j < len ( svg path ) :

# other non−i n t e r e s t i n g svg−cons tan t s# fo r curves shou ld be removed

i f svg path [ j ] == ”m” or svg path [ j ] == ” l ”or svg path [ j ] == ”z” :

i n t e g e r s s t r i n g = svg path [ i +1: j ]i n t e g e r s s t r i n g s l i s t =

i n t e g e r s s t r i n g . s p l i t ( ” ” )while ’ ’ in i n t e g e r s s t r i n g s l i s t :

i n t e g e r s s t r i n g s l i s t . remove ( ”” )i n t e g e r s l i s t = [ int ( x . s p l i t ( ” . ” ) [ 0 ] )

for x in i n t e g e r s s t r i n g s l i s t ]

# between each ”c” and any o ther# constant , t h e r e ’ s a curve

c u r v e s i n t e g e r s l i s t s +=[ i n t e g e r s l i s t ]

i = j + 1break

j+=1i+=1

i = 0

48


The same will be done for lines:

while i < len ( svg path ) :i f svg path [ i ] == ” l ” :

j = i + 1while j < len ( svg path ) :

# other non−i n t e r e s t i n g svg−cons tan t s# fo r l i n e s shou ld be removed

i f svg path [ j ] == ”m” or svg path [ j ] == ”c”or svg path [ j ] == ”z” :

i n t e g e r s s t r i n g = svg path [ i +1: j ]i n t e g e r s s t r i n g s l i s t =

i n t e g e r s s t r i n g . s p l i t ( ” ” )while ’ ’ in i n t e g e r s s t r i n g s l i s t :

i n t e g e r s s t r i n g s l i s t . remove ( ”” )i n t e g e r s l i s t = [ int ( x . s p l i t ( ” . ” ) [ 0 ] )

for x in i n t e g e r s s t r i n g s l i s t ]

# between each ” l ” and any o ther# constant , t h e r e ’ s a l i n e

l i n e s i n t e g e r s l i s t s += [ i n t e g e r s l i s t ]i = j + 1break

j+=1i+=1

49


Then we’ll check if a simplification is needed. In the other case the ’.txt’-filewill be written. The curves will be surrounded with c’s and the lines with l.

# the maximum in l en g t h o f a l l curves and l i n e scurrMaxCurve length = max( [ len ( i n t e g e r s l i s t )

for i n t e g e r s l i s t in c u r v e s i n t e g e r s l i s t s ] )i f len ( c u r v e s i n t e g e r s l i s t s ) != 0 else 0

currMaxLine length = max( [ len ( i n t e g e r s l i s t )for i n t e g e r s l i s t in l i n e s i n t e g e r s l i s t s ] )

i f len ( l i n e s i n t e g e r s l i s t s ) != 0 else 0

# i f t h i s maximum does ’ nt f i t in the NN,# the p i e ce shou ld be s imp l i f i e d

i f currMaxCurve length >= maxCurve length or( currMaxLine length >= maxLine lengthand maxLine length != 0 ) :

p i e c e pa th = s i m p l i f y ( p iece path , s i m p l i f y p a t h )return s v g t o i n t e g e r s l i s t ( p i ece path , s imp l i f y path ,

output path , maxCurve length , maxLine length )else :

# e l s e j u s t wr i t e the ’ . t x t ’− f i l e# with the l i s t s o f i n t e g e r s

o u t p u t f i l e = open( output path + ”\\” + piece name +” . txt ” , ”w” )

o u t p u t f i l e . wr i t e ( ”c” + str ( c u r v e s i n t e g e r s l i s t s ) +” c l ” + str ( l i n e s i n t e g e r s l i s t s ) + ” l ” )

o u t p u t f i l e . c l o s e ( )

The result will look something like this:

50


This will be also done for all pieces:

def s v g t o i n t e g e r s l i s t a l l i n f o l d e r ( f o ld e r pa th ,s i m p l i f y p a t h=defau l tOutput pathes [ ”simp” ] ,output path=defau l tOutput pathes [ ” i n t ” ] ) :

i f output path . s p l i t ( ”\\” ) [−1] not inl i s t d i r ( output path . s p l i t ( ”\\” ) [ 0 ] ) :makedirs ( output path )

done = 0for f in l i s t d i r ( f o l d e r p a t h ) :

i f ” . svg ” in f :a l readySvg = Falsefor f s v g in l i s t d i r ( output path ) :

i f i m a g e i t e r ( f ) == i m a g e i t e r ( f s v g )and image p i ece ( f ) ==image p i ece ( f s v g )and i m a g e l i n e ( f ) ==i m a g e l i n e ( f s v g ) :

a lreadySvg = Truei f not alreadySvg :

s v g t o i n t e g e r s l i s t ( f o l d e r p a t h + ”\\” +f , s i m p l i f y p a t h , output path ,max curve length , max l ine l eng th )

done += 1p r o g r e s s b a r ( done , len ( l i s t d i r ( f o l d e r p a t h ) ) )

return output path

51


4.2.8 All Together

The function work on export txts will take the renamed photos from thefolder Renamed and do all the steps until the .txt-files-level.It’s called work on, because it can be stopped at any time, and when rerunningit, it continues the work, instead of beginning again.

def work on expor t tx t s ( output pathes=defau l tOutput pathes, check=False ) :

print ( ” Converting . . . ” )c o n v e r t a l l i n f o l d e r ( output pathes [ ” ren ” ] ,

output pathes [ ”conv” ] , check )print ( ” Al l are converted ” )# check i f the photos are BWed with the r i g h t r a t i oi f not check :

e x i t ( ”Check now ! ” )print ( ” Cutting in p i e c e s . . . ” )c u t i n p i e c e s a l l i n f o l d e r ( output pathes [ ”conv” ] ,

output pathes [ ” p i e c e s ” ] )print ( ” Al l are cut ” )done = 0print ( ” Vec to r i z i ng . . . ” )for f o l d e r p a t h in l i s t d i r ( output pathes [ ” p i e c e s ” ] ) :

v e c t o r i z e a l l i n f o l d e r ( output pathes [ ” p i e c e s ” ] +”\\” + fo lde r pa th , output pathes [ ” vect ” ] )

done += 1p r o g r e s s b a r ( done , len ( l i s t d i r (

output pathes [ ” p i e c e s ” ] ) ) )print ( ” Al l are v e c t o r i z e d ” )s v g t o i n t e g e r s l i s t a l l i n f o l d e r (

output pathes [ ” vect ” ] , output pathes [ ”simp” ] ,output pathes [ ” i n t ” ] )

print ( ”Txts f u l l y exported ” )

52

5 CURVES TO FONT

5 Curves to Font

In Photo to Curves we started with a photo and ended with information aboutthe curves in this photo. Here we’ll start with these information to determinethe font of the text in the photo. Recall:Steps from Curves to Font

1. Import Data: Import data out of the saved .txt-files and save them ina convenient way: For this we’ll choose a dictionary with a key for eachcurve. This key will be then linked to the corresponding font of this curve.Here we’ll still face the problem of dimensions, because the length of thecurves is still different from one to another. But we’ll start with a naivesolution.

2. Neural Network: Train the neural network with these data.

5.1 Implementation

5.1.1 Import Data

As said before at first we need to save the data in an easily accessible way,which’s the dictionary data.And in case we want to reverse the process, the dictionary image ids can re-turn back from any curve to its original piece and therefore the original photoit comes from.

def import data ( tx t s pa th=de fau l tTxts path ) :print ( ” Importing t r a i n i n g data . . . ” )# d i c t t h a t l i n k s each curve wi th the f on t o f i tdata = {}# d i c t t h a t l i n k s each curve wi th the photo# i t ’ s taken fromimage ids = {}

done = 0for txt name in l i s t d i r ( tx t s pa th ) :

# informat ion t ha t determine the p i e cei m a g e i t e r i = i m a g e i t e r ( txt name )i m a g e p i e c e i = image p i ece ( txt name )i m a g e l i n e i = i m a g e l i n e ( txt name )

# the f on t i s the 1 s t charac t e r o f the namef ont = int ( txt name [ 0 ] )# the output i s 1 f o r the f on t o f the p i e ce and# 0 fo r o ther f on t s# the s i z e o f the output shou ld f i t wi th the NNoutput = [ 0 ] ∗ o u t p u t s i z eoutput [ f ont ] = 1

# make sure the l i s t i s not empty ( l en ( ” [ ] ” ) = 2)# fo r curves and then f o r l i n e s

53

5.1 Implementation 5 CURVES TO FONT

# in the o ther case , f i l l t he space f o r curves# or l i n e s wi th 0 ’ si f len (open( tx t s pa th + ”\\” + txt name )

. read ( ) . s p l i t ( ”c” ) [ 1 ] ) != 2 :c u r v e s i n e t e g e r s l i s t s s t r i n g =

open( tx t s pa th + ”\\” + txt name ). read ( ) . s p l i t ( ”c” ) [ 1 ]

else :c u r v e s i n e t e g e r s l i s t s s t r i n g =

str ( [ [ 0 ] ∗ max curve length ] )i f len (open( tx t s pa th + ”\\” + txt name )

. read ( ) . s p l i t ( ” l ” ) [ 1 ] ) != 2 :l i n e s i n e t e g e r s l i s t s s t r i n g =

open( tx t s pa th + ”\\” + txt name ). read ( ) . s p l i t ( ” l ” ) [ 1 ]

else :l i n e s i n e t e g e r s l i s t s s t r i n g =

str ( [ [ 0 ] ∗ max l ine l eng th ] )

# copy the s e s t r i n g s in t o a c t ua l l i s t sglobal c u r v e s i n t e g e r s l i s t s , l i n e s i n t e g e r s l i s t scommand = ” c u r v e s i n t e g e r s l i s t s = ” +

c u r v e s i n e t e g e r s l i s t s s t r i n g +\” ; l i n e s i n t e g e r s l i s t s = ” +l i n e s i n e t e g e r s l i s t s s t r i n g

exec (command , globals ( ) )

# for each curvefor c u r v e i n t e g e r l i s t in c u r v e s i n t e g e r s l i s t s :

∗∗ add the curves and l i n e s to the data

done += 1# show the progre s s to the userp r o g r e s s b a r ( done , len ( l i s t d i r ( tx t s pa th ) ) )

print ( ” Train ing data imported . ” )return data , image ids

54


curve integer tuple and line integer tuple will be the actual input ofthe neural network. But because the curves and lines don’t all have the samelength these should be first normalised by adding 0’s so that it fits with theinput of the neural network.The case that the length is bigger than the input of the neural network will beasserted by the simplification before.This is just a naive solution and will be discussed further in the problem ofdimensions.

∗∗# add 0 ’ s to the l i s t u n t i l i t f i t s wi th the NNwhile len ( c u r v e i n t e g e r l i s t ) < max curve length :

c u r v e i n t e g e r l i s t += [ 0 ]# as s e r t the case t ha t the l e n g t h o f the l i s t i s b i g g e r# than the input o f the NNi f len ( c u r v e i n t e g e r l i s t ) > max curve length :

print ( ” Error in path length ” )

# turn the l i s t i n t o a t u p l ec u r v e i n t e g e r t u p l e = tuple ( c u r v e i n t e g e r l i s t )

# do the same fo r each l i n efor l i n e i n t e g e r l i s t in l i n e s i n t e g e r s l i s t s :

while len ( l i n e i n t e g e r l i s t ) < max l ine l eng th :l i n e i n t e g e r l i s t += [ 0 ]

i f len ( l i n e i n t e g e r l i s t ) > max l ine l eng th :print ( ” Error in path length ” )

l i n e i n t e g e r t u p l e = tuple ( l i n e i n t e g e r l i s t )

# add the cruve and the l i n e to data and image i d simage ids [ c u r v e i n t e g e r t u p l e + l i n e i n t e g e r t u p l e ] =

i m a g e i t e r i , i ma g e p i e c e i , i m a g e l i n e idata [ c u r v e i n t e g e r t u p l e + l i n e i n t e g e r t u p l e ] = output

55


5.1.2 Neural Network

At first we make an instance of the neural network. The choose of the structureof the hidden layers is not clear and will be discussed in the problem of structure.

i n p u t s i z e = max curve length + max l ine l eng th# output s i z e i s the number o f f on t so u t p u t s i z e = 2hiddenLayers = [ 2 0 , 1 5 ]actFunc = ”sigm”actFuncOutput = ”sigm”

nn = NN( i n p u t s i z e , ou tput s i z e , hiddenLayers ,a c t f u n c s t r=actFunc , a c t f u n c o u t p u t s t r=actFuncOutput )

Then we’ll import the data using the function defined before. Before that wehave to rename the photos and export the .txt-files out of them using all thesteps in from Photo to Curves.

r ename a l l ( images path )work on expor t tx t s ( check=True )

t r a in ing da ta , image ids = import data ( de fau l tTxts path )

And then we’ll train the neural network.

s tep = 0 .1def t r a i n ( ) :

print ( ” Train ing . . . ” )while ( True ) :

for x in keys :nn . t r a i n (x , t r a i n i n g d a t a [ x ] , s t ep=step , t imes =1)

The resulting neural network should be then able to determine the font.This sounds here very easy, but this is actually just the beginning of a newapproach to improve the results of the neural networks.

56

6 IMPROVEMENTS ON THE NEURAL NETWORK

6 Improvements on the Neural Network

6.1 Measurement

We need a measure, or criteria to determine how good the neural network is.These will be discussed here. Let’s define:The input sets:

• Input Photos: P : Set of all input photos.

• Curves of a photo: Cu(p); p ∈ P : Set all curves in the photo p.

• All Curves: X: Set of all curves, i.e. X =⋃

p∈P Cu(p).

• As before: L0 index set for the output of the neural network. I.e. it’s alsothe set of all fonts.

The outputs in curve-level:

• Output: O(x) ∈ R|L0|;x ∈ X output of the neural network, when givenan input curve x.

• Expected Output: t(x) ∈ R|L0|;x ∈ X expected output of the neuralnetwork, when given an input x.

The outputs in photo-level:

• Full Output: FO(p) ∈ R|L0|; p ∈ P the full output for a photo, not onlya curve, where

FOj(p) =

∑x∈Cu(p)

Oj(x)

/|Cu(p)|; j ∈ L0

• (n) Best Full Output for |L0| = 2: (n)FOj(p); p ∈ P, n ∈ N:

1. Sort {x ∈ Cu(p)} using |O1(x) − O2(x)| as a criteria from greatestto lowest.

2. Take the first d 1

n|Cu(p)|e curves of the sorted list and call it A.

3. Now

(n)FOj(p) =

(∑x∈A

Oj(x)

)/|A|; j ∈ L0

• Expected Full Output: Ft(p) ∈ R|L0|; p ∈ P expected output of theneural network, when given any input curve x out of the photo, since thephoto has only one font.

Errors in the curve-level:

• Error:err(x) = max{|Oj(x)− tj(x)| : j ∈ L0};x ∈ X

57

6.1 Measurement 6 IMPROVEMENTS ON THE NEURAL NETWORK

Errors in font-level:

• Error in Font j; j ∈ L0:

errf (j) =

∑

x∈Cu(P )p∈P

p is in font j

err(x)

/ |{x ∈ Cu(P ); p ∈ P ; p is in font j}|

Errors in the photo-level:

• Full Error:

err1(p) = max{|FOj(p)− Ftj(p)| : j ∈ L0}; p ∈ P

• (n) Best Full Error; n ∈ N:

errn(p) = max{|(n)FOj(p)− Ftj(p)| : j ∈ L0}; p ∈ P

Error analysis ∀(err, A) ∈ {(err, X), (errn, P ), (errf , L0)}:

• Maximal Error:

max err = max{err(a) : a ∈ A}

• Minimal Error:

max err = min{err(a) : a ∈ A}

• Relative Error(Mean):

err = (∑a∈A

err(a))/|A|

• ”Possible” errors: e.g.

PE = {i/100 : i = 0, 1, ..., 100}

• Error count:

count err(e) = |{x ∈ X : b100err(x)c = 100e}|; e ∈ PE

Number of curves/photos that make the error e.

• Error Deviation:

σ =

√∑a∈A

(err− err(a))2

• Error Distribution:

dist(a) =1√2πσ

e

−(a− err)2

2σ2 ; a ∈ err(A) = [0, 1]

The idea behind all these definitions will be clear in the next sections.

58

6.2 First Try 6 IMPROVEMENTS ON THE NEURAL NETWORK

6.2 First Try

In section How to Use a Neural Network we discussed all what we need for aneural network to work. Let’s give these here some values:

• Structure: Let it at first be 2 layers with 2,2 nodes respectively.

• Weights: Totally random. With more than one initializing we get differ-ent random starting values, where some might be better than others.

• Activation Function: Let it be the sigmoid function.The sigmoid function is great for problems, where the output should bea probability between 0 and 1. It’s better than the sign function becauseit’s still continuous. At the same time it works like a ”firing” function:The node is activated only after a certain ”edge”. Before this ”edge” weget values close to 0 and after it we get values close to 1. Very near of the”edge” there’s a ”jump”.

Linear, sign and ReLu functions were also tried and gave very bad results.

• Forward Propagation Function: Already defined.

• Backward Propagation Function(Training Function): Already de-fined.

• Training Data: Downloaded from the internet and chosen randomly.

We add to them here:

• Step Function: Let it at first be constant 0.1.

• Input Size: Let at first take only curves (no lines) out of the photos. Letthe maximum curve length be 100, so we’ll simplify the pieces as long asit takes, so that the length of all curves is less than 100.

• Output Size: The input photos will be in 2 fonts: Naskhi and Ruqa’i.So the output size will be 2.

• Number of Pieces: The number of pieces that the photo will be (verti-cally) cut in. Let it at first be 2.

59


The result is after 1 iteration (1 time of learning on all the curves):

Discussion:

• The horizontal axes represents the errors. The vertical axes represents thenumber of curves for the error count.

• The measurement here includes maximal, relative and minimal errors incurves. The less these get, the better the result is.We want the maximal error to be under 0.5, because this means we canrecognize the font in all curves.

• The measurement includes error count and distribution in curves. The lessthe maximal point8 of the error distribution gets, the better the result is.0.5 is the maximal point of the natural (random) distribution, so this firsttry is very bad.

8there, where the maximum is taken, not the value of the maximum

60


After some time, we see that the result is not really getting better:

So we try to make some changes ...

61

6.3 Structure 6 IMPROVEMENTS ON THE NEURAL NETWORK

6.3 Structure

Let’s make the number of nodes in each hidden layer a little bit bigger, say 5,5:

We still get very bad results even after 50 minutes of training.

62


We try taking a huge structure, say 70,50:

Discussion:

• From the error count (blue) we can read that for some photos we get 0error, but for others we get 1 error, which’s the maximal error, that can bemade. This means that the neural network gives, say (1, 0) for all inputs.In other words, the neural network says for any photo it’s this one font,say 1. That’s why we get the best results for the photos of this one fontand the worse for other photos.

• We can say, the neural network had learnt by heart instead of recognizingthe patterns in each font.

• This also means that the measurement is still not representative enough.So let’s define error in font:

errfi (j) =

∑

x∈Cu(P )p∈P

p is in font j

erri(x)

/ |{x ∈ Cu(P ); p ∈ P ; p is in font j}|

63


We take the same structure but this time with more representative measure-ment:

a

Discussion:

• In black is error in curves; in green is error in the first font; in blue is errorin the second font. Correspondingly are the colored points for minimal,relative and maximal error.

• Through this measurement it’s clear that the neural network is learningby heart: The font with more examples gets a high accuracy and thefont with less examples gets very bad results, and the font is even notrecognizable. So what we’ve expected was right.

• It’ll be better if we try to balance the number of examples in both fonts.

64


Let’s try smaller structures, say 40,30:

a

Discussion:

• Even after a long time and more than one run with new initial randomweights, the result was not getting better than this. It was clear thatsomething new should be done, not only in the structure.

• We’ve learned about the structure, that it functions the best when wehave two layers, where the second layer has 3 quarters of the nodes in thefirst layer. There’s no clear reason for this, but after a lot of tries, thebest results was got using these structures.

• We’ve also learned that structures with more than 70 nodes in one layertake a lot of time and learn by heart. And with less than 10 nodes doesn’tchange a lot through learning.

65

6.4 Number of Pieces 6 IMPROVEMENTS ON THE NEURAL NETWORK

6.4 Number of Pieces

After checking some of the pieces, it was clear, that through this cut, a lot oflong curves, which are most of the times very important for the font, were cut.So we decided to try cutting in one piece instead of two pieces.

a

Discussion:

• With some iterations, we’ve noticed that the relative error was going down,while the relative errors in the two fonts always summed up to 1, whichmeans that the relevant result of the neural network was stable.This meant that the relative error is not representative enough and givesfalse information about the evolution of the error.

66


• The new proposed idea to give the relative error was to define in botherr, which is the sum of the relative error in both fonts divided by 2: Thenumber of fonts.

After defining in both err, we found that there’s a better way to follow theevolution of the error: Error history.

a

Discussion:

• The x-axis is the iteration. The y-axis is now the error.

• The new defined difference represents the difference between the maximalpoints in error distribution of both fonts. When this difference is high(close to 1), the neural network is learning by heart, or in other words,one font has good results and the other not, which also means that theneural network says by all curves, it’s only one font.

• As we can see, the in both err is actually stable, which illustrates theunrepresentativeness of the relative error.

• As we already said in structure, these big structure makes the neuralnetwork learn by heart. This is here clear, when looking at the maximumsin both fonts: One is almost 1, and the other is almost 0.

67


Now after we defined a better representative way to follow the evolution ofthe error, the best structure for 1 piece was found after a lot of trials using whatwe’ve learned in structure: 46,35.

a

a

Discussion:

• Even though it took a lot of time, the new defined in both err evolved inthis structure to a small value.

68

6.5 Dimensions 6 IMPROVEMENTS ON THE NEURAL NETWORK

• Add to that, the y coordinates of the colored points for the errors representnow the number of curves in corresponding font. As we see, both havenow similar number of curves.

• Through this representation, a new factor showed up: Step.It was clear that changes in step made very big differences in the evolutionof error. This will be discussed better in step.

• The maximum error is still about 1, which means we still need more im-provements.

6.5 Dimensions

The next proposed solution, so that the neural network gives better results, wasto change the maximal number of nodes9 in the curves. 100 was chosen by aguess. May 75, or 125 be better.Making this maximum bigger, means that the pieces can now be less simplified.So curves here are more detailed which should be an advantage.Making the maximum smaller, means that the learning process will go faster,since we then get less input nodes.

a

Discussion:

• Using 75 maximum nodes for the curves.

• Bad results even after a lot of time and runs.

9Disambiguation: Nodes in a curve are control points of the Bezier-curve. Nodes in aneural network represent neurons.

69


A new idea:

a

Discussion:

• Using 100 maximum nodes for the curves and 12 for the lines.

• Bad results even after a lot of time and runs.

• The idea of using lines wasn’t really successful and will be abandoned.The reason for this is probably, that the vectorization process, where theline-commands come from, considers only lines that are really straightlines and doesn’t round line-like curves into lines.

70


Another try:

a

Discussion:

• Using 125 maximum nodes for the curves. No lines will be considered anymore.

• After about 12 hours, the result converged to a result, which is not reallybetter than the one with 100 nodes.

• the maximum error is still almost 1.

• After that, we tried to change the step to find out, what this factor reallymeans.

71

6.6 Step 6 IMPROVEMENTS ON THE NEURAL NETWORK

6.6 Step

Discussion:

• A big step makes reasonably fast changes in the weights. This showed upin the error history through a fluctuated error evolution.

• On the other hand, a smaller step made the errors more stable.

• The step was first changed manually, but after these observations, a newstep function was defined:

1. Follow the error history of relative error for the last, say 50 iterations.

2. If there was fluctuation, the step should be decreased:

def f l u c t u a t i o n ( l i s t , n=50):maxima = 0for i in range (1 , len ( l i s t )−1):

i f l i s t [ i ] > l i s t [ i +1] andl i s t [ i ] > l i s t [ i −1] :

maxima += 1i f maxima >= n :

return Truereturn False

3. If the result was evolving very slowly, the step should be increased:

def s low ( l i s t , min de l ta =0.01) :d e l t a s = [ ]i f len ( l i s t ) <= 50 :

return Falsefor i in range ( len ( l i s t )−50 , len ( l i s t )−1):

d e l t a s += [ l i s t [ i +1]− l i s t [ i ] ]i f max( d e l t a s ) < min de l ta ∗ l i s t [ −1 ] :

return Truereturn False

The goal of this is just to speed up the learning process.

72

6.7 Interpretation of the Result 6 IMPROVEMENTS ON THE NEURAL NETWORK

6.7 Interpretation of the Result

After all these tries, it was clear that the maximum error is never going to con-verge to a value smaller than 0.5. It was actually mostly 1 all the times untilnow.A new idea was proposed: Interpret the result in the photo-level instead of thecurve-level:In an a calligraphist view: There will always be some bad curves, which existin all fonts, e.g. points. The neural network will never make error in these lessthan 0.5, since it doesn’t make sense to differentiate between two similar curves.The relevant result is actually the result in the whole photo and not for singlecurves out of it, so let’s define the full output: Take the mean of the result inall curves.

FOij(p) = (

∑x∈Cu(p)

Oij(x))/|Cu(p)|; j ∈ L0

An other important representative way to know, if the neural network recog-nizes the pattern in the fonts really, is to use testing data.Testing data are just other photos that the neural network have never seen, i.e.haven’t learned them. These will be similarly turned into curves as in fromPhoto to Curves. After that we run the neural network with these curves asinput. At the end we measure the same errors defined before in the results.

73


Using these new measurements, we can now try to see how the error evolves:

a

Discussion:

• The old graphs are still the same: Blue, green and black for errors in fontsand in both. The black graph in top is the maximal error in curves. Thered graph in bottom is the difference from before.

• The yellow line describes the changes in step.

• The black squares are the maximal full error. The black points are therelative full error.

• The red squares are the maximal full error in the testing data. The redpoints are the relative full error in the testing data. The red points arealso connected through a graph, because they are important.

• We can see now, that the maximal full error is less than 0.5.

• The maximal error in the testing data got to very small values but neverless than 0.5 even with other structures and runs.

Using the same argumentation from before, we know that some curves comein all fonts. The neural network will give reasonably for both fonts very closeresults, when given these curves as input. So let’s just count only the ”good”(representative) curves in the result. Let’s define a new metrics:For x, y ∈ X: x is more representative than y, when

|O1(x)−O2(x)| > |O1(y)−O2(y)|

74


Using this metrics, we can define the (n) Best Full Output for |L0| = 2:(n)FOj(p); p ∈ P, n ∈ N:

1. Sort {x ∈ Cu(p)} using |O1(x)−O2(x)| (representativeness) as a criteriafrom greatest to lowest.

2. Take the first1

n|Cu(p)| curves of the sorted list and call it A.

3. Now(n)FOj(p) = (

∑x∈A

Oj(x))/|A|; j ∈ L0

Let’s add now one last measure for the whole network: Test status: Thenumber of testing photos, where the neural network made a mistake by deter-mining the font: It guessed the font falsely.Trying a lot of structures and runs again and again, we got this result at the end:

a

Discussion:

• No more graphs for error in curves or fonts.

• Black is always for training data. Red is always for testing data. Yellowfor the step.

• The different symbols represent different interpretation of the result, or inother words, different values for n ∈ N in (n)FO(p). The symbol in top ofthe other is always the maximum. The other symbol is the relative error.

75

6.8 More Testing Photos 6 IMPROVEMENTS ON THE NEURAL NETWORK

• As we can see in the bottom: Test status: All test photos are guessedrightly. This can also be seen in the connected red points: The last valueis less than 0.5, which means the maximal best full error is less than 0.5,i.e. in both fonts, the error is less than 0.5. So just by rounding the result1 for > 0.5 and 0 else, we get the font.

• In this test we used 90 training photos and 10 test photos.

6.8 More Testing Photos

Instead of the ratio (training to test photos) 9:1, we’ll take here the ratio 2:1:The number of testing photos is 32 and of training ones is 64. Now let’s seeif the neural network is still able to recognize the patterns in the fonts usingonly 64 photos and then generalising these patterns to the other photos. If theneural network passed this test, then it does totally what it should do.

aDiscussion:

• The same graphs and symbols from the last error history.

• The two graphs in the right represent the error distribution in training(black) and testing (red) data.

• In the bottom we see the test status considering different interpretationsof the result.

• The best interpretation as we see is 3: Taking the first third of the curves,after sorting them after representativeness. Using this interpretation ofthe result, the neural network guessed the right font in 75% of the testingphotos.

76

7 PROBLEMS AND FURTHER WORK

7 Problems and Further Work

7.1 Problems and Suggestions

7.1.1 Dimensions

In this project, we tried only simplifying the photos, in order to shorten theBezier-curves relating to the number of control points. But when the length ofthe curve was already smaller than the input of the neural network we filled theother input nodes with 0. This was not necessarily the most clever solution, butthe easiest.The real problem of dimensions is that the Bezier-curves have different lengths.A not done but proposed solution for this problem is to simplify the curvesby analysing the control points and emerging the close control points into onepoint. On the other hand, we can add control points there, where the controlpoints are very far from each other.Using these two methods, we can make the length of all the curves equal. Froma calligraphist view, this is believed to help the neural network in recognizingthe patterns much more than just adding ambiguous zeros.

7.1.2 Local Optimum: The Evil in Mathematics

Neural networks work as a Greedy algorithm: It searches for the nearest mini-mum of the error starting from a random point. This kind of algorithms despiteits simplicity doesn’t assure that the result is globally the best.Relating to neural networks, even if we train the neural network for years, wewon’t be sure that the weights have reached the best values for the given prob-lem.The only possible solution for this problem is backtracking: Try all possibleweights and search for the global minimum. This is however practically verycostly.

7.1.3 Tests: How many?

Even though a lot of tries were made in this project, more and more tries canstill be made. Some structures can still be better than the ones we found.Here again, the only possible solution for this problem is backtracking: Try allpossible structures but this is very costly.

77

7.2 Further Work 7 PROBLEMS AND FURTHER WORK

7.2 Further Work

7.2.1 Return to the Original Idea

The original idea was not about only recognizing the font but also marking thecurves that make the font this font. With this, it was thought to be able todetermine the characteristics of fonts.This is not done any more, because we should focus on the accuracy problem.This is, however, still possible by analysing the SVG-code and reading off thecurves (e.g. ellipses) out of the control points. After that we can continue thesteps of the original idea and determine the characterizing curves in the text.With this, we’ll also be able to determine the text even with tangled words asexplained in the original idea.The next step in this project is to implement this idea.

7.2.2 Read off Patterns out of the Neural Network

Even though neural networks give great solutions to a lot of problems, it’s stillnot clear how they really work:

• Why some structures work better than others?

• Why some values for the step make the result evolve worse?

• What are the patterns that the neural network recognized and used todetermine the font?For example, we know now exactly which dimensions in the face we useto determine the person, which makes it easy to make caricatures. Is itpossible to do this with every neural-network-like problem?

The last question was tried to be solved in the original idea using subgraphs ofthe whole text as input for the neural networks again. This was due to time-problems not implemented, and it’s not clear if this idea will work.I believe, that it’s always possible to read off the patterns from the neuralnetwork again using this idea of subinputs and I’ll work in the future on provingthis.

78

8 CONCLUSIONS

8 Conclusions

8.1 Goals Scored in Bachelor

8.1.1 Aesthetics

Aesthetics is a branch of philosophy that studies value and tries to answerquestions like ”what make certain things valuable to humans?”.I started to think about this subject in my first bachelor in architecture, wherea lot of my works were thrown away, because they were ”not beautiful”. Eventhough everybody uses the word ”beautiful”, it was very hard for me at that timeto understand what this word really meant. In other words, which ”steps” andwhich ”algorithms” should be done, in order to solve this ”problem of beauty”and to make my art works better. I then knew, it was computer science, whatI should study and not architecture, in order to make myself a better artist.Through this bachelor in computer science, I tried to come closer to answers formy questions about beauty. That’s why I’ve done two projects on this subject.One using genetic algorithms and this one using neural networks and now I havesome results.

8.1.2 The Aesthetician

One clear problem in aesthetics is that humans can very easily evaluate solutions(art works, songs, ...), but need more complex learning process, in order to beable to determine why something is valuable (beautiful, ...) or to improve awork. Actually, it’s much more complicated to construct a solution, than toevaluate one, which reminds us of the NP-problems. For instance, the numberof composers in the world, is much less than people that say ”I like this music!”.The ”Aesthetician” is a proposed solution using genetic algorithms and neuralnetworks to help humans in constructing solutions just by being able to evaluateones.The ”Aesthetician” goes like this:

1. Make some few (random) solutions. Note that they must be few, so thatthe human has enough time to evaluate them, and not totally random, sothat the human thinks they are just sough, but use already known rulesfor beauty, e.g. golden ratios.

2. Evaluate these solutions by a human and make tuples (solution, evalua-tion).

3. Using a neural network and the tuples from step 2, determine the patternsin the human evaluation, i.e. generalise the human evaluation to a biggeramount of solutions. Note that neural networks can also be understoodas a function.

4. Using genetic algorithms and the neural network as an evaluation function,evolve new generations. After some evolution, you get a new populationof some (few again) better solutions. These solution should now be morevaluable for the human.

5. Repeat from step 2, as long as the human doesn’t think, that these solu-tions are not valuable enough.

79

8.2 Lessons 8 CONCLUSIONS

At the end, you get solutions out of nothing. The human has only evaluatedsolutions, but got ones at the end.This is, until now, just a theoretical idea, but illustrates the possibility of makingart, music and every aesthetical subject possible for each human and doesn’tneed extra ”creativity powers”, which are most of the times thought to be abovehuman abilities and ”come from the skies”.

8.2 Lessons

8.2.1 Project Evaluation

This project took about 8 months. 4 months only for the original idea, whichis not used at all in the modified idea.The reason for this loss in time is that no project evaluation was done before thestart: The idea was proposed and then started with. There was no evaluationfor things like time needed and possibility of implementation before the start.One possible reason for this is that the student proposed the idea himself, andnot the professor, who already knows what is possible for a student at this stage.

8.3 Future?

If the original idea was really implemented and we were able to determine thecharacteristics of all fonts, would this mean that the computers would be ableto design new fonts by varying in the characteristics or interpolating betweenthem to get a new font between two or more fonts?Another thought use of this project was in the field of history. For instance, theDiwani (chancellery) font, was deigned as a result of capturing Constantinople,to record everything in the just established Ottoman ”empire”. Would it bepossible to do this backwards: Noticing some changes in font in old documentswould lead to assure the date of some changes which are thought to affect theempire’s chancellery where the writers work?

80

9 ZUSAMMENFASSUNG AUF DEUTSCH

9 Zusammenfassung auf Deutsch

9.1 Abstrakt

Dieses Projekt ist ein Ansatz zum schreiben von einem Programm, das neuronaleNetze ausnutzt um die Schrift10 in arabischen Texten in einem gegebenen Fotozu erkennen. Das Projekt fokussiert sich vor allem auf Kalligrafie Werke undnicht auf alltagliche Handschriften. Der Ansatz wird auch keine Pixeln aus demFoto, sondern die Kurven benutzen, die die Buchstaben machen.

9.2 Ansatz

Es wird zwei große Schritte geben:

• Von Foto zu Kurven: Hier fangen wir mit einem ”.jpg”-Foto von demKunstwerk an, und enden mit Informationen uber die Kurven.

• Von Kurven zu Schrift: Hier fangen wir mit den Informationen aus demersten Schritt, die wir als Eingabe fur das neuronale Netz benutzen. DieAusgabe des neuronalen Netzes ist Auskunfte uber die Schrift, in anderenWorten: Prozentzahl, die beschreibt, wie wahrscheinlich es ist, dass dasFoto in einer der betrachteten Schriften ist.

9.3 Konkrete Schritte

Es gibt in dem Projekt eine nicht vollig implementierte originale Idee, und einefertige modifizierte Variante davon. Hier sind die Schritte von der modifiziertenIdee:

9.3.1 Schritte von Foto zu Kurven

1. Convert: Eingabe ist ein ”.jpg”-Foto und Ausgabe ist das gleiche Fotoals ”.bmp”-Datei, aber in Schwarz und Weiß.

2. Cut in Pieces: Eingabe ist ein ”.bmp”-Foto und Ausgabe ist eine Listevon ”.bmp”-Fotos, die zusammen das ganze originale Foto machen. DieTeile sind als erstes im Bezug auf Linien gemacht, und danach horizontal.

3. Vectorize: Eingabe ist ein (Teil-) Foto in ”.bmp”, und Ausgabe ist dasgleiche (Teil-)Foto als .svg”-Datei.

4. Simplify: Falls die Lange der Pfade in einem (Teil-Foto) zu groß fur dasneuronale Netz ist, wird das Foto vereinfacht bis alle Pfade die richtigenLange haben.

5. SVG to Lists of Integers: Eingabe ist ein ”.svg”-Foto, und Ausgabe isteine Liste von ganzen Zahlen gespeichert in einer ”.txt”-Datei. Dies wirddurch eine Analyse der ”.svg”-Datei erfolgen.

10Schirft und nicht Text, z.B. Kanzleischrift

81

9.4 Resultate 9 ZUSAMMENFASSUNG AUF DEUTSCH

9.3.2 Schritte von Kurven zu Schrift

1. Import Data: Sammle Daten aus den gespeicherten ”.txt”-Dateien so,dass damit praktisch gearbeitet wird: Fur dieses nehmen wir uns ein Dic-tionary mit einem Key fur jede Kurve. Dieses Key wird dann mit derdazugehorigen Schrift verknupft. Hier gibt es immer noch ein Problemmit den Langen der Pfade. Dies wird durch eine naive Losung behoben:fulle alle anderen Eingabeknoten im neuronalen Netz mit 0.

2. Neural Network: Trainiere das neuronale Netz mit diesen Kurven unddie dazugehorigen Schriften.

9.4 Resultate

Nach vielen Versuchen haben wir diese Ergebnisse erreicht:

9.4.1 Struktur

Die Struktur eines neuronalen Netzes entscheidet das Meiste uber die Entwick-lung der Ausgabe wahrend des Trainings. Dabei unterscheiden wir zwischen:

• Zu groß: Wenn eine Schicht mehr als 70 Knoten hat, fuhrt das zu einemauswendigen Lernen, d.h. das neuronale Netz erkennt keine Muster mehrund gibt die gleiche Ausgabe bei allen Kurven.

• Zu klein: Die Ausgabe entwickelt sich nicht durch das Training.

• Verhalten: Das beste Verhalten, das wir gefunden haben war zwei Schichtenmit ungefahr drei Viertel der Knoten in der ersten Schicht in der zweiten.Dafur haben wir aber keine genaue Erklarung.

9.4.2 Interpretation der Ausgabe

Es wurde unterschieden zwischen:

• Naive Interpretation: Weil ein Foto nur eine Schrift besitzt, sollte dieAusgabe bei jeder Kurve aus dem Foto diese Schrift liefern, wenn insneuronale Netz eingegeben.Leider hat dies nicht funktioniert, da es ahnliche Kurven in allen Schriftengibt.

• Mittelwert: Man mittelt die Ausgabe des neuronalen Netzwerken beiallen Kurven.Dies hat bessere Resultate geliefert.

• Best Result: Man mittelt die Ausgabe nur von Kurven, die am meistenvon Bedeutung sind.

Es gab auch andere Resultate (Effekte von Step, Anzahl der Teile und dieDimensionen), die wir erreicht haben, die aber eine lange Erklarung brauchen.

82

9.5 Beste Ergebnisse 9 ZUSAMMENFASSUNG AUF DEUTSCH

9.5 Beste Ergebnisse

Wir unterscheiden erstmal zwischen:

• Training Fotos: Die Fotos, aus denen die Kurven fur das Training desneuronalen Netzes stammen.

• Testfotos: Fotos, die das neuronale Netz noch nie vorher gesehen hat,bis der Test durchgefuhrt wird.

Mit der naiven Interpretation haben wir den Mittelwert von Fehler in allen Kur-ven 17% erreicht, wobei diese Kurven aber nur aus den Training Fotos stammten.

Durch die Mittelwert-Interpretation haben wir 100% Testfotos erkannt, wobeies aber 90 Fotos zum Training und 10 fur den Test gab.

Durch die Best-Result-Interpretation haben wir die Schrift bei 75% der Testfotosrichtig erkannt, wobei es 64 Training Fotos und 32 Testfotos gab.

83

9.5 Beste Ergebnisse 9 ZUSAMMENFASSUNG AUF DEUTSCH

Hiermit versichere ich, dass ich die Arbeit selbststandig verfasst und keineanderen als die angegebenen Quellen und Hilfsmittel benutzt habe und dassalle Stellen dieser Arbeit, die wortlich oder sinngemaß aus anderen Quellenubernommen wurden, als solche kenntlich gemacht wurden und dass die Arbeitin gleicher oder ahnlicher Form noch keiner anderen Prufungsstelle vorgelegtwurde.

Des Weiteren erklare ich, dass ich mit der offentlichen Bereitstellung meinerAbschlussarbeit in der Instituts- und/oder Universitatsbibliothek einverstandenbin.

Clausthal, den Hayyan Helal

84

Documents

Al Arraf - hayyanhelal.files.wordpress.com