13
IJDAR DOI 10.1007/s10032-010-0114-8 FULL PAPER Rotation invariant hand-drawn symbol recognition based on a dynamic time warping model Alicia Fornés · Josep Lladós · Gemma Sánchez · Dimosthenis Karatzas Received: 3 October 2009 / Revised: 8 February 2010 / Accepted: 18 February 2010 © Springer-Verlag 2010 Abstract One of the major difficulties of handwriting symbol recognition is the high variability among symbols because of the different writer styles. In this paper, we intro- duce a robust approach for describing and recognizing hand- drawn symbols tolerant to these writer style differences. This method, which is invariant to scale and rotation, is based on the dynamic time warping (DTW) algorithm. The symbols are described by vector sequences, a variation of the DTW distance is used for computing the matching distance, and K-Nearest Neighbor is used to classify them. Our approach has been evaluated in two benchmarking scenarios consist- ing of hand-drawn symbols. Compared with state-of-the-art methods for symbol recognition, our method shows higher tolerance to the irregular deformations induced by hand- drawn strokes. Keywords Document analysis · Graphics recognition · Symbol recognition · Handwriting recognition · Sequence alignment A. Fornés (B ) · J. Lladós · G. Sánchez · D. Karatzas Computer Vision Center, Department of Computer Science, Universitat Autònoma de Barcelona, Edifici O, 08193 Bellaterra, Spain e-mail: [email protected] J. Lladós e-mail: [email protected] G. Sánchez e-mail: [email protected] D. Karatzas e-mail: [email protected] 1 Introduction Symbol recognition is one of the main topics of Graphics Recognition, which has been an intensive research work in the last decades, covering technical symbol recognition [37], handwritten symbol recognition [2], symbol indexing and spotting [34], or even the recognition of degraded symbols (e.g. [45, 47, 48]). Graphical languages are expressive and synthetic tools for communicating ideas in some domains. A graphical language consists of an alphabet of symbols and the rules defining the valid combinations among them. Thanks to the recognition of the alphabet of symbols of these graphical languages, com- bined with domain-dependent knowledge, the whole docu- ment has a meaning, allowing its automatic processing. Hand-drawn symbol recognition is a particular case of handwriting recognition, which is one of the most signifi- cant topics within the field of document image analysis and recognition (DIAR). Over the last years, relevant research achievements have been attained. Simultaneously, commer- cial products have become available. The progress has been noticeable in applications like bank check processing, postal sorting, historical document transcription or online recogni- tion in calligraphic interfaces. A parallel use has also been explored in writer identification for forensic sciences and writer verification in signatures. Handwriting recognition is a difficult problem due to the variability among scripts and writer styles, or even between different time periods. Due to that, commercial applications are usually constrained to controlled domains that make use of contextual or gram- matical models and dictionaries. The type of source data (handwritten separate characters vs cursive script) is also an important constraint. Focusing on cursive script recognition, the recognition approaches can roughly be classified into analytical or holistic methods. Analytical methods perform 123

Rotation invariant hand-drawn symbol recognition based on ...dimos/papers/IJDAR2010_Fornes.pdfHand-drawn symbol recognition is a particular case of handwriting recognition, which is

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Rotation invariant hand-drawn symbol recognition based on ...dimos/papers/IJDAR2010_Fornes.pdfHand-drawn symbol recognition is a particular case of handwriting recognition, which is

IJDARDOI 10.1007/s10032-010-0114-8

FULL PAPER

Rotation invariant hand-drawn symbol recognitionbased on a dynamic time warping model

Alicia Fornés · Josep Lladós · Gemma Sánchez ·Dimosthenis Karatzas

Received: 3 October 2009 / Revised: 8 February 2010 / Accepted: 18 February 2010© Springer-Verlag 2010

Abstract One of the major difficulties of handwritingsymbol recognition is the high variability among symbolsbecause of the different writer styles. In this paper, we intro-duce a robust approach for describing and recognizing hand-drawn symbols tolerant to these writer style differences. Thismethod, which is invariant to scale and rotation, is based onthe dynamic time warping (DTW) algorithm. The symbolsare described by vector sequences, a variation of the DTWdistance is used for computing the matching distance, andK-Nearest Neighbor is used to classify them. Our approachhas been evaluated in two benchmarking scenarios consist-ing of hand-drawn symbols. Compared with state-of-the-artmethods for symbol recognition, our method shows highertolerance to the irregular deformations induced by hand-drawn strokes.

Keywords Document analysis · Graphics recognition ·Symbol recognition · Handwriting recognition ·Sequence alignment

A. Fornés (B) · J. Lladós · G. Sánchez · D. KaratzasComputer Vision Center, Department of Computer Science,Universitat Autònoma de Barcelona, Edifici O,08193 Bellaterra, Spaine-mail: [email protected]

J. Lladóse-mail: [email protected]

G. Sáncheze-mail: [email protected]

D. Karatzase-mail: [email protected]

1 Introduction

Symbol recognition is one of the main topics of GraphicsRecognition, which has been an intensive research work inthe last decades, covering technical symbol recognition [37],handwritten symbol recognition [2], symbol indexing andspotting [34], or even the recognition of degraded symbols(e.g. [45,47,48]).

Graphical languages are expressive and synthetic tools forcommunicating ideas in some domains. A graphical languageconsists of an alphabet of symbols and the rules defining thevalid combinations among them. Thanks to the recognition ofthe alphabet of symbols of these graphical languages, com-bined with domain-dependent knowledge, the whole docu-ment has a meaning, allowing its automatic processing.

Hand-drawn symbol recognition is a particular case ofhandwriting recognition, which is one of the most signifi-cant topics within the field of document image analysis andrecognition (DIAR). Over the last years, relevant researchachievements have been attained. Simultaneously, commer-cial products have become available. The progress has beennoticeable in applications like bank check processing, postalsorting, historical document transcription or online recogni-tion in calligraphic interfaces. A parallel use has also beenexplored in writer identification for forensic sciences andwriter verification in signatures. Handwriting recognition isa difficult problem due to the variability among scripts andwriter styles, or even between different time periods. Dueto that, commercial applications are usually constrained tocontrolled domains that make use of contextual or gram-matical models and dictionaries. The type of source data(handwritten separate characters vs cursive script) is also animportant constraint. Focusing on cursive script recognition,the recognition approaches can roughly be classified intoanalytical or holistic methods. Analytical methods perform

123

Page 2: Rotation invariant hand-drawn symbol recognition based on ...dimos/papers/IJDAR2010_Fornes.pdfHand-drawn symbol recognition is a particular case of handwriting recognition, which is

A. Fornés et al.

Fig. 1 High variability of hand-drawn musical clefs: a Treble, b Bass, c Alto

Fig. 2 Distorted shapes: a b Distortion on junctions, c Gaps, d Over-lapping, e Missing parts

a segmentation preprocess that divides the word image insequences of smaller units which are therefore classified interms of associated features and lexical information. Holisticmethods, which recognize words as a whole, usually describethe word image as a unidimensional signal consisting of asequence of image features at each column. This allows touse techniques sometimes inspired by the speech recognitiondomain such as sequence alignment by dynamic program-ming [16] or Hidden Markov Models [32].

In this paper, we present a novel rotation invariant symbolrecognition method without restricting its applicability. Wechoose to focus here on the case of hand-drawn graphicalsymbols of non-textual alphabets as a representative prob-lem. This refers to symbols that compound diagrammaticnotations in graphical documents like musical scores, archi-tectural drawings, electronic and engineering diagrams, andflow charts. (see [21] for a review). In addition to the highwriter style differences (see Fig. 1) and the inherent distor-tion of hand-drawn strokes (see Fig. 2), the recognition ofgraphical symbols has two added difficulties regarding tohandwritten text recognition. First, graphical symbols arebidimensional shapes appearing in bidimensional layouts, so1D models should be adapted to rotation, scale, and positioninvariance. Second, unlike text, graphical symbols can notbenefit from the use of contextual and grammatical models.

To cope with the problem of hand-drawn symbol recog-nition under the conditions stated in the above paragraph,in this work we propose a method inspired by the holis-tic approaches for unconstrained handwritten word recog-nition but extended to bidimensional shapes appearing inbidimensional layouts. Our main contribution is an approachto model and classify hand-drawn symbols. The proposedmethod is robust against the elastic deformations typicallyfound in handwriting and invariant to rotation and scale.The method proposed is based on the dynamic time warp-ing (DTW) algorithm [16] for signals (one-dimensionaldata), and it has been extended to graphical symbols (two-dimensional data). Among the two major families of meth-ods for handwriting recognition, namely sequence align-ment (e.g. DTW) and Hidden Markov Models (HMMs), ourwork is based on the former. The DTW algorithm has been

successfully used for finding the best match between two timeseries in a noisy and complex domain. It has been alreadyused in handwritten text recognition [33], coping with theelastic deformations and distortions in the writing style. Forthat reason, we maintain that the DTW algorithm can beadapted for the recognition of hand-drawn symbols. In com-parison with HMMs, the DTW approaches are more suitablefor coping with the problem of hand-drawn symbol recog-nition when there is a small number of instances for eachsymbol (which is the case of some hand-drawn graphical dat-abases), not being enough for a successful training process.In addition, the adaptation of DTW to a rotation invariantsystem is easier than the adaptation of HMM, because HMMrequires to train a model for each possible orientation, withthe consequently increment of its time complexity.

To solve the problem of rotational invariance, classical andeffective methods exist in the literature on OCR or SymbolRecognition. Methods like projections in different orienta-tions or zoning using concentric ring masks are well known.We have taken into account these ideas and extended them toa novel DTW-based algorithm. The steps of the method pro-posed are the following: First, column sequences of featurevectors from different orientations of the two input shapes tobe compared must be computed. The features comprise theupper and the lower profiles and the number of pixels perregion. Once we have the features for all the considered ori-entations, the DTW algorithm computes the matching costbetween every orientation of the two symbols and decides inwhich orientation these two symbols match with the lowestcost.

The rest of the paper is organized as follows. Section 2 cor-responds to the state of the art of hand-drawn symbol recog-nition methods. In Sect. 3, the fundamentals of the dynamictime warping (DTW) algorithm are presented. Afterward,our DTW-based method for the recognition and classifica-tion of graphical symbols is fully described, demonstratingits invariance to rotation and scale. In Sect. 4, the exper-imental results are presented. Finally, concluding remarksare exposed in Sect. 5.

2 State of the art of hand-drawn symbol recognitionmethods

Hand-drawn symbol recognition has been one of the mostintensive research fields of graphical symbol recognition[21]. It is close to handwritten character recognition,

123

Page 3: Rotation invariant hand-drawn symbol recognition based on ...dimos/papers/IJDAR2010_Fornes.pdfHand-drawn symbol recognition is a particular case of handwriting recognition, which is

Rotation invariant hand-drawn symbol recognition based on a dynamic time warping model

especially for logographic languages such as chinese char-acters [5,20]. In fact, some of these approaches use variantsof DTW (see [39] for a survey).

In the handwriting domain, symbol recognition methodsrequire symbol descriptors with three important properties:first, they should guarantee intra-class compactness and inter-class separability; second, they should be rotation and scaleinvariant; and third, they should cope with elastic deforma-tions and distortions caused by the high variability in hand-writting style.

Traditionally, symbol descriptors, as a particular case ofshape descriptors, can be classified into statistical and struc-tural approaches. The first ones represent the image as an-dimensional feature vector, whereas the second ones usu-ally represent the image as a set of geometric primitivesand relationships among them. Statistical approaches tendto use pixels as the primitives to extract features from.The curvature scale space (CSS), Zernike moments, GenericFourier Descriptor, Radial Angular Transform, and ShapeContext descriptors are examples of these statisticalapproaches. The CSS [26] descriptor only takes into accountthe symbol silhouette and can only be used for closed curves,but it is tolerant to rotation. On the contrary, Shape Con-text [2] can work with non-closed curves and has goodperformance in hand-drawn symbols, because it is tolerantto deformations, but it requires point-to-point alignment ofthe symbols to be compared before their alignment. Thegeneric Fourier descriptor (GFD) [46] applies a 2D Fou-rier Transform to the polar representation of the image andis rotation and scale invariant. The angular radial trans-form (ART) [14] decomposes the shape in an orthogonalbasis, taking use of a radial and angular function. It hasgood performance for general shapes, and it is robust tonoise. Zernike moments [13] are widely used for hand-drawn symbols, as well as online systems [10], because theypreserve properties of the shape and are invariant to rota-tion, scale, and deformations. There are also several sta-tistical approaches for online symbol recognition, whichcan also use on-line information such as speed or pressure.Although they are usually more focused in human inter-faces, a few works are briefly mentioned next: In [30], amethod applied to logic diagrams is proposed, which usesgeometric features and template matching. In the methodproposed in [43], the symbol is represented as a sequenceof coordinates, and matching is based on curvature dis-tance. Miyao and Maruyama [25] present a hand-drawnmusic symbol recognition system, consisting of the com-bination of two classifiers: the first one uses chain codesfor representing the strokes, while string-edit distance isused for matching; the second classifier is used for com-plex strokes, dividing strokes into regions, and computingthe directional feature for each region. Golubitsky and Watt[9] propose the recognition of multi-stroke symbols using

truncated Legendre–Sobolev expansions of the coordinatefunctions for creating the feature vectors and classifyingusing support vector machines.

In structural approaches, straight lines and arcs are usuallythe basic primitives. Strings, graphs, or trees represent therelations between these primitives. The similarity measure istherefore performed by string, tree, or graph matching. A fewexamples of structural approaches are briefly described next:the attributed graph grammars [3] can cope with partiallyoccluded symbols, while Spectral models [19] and RegionAdjacency Graphs [22] are well-suited to describe symbols inhand-drawn documents, showing good performance in frontof distortions typically found in these documents. Deform-able models [40] are invariant to distortions and rotation,but the basic primitives are lines, thus not being suitablefor symbols with arcs and curves. Hidden Markov Modelsare also widely used in offline [27] and online symbol rec-ognition methods [44]. Basically the structure of the sym-bol is described by the sequence of states that generate theimage, and the recognition consists in finding the sequenceof states with the highest probability. Concerning structuralapproaches for online symbol recognition, Fonseca et al.[6] propose a method for recognizing architectural symbols,using fuzzy logic and geometric features; Peng et al. [31] pro-pose a constrained partial permutation algorithm which usesbinary and ternary topological spatial relationships for therecognition of symbols; and Mas et al. [24] describe a com-plete system for recognizing architectural drawings, repre-senting the data as trees and proposing adjacency grammarswith distortions measures for adapting them to sketches.

Mathematical symbol recognition requires a mixed strat-egy, because it requires text recognition and graphics (sym-bols) recognition. It is a very active research field (see [4] for asurvey), which also includes several online systems: Shi et al.[36] propose a symbol decoding and graph generation algo-rithm; and a full mathematical expression recognizer systemis defined in [8], which involves symbol recognition (usingboth online and offline features) and structural analysis ofmulti-stroke characters using context-free grammars.

3 A DTW-based approach for graphical symbolrecognition

Since the approach proposed in this paper is based on theDTW algorithm, we will start this section with a short intro-duction before detailing our approach. The DTW algorithmwas first introduced by Kruskal and Liberman [16] for putt-ing series into correspondence. This technique was first usedin the context of speech recognition, a domain in which thetime series are notoriously complex and noisy. The methodwas used for coping with noise and variations in speechspeed. Beside speech recognition, this technique has been

123

Page 4: Rotation invariant hand-drawn symbol recognition based on ...dimos/papers/IJDAR2010_Fornes.pdfHand-drawn symbol recognition is a particular case of handwriting recognition, which is

A. Fornés et al.

widely used in many other applications: chemical engineer-ing, gesture recognition, signatures, robotics, bioinformatics,music, shape retrieval, or Data Mining [1,11,12,29]. DTWhas been also applied to the handwritten text recognition field,being used in both offline [15,17,33] and online approaches[28,42].

The basic dynamic time warping algorithm achieves goodresults when working with one-dimensional data and withhandwritten words in documents. Concerning the hand-drawn symbol domain, the method must be adapted to copewith the variations in writing style and rotation. In the firstpart of this section, the fundamentals of DTW are presented.Afterward, the architecture for our DTW-based system isfully described, and its benefits for hand-drawn symbol rec-ognition are presented. Compare to the classical DTW, theproposed method introduces two main changes: first, dif-ferent features are used and second, the computation of theDTW distance has been modified, combining information atcertain orientations of the symbol.

3.1 DTW for one-dimensional signals

The DTW algorithm [16] is used for comparing signals bymatching two one-dimensional vectors. It is a much morerobust distance measure for time series than Euclidean dis-tance, allowing similar samples to match even if they areout of phase in the time axis (see Fig. 3). DTW can distort(or warp) the time axis, compressing it at some places andexpanding it at others, finding the best matching between twosamples.

Let us define the DTW distance of two time seriesC = x1 . . . xM and Q = y1 . . . yN as DTWCost(C, Q)

Fig. 3 Normal and DTW alignment, extracted from [33]

Fig. 4 An example of DTW alignment (extracted from [12]) aSamples C and Q, b The matrix D with the optimal warping path ingray color, c The resulting alignment

(see Fig. 4a). For this purpose, a matrix D(i, j) (wherei = 1 . . . M, j = 1 . . . N ) of distances is computed usingdynamic programming:

D(i, j) = min

⎧⎨

D(i, j − 1)

D(i − 1, j)D(i − 1, j − 1)

⎫⎬

⎭+ d2(xi , y j ) (1)

d2(xi , y j ) = xi − y j (2)

Performing backtracking along the minimum cost indexpairs (i, j) starting from (M, N ) yields the warping path(Fig. 4b). Finally, the matching cost is normalized by thelength Z of this warping path, otherwise longest time seriesshould have a higher matching cost than shorter ones:

DTW Cost(C, Q) = D(M, N )/Z (3)

The creation of this path is important, because it deter-mines which points match (Fig. 4c) and are to be used tocalculate the distance between the time series. In addition,DTW is able to handle samples of unequal length, allowingthe comparison without resampling.

3.2 DTW for two-dimensional shapes

In case of bidimensional data, the DTW computation mustbe adapted. Some work has been done in the adaptationof DTW to 2 dimensions [18,38], but these approachesare of a very high time complexity, reaching O(N 4N ) andO(N 39N ) respectively. For this reason, some research workhas been focused on the reduction of the 2D problem.Generally, the reduction of dimensionality can be performedwhen 2D data can be encoded by 1D signals, such as shapesdescribed by their external contours (silhouettes). Specifi-cally, for handwritten text methods, the 2D representation istypically reduced to 1D based on the assumption that text fol-lows a given text line [33]. In these cases, the time complexityof the 2D-DTW computation is significantly reduced.

3.3 Extraction of features

The choice of features that better represent shapes is a keydecision of the application of the DTW algorithm. In thiswork, we have been inspired by features representing serieswith a view to reduce the dimensionality. Let us first describethe approaches which have inspired our proposed represen-tation.

In the handwritten text recognition system described byRath and Manmatha [33], the following four features arecomputed for every column of a word image: the numberof foreground pixels in every column; the upper profile (thedistance of the upper pixel in the column to the upper bound-ary of the word’s bounding box); the lower profile (the dis-tance of the lower pixel in the column to the lower boundary

123

Page 5: Rotation invariant hand-drawn symbol recognition based on ...dimos/papers/IJDAR2010_Fornes.pdfHand-drawn symbol recognition is a particular case of handwriting recognition, which is

Rotation invariant hand-drawn symbol recognition based on a dynamic time warping model

Fig. 5 Example of features extracted from every column of the image,with S = 5: f1 = upper profile, f2 = lower profile, f3 . . . f5 = sum ofpixels of the image of the three regions defined

of the word’s bounding box); and the number of transitionsfrom background to foreground and viceversa. In this way,two word images A and B can be easily compared usingDTW. If fk(ai ) corresponds to the k-th feature of the col-umn i of the image A, and fk(b j ) corresponds to the k-thfeature of the column j of the image B, the matching dis-tance DTW Cost(A, B) is calculated using the same equa-tions (Eq. 1, 3) as in Kruskal’s method, but instead of Eq. 2,the computation of d2 will be the sum of the squares of thedifferences between individual features:

d2(xi , y j ) =4∑

k=1

( fk(ai ) − fk(b j ))2 (4)

Another typical set of column features in the literature isthe one proposed by Marti and Bunke [23] for handwrittenword recognition. The following nine features are obtainedper column: the number of foreground pixels, the center ofgravity, the second moment order, the lower and upper pro-files, the differences between the lower and upper values withrespect to the previous column, the number of gaps, and thenumber of pixels between the upper line and baseline of theword. Finally, the features described by Vinciarelli et al. [41]are also very common in the literature, consisting in a slidingwindow which moves from left to right. In this case, insteadof the single column features, the window comprises severalcolumns. After adjusting the size of the window to the areawhich contains pixels, it is divided into a 4x4 cell grid, andthe number of pixels in every cell is used as a feature. Then,the 4x4 features are concatenated to a 16-dimensional featurevector.

Inspired by the above approaches, we propose a fea-ture set for symbol recognition. In this field, it is importantto obtain some information about the external shape (pro-files), but also about the internal shape (distribution of pixelsinside the silhouette). In fact, in other recognition fields (e.g.chinese character recognition), it has been demonstrated thatthe external profiles (e.g. the peripheral features) are not effi-cient enough for the recognition of certain characters [5]. Forthis reason, in addition to the upper and the lower profiles,our method divides every column in several regions, count-ing the number of foreground pixels per region (it can be

seen as a column zoning). First, the image is normalized interms of its size, and the following features are computed forevery column of the image:

– f1 = upper profile.– f2 = lower profile.– f3 . . . fS = number of foreground pixels in each region.

When computing the upper and lower profiles, a morpho-logical closing operation over the image is performed, sothat few little gaps in the writing will not affect the final pro-file. Finally, all the features are normalized ( 0 ≤ fk ≤ 1,k=1…S), and the features corresponding to the sum of pix-els ( f3,…, fS) are smoothed over the symbol’s columns usinga gaussian filter for a better matching. Notice that due to thehigh variability in the writing style, the number of transitionsper column (from background to foreground and viceversa)can confuse the system, thus, they are not used as features.

Figure 5 shows an example of the features extracted forthe marked column of a music symbol: the pixels of the col-umn are used for extracting the upper and the lower profiles.Then, the column is divided in three equal regions (in thisexample, S=5), and for every region the number of pixels iscounted.

The reader should notice that the features f3, . . . , fS

provide an adequate information about the distribution ofthe pixels inside the shape. The number of regions is aparameter that can be set up to reflect the complexity ofthe symbols in the database. These measures will help toclassify correctly shapes that have the same external con-tour but differences in their interior. Moreover, it will notget confused when comparing axially symmetrical symbols.In Fig. 6b, one can see two similar images in terms of sil-houette (both are squares), but very different inside (a crossor a circle). Notice that the upper/lower profiles and thewhole sum of pixels per column are very similar (Fig. 6a),whereas the functions of the sum of the three regions (seeFig. 6c) are very different, being able to discriminate thesymbols.

3.4 Computation of the DTW distance

Due to the fact that the slant and the orientation of graphi-cal symbols are frequently different between each other (seeFig. 7), symbols can not be directly and easily comparedbetween them.

To cope with rotation invariance and hand-drawn distor-tion, we define a DTW-based distance in terms of differentprojections, covering the full range of possible orientationsof the symbol.

Let us introduce the notation that will be used in thissection:

123

Page 6: Rotation invariant hand-drawn symbol recognition based on ...dimos/papers/IJDAR2010_Fornes.pdfHand-drawn symbol recognition is a particular case of handwriting recognition, which is

A. Fornés et al.

Fig. 6 Two architectural symbols with similar external contour(squares) but with differences inside the contours (circle and cross).The first row corresponds to the features for the square with a circle,and the second row corresponds to the features for the square witha cross. a Functions of the sum of pixels per column. b Symbols. The

gray horizontal lines divide the image in three regions: upper, lower, andmiddle c Functions corresponding to the sum of pixels for the upper,middle, and bottom region. Notice that the functions in a are similar,whereas functions in c are very different

Fig. 7 a Clefs: two treble clefs with different slants, b Two identicalarchitectural symbols but in different orientations

– Aα: Symbol A oriented at α degrees.– Bβ : Symbol B oriented at β degrees.– aαi : Column i of the symbol A oriented at α degrees.– bβ j : Column j of the symbol B oriented at β degrees.

– Dα,β(i, j): Matrix which contains the cost of matchingthe first i columns of Aα and the first j columns of Bβ .

– MC(α, β): Matrix which contains at the position (α, β)

the matching cost between Aα and Bβ .– G(α, β): Matrix which contains at the position (α, β) the

sum of MC(α, β) and MC(α + 90, β + 90).

There are three steps in the procedure: the extractionof features at different orientations; the computation of thematching distance between all the possible combinations oforientations between the two symbols; and the computationof the final matching distance. In the first step, the two sym-bols A and B are oriented in certain angles (see Fig. 8a),

Fig. 8 Example of featureextraction. a Some of theorientations used for extractingthe features of every symbol. bFeature vectors extracted fromevery orientation (α1, . . . α4)

123

Page 7: Rotation invariant hand-drawn symbol recognition based on ...dimos/papers/IJDAR2010_Fornes.pdfHand-drawn symbol recognition is a particular case of handwriting recognition, which is

Rotation invariant hand-drawn symbol recognition based on a dynamic time warping model

Fig. 9 Feature vectors of two different music symbols: a The first sym-bol is an alto clef with a orientation of α degrees, the second one is abass clef with a orientation of β degrees. b The same alto clef with a

orientation of α + 90◦ and the bass clef with a orientation of β + 90◦.Here, the functions of the two symbols are very different

covering the range from 0 to 180◦. For each orientation, thecolumn sequence of feature vectors (see Fig. 8b) definedin the previous section is obtained. In the second step, theDTW distance is computed for every combination of orien-tations of the two symbols. Thus, every orientation of thesymbol A is compared to every orientation of the symbol B.It should be observed that it is necessary to obtain the fea-tures from every orientation of the two symbols, because wedo not know a priori which orientation will give the high-est discriminatory power. Finally, the third step consists indetermining the final matching cost, and the two angle ori-entations in which the two symbols match with the lowestcost. In fact, we can not trust in only one matching whenworking with 2D data, because false matchings could appearif only one direction is used (see Fig. 9). For this reason,we also take into account the perpendicular alignment inrespect to the orientation we are considering. As a summary,we can define the final matching cost DTWCostA,B of thesymbol A and B as the minimum of the results of summingMC(α, β) + MC(α + 90, β + 90) for each possible α, β

angles.Let us define as Aα = (aα1 ,aα2 , . . . ,aαM ) the symbol

A oriented at α degrees, and Bβ = (bβ1 ,bβ2 , . . . ,bβN ) the

symbol B oriented at β degrees. First, the column sequencesof feature vectors F(Aα) and F(Bβ) are computed as it hasbeen explained in the above section (the upper/lower profileand the sum of pixels per region):

F(Aα) =

⎜⎜⎜⎜⎝

f1(aα1) f1(aα2) . . . f1(aαM )

f2(aα1) f2(aα2) . . . f2(aαM )

. . . . . . . . . . . .

fs(aα1) fs(aα2) . . . fs(aαM )

⎟⎟⎟⎟⎠

(5)

F(Bβ) =

⎜⎜⎜⎜⎝

f1(bβ1) f1(bβ2) . . . f1(bβN )

f2(bβ1) f2(bβ2) . . . f2(bβN )

. . . . . . . . . . . .

fs(bβ1) fs(bβ2) . . . fs(bβN )

⎟⎟⎟⎟⎠

(6)

Notice that the length of every column sequence of featurevector depends on the number of columns (the width) of theprojection and varies from one orientation to another.

Once the column sequences of feature vectors are com-puted, the matching cost MC(Aα, Bβ) between them mustbe calculated. First, the matrix D will be filled in with the

123

Page 8: Rotation invariant hand-drawn symbol recognition based on ...dimos/papers/IJDAR2010_Fornes.pdfHand-drawn symbol recognition is a particular case of handwriting recognition, which is

A. Fornés et al.

classical DTW method:

Dα,β(i, j) = min

⎧⎪⎨

⎪⎩

Dα,β(i, j − 1)

Dα,β(i − 1, j)

Dα,β(i − 1, j − 1)

⎫⎪⎬

⎪⎭+ d2(aαi ,bβ j )

(7)

The way of computing the distance d2 must take intoaccount that both the upper/lower profile features and theset of sum of pixels features have to be weighted equally inthe calculation. The goal is to avoid a reduced effect of theupper/lower profile in the computation of d2 whenever thefeature number S is very high (which means a high numberof regions for the zoning) For this reason, the two parts areweighted by 0.5 as following:

d2(aαi,bβ j ) = 0.5 · P1(aαi,bβ j ) + 0.5 · P2(aαi,bβ j ) (8)

P1(aαi,bβ j ) =(

2∑

k=1

( fk(aαi ) − fk(bβ j )

)2

(9)

P2(aαi,bβ j ) =(

s∑

k=3

( fk(aαi ) − fk(bβ j )

)2

(10)

Then, the matching cost of Aα and Bβ is normalized by thelength Z of the warping path (obtained performing backtrack-ing on Dα,β ), and this value is stored in the correspondingcell of the matrix MC:

MC(α, β) = Dα,β(M, N )/Z (11)

This process must be repeated for all the orientations α =1 . . . 180 and β = 1 . . . 180 (the step is decided ad-hoc), fill-ing all the cells in the matrix MC. Thus, every cell of thematrix MC(α, β) will contain the matching cost betweenthe two symbols, the first one with an orientation angle ofα degrees, and the second one with an orientation angle ofβ degrees. This means that if the two symbols are orientedin W different angles, the DTW distance is computed W 2

times.The next step is the computation of the final matching cost.

It must be noticed that defining the final matching cost as theminimum of the DTW distances computed is not a good solu-tion. For example, two symbols, which belong to differentclasses, could reach the minimum matching cost if they areoriented in some specific α and β angles, but they could havevery high matching costs in other orientation angles. One wayto avoid this problem is to look at the perpendicular align-ment in respect to the orientation we are examining. Anotheroption could be to take into account the matching cost of allthe alignments, but it has been experimentally shown thatit does not increase the discriminatory power, whereas thetime complexity is increased. As an example of the problemof using only one matching, Fig. 9 shows the feature vectorsof two different music symbols: in Fig. 9a, one can see that

Table 1 DTW-based algorithm

despite the two symbols being extremely different, only theupper contour and the middle sum are adequately differentfunctions in the DTW sense, whereas in Fig. 9b all the fivefunctions of the first symbol are very different from the onesof the second symbol. For this reason, we should claim thattwo symbols are correctly matched in α and β orientationangles (α ∈ [0 . . . 360], β ∈ [0 . . . 360]), only if they have alow matching cost in α and β angles but also a low matchingcost in the corresponding perpendicular alignment (α + 90and β + 90◦). For this step, let us define as G the matrixwhich stores in position (α, β) the cell MC(α, β) plus itscorresponding perpendicular angle:

G(α, β) = MC(α, β) + MC(α + 90, β + 90) (12)

Thus, the matching cost DTWCostA,B of the symbols Aand B will be defined as the minimum value of the matrixG, where the angles θ and λ correspond to the orientationangles in which the two symbols are matched:

DTW CostA,B = min(G) (13)

Table 1 shows the pseudo-code of the algorithm.Finally, it must be noted that with the proposed descriptor

and matching strategy, we obtain a symbol descriptor andclassifier methodology which is rotation invariant and robustagainst typical elastic deformations present in hand-drawnsymbols. Concerning the complexity of the algorithm, if w

corresponds to the number of angles in which every symbolis oriented, and N is the number of columns of the widestsymbol image, then the complexity is O(W 2 N 2), becausethe DTW matching distance with order O(N 2) is computedW 2 times. This complexity cost is remarkably lower thanO(N 4N ) and O(N 39N ) of existing 2D-DTW approaches[18,38].

123

Page 9: Rotation invariant hand-drawn symbol recognition based on ...dimos/papers/IJDAR2010_Fornes.pdfHand-drawn symbol recognition is a particular case of handwriting recognition, which is

Rotation invariant hand-drawn symbol recognition based on a dynamic time warping model

4 Results

For the evaluation of our approach, we first describe the dat-abases, metrics, comparisons, and experiments performed.

4.1 Benchmarking data

Two benchmarking databases of hand-drawn symbols havebeen used, namely music symbols from musical scores andarchitectural symbols from a sketching interface in a CADframework. These two databases have been chosen for differ-ent purposes. First, with the clefs database, we plan to analyzethe robustness of the proposed approach against deforma-tions. The data set is extracted from modern and old musicscores, and it is used because of the high variability of thesymbols, with important elastic deformations produced bythe different writer styles. With the architectural database,we evaluate the scalability with an increasing number of clas-ses. The architectural data set contains an important num-ber of different classes with different appearance, while theinter-class variability is comparably lower.

4.2 Benchmarking methods

Some benchmarking methods are chosen to compare our pro-posed features and our full DTW approach. The goal is toanalyze the performance of our method but also the suitabil-ity of the set of features we propose.

Zernike moments [13], Generic Fourier Descriptor [46],Angular Radial Transform [14], and a DTW-cyclic methodare used for comparing our DTW approach. Zernikemoments, GFD and ART are classical shape descriptionmethods in the literature. They have been used in symbolrecognition methods, because they are robust to deforma-tions and invariant to scale and rotation. In our experiments,GFD has a radial frequency with value 4, and angular fre-quency with value 9; ART has a radial order with value 2,and angular order with value 11; and 7 moments are used forZernike.

We have also implemented a variation of our own method,named cyclic DTW. The idea is to see how the performancechanges when using an algorithm with a lower computationalcost. It consists of taking the center of mass of the symbol,and for every orientation (from 0 to 180, with a step of 10◦)we only take into account the column that corresponds to thecenter of mass of the shape, and for this “centroid column”,the features used in our approach are computed (the upperand lower profiles, the sum of pixels per region). Thus, onlyone feature vector describes the symbol in every orientation.Then, a DTW-cyclic approach (similar to a string matchingcyclic) is used to match the matrices of the two symbols.

Concerning feature comparison, Marti’s [23] and Rath’s[33] features are compared against our features. In these

experiments, our DTW approach has been applied usingthese features from the literature, which have been describedin Sect. 3. Thus, we compare the proposed features againstthe ones defined by Rath and Marti to establish the suitabilityof our features.

Referring the method proposed in this paper, we use theupper and lower profiles, and the sum of pixels of 3, 4, or5 regions. The features are extracted from every orientation,from 0 to 180◦, also with a step of 10◦.

4.3 Classification

For the classification of the symbols, one representative perclass is usually chosen. Thus, every input symbol of thedatabase is compared to these n representatives, and onlyn comparisons are computed for classifying every inputsymbol. Notice that with this approach, no training pro-cess is required, saving an important computational cost. TheK-nearest neighbor (in our case, 1-NN) is used as the distancefor the classification. The minimum distance will define theclass where the input symbol belongs to.

4.4 Music clefs data set

The data set of music clefs was obtained from a collection ofmodern and old musical scores (19th century) of the Archiveof the Seminar of Barcelona. This database contains a totalof 2,128 samples between the three different types of clefsfrom 24 different authors. These images have been obtainedfrom original documents using a semi-supervised segmenta-tion approach [7]. The main difficulty of this database is thelack of a clear class separability because of the variation ofthe writer styles and the lack of a standard notation. The highvariability of clefs’ appearance from different authors can beobserved in the segmented clefs of Fig. 1.

Under this scenario, the selection of the representativefor each class is not easy. The printed clefs that are shownin Fig. 10a–c are not similar enough to the hand-drawnones. For this reason, we have chosen some hand-drawn

Fig. 10 Printed Clefs and Selected representative clefs: a PrintedTreble clef, b Printed Bass clef, c Printed Alto clef, d Treble represen-tative clef, e Bass representative clef, f and g Two Alto representativeclefs

123

Page 10: Rotation invariant hand-drawn symbol recognition based on ...dimos/papers/IJDAR2010_Fornes.pdfHand-drawn symbol recognition is a particular case of handwriting recognition, which is

A. Fornés et al.

Table 2 Classification of clefs (%): recognition rate (RR.), recall andfall-out of these three music classes using four models

Method Zernike ART GFD DTW- DTW-approachmoment cyclic (5 zones)

RR. trebble 87.7 69.3 98.7 27.1 96.2

RR. bass 63.8 82.3 68.1 91.4 96.5

RR. alto 75.7 97.1 69.7 78.0 97.1

Overall rec. rate 75.7 82.9 78.8 65.5 96.6

Overall precision 80.3 87.5 83.2 68.2 96.9

Overall fall-out 11.9 9 10.2 19.6 1.8

representative clefs: one treble clef (Fig. 10d), one bass clef(Fig. 10e), and two alto clefs (Fig. 10f,g) because of the highvariability in alto clefs. The selected representatives corre-spond to the set median symbol.

Given a database consisting of a set of elements of sev-eral classes and a query class X to recognize from it, let usdefine Positives as the number of elements belonging to theclass X and Negatives as the number of elements that doesnot belong to X . The precision, recognition rate (recall), andfall-out (false positive rate) measures are computed using thefollowing equations:

Precision = |True Positives|(|True Positives| + |False Positives|) (14)

Recognition Rate = Recall = |True Positives||Positives| (15)

Fall − out = false posit. rate = |False Positives||Negatives| (16)

In Table 2, the recognition rates of the classification ofthis data set are shown, where the DTW approach is com-pared to the Zernike moments, GFD, ART, and DTW-cyclic,using the parameters defined above. One can see that withthe proposed method, we reach a recognition rate of 96.9%,significantly improving the Zernike Moments (75.7%), ART(82.9%), GFD (78.8%), and DTW-cyclic (65.5%).

In Table 3, we show the experimental results with somedifferent features that can be used for describing the symbols.In this experiment, our DTW approach is always used, butmaking use of different features described in the literature,specifically those proposed by Rath and Marti. In Table 3,we also show the recognition rates obtained using differentnumbers of regions (3, 4, and 5) in the feature extractionstep of our approach. We can observe that Marti’s featuresperform very well for the trebble and bass clefs (over 97%of recognition rate), but very poor with alto clefs (90%).Contrary, Rath’s features achieve a good performance in altoclefs but have some problems with trebble clefs. Concern-ing our features, we can see that the division of the image

Table 3 Classification of clefs (%): recognition rates (RR.) of thesethree music classes using four models

Method Rath Marti DTW 3z DTW 4z DTW 5z

n f 4 8 5 6 7

RR. trebble clef 95.8 97.3 96.7 96.3 96.2

RR. bass clef 96.1 97.6 96.5 96.3 96.5

RR. alto clef 96.5 90.1 94.3 96.1 97.1

Overall RR. 96.1 95.0 95.8 96.2 96.6

Overall precission 96.5 94.6 96.2 96.6 96.9

Overall fall-out 2.0 2.6 2.2 2.0 1.8

Overall recognition rate (RR.), precision, and fall-out of Rath’s fea-tures, Marti’s features and our DTW features, using 3, 4, and 5 regions(zones). n f = Number of features per column

Fig. 11 The fifty selected representatives for the architectural database

in 3 regions does not provide enough discriminatory powerfor the high variability in alto clefs (we reach a recognitionrate of 94.3%), while the recognition rate increases whenthe number of regions is increased, reaching a 97.1% with 5regions. In addition, it is shown that the features we have usedachieve a better overall recognition rate and precision (96.6and 96.9%, respectively) in comparison with both of Marti’s(95 and 94.6%) and Rath’s ones (96.1 and 96.5%), with alower fall-out (1.8% in comparison with 2.6% of Marti’s and2% of Rath’s ones).

4.5 Architectural symbols data set

The architectural symbol data set is a benchmark database[35] comprising online and off-line instances from a set of 50symbols drawn by a total of 21 users. Each user has drawn atotal of 25 symbols and over 11 instances per symbol. Thus,the database (see examples in Fig. 2) consists on 7,465 indi-vidual instances, consisting of 50 symbols, each class withan average of 150 samples. In this database, the represen-tative selected for each class (Fig. 11) corresponds to theprinted symbol of the class, because both the printed and thehand-drawn symbols are quite similar.

The architectural symbol data set has been used to testthe scalability of our method. In this experiment, we testthe performance under an increasing number of classes.

123

Page 11: Rotation invariant hand-drawn symbol recognition based on ...dimos/papers/IJDAR2010_Fornes.pdfHand-drawn symbol recognition is a particular case of handwriting recognition, which is

Rotation invariant hand-drawn symbol recognition based on a dynamic time warping model

Fig. 12 Classification of architectural hand-drawn symbols: recogni-tion rates using different number of classes

We have started the classification using the first five clas-ses. Iteratively, five classes have been added at each step,and the classification has been repeated. The higher num-ber of classes we introduce, the higher the confusion degreebecomes among them. It is because of the elastic deforma-tions inherent to hand-drawn strokes and the higher num-ber of objects to distinguish. In Fig. 12, the recognitionrates are presented, showing that our approach reaches sig-nificantly higher results than Zernike moments and theDTW-cyclic approach (87% in comparison with 26 and 38%,respectively). The performance of the Zernike moments,ART, GFD, and the DTW-cyclic decreases dramaticallywhen increasing the confusion in terms of the number of clas-ses (Zernike moments decreases from 62.5 to 26.4%, ARTdecreases from 67.8 to 33.9%, GFD decreases from 65.7to 35.8%, and DTW-cyclic decreases from 61.3 to 38.4%with 50 classes), whereas our method is quite robust to theincreasing of the number of different classes participating(from 97.5% with five classes decreases to 87.2% with 50classes).

4.6 Discussions

Our DTW-based method has shown to be suitable for dealingwith hand-drawn symbol recognition problems, being toler-ant to elastic deformations, scale, and rotation. It has showngood performance with symbols with high variability (suchas the music clefs data set), and also, shows a good scalabilitydegree (see the results on the architectural symbols data set),outperforming the Zernike moments, ART, and GFD descrip-tors. The features proposed in our method also outperformthe Rath’s and Marti’s ones.

An important point of our approach consists in the selec-tion of the number of zones and the step orientations. Con-cerning the step orientation, a low value could help in

decreasing the final matching cost, because more features(for each orientation) are computed, and the matching ismore precise. However, one must take into account that if thenumber of angles W increases, the computational cost is alsoincreased (O(W 2 N 2). Concerning the number of zones, theyare used for defining the blurring degree allowed, in otherwords, a low number of zones will decrease the intra-classvariability (but also the inter-class variability) and vice-versa.For this reason, the optimum number of zones will depend oneach data set and will be a trade-off between inter-class andintra-class variability. A common way to look for the opti-mum number of zones is to use different values on a subsetof the database and selecting the value that maximizes therecognition rate.

5 Conclusions

In this paper, we have presented a Dynamic Time Warping-based method for the description and classification of hand-drawn symbols. This approach is rotation and scale invariant,and robust to the deformations typical in hand-drawn sym-bols. The method proposed computes a column sequenceof feature vectors for each orientation of the two symbolsand computes the DTW distance, taking also into accounttheir perpendicular alignment. Our method has been testedwith two hand-drawn symbol databases (music and architec-tural) achieving high recognition rates. Comparison againstsome state-of-the-art descriptors shows the robustness andbetter performance of the proposed approach when classi-fying symbols with high variability in appearance, such asirregular deformations induced by hand-drawn strokes, lowinter-class and high intra-class variabilities.

The main drawback is the high computational cost: eventhough the method proposed is O(w2 N 2), which is remark-ably lower than other existing 2D-DTW approaches (suchas O(N 4N ) and O(N 39N )), it is still not fast enough forperforming symbol recognition in large databases or evenreal-time symbol recognition systems. In this sense, fur-ther work can be focused on developing DTW-variations fordecreasing the time complexity of the algorithm.

Acknowledgments We would like to thank Prof. J.M.Gregori forhis help in accessing the historical archives, and V.Kilchherr andDr. A.Schlapbach for providing support in the experiments. Thiswork has been partially supported by the Spanish projects TIN2008-04998, TIN2009-14633-C03-03 and CONSOLIDER -INGENIO 2010(CSD2007-00018).

References

1. Bartolini, I., Ciaccia, P., Patella, M.: Warp: accurate retrieval ofshapes using phase of fourier descriptors and time warping dis-tance. IEEE Trans. Pattern Anal. Mach. Intell. 27(1), 142–147(2005)

123

Page 12: Rotation invariant hand-drawn symbol recognition based on ...dimos/papers/IJDAR2010_Fornes.pdfHand-drawn symbol recognition is a particular case of handwriting recognition, which is

A. Fornés et al.

2. Belongie, S., Malik, J., Puzicha, J.: Shape matching and object rec-ognition using shape contexts. IEEE Trans. Pattern Anal. Mach.Intell. 24(4), 509–522 (2002)

3. Bunke, H.: Attributed programmed graph grammars and theirapplication to schematic diagram interpretation. IEEE Trans.Pattern Anal. Mach. Intell. 4(6), 574–582 (1982)

4. Chan, K.F., Yeung, D.Y.: Mathematical expression recognition: asurvey. Int. J. Document Anal. Recognit. 3(1), 3–15 (2000)

5. Chou, K.S., Fan, K.C., Fan, T.I.: Peripheral and global features foruse in coarse classification of Chinese characters. Pattern Recog-nit. 30(3), 483–489 (1997)

6. Fonseca M.J., Pimentel C., Jorge J.A.: CALI: an online scribblerecognizer for calligraphic interfaces. In: AAAI Spring Sympo-sium on Sketch Understanding, pp. 51–58. (2002)

7. Fornés, A., Lladós, J., Sánchez, G.: Primitive segmentation in oldhandwritten music scores. In: Liu, W., Lladós, J. (eds.) GraphicsRecognition: Ten Years Review and Future Perspectives, vol. 3926of Lecture Notes in Computer Science, pp. 279–290. Springer(2006)

8. Garain, U., Chaudhuri, B.B.: Recognition of online handwrittenmathematical expressions. IEEE Trans. Syst. Man Cybern. Part B:Cybern. 34(6), 2366–2376 (2004)

9. Golubitsky, O., Watt, S.M.: Online recognition of multi-stroke sym-bols with orthogonal series. International Conference on DocumentAnalysis and Recognition, vol. 2, pp. 1265–1269 (2009)

10. Hse, H., Newton, A.R.: Sketched symbol recognition using Zernikemoments. In: Proceedings of the 17th International Conference onPattern Recognition, vol. 1, pp. 367–370 (2004)

11. Hu, N., Dannenberg, R.B., Tzanetakis, G.: Polyphonic audiomatching and alignment for music retrieval. In: IEEE Workshopon Applications of Signal Processing to Audio and Acoustics,pp. 185–188. New Paltz, New York, October (2003)

12. Keogh, E., Ratanamahatana, C.A.: Exact indexing of dynamic timewarping. Knowl. Inf. Syst. 7(3), 358–386 (2005)

13. Khotanzad, A., Hong, Y.H.: Invariant image recognition byZernike moments. IEEE Trans. Pattern Anal. Mach. Intell. 12(5),489–497 (1990)

14. Kim, W.Y., Kim, Y.S.: A new region-based shape descriptor. Tech-nical report, Hanyang University and Konan Technology (1999)

15. Kornfield, E.M., Manmatha, R., Allan, J.: Text alignmentwith handwritten documents. In Document Image Analysis forLibraries, pp. 195–209. IEEE Computer Society, Washington, DC,USA, (2004)

16. Kruskal, J.B., Liberman, M. : The symmetric time-warping prob-lem: from continuous to discrete. In: Sankoff, D., Kruskal,J.B. (eds.) Time Warps, String Edits, and Macromolecules: TheTheory and Practice of Sequence Comparison., pp. 125–161. Addi-son-Wesley Publishing Co, Reading, MA (1983)

17. Khurshid, K., Faure, C., Vincent, N.: Fusion of word spotting andspatial information for figure caption retrieval in historical docu-ment images. 10th International Conference on Document Analysisand Recognition, vol. 1, pp. 266–270 (2009)

18. Levin, E., Pieraccini, R.: Dynamic planar warping for optical char-acter recognition. In IEEE International Conference on Acoustics,Speech, and Signal Processing, vol. 3, pp. 149–152 (1992)

19. Liang, S., Sun, Z., Li, B.: Sketch retrieval based on spatial relations.Proceedings of International Conference on Computer Graphics,Imaging and Visualization, Beijing, China, pp. 24–29 (2005)

20. Liu, C.L.: Normalization-cooperated gradient feature extractionfor handwritten character recognition. IEEE Trans. Pattern Anal.Mach. Intell., pp. 1465–1469 (2007)

21. Lladós, J., Valveny, E., Sánchez, G., Martí, E.: Symbol recognition:current advances and perspectives. In Lecture Notes in ComputerScience, vol. 2390, pp. 104–128. Springer (2002)

22. Lladós, J., Martí, E., Villanueva, J.J.: Symbol recognition by error-tolerant subgraph matching between region adjacency graphs.

In IEEE Transactions on Pattern Analysis and Machine Intelli-gence, vol. 23(10), pp. 1137–1143, October (2001)

23. Marti, U.V., Bunke, H.: Using a statistical language model toimprove the performance of an hmm-based cursive handwritingrecognition system. Int. J. Pattern Recogni. Artif. Intell. 15, 65–90 (2001)

24. Mas, J., Jorge, J.A., Sánchez, G., Lladós, J.: Represent-ing and parsing sketched symbols using adjacency gram-mars and a grid-directed parser. In: Ogier eds, J.M.,Liu, W., Lladós, J. (eds.), Graphics Recognition: RecentAdvances and New Opportunities, Lecture Notes in ComputerScience, vol. 5046, pp. 176–187. Springer-Verlag (2008)

25. Miyao, H., Maruyama, M.: An online handwritten music symbolrecognition system. Int. J. Document Anal. Recogni. 9(1), 49–58(2007)

26. Mokhtarian, F., Mackworth, A.K.: Scale-based description and rec-ognition of planar curves and two-dimensional shapes. IEEE Trans.Pattern Anal. Mach. Intell. 8(1), 34–43 (1986)

27. Müller, S., Rigoll, G.: Engineering Drawing Database RetrievalUsing Statistical Pattern Spotting Techniques. Graphics Rec-ogni. Recent Adv. Lecture Notes Comput. Sci. 1941, 246–255(2000)

28. Niels, R., Vuurpijl, L.: Using dynamic time warping for intuitivehandwriting recognition. Advances in Graphonomics, Proceedingsof the 12th Conference of the International Graphonomics Society,pp. 217–221 (2005)

29. Orio, N., Schwarz, D.: Alignment of monophonic and poly-phonic music to a score. In San Francisco International ComputerMusic Association, editor, Proceedings of the International Com-puter Music Conference, pp. 155–158, Havana, Cuba, September(2001)

30. Parker, J., Pivovarov, J., Royko, D.: Vector templates for symbolrecognition. In: International Conference on Pattern Recognition,vol. 15, pp. 602–605 (2000)

31. Peng, B., Liu, Y., Wenyin, L., Huang, G.: Sketch recognitionbased on topological spatial relationship. In: Structural, Syntac-tic, and Statistical Pattern Recognition: Lecture Notes in ComputerScience, vol. 3138, pp. 434–443. Springer-Verlag (2004)

32. Rabiner, L.R.: A tutorial on hidden Markov models and selectedapplications inspeech recognition. Proc. IEEE 77(2), 257–286 (1989)

33. Rath, T.M. Manmatha, R.: Word image matching using dynamictime warping. In: Proceedings of the Conference on ComputerVision and Pattern Recognition, vol. 2, pp. 521–527 Madison, WI,June 18–20 (2003)

34. Rusiñol, M., Borràs, A., Lladós, J.: Relational indexing of vecto-rial primitives for symbol spotting in line-drawing images. PatternRecogni. Lett. 31(3), 188–201 (2010)

35. Sanchez, G., Valveny, E., Llados, J., Romeu, J. Mas, Lozano. N.:A platform to extract knowledge from graphic documents. Applica-tion to an architectural sketch understanding scenario. In: Dengel,A. Marinai, S., (eds.), Document Analysis Systems VI, LectureNotes in Computer Science, vol. 3163, pp. 389–400, Florence—Italy, 2004, Springer-Verlag

36. Shi, Y., Li, H.Y., Soong, F.K.: A unified framework for symbolsegmentation and recognition of handwritten mathematical expres-sions. Ninth International Conference on Document Analysis andRecognition, vol. 2, 854–858, Sept. (2007)

37. Tabbone, S., Wendling, L.: Technical symbols recognition usingthe two-dimensional radon transform. In Proceedings of the 16thInternational Conference on Pattern Recognition, vol. 3, pp. 200–203 (2002)

38. Uchida, S., Sakoe, H.: A monotonic and continuous two-dimen-sional warping based on dynamic programming. In Proceedingsof 14th International Conference on Pattern Recognition, vol. 1,pp. 521–524, (1998)

123

Page 13: Rotation invariant hand-drawn symbol recognition based on ...dimos/papers/IJDAR2010_Fornes.pdfHand-drawn symbol recognition is a particular case of handwriting recognition, which is

Rotation invariant hand-drawn symbol recognition based on a dynamic time warping model

39. Uchida, S., Sakoe, H.: A survey of elastic matching techniques forhandwritten character recognition. IEICE Trans. Inf. Syst. E88-D,1781–1790 (2005)

40. Valveny, E., Marti, E.: Hand-drawn symbol recognition in graphicdocuments using deformable template matching and a bayesianframework. Proceedings of the 15th International Conference onPattern Recognition, vol. 2, pp. 239–242 (2000)

41. Vinciarelli, A., Bengio, S., Bunke, H.: Offline recognition of uncon-strained handwritten texts using HMMs and statistical languagemodels. IEEE Trans. Pattern Anal. Mach. Intell., pp. 709–720(2004)

42. Vuori, V., Laaksonen, J., Oja, E., Kangas, J.: Experiments withadaptation strategies for a prototype-based recognition systemfor isolated handwritten characters. Int. J. Document Anal. Rec-ogni. 3(3), 150–159 (2001)

43. Wilfong, G., Sinden, F., Ruedisueli, L.: On-line recognition ofhandwritten symbols. IEEE Trans. Pattern Anal. Mach. Intell.18(9), 935–940 (1996)

44. Xin, G., Cuiyun, L., Jihong, P., Weixin, X.: HMM based onlinehand-drawn graphic symbol recognition. In Proceedings of the 6thInternational Conference on Signal Processing, vol. 2, pp. 1067–1070 (2002)

45. Yang, S.: Symbol recognition via statistical integration of pixel-level constraint histograms: a new descriptor. IEEE Trans. PatternAnal. Mach. Intell. 27(2), 278–281 (2005)

46. Zhang, D.S., Lu, G.: Generic Fourier descriptor for shape-basedimage retrieval. In Proceedings of the IEEE International Confer-ence on Multimedia and Expo, vol. 1, pp. 425–428 (2002)

47. Zhang, W., Liu, W.: A new syntactic approach to graphic symbolrecognition. International Conference on Document Analysis andRecognition vol. 1, pp. 516–520 (2007)

48. Zhang, W., Liu, W., Zhang, K.: Symbol recognition with ker-nel density matching. IEEE Trans. Pattern Anal. Mach. Intell.28(12), 2020–2024 (2006)

123