6
A Real-time Recognition System for Handwritten Mathematics Backtracking and Relationship Discovery Ray Genoe and Tahar Kechadi University College Dublin, Belfield, Dublin 4, Ireland. {ray.genoe, tahar.kechadi}@ucd.ie Abstract This paper describes a real-time approach for hand- written, mathematical expression recognition. Since users can view the output of the system after sketching each stroke, it is useful to retain as much information as possible from previous increments of an expression. However, if subsequent input results in an unordered or multi-stroke symbol, it can have adverse effects on a previously identified expression. Rather than reprocess the entire expression, it would be more beneficial to only reprocess a subexpression. To this end we have devel- oped a backtracking technique, which can revert back to the expression discovered before this subexpression. An added benefit of this technique is that it simplifies other processes of recognition, such as relationship dis- covery. 1. Introduction Real-time recognition is concerned with processing handwritten data, that is captured online, and generat- ing an overall solution as the user writes. Sometimes referred to as online or dynamic recognition, this tech- nique is a relatively new development in the field of handwriting recognition. The system’s output can be displayed to the user during data entry rather than after, so that recognition errors can easily be identified. Of- fline approaches cannot offer the same advantage and recognition errors can often be difficult to identify. One advantage of offline approaches is the fact that the user has finished writing and the speed of recognition is not a primary concern. In [8], average writing rates for alphanumeric characters were reported to be between 1.5 to 2.5 characters per second, so real-time recogni- tion systems must be able to keep up with this speed of writing. The processing power of modern computers, combined with optimised recognition techniques, can be harnessed to deliver an overall solution to the user, as quickly as possible. In systems such as the Progressive Structural Anal- ysis approach described in [9], the expression is built incrementally, each time the user finishes a stroke. The system reduces the complexity of the overall process, by adding the stroke to the existing expression rather than reprocessing all of the input. However, unordered and multi-stroke symbols can have adverse effects on the structure of previous increments of an expression. One approach to this problem is to reprocess all of the symbols again after each symbol is added. This is similar to offline approaches where all of the symbols in an expression have been identified and sorted, such as the projection profile cutting techniques described in [6, 7, 10]. However, repeatedly recognising the entire expression after each stroke, requires the user to de- lay their natural handwriting speed while entering sub- sequent symbols. This is particularly inefficient when recognising large expressions. Other systems impose a symbol order constraint on the user, such as [5]. While this reduces the complexity of the recognition process, the limitations placed on the user are undesirable. 2. Our Approach In our approach, we use an expression tree to repre- sent the structure of an expression. As new symbols are added to the expression, the tree is altered to incorpo- rate them. Nodes are created to represent the symbols and the relationships discovered between them are used to construct the tree. Rather than reprocessing the entire expression again when unordered and multi-stroke sym- bols are encountered, we propose using a backtracking technique and a symbol ordering constraint that is not visible to the user. These techniques are used to retain as much structural information as possible, prior to re- 2010 12th International Conference on Frontiers in Handwriting Recognition 978-0-7695-4221-8/10 $26.00 © 2010 IEEE DOI 10.1109/ICFHR.2010.69 399

[IEEE 2010 International Conference on Frontiers in Handwriting Recognition (ICFHR) - Kolkata, India (2010.11.16-2010.11.18)] 2010 12th International Conference on Frontiers in Handwriting

  • Upload
    tahar

  • View
    217

  • Download
    5

Embed Size (px)

Citation preview

A Real-time Recognition System for Handwritten MathematicsBacktracking and Relationship Discovery

Ray Genoe and Tahar KechadiUniversity College Dublin,Belfield, Dublin 4, Ireland.

{ray.genoe, tahar.kechadi}@ucd.ie

Abstract

This paper describes a real-time approach for hand-written, mathematical expression recognition. Sinceusers can view the output of the system after sketchingeach stroke, it is useful to retain as much informationas possible from previous increments of an expression.However, if subsequent input results in an unorderedor multi-stroke symbol, it can have adverse effects on apreviously identified expression. Rather than reprocessthe entire expression, it would be more beneficial to onlyreprocess a subexpression. To this end we have devel-oped a backtracking technique, which can revert backto the expression discovered before this subexpression.An added benefit of this technique is that it simplifiesother processes of recognition, such as relationship dis-covery.

1. Introduction

Real-time recognition is concerned with processinghandwritten data, that is captured online, and generat-ing an overall solution as the user writes. Sometimesreferred to as online or dynamic recognition, this tech-nique is a relatively new development in the field ofhandwriting recognition. The system’s output can bedisplayed to the user during data entry rather than after,so that recognition errors can easily be identified. Of-fline approaches cannot offer the same advantage andrecognition errors can often be difficult to identify. Oneadvantage of offline approaches is the fact that the userhas finished writing and the speed of recognition is nota primary concern. In [8], average writing rates foralphanumeric characters were reported to be between1.5 to 2.5 characters per second, so real-time recogni-tion systems must be able to keep up with this speed ofwriting. The processing power of modern computers,

combined with optimised recognition techniques, canbe harnessed to deliver an overall solution to the user,as quickly as possible.

In systems such as the Progressive Structural Anal-ysis approach described in [9], the expression is builtincrementally, each time the user finishes a stroke. Thesystem reduces the complexity of the overall process,by adding the stroke to the existing expression ratherthan reprocessing all of the input. However, unorderedand multi-stroke symbols can have adverse effects onthe structure of previous increments of an expression.

One approach to this problem is to reprocess all ofthe symbols again after each symbol is added. This issimilar to offline approaches where all of the symbolsin an expression have been identified and sorted, suchas the projection profile cutting techniques described in[6, 7, 10]. However, repeatedly recognising the entireexpression after each stroke, requires the user to de-lay their natural handwriting speed while entering sub-sequent symbols. This is particularly inefficient whenrecognising large expressions. Other systems impose asymbol order constraint on the user, such as [5]. Whilethis reduces the complexity of the recognition process,the limitations placed on the user are undesirable.

2. Our Approach

In our approach, we use an expression tree to repre-sent the structure of an expression. As new symbols areadded to the expression, the tree is altered to incorpo-rate them. Nodes are created to represent the symbolsand the relationships discovered between them are usedto construct the tree. Rather than reprocessing the entireexpression again when unordered and multi-stroke sym-bols are encountered, we propose using a backtrackingtechnique and a symbol ordering constraint that is notvisible to the user. These techniques are used to retainas much structural information as possible, prior to re-

2010 12th International Conference on Frontiers in Handwriting Recognition

978-0-7695-4221-8/10 $26.00 © 2010 IEEE

DOI 10.1109/ICFHR.2010.69

399

processing a problematic subset of the symbols in theexpression. As we will also describe, this technique isused to simplify many of the processes of recognition,such as finding suitable symbols for relationship discov-ery and reducing the number of relationship positionsthat should be investigated.

3. Ordering Symbols and Backtracking

Figure 1 illustrates an example where recognitionhas taken place and the structure of the expression isrepresented by a tree. Figures 1(b)-1(e) represent thestructure of the expression after each of the four strokesare added to the expression.

(a) Symbols 1 to 4 (b) 𝑇2 (c) 𝑇3

(d) 𝑇4 (e) 𝑇5, The current tree

Figure 1. Incremental tree development.

As new symbols are added during the constructionof an expression, the previous expression tree is associ-ated with each symbol. This allows the system to revertback to the state of the expression before any symbol;i.e., backtracking. For example in Figure 1, the stateof the expression before symbol n is 𝑇𝑛, and the cur-rent tree represents the state of the expression beforethe next symbol is added. During the incremental de-velopment of the tree, the system also maintains an or-dered list of the symbols that have been added to theexpression, which is sorted from left-to-right and top-to-bottom. Therefore, the list of symbols for the ex-pression in Figure 1(a) is {2, c, -, l}. As new symbolsare added they are positioned within this list and back-tracking procedures are executed for unordered sym-bols. Since additional strokes can alter the shape andidentity of previously identified symbols, backtrackingis also required for multi-stroke symbols. We will illus-trate this process by changing the expression ‘2𝑐 − 𝑙’to ‘2𝑐 − 𝑡’ and to ‘2𝑐 − 6𝑙’; i.e., adding a multi-strokesymbol or an unordered symbol.

3.1 Multi-stroke Symbols

Since our approach updates the expression after eachstroke is inserted, it is necessary to change previous in-crements of the expression to reflect multi-stroke sym-bols. To achieve this, the system identifies the initialstroke to be updated, backtracks to the state of the ex-pression before this stroke’s symbol, and rebuilds theexpression using the multi-stroke symbol. An exampleof this can be described using the expression in Fig-ure 1, where the user is creating the expression ‘2𝑐 − 𝑡’.To complete the expression, the user creates the multi-stroke symbol ‘𝑡’ by adding a horizontal line to the ‘𝑙’.Once the ‘𝑡’ has been identified the system backtracksto expression 𝑇4, the expression before the ‘𝑙’. Finallythe multi-stroke symbol is added to the tree and the or-dered list of symbols is updated accordingly.

This is a simple example of where backtracking canbe used to retain as much information as possible fromthe previous increments of the expression. An alterna-tive to this approach would be to alter the current treerather than backtrack. However, the alterations requiredby unordered symbols can be complex and backtrackingcan simplify the procedure.

3.2 Unordered Symbols

The system’s symbol ordering constraint ensuresthat, when a new symbol added to the expression, itis correctly ordered. If it is not positioned at the endof the list, then the backtracking procedure is initiated.An unordered symbol can be entered when a user mis-takenly omits a symbol from the expression. This canalso occur when a user enters symbols in a manner thatis contrary to the natural order of handwriting, such assketching a fraction from bottom-to-top.

Using the example from Figure 1 again, we will dis-cuss the case where the user intended to sketch the ex-pression ‘2𝑐 − 6𝑙’. When the new symbol (‘6’) is addedto the sorted list, it is not the last symbol in the list andtherefore, an unordered symbol. The sorting procedurereturns the list, {2, c, -, 6, l}, and indicates that priorto constructing the expression for the new symbol, thesystem must backtrack to 𝑇4, the expression before ‘𝑙’.Subsequently the symbols ‘6’ and ‘𝑙’ are added to thetree by using the remaining processes of recognition.

In previous work [4], we attempted to create proce-dural rules to reconstruct the tree when unordered sym-bols were encountered. However, when the scope of thesystem’s recognition was extended, the effect that un-ordered symbols have on the expression tree, becomesincreasingly difficult to anticipate. The tree transforma-tion rules are complex and errors are difficult to identify.

400

This led to the current approach, which uses the sortedlist to identify unordered symbols and the backtrackingtechnique to revert to previous states of the expression.

The benefits of using a sorted list is more obvi-ous, when an unordered symbol is placed in an areaof the expression that contains multi-tiered subexpres-sions. Figure 2 illustrates an example where the userhas entered a numerator and denominator, before thehorizontal line that defines them as being part of a frac-tion. In this example, the system has interpreted theexpression as ‘2 + 63𝑦𝑥’ and the list is {2, +, 6, 3, y,x}. During the sorting procedure, the horizontal line isfound to be to the right of the ‘+’ but above the ‘6’.This indicates that the ‘6’ is an unordered symbol andthat the system should backtrack to the expression be-fore it was inserted. Furthermore, this indicates that thesubset of symbols to be reintroduced after backtrackinghas taken place, will be found at position three in thelist. The unordered symbols are removed from the listand reintroduced once the horizontal-line symbol hasbeen allocated a position. These unordered symbols areinserted directly after this position. Therefore, the listwill be {2, +, 3, x, -, 6, y}, and after backtracking theexpression will be ‘2+’. Starting with ‘3’, the remain-ing symbols are then added to the expression, to createthe second tree in Figure 2.

Figure 2. The structural implications ofunordered symbols.

The backtracking technique benefits many other pro-cesses of recognition, such as relationship discovery,structural development and error correction. The re-mainder of this paper will discuss the benefits to therelationship discovery process. Backtracking not onlymakes it easier to identify which symbols to discoverrelationships between, but also reduces the complexityof the spatial analysis technique.

4. Finding Suitable Symbols

When a new symbol is added to an expression, thesystem searches for other symbols that it may have arelationship with and then performs spatial analysis todetermine the best relationship. To discover suitablesymbols, it is useful to know the structure of the ex-pression. The backtracking procedure that is executedfor unordered symbols, usually results in the appropri-ate symbol being found in the last node of the tree; rela-tionships are then investigated between that symbol andthe new symbol. Figure 3 shows an example of this,where relationships should be investigated between ‘𝑏’and the new symbol, ‘3’. It is unnecessary to examineany relationships with other symbols in this example,thereby reducing the complexity of relationship discov-ery.

Figure 3. Finding suitable symbols fromthe tree

4.1 Multi-Tiered Expressions

In some instances there may be special relationshipsfound at the internal nodes which require further exam-ination. By using a top-down approach to examine theright side of the expression tree, we can use contextualinformation to select appropriate symbols. Figure 4, il-lustrates an example where the most suitable symbol isnot contained in the last node of the tree. In this exam-ple, a minus symbol is added to the right of a script rela-tionship, and spatial analysis technique should considerthe ‘2’ and ‘𝑐’, when determining the most suitable re-lationship. When dealing with multi-level expressionssuch as this, we must investigate relationships with theright-most symbol of each level. This ensures that thespatial analysis technique can determine which level thenew symbol belongs to.

4.2 Special Subexpressions

A similar approach to retrieving suitable symbols,other than the last one in the tree, can be applied to theexpression in Figure 5. However in this case, rather thanjust considering the symbols ‘𝑥’ and ‘𝑦’, we must also

401

Figure 4. Multiple symbols for potentialrelationships

investigate any possible relationships between the newsymbol, ‘𝑧’, and the entire fraction. Therefore, we in-clude the fraction’s subexpression in the list of symbolsfor spatial analysis.

Figure 5. Subexpression for relationshipdiscovery

Once the suitable symbols and subexpressions havebeen identified, the spatial analysis technique deter-mines the best relationship that should be constructedfor a new symbol.

5. Relationship Discovery

Mathematical relationships are determined by exam-ining the relative positioning and size of the symbolsin an expression. Our previous approaches to spatialanalysis [3], used fuzzy logic techniques to determinethe best relationships between symbols and producedexcellent results for a subset of mathematical relation-ships. The results of the spatial testing showed thatthe hybrid baseline approach, for discovering relation-ships between symbols, was the most accurate. Fur-thermore, the use of thresholds for our fuzzy function,meant that the low-level rules for positional confidencewere generic. However, the scalability of these ap-proaches was poor, due to the use of complex, codedprocedural rules. These rules were initially created totest the functionality of our fuzzy approach to relation-ship discovery, and while efficient, they often had to beredesigned when more relationships were added. Thisrequired a major effort from the designer of such rules,

and to avoid this the Catchment Area solution was de-veloped. All that is required from the developer of anew rule between two symbols, is the specification ofthe relative vertical and horizontal positions, and an in-dication of the comparative size; i.e., smaller, similar orlarger.

5.1 Symbol Rules

To achieve this, we must first specify what relation-ships can exist between any two given symbols. Thisis done upon creation of a symbol, i.e., after symbolrecognition. Depending on the type of symbol cre-ated, we specify the symbols that can have relation-ships with it, and the names of the relationships be-tween them. This information is used to perform spa-tial analysis between a suitable (old) symbol and thelatest (new) symbol that has been added to an expres-sion. For example, Table 1 describes some of therelationships that can exist between letters and othersymbols. RPAR represents concatenation with rightparenthesis and SUPERF/MULTF represents a super-script/multiplication relationship with both completedand incompleted fractions.

Table 1. Relationship rules for letters (oldsymbol)

Symbols (new) RelationshipsLetters SUPER, SUB, MULTDigits SUPER, SUB, MULTΣ SUPER, SUB, MULT, USUM∫

SUPER, MULTLMINUS SUPER, SUB

( SUPER, SUB, MULT) RPAR

DIV SUPERF, SUB, MULTFNUMER MULTFDENOM MULTF

× RMULT- RMINUS, NUMER+ RPLUS÷ RDIV

This approach is highly scalable, as a user can ex-tend/limit the system’s recognition capabilities by al-tering this table. Furthermore, the complexity of therelationship discovery technique is improved by usingthe identity of the new symbol, to reduce the numberof relationships that should be examined. To make thesystem ultimately scalable, we must provide a similartechnique for relationships.

402

5.2 Relationship Rules

In [4], we attempted to investigate relationships be-tween every symbol in the expression, in an attemptto avoid imposing a symbol ordering constraint on theuser. This was a complex procedure and involved test-ing all relationships between every symbol pair in theexpression, twice; i.e., 𝑟𝑒𝑙𝑥(𝑠𝑦𝑚𝑏𝑜𝑙𝑦, 𝑠𝑦𝑚𝑏𝑜𝑙𝑛𝑒𝑤) and𝑟𝑒𝑙𝑥(𝑠𝑦𝑚𝑏𝑜𝑙𝑛𝑒𝑤, 𝑠𝑦𝑚𝑏𝑜𝑙𝑦). When the best relation-ship was added to the tree, the complexity of the processincreased when the symbol was unordered, and the treetransformation rules that facilitated this approach werenot scalable and error prone.

The complexity of the relationship discovery processhas been reduced by finding suitable symbols in the tree.Furthermore, the results of our investigations into suit-able spatial analysis procedures, in [3], have shown thatcertain spatial positions can be used to suit a variety ofrelationships. Since our symbol ordering constraint dic-tates that the new symbol will be to the right of, or be-low, the other symbols in the expression, this reducesthe number of positions that should be investigated byspatial analysis.

The three character positional attributes, VHS, definewhere the new symbol should be positioned in relationto another, and how its size should be. It represents thevertical, horizontal and size comparison values, whichare defined in Figures 6-8.

The vertical positions define where the key verticalfeature, usually the baseline, of the new symbol shouldbe approximately positioned. This feature can vary de-pending on the symbol in question and adjustments aremade for symbols whose bottom y-coordinate are notfound on the natural baseline of an expression; for ex-ample descenders and fractions. Positions 0 and 6, inFigure 6, can be determined by swapping the symbolsand following the same procedure as 5 and 2, respec-tively. For example, the same approach to right-centredvertical confidence (position 2) for the relationship pair‘3−’ can be used for the left-centred relationship ‘−3’.

Figure 6. Vertical Positions

The three positions in Figure 7 define where newsymbol should be horizontally positioned. Position R

specifies where its left edge should be positioned in re-lation to the old symbol. Position d represents a dom-inated symbol, such as the denominator of a fraction,and indicates where the centre x-coordinate should belocated. Position D represents a dominant symbol, suchas the dividing-line of a fraction. In this instance, thecentre x-coordinate of the old symbol should be posi-tioned within this area.

Figure 7. Horizontal Positions

The three sizes in Figure 8 are used to define whetherthe new symbol should be larger (L), smaller (S) or asimilar size (N), to the old symbol. Size confidence isdetermined by comparing the specific size features ofsymbols. Adjustments are made to the height of cer-tain symbols, such as descenders and ascenders, whencomparing them to other types of symbols.

Figure 8. Size Comparison

With these positional and size attributes, a vast ar-ray of mathematical relationships can be defined. Forexample, the superscript relationship rule states thatthe positional attribute is 1RS. This means that thenew symbol should be slightly higher, to the right andsmaller than the other. To provide another example, asquare-root relationship’s positional attribute could be2d@. This means that a new symbol should be right-centred and dominated by the square-root symbol, andthat the size of the symbol has no bearing on the out-come of the relationship.

The hybrid baseline technique, described in [3], isused to determine the confidence of each relationship.This is an important factor when analysing handwrit-ten input, since different relationships may be found inoverlapping positions. For example, a slight alterationin the placement and size of the ‘𝑥’ in the multiplicationrelationship ‘𝑎𝑥’ can result in a superscript or subscriptrelationship being discovered with the ‘𝑎’.

403

6 Testing

We collected a dataset of 247 expressions from mul-tiple writers, which produced no symbol recognition er-rors. The 2425 mathematical relationships containedwithin these expressions included implicit relationshipssuch as scripts, concatenation and multiplication andexplicit relationships such as binary and unary opera-tors. Our dataset contained 24 unordered symbols and474 multi-stroke symbols, which required 554 relation-ships to be re-examined. The system failed to cor-rectly identify 8 expressions (3.24%). This was due to6 relationship classification errors (0.25%) and 2 struc-tural errors. Each of the 6 relationships were identifiedbut were not chosen as the best relationship to be con-structed. We will describe the construction technique inanother publication and discuss these structural errors.

Due to the absence of a common corpus of ex-pressions and the limited research in the area of real-time recognition, it is difficult to compare these re-sults to other techniques. However, we can comparethem to one recognition system, by using the samedataset. Fitzgerald’s Fuzzy Shift-Reduce Parser (FSRP)[2], which uses the same symbol recogniser [1] as oursystem, accepts a completed expression and performsrecognition after all the symbols have been added; i.e.,offline recognition. In a similar manner to our approach,FSRP pursues the most likely relationships when con-structing an expression. However, the most accurate im-plementation of the system, FSRP(10), examines up to10 solutions when ambiguous relationships are encoun-tered. Our results show that by using the backtrackingtechnique and limiting the number of symbols to inves-tigate relationships with, our approach is faster than themost efficient FSRP procedure. Futhermore, our refinedfuzzy relationship rules and construction techniques aremore accurate than FSRP(10).

Table 2. Comparing our approach to FSRP(1.3 GHz processor)

Recognition Average TimeSystem Accuracy per ExpressionOur System 96.76% 0.45 secsFSRP 83.81% 0.49 secsFSRP(10) 91.9% 1.02 secs

7. Conclusion

The system presented in this paper provides real-time mathematical expression recognition, by incre-

mentally developing an expression tree while the usersketches the input. By using a backtracking techniqueand a symbol ordering constraint, the system retains asmuch information as possible when required to read-dress previous increments of the expression. As de-scribed, these techniques also simplify the process ofrelationship discovery. Suitable symbols are discoveredby using the structure of the expression and the spa-tial analysis technique is optimised by minimising thenumber of relationship positions to investigate. Further-more, the scalability of the relationship discovery pro-cess has been improved by using symbol specific rela-tionship rules. This allows developers to easily extendthe capabilities of the system when incorporating newmathematical symbols and relationships.

References

[1] J. Fitzgerald, F. Geiselbrechtinger, and T. Kechadi. Ap-plication of fuzzy logic to online recognition of hand-written symbols. In IWFHR ’04: The 9th Int. Workshopon Frontiers in Handwriting Recognition.

[2] J. Fitzgerald, F. Geiselbrechtinger, and T. Kechadi.Structural analysis of handwritten mathematical expres-sions through fuzzy parsing. In IASTED Int. Conf onAdvances in Computer Science and Technology, 2006.

[3] R. Genoe and T. Kechadi. Fuzzy spatial analysis tech-niques for mathematical expression recognition. InICAISC ’10: The 10th Int. Conference on Artificial In-telligence and Soft Computing.

[4] R. G. Genoe, J. Fitzgerald, and T. Kechadi. A purely on-line approach to mathematical expression recognition.In IWFHR ’06: The 10th Int. Workshop on Frontiers inHandwriting Recognition.

[5] A. Kosmala and G. Rigoll. Recognition of On-Linehandwritten formulas. In IWFHR ’98: The 6th Int.Work. on Frontiers in Handwriting Recognition.

[6] N. Okamoto and B. Miao. Recognition of mathematicalexpressions by using the layout structures of symbols.In ICDAR ’91: Proc. of the 1st Int. Conference on Doc-ument Analysis and Recognition.

[7] N. Okamoto and A. Miuazawa. An experimental imple-mentation of a document recognition system for paperscontaining mathematical expressions. Structural Docu-ment Image Analysis, pages 36–53, 1992.

[8] C. Tappert, C. Suen, and T. Wakahara. The state of theart in On-Line handwriting recognition. IEEE Trans-actions on Pattern Analysis and Machine Intelligence,12(8):787–808, 1990.

[9] B. Vuong, S. Hui, and Y. He. Progressive structuralanalysis for dynamic recognition of on-line handwrittenmathematical expressions. Pattern Recognition Letters,29(5):647–655, 2008.

[10] Z. Wang and C. Faure. Structural analysis of handwrit-ten mathematical expressions. In ICPR ’88: Proceed-ings of the 9th Int. Conf. on Pattern Recognition.

404