A new approach for recognizing handwritten mathematics using relational ...glabahn/Papers/fuzzy.pdf · A new approach for recognizing handwritten mathematics ... A new approach for

IJDARDOI 10.1007/s10032-012-0184-x

ORIGINAL PAPER

A new approach for recognizing handwritten mathematicsusing relational grammars and fuzzy sets

Scott MacLean · George Labahn

Received: 21 July 2010 / Revised: 29 February 2012 / Accepted: 3 March 2012© Springer-Verlag 2012

Abstract We present a new approach for parsing two-dimensional input using relational grammars and fuzzy sets.A fast, incremental parsing algorithm is developed, moti-vated by the two-dimensional structure of written mathemat-ics. The approach reports all identifiable parses of the input.The parses are represented as a fuzzy set, in which the mem-bership grade of a parse measures the similarity betweenit and the handwritten input. To identify and report parsesefficiently, we adapt and apply existing techniques such asrectangular partitions and shared parse forests, and introducenew ideas such as relational classes and interchangeability.We also present a correction mechanism that allows users tonavigate parse results and choose the correct interpretation incase of recognition errors or ambiguity. Such corrections areincorporated into subsequent incremental recognition results.Finally, we include two empirical evaluations of our recog-nizer. One uses a novel user-oriented correction count metric,while the other replicates the CROHME 2011 math recogni-tion contest. Both evaluations demonstrate the effectivenessof our proposed approach.

1 Introduction

It is generally acknowledged that the primary methods bywhich people input mathematics on computers (e.g., LATEX,Maple, Mathematica) are unintuitive and error-prone. A morenatural method, at least on pen-based devices, is to use hand-written input. However, automatic recognition of handwrit-

S. MacLean (B) · G. LabahnCheriton School of Computer Science, University of Waterloo,200 University Ave. W., Waterloo, ON N2L 3G1, Canadae-mail: [email protected]

G. Labahne-mail: [email protected]

ten mathematical expressions is a hard problem. Indeed, evenafter forty years of research, no existing recognition systemis able to accurately recognize a wide range of mathematicalnotation [5,9].

There are many similarities between math notation andother natural languages [6]. In particular, notations are notformally defined and can be ambiguous. For example, with-out contextual information, it is impossible to tell whether thenotation u(x+y) denotes a function application or a multipli-cation operation. Such semantic ambiguities, along with thesyntactic ambiguities generally associated with handwritingrecognition, make math notation a challenging recognitiondomain. These difficulties are compounded by the two-dimensional structures prevalent in handwritten math. Manywell-known approaches for recognition and domain mod-eling (e.g., Markov models, grammars, dictionary lookup)cannot be directly applied to the more complicated structuresfound in math notation.

The present work describes the math recognition systemused in MathBrush, our pen-based system for interactivemathematics [17]. Using MathBrush, a user writes mathe-matical expressions as if they were using pen and paper. Afterentry, expressions may be edited and manipulated using com-puter algebra system (CAS) operations that are invoked bypen-based interaction. The output of CAS operations may becaptured and further edited with the pen.

MathBrush is designed for real-time interaction, recogniz-ing and reporting parse results as the user is writing. Its recog-nizer must therefore be fast enough to update parse results inreal-time and flexible enough to support a mix of typeset andhandwritten input. However, the ambiguity present in hand-written math makes it all but impossible to develop a recog-nition system with perfect or even near-perfect accuracy. Toensure that the system remains usable when recognition isimperfect, we believe that it is essential to capture multiple

123

S. MacLean, G. Labahn

parses simultaneously, and, by so doing, to allow the user tochoose between alternatives in case of recognition errors. Atthe same time, the system must remain highly responsive, soefficiency is vital.

Motivated by these design considerations, this paperpresents two main contributions in regard to recogniz-ing handwritten math notation: fuzzy relational context-free grammars (fuzzy r-CFGs) and relational classes. Fuzzyr-CFGs specifically address recognition problems in struc-tured, ambiguous domains. The definition of a fuzzy r-CFGincorporates not only the structure of the recognition domain,but also the uncertainty inherent in recognizing that structure.The grammar, thus, provides a formal model of the recogni-tion process itself and is presented in detail in Sect. 3.

Relational classes are a generalization and formalizationof the syntactic symbol classes used by other authors. Theyrepresent families of similar-looking mathematical expres-sions and replace the typical assumption of independencebetween parse tree branches. Parses within a relational classare assumed to be interchangeable within a larger parse with-out affecting the parse confidence. This interchangeabilityreduces the number of expressions that must be consideredwhen forming parse trees.

Together, these two ideas form the basis for a two-stepparsing process that captures all recognizable parses ofthe input, while avoiding the exponential runtime typicallyassociated with such an approach. In the first step, terminalsymbols and potential subexpression structures are identi-fied efficiently using a fuzzy r-CFG and associated parsingalgorithms. This step generates a shared parse forest thatsimultaneously represents every recognizable parse tree. Inthe second step, particular parse trees are extracted fromthe shared parse forest in decreasing order of recognitionconfidence. Each tree represents the syntactic layout of amath expression and is rewritten to represent the expres-sion’s semantics. Trees are generated on demand as they arerequested by the user through the interface of MathBrush. Weuse a modest set of relational classes to improve recognitionaccuracy while maintaining fast tree extraction. Althoughthere may be exponentially many trees, extraction of a singletree remains fast, so that the user experiences no delay.

A secondary contribution of this paper is the correctioncount metric for evaluating recognition accuracy (Sect. 6).The metric provides an implementation-independent way tocompare the effectiveness of recognizers that permit usersto make corrections to their output in case of errors. Unlikethe expression-level correctness rate that typically serves as alowest common denominator for recognizer comparison, thecorrection count metric measures “how correct” a particularrecognition result is without reference to the implementationdetails of any one recognizer.

Grammar-based approaches have been proposed for rec-ognizing mathematical notation for decades—the next sec-

tion surveys past and current related research. Section 3presents fuzzy r-CFGs in an abstract setting, while Sect. 4describes our two-step parsing process in detail. Section 5provides some details of the symbol and relation classifi-cation systems used in MathBrush. Section 6 evaluates ourmath recognition system empirically on a publicly availabledata set. Finally, Sect. 7 offers some concluding remarks andfuture research directions.

2 Related work

There are a large number of existing grammar formal-isms addressing the problem of two-dimensional parsing;Marriott, Meyer, and Wittenberg [29] provide an extensivesurvey. Generally, each formalism strikes a balance betweenthe severity of the geometric constraints it imposes on aparser’s input and the amount of time and space requiredto parse that input. Some formalisms, like general relationalgrammars and graph rewriting grammars, impose so littleconstraint that it is intractable to parse the languages they gen-erate. Our work most closely resembles Tomita’s 2D-CFGs[36] and the positional grammars developed by Costagliolaand others [11].

Both of these formulations fit input elements into a rect-angular grid-based structure. In the 2D-CFG case, each gridcell must be filled by a terminal, and adjacent grid regionsof the same height (width) may be merged into horizontal(vertical) concatenations by applying grammar productions.However, this formalism is not rich enough to support math-ematical notation. For example, consider fitting the symbolsof the expression a

b +1 into a regular grid using one terminalper cell. The fraction requires three vertically adjacent cells,each of which has a neighbor to the right. But these threeneighboring cells cannot be filled by the plus sign alone, andnone are allowed to be empty. The expression can thereforenot be represented by a 2D-CFG in a natural way.

The positional grammar formalism allows grid cells tobe empty, avoiding the problem we just saw with 2D-CFGs.Positional grammar productions take the general form A⇒A1r1 A2 · · · rk−1 Ak , where the Ai are terminals and/or non-terminals, and the ri are relations describing spatial config-urations. Crucially, given a current grid cell in the input anda grammar relation, the next cell in which parsing contin-ues must be uniquely determined. Positional grammars, thus,require distinct language elements to be parsed by distinctpaths through the input. Consider again the example a

b + 1,which one might naturally parse by following the path a ↓— ↓ b → + → 1, where the arrows indicate grammarrelations. But then how would one parse a

b+1 ? After parsingthe b, the→ relation can direct the grammar to move hori-zontally, as required for the second case, or horizontally andupward, as required for the first case. But it cannot choose

123

A new approach for recognizing handwritten mathematics using relational grammars and fuzzy sets

one or the other depending on the circumstances. Again, theformalism is too weak for math notation.

Our work adopts aspects of both of these formalisms. Weinclude multiple geometric relations similarly to positionalgrammars, but restrict the form of grammar productions,so that a single production may only use a single relation.This restriction facilitates efficient parsing via partitioningthe input into horizontal and vertical concatenations—anaspect of Tomita’s work that we share (along with Chou,as discussed below). Unlike 2D-CFGs, we do not requireconcatenated regions to have the same size and do not fit ter-minal symbols into a grid. And we omit the requirement ofpositional grammars that each relation direct the parser to aunique successor position. Instead, we use fuzzy sets to main-tain multiple successor positions and associated confidencescores.

For math recognition in particular, grammar models areoccasionally used through rule-based approaches, as inAlgoSketch [21] and its relatives, which take an approachsimilar to those of Zanibbi et al. [41] and Rutherford [33].In Zanibbi’s DRACULAE system, the input is processed inthree passes based on tree rewriting. The first pass identi-fies baselines, which it organizes in a tree structure in whicheach edge represent geometric relationships. The second passrewrites geometric structures as syntactic structures. (E.g., ahorizontal line above one subexpression and below anotherwould be rewritten as a fraction). The third pass rewritessyntactic structures as semantic parse trees suitable for com-putation. (E.g., the fraction would be rewritten as a divisionoperation.)

Grammars may also be used as a verification step to con-firm the validity of an expression recognized by some othermeans (e.g., [13,37]). Garain and Chaudhuri, for example,view an expression as a collection of horizontal baselines,which they call “levels”. Each symbol is placed in a levelas it is drawn. After the entire drawing is complete, it issegmented into horizontal and vertical “stripes,” which aremerged based on grammar rules and the levels. The result-ing parse is validated against a grammar that generates TEXstrings. If the parse is not valid, then alternatives are soughtfrom the symbol recognizer.

Often, grammars are used as a flexible rule set to guide arecognition process based on traditional parsing techniques.Such use goes back as far as Anderson’s original graphi-cal rewriting system [2]. Already many of the features ofmodern grammar-based recognition systems are present inAnderson’s work. In his approach, each grammar produc-tion is equipped with predicates that determine how a setof input elements should be partitioned for parsing. Thepredicates include rules applying to the minimum, maxi-mum, and central x- and y-coordinates of expression andsymbol bounding boxes, as well as threshold-based rulesto ensure proper alignment of neighboring subexpressions.

Productions not containing a terminal symbol on the RHS arerestricted to contain only two non-terminals, as in ChomskyNormal Form. In such cases, Anderson includes a restrictionsimilar to one later proposed by Liang et al. [22] (which wewill encounter in Sect. 4) that input elements should be par-titioned by drawing a straight line in the plane that splits theelements into two subsets and does not intersect any of them.

Another early grammar-based approach was proposed byBelaid and Haton [4]. In it, a symbol is chosen by the systemas a starting point, and the symbols around it are divided into“zones” depending on context and grammar productions. Thezones are then processed recursively in a top-down parsingscheme directed by operator identities. If the union of zonesdoes not cover the desired subset of the input (i.e., the wholeinput for a complete parse), then the grammar is searchedfor rules that derive the starting point operator, and parsingproceeds in a bottom-up fashion.

Our ordering assumption, described in Sect. 4, whichtreats grammar productions as either vertical or horizontalconcatenation, was first developed by Chou [10], who useda stochastic grammar to recognize synthetic binary imagesof math expressions. In Chou’s model, each grammar pro-duction is assumed to be equally likely, terminal symbols areassigned probabilities based on Hamming distance betweenthe input image and a template, and geometric relation-ships between symbols and subexpressions are determinedby fixed, non-probabilistic rules about symbol size and base-lines. Symbol ambiguity is permitted up to a predeterminedprobability threshold, and relation ambiguity is permittedonly in the case of horizontal concatenation where base-lines differ by one pixel or less. The most likely parse tree isobtained by a variant of the Cocke-Younger-Kasami (CYK)CFG parsing algorithm.

Winker et al. [39] later proposed an approach that pro-vided for some ambiguity in geometric relationships, but notin terminal symbol identities. In their system, the space sur-rounding certain mathematical operators (e.g., fraction bar,summation sign, square root) is divided into a number ofregions. Symbols lying in different regions may lead to dif-ferent parses, and each of these possible parses is maintainedand reported to the user. However, each time an ambigu-ity is detected, the parse is duplicated and both possibilitiesexplored independently. This may lead to an exponential run-time, as well as an exponential number of parse results.

Miller and Viola [30], expanding on earlier work of Hull[16], proposed an approach in which a graph is dynamicallycreated, the edges of which represent the application of gram-mar rules. They weight the graph edges by probability esti-mates of the parse results and use A∗ to navigate to the mostlikely parse. The probability of a parse tree is the product ofterminal symbol recognition probabilities and probabilitiesderived from geometric rules. Each production is assumed tobe equally likely. Miller and Viola also introduced a convex

123


hull constraint. Instead of parsing all 2n subsets of the input,this constraint restricts the parser to only those subsets thatare geometrically convex, leading to a significant speedup.

To allow for ambiguous parse results, Miller and Violadivided the set of terminal symbols into four syntactic clas-ses and allowed the parser to choose the class that best fits intothe expression’s geometry. This does permit some ambiguitybetween symbols, but if the top-ranked candidate within aclass is incorrect, or if the syntactic class eventually settledupon by the parser is incorrect, there is no way to choose analternative. The geometric rules used in the approach are notdescribed, making it difficult to assess how much ambiguityin an expression’s geometry they permit.

A CYK-based approach was developed by Álvaro et al. [1]for parsing typeset mathematics using 2-D stochastic CFGs.In their approach, the grammar is written in Chomsky nor-mal form, and each production is associated with a geomet-ric relation. Probability distributions over the relations weredefined manually, and distributions over terminal symbolswere derived from neural network outputs. These values areincorporated into a CYK parsing algorithm that outputs themost likely overall parse.

The four approaches just cited use probabilistic grammarsto recognize math expressions. The main characteristic ofprobabilistic grammars is that one can assign probabilitiesto individual productions, and hence to derivation sequencesand language elements. Each of the approaches applies a uni-form distribution to grammar productions and treats terminalsymbols and/or geometric relationships as probabilistic. Inthis context, the use of a uniform distribution over produc-tions can be interpreted as making no assumptions about thecomposition of mathematical expressions aside from theirformal properties, which are encoded in the grammar rulesthemselves. This lack of assumptions means that, at any givenparsing step, the parser is not influenced by the productiondistribution and considers all its options to be equally likely.It also provides a neutral starting point from which train-ing algorithms may learn more specific production probabil-ities. But even the uniform distribution cannot completelyavoid bias during parsing, as stochastic CFGs are inherentlybiased toward parse trees of small height. Each derivationstep reduces the probability of a parse, so, all else beingequal, a more deeply nested tree will be considered to be lessprobable than a shallow tree. Approaches using stochasticgrammars with uniform distributions and no training inheritthis downside without benefiting from the ability to use arealistic distribution over productions.

The multiplicative model used by these approachesassumes independence of individual symbols and subexpres-sions. This assumption is invalid (though understandable forthe sake of runtime efficiency) in many domains, includ-ing math recognition. Consider, for example, the expressionax3 + bx2 + c? + d, and suppose the system is trying to

determine the identity of the letter indicated by a questionmark. Assuming independence, it may be more likely that themissing letter is classified as X or λ than the x that a humanreader would expect. Miller and Viola [30] also point out thenecessity of incorporating contextual information. Indeed,they give a convincing quantitative argument as motivationfor their division of terminal symbols into syntactic classeswhich we discussed earlier. But it is unclear how these classesare incorporated into their geometry rules and how they fitinto their probabilistic model, which assumes independenceof all terminals.

As alluded to above, independence assumptions areknown to degrade parser performance in natural languageapplications. A common technique used to improve perfor-mance is to use the geometric mean of probabilities ratherthan their product [28, §12.1]. This leads to the same typeof scoring function we propose in the next section for usewith fuzzy r-CFGs. Our approach is inspired by probabilisticparsing techniques, but is not, strictly speaking, a valid prob-abilistic method. While some approaches exist for relaxingthe independence assumption in natural language processing(e.g., that of Caraballo and Charniak [7]), they use n-grammodels, which may only be applied to mathematics whenlarge corpora of realistic expressions are available.

One such corpus, owned by Microsoft, has been used byShi, Li, and Soong [35] to develop a math recognizer based onhidden Markov models. In their method, an observed strokesequence is used to compute likelihoods of stroke bound-aries, symbol identities, and relations between symbols. Incontrast to the work cited earlier, probability distributionsfor symbol and relation bigrams are determined empiricallyfrom the corpus data. A symbol graph representing alterna-tive sequences of symbol identity assignment is created, andthe optimal sequence is extracted and passed to a separatemath structure analysis system. The approach of Shi et al.is sensitive to writing order—symbols must be written withconsecutive strokes, and symbols adjacent in time must alsobe adjacent (in some model-specific sense) in the expressionbeing written. These temporal restrictions are insufficient forour purposes. For example, a user may draw two parenthesesseparated by empty space and then fill in the space with a frac-tion. We wish to recognize such inputs, in which consecutivestrokes are not directly connected by geometric relationships.

Even so, such probabilistic models are promising andshould be extended and generalized once large realistic, cor-pora of handwritten math expressions are generally available.In the meantime, it is reasonable to model the ambiguityin math recognition using fuzzy sets rather than probabil-ity distributions. Recent work by Zhang et al. [42] addedsupport for fuzzy geometric relations to FFES, a math rec-ognition system using Zanibbi’s DRACULAE recognizer,so that it could report multiple interpretations of the user’sinput. However, the number of interpretations can only be

123


controlled by adjusting thresholds. The ambiguous interpre-tations are pursued independently and reported to the usersimultaneously, so recognition time is proportional to theamount of ambiguity in the drawing.

Fitzgerald et al. [12] proposed an approach to math recog-nition which models geometric relations as fuzzy constraintson the productions of an attribute grammar. In case of strongambiguity in the identity of a symbol or geometric relation,the input is parsed repeatedly, choosing different identitieseach time in a best-first exploration of alternatives. Two-sidedsigmoid functions of geometric features are used to representthe membership grades in fuzzy geometric relation. It is notclear if these membership grades combine into grades forentire parse trees, or if parses are simply reported as theyarise from the best-first parsing process. On one hand, thesystem is efficient in the sense that only strong ambiguitiescause it to investigate alternative parses, but on the other,such alternatives must be parsed independently and serially,reducing efficiency. What constitutes a strong ambiguity ishard-coded, so the correct parse may fail to be reported ifthe membership grade of a particular geometric relation fallsbelow a threshold. Like the work of Fitzgerald et al, ourmethod of tree extraction—presented in Sect. 4.3—employsa best-first search strategy. But, it can report as many treesas the user requests, making available all parse trees withnon-zero recognition confidence.

Our work avoids many of the difficulties identified inthis section. It permits ambiguity of terminal symbols aswell as geometric relations. It does not limit the numberof alternative parses and allows users to select betweenthem, but it does not construct all of the (potentiallyexponentially many) interpretations of the user’s writingbefore producing output. By using fuzzy sets, we need notbe constrained by probabilistic independence and can usecontext-sensitive scoring functions when combining subex-pressions into larger expressions. By generating interpreta-tions on demand as the user requests them, only as muchwork is done as is necessary to produce the correct interpre-tation.

3 Fuzzy relational context-free grammars

Recognition may generally be seen as a process by whichan observed, ambiguous, input is interpreted as a certain,structured expression. Fuzzy r-CFGs explicitly model thisinterpretation process as a fuzzy relation between concreteinputs and abstract expressions. The formalism, therefore,captures not only the idealized, abstract syntax of domainobjects (as with a typical context-free grammar), but also theambiguity inherent in the recognition process itself. In thissection, we define fuzzy r-CFGs, give examples of their use,and describe fuzzy parsing in an abstract setting.

Fig. 1 Ambiguousmathematical expressions

3.1 Review of fuzzy sets

Recall that a fuzzy set X is a pair (X, μ), where X is someunderlying set and μ : X → [0, 1] is a function giving themembership grade in X of each element x ∈ X . A fuzzyrelation on X is a fuzzy set (X × X, μ). The notions of setunion, intersection, Cartesian product, etc. can be similarlyextended to fuzzy sets. For details, refer to Zadeh [40]. Todenote the grade of membership of a in a fuzzy set X , we willwrite X(a) rather than referring explicitly to the name of themembership function, which is typically left unspecified. Byx ∈ X = (X, μ), we mean μ(x) > 0, and by |X | we meanthe number of elements having non-zero membership grade,rather than the sum of membership grades over x ∈ X .

3.2 Fuzzy r-CFG intuition

Before defining fuzzy r-CFGs formally, we motivate the def-inition through an example. Consider the top expression inFig. 1: reasonable interpretations include Ax + b, Ax + 6,Ax + b, Ax tb, etc. There are three important points to note.The identity of each terminal symbol is ambiguous. (Gen-erally, even which strokes ought to be combined to form adistinct symbol may be ambiguous, as in the bottom pair ofexpressions in the figure. Does the first contain a binomialterm or a fraction? Is the second km or lcm?) The geomet-ric relationships between symbols and subexpressions deter-mine the mathematical semantics with which those symbolsshould be interpreted (e.g., Ax vs. Ax ). And, given a partic-ular choice of symbol identities and spatial relationships, allremaining ambiguity is semantic. In cases of semantic ambi-guity (i.e., the same notation representing multiple distinctexpressions, as in u(x + y)), it is not possible for a syntacticparser to reliably obtain the correct parse.

The rules governing the assembly of mathematical expres-sions from their component parts are known with certainty.Uncertainty only arises because the identities of and rela-tionships between handwritten symbols are ambiguous. Thisambiguity is a property of each particular drawing, not of theformal structure of math expressions. As such, it is not nec-essary to assign probabilities or fuzzy membership grades toeach production in our grammar, though such an extensioncould certainly be made to incorporate prior domain knowl-edge. Instead, we model formally only the ambiguity arisingfrom symbol and geometric relation classification processes.

123


This represents a significant difference between our modeland traditional fuzzy grammars. Typically, fuzzy grammarsare a straightforward modification of probabilistic grammarsin which each non-terminal is considered to be a fuzzy set,and its productions are each assigned a membership grade[20]. Such grammars generate languages in which each stringhas a particular grade of membership, just as each string in astochastic language is derived with a particular probability.

In our approach, the language itself is not fuzzy. It consistsof exactly those math expressions derivable from the gram-mar’s start symbol. Fuzziness represents the degree to whicha particular drawing is compatible with a particular mathexpression in the grammar’s language. It is therefore not aproperty of an expression, but of the input, and it provides away to measure the similarity between a handwritten inputand one of its potential interpretations as a math expression.

3.3 Fuzzy r-CFG definition

Fuzzy relational context-free grammars are formally definedas follows:

Definition 1 A fuzzy relational context-free grammar G is atuple (�, N , S, T, R, r�, P), where:

– � is a set of terminal symbols;– N is a set of non-terminal symbols;– S ∈ N is the start symbol;– T is a set of observables;– R is a set of fuzzy relations on I , where I is the set of

interpretations, defined below;– r� is a fuzzy relation on (T, �); and– P is a set of productions, each of the form A0

r⇒A1 A2 · · · Ak , where A0 ∈ N , r ∈ R, and A1, . . . , Ak ∈N ∪�.

The form of a fuzzy r-CFG is generally similar to that ofa traditional context-free grammar. We point out the differ-ences below.

3.3.1 Observables

The set T of observables represents the set of all possibleconcrete inputs. Formally, T must be closed under union andintersection. In practice, for online recognition problems, anobservable t ∈ T is a set of ink strokes, each tracing out aparticular curve in the (x, y) plane.

3.3.2 Set of interpretations

While typical context-free grammars deal with strings, wecall the objects derivable from fuzzy r-CFGs expressions.Any terminal symbol α ∈ � is an expression. An expression

e may also be formed by concatenating a number of expres-sions e1, . . . , ek by a relation r ∈ R. Such an r-concatena-tion is written (e1re2r · · · rek). The size |e| of an expressioncounts the number of terminal symbols that appear in it. Forexample, the size of Ax is |A ↗ x | = 2 and the size ofAx + b is |(A ↗ x)→ +→ b| = 4. In these expressions,the parentheses indicate subexpression grouping, not termi-nal symbols. The arrows indicate spatial relationships corre-sponding to general writing direction; they are described indetail below.

The representable set of G is the set E of all expressionsderivable from the non-terminals in N using productions inP . It may be constructed inductively as follows:

For each terminal α ∈ �, Eα = {α}.For each production p of the form A0

r⇒ A1 · · · Ak ,

E p ={(e1r · · · rek) : ei ∈ E Ai

}.

For each non-terminal A,

E A =⋃

p∈P having LHS A

E p;

and finally,

E =⋃

A∈N

E A.

The set of interpretations is I = T × E . A pair in I corre-sponds to the interpretation, through recognition, of a partic-ular observable as a particular expression. We make I fuzzyby assigning membership grades to each interpretation. Asdescribed in the motivation section, the membership gradeI ((t, e)) measures the degree to which the observable t iscompatible with the expression e, as measured by the fuzzyrelations in the grammar. Note that the (crisp) language gen-erated by G is ES , where S is the start symbol.

The language structure generated by a fuzzy r-CFG is sim-ilar to one we used in previous work on sketch corpus creation[24]. In that formulation, a grammar was used as a generativelanguage model for producing random mathematical expres-sions. Context-sensitive probabilities were assigned to pro-duction rules, similarly to stochastic CFG applications. Moreimportantly, recognition processes were not involved, so therelations acted only on expressions, not on interpretations.This distinction reflects the grammar’s purpose in each case:as a language model for the generation application and as amodel of sketch interpretation for the recognition applicationthat we address in this paper.

3.3.3 Relations

The set R contains fuzzy relations that give structure toexpressions by modeling the relationships between subex-pressions. These relations act on interpretations, allowing

123


Fig. 2 An expression in whichthe optimal relation depends onsymbol identity

context-sensitive statements to be made about recognition inan otherwise context-free setting.

The necessity of context-sensitive evaluation was men-tioned in Sect. 2 and is further illustrated by Fig. 2. Theexpression shown in the figure may be best interpreted as Ap

or AP depending upon the case of the P symbol. Denote theA symbol by t1 and the P symbol by t2. Let↗∈ R be a fuzzyrelation for diagonal spatial relationships and→ be similarfor horizontal adjacency relationships. Then we expect that

↗ ((t1, A), (t2, p)) >↗ ((t1, A), (t2, P))

and

→ ((t1, A), (t2, P)) >↗ ((t1, A), (t2, P)).

That is, the membership grade of a pair of interpretationsin a spatial relation should vary, depending on the expressions(i.e., the context) involved. Given explicit membership func-tions, we could evaluate whether or not these expectationsare met.

3.3.4 Terminal relation

The relation r� models the relationship between observablesand terminal symbols. In practice, it may be derived from theoutput of a symbol recognizer. For example, if a subset t ′ ofthe input observable was recognized as the letter b with, say,60 % confidence, then one could take r�((t ′, b)) = 0.6.

3.3.5 Productions

The productions in P are similar to context-free gram-mar productions in that the left-hand element derives thesequence of right-hand elements. The fuzzy relation r appear-ing above the⇒ production symbol indicates a requirementthat the relation r is satisfied by adjacent elements of theRHS. Formally, given a production A0

r⇒ A1 A2 · · · Ak ,if ti denotes an observable interpretable as an expressionei derivable from Ai (i.e., ei ∈ E Ai and (ti , ei ) ∈ I ),then for

⋃ki=1 ti to be interpretable as (e1 r · · · r ek) requires

((ti , ei ) , (ti+1, ei+1)) ∈ r for i = 1, . . . , k − 1.

3.4 Examples

The following examples demonstrate how fuzzy r-CFG pro-ductions may be used to model the structure of mathematicalwriting. We use five binary spatial relations: ↗,→, ↘, ↓,�. The arrows indicate a general writing direction, while �denotes containment (as in notations like

√x , for instance).

1. Suppose that [ADD] and [EXPR] are non-terminalsand + is a terminal. Then the production [ADD]

→⇒[EXPR]+[EXPR]models the syntax for infix addition:two expressions joined by the addition symbol, writtenfrom left to right.

2. The production [SUP]↗⇒ [EXPR][EXPR] models

superscript syntax. Interpreted as exponentiation, the firstRHS token represents the base of the exponentiation, andthe second represents the exponent. The tokens are con-nected by the↗ relation, reflecting the expected spatialrelationship between subexpressions.

3. The following pair of productions models the syntax ofdefinite integration:

[ILIMITS]↓⇒ [EXPR]

∫[EXPR]

[INTEGRAL]→⇒ [ILIMITS][EXPR]d[VAR]

Definite integrals are drawn using two writing directions.The limits of integration and integration sign itself arewritten in a vertical stack, while the integration sign, inte-grand, and variable of integration are written from left toright.

3.5 Semantic expression trees and textual output

Fuzzy r-CFG productions model the two-dimensional syn-tax of mathematical expressions. To represent mathemati-cal semantics, each production is also associated with rulesfor generating textual output and math expression trees withsemantic information. For example, consider again the pro-duction [ADD]

→⇒ [EXPR]+ [EXPR]. A rule to generatea MathML string would be written in our grammar format as<mrow>%1<mo>+</mo>%3</mrow>, where the “%n”notation indicates that the string representation of the nthRHS element should be inserted at that point. A rule togenerate a semantic expression tree would be written ADD(%1,%3). This rule would generate a tree with the rootnode labeled with addition semantics (“ADD”) and two sub-trees. Similarly to the string case, the %n notation indi-cates that the tree representation of the nth RHS elementshould be used as a subtree. Hence, the first child tree cor-responds to the left-hand operand of the addition expres-sion, and the second child tree corresponds to the right-handoperand.

3.6 The fuzzy set of interpretations

Because of ambiguity, there are usually several expres-sions that are reasonable interpretations of a particularinput observable t ∈ T (e.g., Ap and AP are both rea-sonable interpretations of Fig. 2). We represent all of the

123


interpretations of t as a fuzzy set It of expressions. Themembership function of It gives the extent to which thestructure of an expression matches the structure of t , asmeasured by r� and the other grammar relations. This setcan be thought of as a “slice” of the fuzzy set I of inter-pretations mentioned above; i.e., It = {e : (t, e) ∈ I }. Itremains to specify the membership function It (e) = I (t, e)concretely.

For cleaner notation, assume that the grammar produc-tions are in a normal form such that each production is eitherof the form A0 ⇒ α, where α ∈ � is a terminal symbol, orof the form A0

r⇒ A1 · · · Ak , where all of the Ai are non-terminals. This normal form is easily realized by, for eachα ∈ �, introducing a new non-terminal Xα , replacing allinstances of α in existing productions by Xα , and adding theproduction Xα ⇒ α.

The set It of interpretations of t is then constructed as fol-lows. For every terminal production p of the form A0 ⇒ α,define a fuzzy set I p

t = {α}, where I pt (α) = r� ((t, α)) and

I pt (β) = 0 for β = α.

For every production p of the form A0r⇒ A1 · · · Ak ,

define a fuzzy set (taking the union over all partitionsof t)

I pt =

⋃

t1∪···∪tk=t

I pt1,...,tk , (1)

where

I pt1,...,tk =

{(e1r · · · rek) : ei ∈ I Ai

ti , ((ti , ei ),

(ti+1, ei+1)) ∈ r, i = 1, . . . , k − 1}. (2)

There is room for experimentation when choosing themembership function of I p

t1,...,tk . The typical combinationrule in fuzzy systems is to take the minimum of all rele-vant membership grades. In the present context, this rule hasthe disadvantage that all parses sharing a low-scoring sym-bol or relation are assigned the same score. Zhang et al. [42]found that using multiplication when combining member-ship grades helped to prevent ties. We, therefore, computethe membership grade of an r -concatenation e = e1r · · · rek

in I pt1,...,tk as

I pt1,...,tk (e) =

(k∏

i=1

I Aiti (ei )

2|ei |−1k−1∏

i=1

r ((ti , ei ),

(ti+1, ei+1))

)1/(2|e|−1)

(3)

As an expression always contains exactly one more ter-minal than relation, this function assigns as an expression’smembership grade the geometric mean of the grades of allof its components. Geometric averaging preserves the tie-

breaking properties of multiplication while normalizing forexpression size.

Given all of the I pt , the fuzzy set of interpretations for a

particular non-terminal A ∈ N is

I At =

⋃

p having LHS A

I pt ,

and the fuzzy set of interpretations of an observable t is It =I St , where S is the start symbol.

An expression e is in It if t is interpretable as e and eis derivable from the start symbol S. The recognition prob-lem is therefore equivalent to the extraction of elements of It

(t being the user’s input) in decreasing order of membershipgrade.

4 Parsing fuzzy r-CFGs

There are two particular properties of fuzzy r-CFGs thatmake parsing difficult. Like other relational grammars, thelanguages they generate are multi-dimensional. This pre-vents the straightforward application of common parsingtechniques like Earley’s algorithm, which assume a simplyordered sequence of input tokens. Multi-dimensionality alsocomplicates the task of deciding which subsets of the inputmay contain a valid parse. Furthermore, because our inputis ambiguous, we require a parser to report all recognizableparse trees. Since there may be exponentially many trees,some effort is required to ensure a reasonable running time.This section presents two formal assumptions on the struc-ture of the relations of fuzzy r-CFGs and describes how toconstruct an efficient fuzzy r-CFG parser using those assump-tions. Detailed algorithms are omitted but may be found in atechnical report [27].

4.1 Representing It as a shared parse forest

In Eq. 2, each set I pt1,...,tk contains up to

∏i

∣∣∣I Ai

ti

∣∣∣ expres-

sions. It is therefore infeasible to identify or construct eachexpression individually before reporting parse results to theuser. This problem is similar to that of parsing ambiguouslanguages, in which the same input may be represented bymany parse trees. Indeed, the language of math expressions isambiguous even in the absence of syntactic fuzziness due tothe semantic ambiguities identified earlier. Parsing ambigu-ous languages is a well-studied problem; we adopt the sharedparse forest approach of Lang [18], in which all recognizableparses are simultaneously represented by an AND-OR tree.

123


Fig. 3 Shared parse forest for figure

For example, consider again the expression shown inFig. 1, along with the following toy grammar:

[ST]⇒ [ADD] | [TRM][ADD]

→⇒ [TRM]+ [ST][TRM]⇒ [MUL] | [SUP] | [CHR][MUL]

→⇒ [SUP][TRM] | [CHR][TRM][SUP]

↗⇒ [CHR][ST]

[CHR]⇒ [VAR] | [NUM][VAR]⇒ a | b | · · · | z[NUM]⇒ 0 | 1 | · · · | 9

Figure 3 depicts a shared parse forest representing somepossible interpretations of Fig. 1. In the figure, the boxedarrows are AND nodes. Those arrows indicate the relationthat links derived subexpressions. The ovals are OR nodesrepresenting derivations of non-terminals on particular sub-sets of the input. The circles relate subsets of the input withterminal symbols from the grammar. Simple productions ofthe form [CHR] ⇒ [VAR], for example, have been omit-ted for clarity. Any tree rooted at the [ST] node that hasexactly one path to each input element is a valid parse tree.This shared parse forest captures, for example, the expres-sions Ax + b, AX + 6, Ax + 6, AX tb, etc. If an expressionis incomplete (e.g., (x + y without the closing parenthesis),then no parse will exist for the correct interpretation. How-ever, other parses using different interpretations of the inputmay exist (e.g., lx + y or Cx ty).

Parsing a fuzzy r-CFG may be divided into two steps:forest construction, in which a shared parse forest is cre-ated that represents all recognizable parses of the input, andtree extraction, in which individual parse trees are extractedfrom the forest in decreasing order of membership grade. Wedescribe each of these steps in turn.

4.2 Shared parse forest construction

Because the symbols appearing in a two-dimensional mathexpression cannot be simply ordered, one would naively haveto parse every subset of the input in order to obtain all possi-ble parses. Similarly, recall from Eq. 2 that I p

t is constructedfrom a union taken over all partitions of t . It is intractable totake this union literally. To develop a practical parsing algo-rithm, we introduce constraints on partitions so as to limithow many must be considered. The constraints are based onthe two-dimensional structure of mathematical notation.

4.2.1 The ordering assumption and rectangular sets

Define two total orders on observables: <x orders observ-ables by minimum x-coordinate from left to right and <y

orders observables by minimum y-coordinate from top tobottom. We take the y axis to be oriented downward. Asso-ciate each relation r ∈ R with one of these orders, denotedord r. ord r is the dominant writing direction used for a par-ticular relation. For math recognition, we use ord→ =ord↗ = ord↘ = ord� =<x , and ord ↓ =<y .

Informally, we assume that each geometric relation r ∈ Ris embedded in either <x or <y . Thus, we may treat anygrammar production as generating either horizontal or verti-cal concatenations of subexpressions, making the partition-selection problem much simpler.

More formally, denote by mind t the element a ∈ t suchthat a <d b for all b ∈ t aside from a and define maxd tsimilarly.

Assumption 1 (Ordering) Let t1, t2 be observables, andlet e1, e2 be representable expressions. We assume thatr ((t1, e1), (t2, e2)) = 0 whenever maxord r t1 ≥ ord r

minord r t2.

The ordering assumption says that, for a parse to exist ont1∪t2, the last symbol of t1 must begin before the first symbolof t2 along the dominant writing direction of the expressionbeing parsed. For example, in Fig. 1, to parse Ax + b in theobvious way requires that the A begins before the x , and the+ begins after the x but before the b, when the symbols areconsidered from left to right (i.e., ordered by <x ).

Similarly, we could formulate a production for fractions

as[FRAC]↓⇒ [EXPR]—[EXPR]. Then to parse a fraction

would require that the bottom symbol of the numerator began

123


before the fraction bar, and the fraction bar began before thetop symbol of the denominator, when considered from top tobottom (i.e., ordered by <y).

Liang et al. [22] proposed rectangular hulls as a subset-selection constraint for two-dimensional parsing. A very sim-ilar constraint that we call rectangular sets is implied by theordering assumption.

Definition 2 (Rectangular set/partition) Call a subset t ′ of trectangular in t if it satisfies

t ′ ={

a ∈ t : minx

t ′ ≤ a ≤ maxx

t ′}

∩{

a ∈ t : miny

t ′ ≤ a ≤ maxy

t ′}.

Call a partition t1 ∪ · · · ∪ tk of a rectangular set t rectangularif every ti is rectangular in t .

From the definition of<x and<y , a set t ′ that is rectangu-lar in t must include all input elements in t whose left edgelies between the left-most left edge of an element in t ′ andthe right-most left edge and whose top edge lies between thetop- and bottom-most top edges of elements of t ′.

Proposition 1 Let t ∈ T be an observable, and let p be aproduction of the form A0

r⇒ A1 · · · Ak. Under the order-ing assumption, if (e1r · · · rek) ∈ I p

t1,...,tk , then the partitiont1 ∪ · · · ∪ tk of t is rectangular.

Proof Let d = ord r , and choose any ti . We must show that

ti ={

a ∈ t : minx

ti ≤x a ≤x maxx

ti}

∩{

a ∈ t : miny

ti ≤y a ≤y maxy

ti

}.

It is clear that ti is a subset of the RHS, so suppose thatthere is some a′ ∈ t in the RHS put into t j = ti by thepartition of t . If j < i , then((t j , e j ), (t j+1, e j+1)

), . . . , ((ti−1, ei−1), (ti , ei )) ∈ r.

By the assumption,

maxd

t j <d mind

t j+1 ≤ maxd

t j+1 <d · · · <d mind

ti .

But mind t j ≤d a′ ≤d maxd t j since a′ ∈ t j , andmind ti ≤d a′ ≤d maxd ti since a′ is in the RHS, somind ti ≤d a′ ≤d maxd t j , a contradiction. A similar con-tradiction can be obtained in the case where j > i . ��

Rectangular sets are the natural two-dimensional general-ization of contiguous substrings in one-dimensional stringparsing. This definition could be generalized to arbitrarydimension, giving “hypercube sets” of input elements.

Fig. 4 Expressions withoverlapping symbol boundingboxes

Following Liang et al, notice that any rectangular set u ⊆ tcan be constructed by choosing any four input elements in tand taking them to be represent the left, right, top, and bottomboundaries of the set. There are therefore O (|t |4) rectangularsubsets of t . If we instead naively parsed every subset of theinput, there would of course be 2|t | subsets to process. Theordering assumption, thus, yields a substantial reduction inthe number of subsets that must be considered for parsing.

Liang et al. define a rectangular hull of a set of input ele-ments to be their geometric bounding box, and they parseonly those sets whose rectangular hulls have a null intersec-tion. I.e., no bounding box of a subset selected for parsing canintersect that of any other parsed set. This formulation causesproblems for containment notations like square roots, as wellas for somewhat crowded or messy handwriting styles, whichour rectangular set formulation avoids. For example, considerthe square root expression on the left of Fig. 4. The rectan-gular hulls of the root symbol and its argument are shown assolid boxes and are the (union of) geometric bounding boxesof the symbols. Note that the rectangular hull of the squareroot symbol intersects (in fact contains) that of its contents.The argument cannot be separated from the operator intonon-intersecting hulls.

When considering input subsets as rectangular sets, we donot use the natural geometric bounding box of the strokes,as Liang et al. do. Instead, we take only the minimal x- andy-coordinates of each stroke and consider the bounding box(or rectangular hull) of those points. The “boundary” of theroot symbol in Fig. 4 is thus the single point at the top-leftof its bounding box, and the boundary of the rectangularset representing the argument 2π is shown as a dotted box.By using minimal coordinates instead of complete boundingboxes, the rectangular set boundaries do not intersect.

Similarly, in the expression on the right of Fig. 4, the rect-angular hull of opening parenthesis intersects the hull of the asymbol and that of the closing parenthesis intersects the hullsof both the 3 and the exponent 2. As before, the expressioncannot be partitioned such that the required subsets’ hulls arenon-intersecting. But the boundary of the rectangular set rep-resenting a−3 (indicated by a dotted box) extends only to theleft edge of the 3 symbol. The boundaries of the parenthesesand the exponent are, as for the root sign, single points at thetop-left corner of their bounding boxes. This small change—taking the left- and top-most coordinates of strokes as theirrepresentative points, “spaces out” overlapping writing andfacilitates non-intersecting partitions.

Figure 5 illustrates schematically the recursive rectangularpartitioning of an expression, following the expected parseof

∑n−1i=1

i2

n−i . The whole expression is a rectangular set. Thecentral dotted vertical line indicates a rectangular partition

123


Fig. 5 Recursive rectangularpartitions in an expression

into the sum symbol (with limits) and the summand. In thesummand, for example, the two horizontal dashed lines indi-cate a rectangular partition into the numerator, fraction bar,and denominator, and the numerator and denominator are fur-ther partitioned into rectangular sets, each containing a singlesymbol. Note that resulting boxes in the figure are meant toemphasize the hierarchical structure of the partition. They donot indicate the geometric bounding boxes of the rectangularsets.

4.2.2 Parsing algorithm

Using the restriction to rectangular partitions derived above,it is straightforward to adapt the CYK bottom-up parsingalgorithm so that is generates shared parse forests for fuzzyr-CFGs. Detailed algorithms are given in a technical report[27]. Experiments with the CYK approach showed that, whileall rectangular sets must be enumerated and parsed, relativelyfew actually contribute to valid parse trees. A similar obser-vation was made by Grune and Jacobs [15] in the case ofambiguous languages. The algorithm’s runtime, while pre-dictable, is always the worst-case time.

Instead of CYK, we, therefore, adapted to fuzzy r-CFGs,a tabular variant of Unger’s method for CFG parsing [38]. Inthis approach, we assume that the grammar is in the normalform described in Sect. 3.6. At a high level, the algorithmparses a production p on an input subset t as follows:

1. If p is a terminal production, A0 ⇒ α, then check if(t, α) ∈ r� . If so, add the parse to table entry (t, α);otherwise parsing fails.

2. Otherwise, p is of the form A0r⇒ A1 · · · Ak . For every

rectangular partition t1 ∪ · · · ∪ tk of t , parse each non-terminal Ai on ti . If any of the subparses fail, then failon the current partition. Otherwise, add the partition totable entry (t, A0). If parsing fails on every partition,then parsing fails on t .

One drawback of this algorithm is that its runtime is expo-nential in the size of the RHS of a grammar production, sinceCase 2 iterates over

( |t |k−1

)partitions. This bound is obtained

by sorting the input by<ord r and choosing k− 1 split pointsto induce a rectangular partition. In the worst case, then (i.e.,when parses exist on every partition), our algorithm mustconsider every grammar production on every rectangular set,giving a complexity of O (

n3+k pk), where n is the num-

ber of elements in the input observable, p is the number of

grammar productions, and k is the number of RHS tokensin the largest production. The extra factor of k arises fromwriting up to k partition subsets into a parse table entry inCase 2. Importantly, k can be controlled by the designer ofa grammar as a tradeoff between RHS size and number ofgrammar productions. For example, if the grammar is writ-ten in Chomsky Normal Form (CNF), then the complexity isO (

n5 p). Note that the general bound is asymptotically tight

to the worst-case size of the parse forest, since there maybe O (

n4 p)

table entries (counting each production as a dis-tinct entry), each of which may link to O (

nk−1k)

other tableentries.

Instead of writing our grammar in CNF, we allow arbitraryproduction lengths and use the following three optimizationsto reduce the number of partitions that must be considered.The first two are typical when using Unger’s method [15];the third is specific to fuzzy r-CFG parsing.

1. Terminal symbol milestones. The terminal symbol rela-tion r� may be used to guide partitioning. Suppose pis A0

r⇒ A1 · · · Ai−1αAi+1 · · · Ak , where α is a termi-nal symbol. Then r� must contain (ti , α) for a parse toexist on a partition t1 ∪ · · · ∪ tk . That is, given a parti-tion, any subset corresponding to a terminal symbol inthe grammar production must be recognizable as that ter-minal symbol. We, therefore, “seed” the parse table withsymbol recognition results and limit the enumeration ofrectangular partitions to those for which the appropriateterminal symbols are already present in the parse table.Such seeding also facilitates recognition of typeset sym-bols, which are simply inserted into the parse table withtheir known identities prior to invocation of the parsingalgorithm.

2. Minimum non-terminal length. If there are no empty pro-ductions in the grammar, then each non-terminal mustexpand to at least one terminal symbol. Moreover, given aminimum number of strokes required to recognize a par-ticular symbol (e.g., at least two strokes may be requiredto form an F), one can compute the minimal number ofinput elements required to parse any sequence of non-terminals A1 · · · Ak . These quantities further constrainwhich partitions are feasible.For example, consider the top expression in Fig. 1. Sup-pose we are parsing the production [ADD]

→⇒ [TRM]+[ST] and we know that the + symbol must be drawnwith exactly two strokes. Then [ADD] cannot be parsedon fewer than 4 input strokes, and the input must be parti-tioned into t1∪t2∪t3 such that |t1| ≥ 1, |t2| = 2, |t3| ≥ 1.Furthermore, from the previous optimization, t2 must bechosen so that (t2,+) ∈ r� . In this particular case, onlyone partition is feasible.

3. Spatial relation test. Just because an input subset canbe partitioned into rectangular sets does not mean that

123


those sets satisfy the geometric relations specified by agrammar production. Unfortunately, the grammar rela-tions act on expressions as well as observables, so theycannot be tested during parsing because expressions arenot explicitly constructed. As there may be exponen-tially many expressions, they cannot be constructed, andtherefore, we cannot evaluate the grammar’s spatial rela-tions, which vary with expression identity. Still, we canspeed up parsing by testing whether relations are satis-fied, which approximate the grammar relations.Namely, for each relation r ∈ R, we test the relation

r(t1, t2) ={

1 if ∃e1, e2 s.t. ((t1, e1), (t2, e2)) ∈ r

0 otherwise.

These relations will be specified more concretely inSect. 4.3.2 below. If r(ti , ti+1) = 0 for any pair of adja-cent sets in a rectangular partition, the partition need notbe parsed.

4.2.3 Incremental parsing

The parsing algorithm above may be used incrementallywithout any significant changes. Suppose that parsing is com-plete for some observable t = {a1, . . . , an}. We must handletwo cases: the addition of a new observable an+1 to t if theuser draws a new stroke and the removal of an observable ai

from t if the user erases a stroke.In the case where a new observable an+1 is added, every

existing entry of the parse table remains valid. We may sim-ply invoke the parser on the new input {a1, . . . , an+1}, and allexisting parse results will be reused. Note that, although exist-ing parses remain valid, they will not necessarily be reused.For example, suppose the expression a3 + b is changed toa3+2b by the insertion of a new stroke representing the num-ber 2. Then the parse of a3+b on the original three strokes isstill a valid interpretation of those strokes. But those strokesno longer form a rectangular set with respect to the new setof input strokes, so they will not be considered as a groupwhen the new input is parsed. However, the existing parsesof a3 and + remain individually valid, and they need onlybe combined with the new parse of 2b to form the entireexpression.

In the case where an existing observable ai is removed, thesituation is slightly more complicated. Any parse table entryfor an input subset t that includes ai becomes invalid and mustbe removed from the table. When the parser is invoked onthe revised input set {a1, . . . , ai−1, ai+1, . . . , an}, any exist-ing parse results that do not include the stroke ai will bereused.

This approach to incremental parsing works particularlywell when subsets of the input are represented by bit vectors.

Each input element is represented by a bit, where the firststroke drawn corresponds to the lowest-order bit and the mostrecent stroke to the highest-order bit. If a bit is set, thenthe corresponding input element is included in the subset,otherwise it is not.

Using this representation, when the first stroke is drawn,the parser might create an entry (1, A) in the parse table.After a second stroke is drawn, the bit vectors representingsubsets will contain two bits. But since the low-order bit cor-responds to the first stroke, accessing (01, A) is the same asaccessing the entry (1, A) that was created when the inputwas just one stroke.

Incremental parsing is useful for two main reasons. It per-mits a parser to process in the background as the user iswriting, helping to avoid long delays after the user has fin-ished writing the expression. It also allows for reporting ofintermediate parse results so that the user can tell whether theparser is going off-track and make any necessary correctionsto the output. The tree extraction technique described in thenext subsection extracts only complete parse trees (and thuscannot report partial results, like a+?, in which the right-handoperand has not yet been written). However, it is straightfor-ward to adapt a grammar so as to support partial expressions:one simply creates new non-terminals that derive prefixes orsuffixes of complete expressions.

4.3 Parse tree extraction

The links between entries in the parse table implicitly spec-ify a shared parse forest. It remains to extract individualparse trees from this forest in decreasing order of member-ship grade in the fuzzy set It of interpretations of the user’sinput t . To do so, we must explicitly evaluate the membershipgrades of particular parse trees and compare them to deter-mine which is highest, second highest, and so on. Given thelack of constraint on the grammar relations and the largenumber of parse trees in the forest, this may naively be avery time-consuming task.

Consider, for example, the problem of determining thehighest-ranked parse tree in the forest shown in Fig. 3. Callthe input subsets t1, t2, t3, t4 from left to right. We need tocheck whether → ((t1, A), (t2, x)) >↗ ((t1, A), (t2, x))to decide whether the first two symbols form a multiplica-tion or exponentiation. But those geometric relations maygive different results for different expression choices, sowe also need to check whether → ((t1, A), (t2, X)) >↗((t1, A), (t2, X)). It could be the case that one relationshiphas a higher grade when t2 is interpreted as x , and the otheris higher when it is interpreted as X . Indeed, such behav-ior is desirable to ensure correct recognition of ambiguouscases as illustrated in Fig. 2. We further need to combinethose geometric relation grades with the relevant terminalrelation grades, r�((t1, A)), r�((t2, x)), r�((t2, X)), to find

123


membership grades in It1∪t2 , and thereby determine the best-first ordering of subexpressions.

Because we have specified no constraints on the gram-mar relations, it is possible for two expressions e1, e2 to havevery low membership grades in It1 and It2 , but for the pairof interpretations (t1, e1), (t2, e2) to have sufficiently highmembership in a relation r that e1re2 is the highest-gradedinterpretation in It1∪t2 . To determine the highest-graded parsetree overall, one must examine every single parse tree rep-resented in the forest. In particular, at the parse forest noderepresenting the set I p

t1,...,tk of parses for production p =A0

r⇒ A1 · · · Ak on the partition t1 ∪ · · · ∪ tk of t , a total of∏k

i=1

∣∣∣I Ai

ti

∣∣∣ trees would need to be constructed and graded.

This quantity is exactly∣∣I p

t1,...,tk

∣∣, which is still less than the

total number∣∣∣I A0

t

∣∣∣ of parses of A0 on t . At a node higher up

the parse forest referencing parses of A0 on t , every tree inI A0t would need to be extracted and combined with sibling

trees. The number of trees that must be examined by thisnaive approach thus increases combinatorially as parse treedepth increases. Such an approach is obviously not feasibleif we wish to report parses to the user in real-time as (s)hewrites.

4.3.1 Relational classes

Recall from Sect. 3 that E is the set of all expressionsgenerated by the productions of a given fuzzy r-CFG. Wedefine several relational classes c1, . . . , cm , and assign everyexpression in E to at least one class. Thus, E = c1∪· · ·∪cm ,but the ci need not form a partition of E .

These classes play a role in our system similar to thatof syntactic or symbol classes in the approaches of Millerand Viola [30], Zanibbi et al. [41], and Rutherford [33]. Weinclude five relational classes for archetypal symbol shapes,which we extend by a special class, box, which represents allmulti-symbol expressions. Details are provided in the nextsection.

To facilitate efficient tree extraction, we assume that thegrammar relations depend only on expressions’ relationalclasses, not on their precise structures. In the example above,one might expect that

→ ((t1, A), (t2, x)) =→ ((t1, A), (t2, X)),

as x is a centered symbol and X is an ascender. One mightalso expect that

→ ((t1 ∪ t2), Ax ), (t3,+)) =→ ((t1 ∪ t2, Ax), (t3,+)).That is, the extent to which the first two symbols are judged

to be horizontally adjacent with the plus sign does not dependon whether one interprets them as a multiplication or an expo-nentiation.

Our selection of relational classes captures the intuitionthat symbol identities have a greater effect on subexpres-sion layout than subexpression identities have on the layoutof larger expressions. If such an intuition seems unreason-able, one is free to introduce relational classes correspondingto subexpression structures, so that, for example,→ ((t1 ∪t2, e), (t3,+))would vary depending on whether e was Ax orAx . One is not restricted, as in existing approaches, to syntac-tic classes that represent collections of individual symbols.The relational class approach, thus, provides control over thetradeoff between context sensitivity in grammar relations andexecution time.

Formally, denote by C the set of relational classes and bycl(e) the set of classes to which an expression e belongs. Eachgrammar relation may be viewed as a union of class-specificrelations:

r =⋃

c1,c2∈C

rc1,c2 ,

where rc1,c2 ((t1, e1), (t2, e2)) = 0 if c1 ∈ cl(e1) or c2 ∈cl(e2). Then, by the usual rules of fuzzy sets,

r ((t1, e1), (t2, e2)) = maxc1∈cl(e1)c2∈cl(e2)

rc1,c2 ((t1, e1), (t2, e2)). (4)

Using the formulation above, we constrain the grammarrelations by the following assumption:

Assumption 2 (Interchangeability) Let t1, t2 be observ-ables, let e1, e2, e1, e2 be representable expressions, and letr ∈ R be a relation. We assume for relational classes c1, c2

that

rc1,c2 ((t1, e1), (t2, e2)) = rc1,c2

((t1, e1), (t2, e2)

)

whenever c1 ∈ cl(e1), cl(e1) and c2 ∈ cl(e2), cl(e2).

Each production p, and hence each non-terminal symbolA, may be associated with sets cl(p), cl(A) of relational clas-ses in the natural way, so that, if A can derive an expressione via production p, then cl(e) ⊆ cl(p) ⊆ cl(A).

When constructing the parse table, each time a potentialparse of some production p is identified on an input subsett , the relational classes associated with p are noted in theparse table. Thus, when extracting trees, the potential classesof expressions that might be obtained from any node of theshared parse forest are known.

This formulation is similar in some ways to the scoringrules developed by Rhee and Kim [32] to represent all parsetrees with identical structure, but differing symbol identitiesby a single tree with indeterminate terminal symbols. Work-ing in a cost-minimization framework, they defined the costof a spatial relationship between two indeterminate symbolsas the minimum relationship cost between determinate sym-bols, taken over all recognized possibilities for the indeter-minate symbols’ identities. This definition is similar in form

123


to our calculation of a relation membership grade as the max-imum class-specific membership grade, taken over relationalclasses. In our language, Rhee and Kim assign each terminalsymbol to its own relational class.

4.3.2 Spatial relation test in parse forest construction

Recall from Sect. 4.2.2 that, during parse forest construction,we test whether a partition of the input is feasible for pars-ing by evaluating spatial relations that approximate the fuzzygrammar relations. Now we may define these approximaterelations more specifically, using relational classes.

A pair of observables (t1, t2) should have membershipgrade 1 in an approximate relation r if there exist expressionse1, e2 such that the pair of interpretations ((t1, e1), (t2, e2))

has non-zero membership in the corresponding grammarrelation r . Using relational classes, this is the same as askingwhether

maxc1,c2∈C

rc1,c2 ((t1, e1), (t2, e2)) > 0.

That is, do relational classes exist such that the interpre-tations will have non-zero membership in the grammar rela-tion, regardless of what the expressions are (since they areassumed to be interchangeable)? If so, we take r(t1, t2) = 1;if not, r(t1, t2) = 0.

Note that these r relations are used only in the forest con-struction stage. In the tree extraction algorithm given in thenext section, explicit expressions are constructed, so the truegrammar relations are used based on each particular expres-sion’s relational classes.

4.3.3 Efficient tree extraction using relational classes

Consider again the problem of determining the most highly-graded parse of an input t . The most highly-graded parse ofa non-terminal A on t is just the most highly-graded parse,taken over all productions with LHS A. Similarly, the mosthighly-graded parse of a production p on t is the most highly-graded parse, taken over all partitions of t on which potentialparses of p were found. These partitions are noted in theparse forest by our variant of Unger’s method.

To find the most highly-graded parse of p on a particularpartition t1∪· · ·∪ tk of t , suppose that p is of the form A0

r⇒A1 · · · Ak . (If p is of the form A0 ⇒ α for a terminal α ∈ �,then the problem is trivial.) If any table entry (ti , Ai ) containsonly one relational class, then the most highly-graded parsemust include the most highly-graded interpretation in I Ai

ti ,because under the interchangeability assumption, choosinga lower-graded interpretation cannot increase the grade ofmembership in r , hence it must decrease the overall mem-bership grade in I p

t . In such a case, we can simply recursively

extract the most highly-graded parse of Ai on ti and paste itinto a parse tree.

More generally, if a table entry (ti , Ai ) contains severalclasses, then the most highly-ranked interpretation of eachclass must be considered. Thus, the membership grade in It

must be evaluated for at most∏

i |Ci | interpretations, whereCi is the set of relational classes that were noted at table entry(ti , Ai ) during parsing. Thus, when interpretations withina given table entry share relational classes (as they usuallywill for a reasonable selection of classes), significantly fewerinterpretations need to be extracted and graded than in thenaive approach to tree extraction discussed at the outset ofthis subsection.

Note that, because grammar relations are binary, two tableentries (ti , Ai ) and (t j , A j ) become independent if there issome � between i and j such that table entry (t�, A�) containsonly one relational class. In such cases, the relational classesat cells (ti , Ai ) and (t j , A j ) need not be varied with respectto each other, reducing the number of evaluations needed.

To find the most highly-graded parse of Ai on ti that hasrelational class c, one need to only know which produc-tions of Ai generate expressions of that class and recursivelyfind the most highly-graded parse among those productions.Notice that |ti | < |t |, so no infinite recursion is possible.

To extract the top-ranked parse of the entire input, there-fore, one visits every parse table cell in bottom-up order,determining the highest-ranked local parse at each one. Con-sidering different productions with the same LHS as differentparse table cells, there are at most n4 p cells, where n is thenumber of input elements and p is the number of grammarproductions. The forest construction algorithm identifies atmost

( nk−1

)distinct rectangular partitions on which to parse

p, where k is the number of right-hand tokens in the largestproduction. Using relational classes,

∏i |Ci | interpretations

must be considered per partition, as described above. This isbounded naively by ck , where c = |C | is the total numberof classes. Note that, since the approach proceeds bottom-up, the top-ranked interpretations from partition subsets areavailable in constant time, as they have already been com-puted and stored. Still, it takes time O (k) to extract them,combine them into a larger interpretation and compute itsmembership grade. Thus, obtaining the top-ranked parse treetakes worst-case time O (

n3+k pckk). This is the same com-

plexity as the forest construction algorithm with an additionalfactor of ck arising from the relational classes. Importantly,both c and k are determined by the grammar designer, allow-ing a tradeoff between flexibility and speed. If all expressionsare considered to be in the same relational class (i.e., if rela-tional membership functions are context-insensitive), thenc = 1, and this extra factor disappears.

The above process may be generalized to solve the prob-lem of finding the mth-most highly-graded parse, given thatall of the m−1 more-highly-graded parses are known. To find

123


the mth parse of a non-terminal A, we maintain a priorityqueue of known parses. When the (m−1)st parse is extracted,it was obtained as, say, the j th parse from a particular pro-duction p. Thus, to find the mth parse, we extract the ( j+1)stparse from p, add it to the queue, and pop the most highly-ranked parse out of the queue. A similar process may be usedto extract the mth-most highly-graded parse from a particularproduction (selecting over partitions), and from a particularrelational class (selecting over productions).

A slightly more complicated process is required to findthe nth-most highly-graded parse of a production A0

r⇒A1r · · · r Ak on a particular partition t1 ∪ · · · ∪ tk of t .Again, we maintain a priority queue of known parses.During extraction of the most highly-ranked parse, eachinterpretation whose membership grade in It was evalu-ated is added to the queue, and only one—the most highlyranked—is removed. Inductively, then, the m − 1st parsewas an r -concatenation of the ji th parses of each Ai onti of class ci , for i = 1, . . . , k. Denote this parse by( j1, . . . , jk), where the integer at each position indicatesthe ranking of the subexpression at that position in the con-catenation. By the interchangeability assumption, the nextmost highly-ranked parse using the same selection of rela-tional classes must be one of ( j1 + 1, j2, . . . , jk), ( j1, j2 +1, j3, . . . , jk), . . . , ( j1, . . . , jk−1, jk + 1). Therefore, themembership grades of all k of these expressions are eval-uated, and the expressions are added to the priority queue.The mth-most highly-ranked parse overall is simply poppedoff of the queue and its provenance noted, as in the previouscase.

In this case, tree extraction is much faster than for thetop-ranked tree. That case requires the top-ranked tree tobe determined for each cell in the parse table, whereas,when finding the mth-most highly-ranked parse, only thosetable cells and relational classes explicitly used to form the(m − 1)st most highly-ranked parse must be considered.There are O (np) such table cells. (O (n) for the numberof nodes in the parse tree which have more than one child,and O (p) to account for trivial productions of the formA ⇒ B.) At the O (n) table cells arising from non-triv-ial productions, at most k new candidate interpretations arecreated and evaluated, each requiring the extraction of onlyone subinterpretation. Extraction, thus, has worst-case costO˜(np + nk), where the so-called “soft-O” notation O˜(∗)ignores the logarithmic factors arising from priority queueoperations.

4.4 Handling user corrections

As mentioned in the introduction, we wish to provide to usersa simple correction mechanism so that they may select theirdesired interpretation in case of recognition errors or multipleambiguous interpretations. Our recognizer facilitates such

Fig. 6 Interface for displaying and selecting alternative interpreta-tions. Different subexpressions have been selected for alternatives;clockwise from top-left: the whole expression, the first two symbols,and the last symbol

corrections by allowing locks to be set on any subset of theinput.

Two types of locks are supported:

1. Expression locks, which fix the interpretation of a par-ticular subset of the input to be a specified expression,and

2. Semantic locks, which force interpretations of a partic-ular subset of the input to be derived from a specifiednon-terminal.

Both of these lock types are useful in different circum-stances. For example, suppose a user draws the top expressionin Fig. 1, meaning the expression AX + b, but the first resultreturned by the recognizer is Ax tb. In our system, the usercan correct the result to an addition, Ax + b. This applies asemantic lock to the entire input, requiring all derived expres-sions to be additions. The user could then correct the x to X ,applying an expression lock to the corresponding ink subset.The graphical interface used in MathBrush for corrections isshown in Fig. 6.

Consider extracting an expression tree from input t . If tis locked to an expression e by an expression lock, then weconsider It to contain only one element, e, with membershipgrade 1. If t is locked to a non-terminal AL by a semantic lock,then we take I A′

t to be empty for all non-terminals A′ = AL ,except those for which a derivation sequence A′ ⇒∗ AL

exists, in which case we take I A′t = I AL

t .

5 Computing the grammar relations

The fuzzy r-CFG formulation requires the calculation of twotypes of fuzzy relations: the terminal symbol relation r� andthe geometric relations in R. This section outlines the symbol

123


Fig. 7 Multiple strokes may becombined into one

and relation classification subsystems from which our mathrecognizer obtains membership grades for those relations.

5.1 Terminal symbol relation

The symbol classification subsystem receives strokes as inputone at a time as they are drawn by the user. Strokes may bemerged if they tend in the same direction and their ends aresufficiently close together. For example, in Fig. 7, the squareroot sign and fraction bar are each drawn with two strokesthat would be merged together.

The input strokes are maintained in a list. The first ele-ment of the list is the first stroke to be received as input. Astroke’s successor in the list is the stroke that is nearest toit that has not already appeared in the list. There is no reli-ance on temporal information, so writing or editing order hasminimal impact on classification results. Special processingis invoked for strokes that appear to be dots: they are placedafter the stroke with which a combination of horizontal andvertical distance measurements is minimized.

Next, groups of strokes that may correspond to distinct ter-minal symbols are identified. Two types of candidate groupsare identified by extracting contiguous sublists from the listof strokes. First, proximity groups are extracted and scoredbased on a combination of stroke proximity and boundingbox alignment. Then stacked groups are constructed by iden-tifying collections of proximity groups arranged in verticalstacks; they capture such symbols as ≤,±,≡, etc.

Model-based symbol recognition is performed on eachcandidate group in a two-step process. Feature-based match-ing is used first as a pruning step: features such as first andlast point, width, height, and arc length are extracted fromthe candidate group. They are compared with correspondingfeatures of model symbols, and the numerical differences aretallied in a weighted sum.

Those models with a sufficiently small feature differenceare then compared with the candidate stroke group using afast variant of elastic matching distance [26]. Based on com-parison of stroke-based features, the input strokes may bereordered and/or individually reversed so as to obtain thesmallest match distance. The elastic matching distances areinverted so that small distances correspond to large scoresand then weighted by the appropriate group score (proxim-ity group score for most symbols, stacked group score forstack-based symbols, as described above).

Many symbols share similar-looking strokes or subsym-bols. For example, the symbol E “contains” the symbol F.

Because symbols with more strokes are drawn with morevariability, they typically are scored more poorly by our rec-ognizer. We, therefore, augment the scores of larger symbolsby a weighted sum of the scores of similar-looking smallersymbols. The weighting coefficients are automatically deter-mined ahead of time by measuring symbol-to-symbol sim-ilarity among the models in the symbol database using theelastic matching algorithm.

Finally, the augmented scores are normalized so that thehighest-scoring symbol has a score of one. These final scoresgive the membership grades for the fuzzy relation r� .

5.2 Geometric relations

The geometric relation membership functions are based onbounding box geometry. In our discussion in Sect. 4 of rect-angular sets, we used a definition of bounding boxes basedon the orderings<x and<y . In this section, we are concernedwith actual expression geometry, so we consider the naturaldefinition of a bounding box. That is, a stroke’s bounding boxis the smallest axis-aligned rectangle that completely coversthe stroke.

The membership function for the containment relationonly considers the amount of overlap between the bound-ing boxes of t1 and t2. We define the overlap proportion,ol(t1, t2), as the area of the bounding boxes’ intersectiondivided by the area of the smaller box. The membership func-tion for the containment relation is � ((t1, e1), (t2, e2)) =ol(t1, t2).

The membership functions for the other geometric rela-tions incorporate the distance and angle between the bound-ing boxes of the observables. They are all of the form

r ((t1, e1), (t2, e2))

= θ ((t1, e1), (t2, e2))×d (t1, t2)×(

1− 1

2ol (t1, t2)

),

where θ is a scoring function based on angle and d is a dis-tance-based penalty function.

The penalty function d ensures that observables satisfyingthe relations are within a reasonable distance of one another.To compute d, the size of each of t1 and t2 is calculated as theaverage of bounding box width and height. Next, a distancethreshold t is obtained as half the average of the two sizes,clamped to the range [1/6, 1/3] (measured in inches). If thedistance between the bounding boxes is less than t , thend = 1. Otherwise, d decreases linearly to zero at = 3t .

The points between which the angle is measured to com-pute θ are selected based on the relation r and the relationalclass of the expression e2. The x coordinate is chosen cen-trally, but the choice of y coordinate varies. For the↓ relation,it is chosen centrally. Table 1 summarizes how it is chosen

123


Table 1 Relational classes and measurement point selection

Rel. class Ex. symbol(s) y coordinate

Box (none) top(t2)+bottom(t2)2

Default A, +, (,∫,∑,√ , —,.

Baseline x,σ top(t2)+ height (t2)/10

Descender q,ρ top(t2)+ height (t2)/20

Half-ascender i,t top(t2)+ height (t2)/3

j j top(t2)+ height (t2)/4

Fig. 8 θ function component of relation membership grade

Table 2 Angle thresholds for geometric relation membership functions

Relation ϕ0 ϕ1 ϕ2

→ −90 0 90

↗ −90 −37.5 0

↘ −20 35 160

↓ 0 90 180

for the other relations (→ ,↗ , and ↘ ), as well as listingthe relational classes used in our system. These choices, aswell as the selection of threshold values below, were madethrough experimentation on a small dataset unrelated to thoseused in the evaluation section.

Given those measurement points, we measure the angle ϕbetween them relative to the x axis. The angle-based functionθ is triangular with three parameters ϕ0, ϕ1, ϕ2, as follows:

θ(ϕ) =

⎧⎪⎪⎪⎪⎨

⎪⎪⎪⎪⎩

0 if ϕ < ϕ0ϕ−ϕ0ϕ1−ϕ0

if ϕ0 ≤ ϕ ≤ ϕ1ϕ2−ϕϕ2−ϕ1

if ϕ1 ≤ ϕ ≤ ϕ2

0 if ϕ > ϕ2.

Figure 8 indicates the functions’ behavior schematically.Table 2 lists the thresholds for each relation, in degrees.As mentioned above, these values were selected based onmanual examination of a small dataset. Note that the y-axisincreases downward.

6 Evaluation

We evaluated the accuracy of our math recognizer exper-imentally using two publicly available databases of hand-drawn math expressions. For the first evaluation, we used anexpression corpus developed by our research group at theUniversity of Waterloo [24,25]. For the second, comparativeevaluation, we obtained the data set and marking scripts usedat the recent CROHME 2011 math recognition competition[31]. By using the CROHME data, we may compare our sys-tem against the four recognizers which participated in thecompetition, as well as a baseline recognizer developed byone of the competition organizers.

6.1 Evaluation on the Waterloo corpus

Many of the roughly 4,500 legible expressions in our corpusinclude mathematical notations not currently supported byour parser (e.g., set notation, multi-symbol variable names).Our test set thus contained 3,672 expressions from 20 writ-ers, of which 53 expressions were common to all writers,and the remainder were randomly generated for each writer.Since this corpus was also intended to provide examples ofindividual symbols and the geometric relationships betweensymbols, it contains a large number of expressions with onlyone or two symbols. We used all of the expressions contain-ing three or fewer symbols as training data for our symbolrecognizer, as described in the methodology section below.In all, our grammar contains 49 non-terminals and 104 termi-nal symbols organized into 176 productions. The grammaris given in its entirety as Appendix A.

6.1.1 Correction count metric

Devising objective metrics for evaluating the real-worldaccuracy of a recognition system is difficult. Several authorshave proposed accuracy metrics (e.g., [8,14,19,33,41])based on implementation details specific to their particularrecognizers. It is therefore difficult to directly compare theirevaluation results to one another, or to apply their methodol-ogies to our evaluation.

Recently, though, some authors have proposed some rec-ognizer-independent evaluation schemes that rely only onthe output of a recognizer and treat its implementation as ablack box. Awal et al. [3] proposed an approach in whichthe accuracy of symbol segmentation, symbol recognition,relation, and expression recognition are reported separately.A positive feature of this approach is that it makes clear,in the case of incorrect expression recognition, which rec-ognizer subsystem failed, and a version of it was used forthe CROHME competition (no relation accuracy measure-ments were used). Sain et al. [34], following the intuition ofGarain and Chaudhuri [14] that errors far from the dominant

123


baseline are less important than those on or near the mainbaseline, proposed a scheme called EMERS. In it, the editdistance between parse trees representing recognized andground-truth expressions measures the accuracy of a recog-nition result. Edits are weighted so that those more deeplynested in the parse tree have less cost.

These approaches are both valuable in that they are easilyimplemented, permit objective comparisons between recog-nizers, and provide valuable feedback to recognizer develop-ers. But they both measure the amount by which a recognizedexpression deviates from its ground-truth and do not considerwhether the ground-truth was achievable by the recognizer atall. If a recognizer consistently fails to recognize a particularstructure or symbol, the deviation from ground-truth may besmall, even though the recognizer is effectively useless forthat recognition task.

Our recognizer was designed for the MathBrush pen-based mathematics system and is intended for real-time,interactive use by human writers. As such, we believe that auser-oriented accuracy model provides the best way to assessits performance. So as well as asking “Is the recognition cor-rect?”, we want to ask not “How many symbols were wrong?”or “How many transformation steps give the correct answer”,but “Is the desired result attainable?”, and “How much effortmust the user expend to get it?” To a user, it is not necessarilythe case that an error on the main baseline (say, recognizinga + b as atb) is more incorrect than one on a secondarybaseline (say, recognizing n21−ε

as n2l−ε ).To answer these questions, we count the number of correc-

tions that a user would need to make to a recognition resultin order to obtain the correct parse. If an input is recognizedcorrectly, then it requires zero corrections. Similarly, if it isrecognized “almost correctly”, it requires fewer correctionsthan if the recognition is quite poor. This metric is gener-ally applicable to any recognition system, though it clearly isintended to be used with systems providing some correctionor feedback mechanism. One could similarly navigate therecognition alternatives provided by Microsoft’s math rec-ognizer, for instance, count the number of required correc-tions and obtain comparable measurements. Our evaluationscheme, thus, provides an abstract way to compare the per-formance of recognition systems without direct reference totheir implementation details.

Liu et al. [23] also devised a user-based correction costmodel for evaluating a diagram recognition system. Theymeasured the physical time required to correct the errors inrecognized diagrams. This approach would also be useful forevaluating our system, but the size of the expression corpusmakes it impractical to manually test and correct the recog-nition results.

Instead, we have automated the evaluation process. Wedeveloped a testing program that simulates a user interacting

with the recognizer. The program passes ink representing amath expression to the recognizer. Upon receiving the rec-ognition results, the program compares them to the ground-truth associated with the ink. If the highest-ranked result isnot correct, then the testing system makes corrections tothe recognition results, as a user would, to obtain the cor-rect interpretation. That is, the system browses through listsof alternative interpretations for subexpressions or symbols,searching for corrections matching the symbols and struc-tures in the ground-truth. It returns the number of correctionsrequired to obtain the correct expression.

Algorithm 1 outlines this process. Our recognizer uses a“context” to refer to any node in the shared parse forest. So, asthe algorithm descends into an expression tree, it can requestalternatives for any particular subexpression and having anyparticular semantics. For example, if an expression intendedto be a + c was instead recognized as a + C , the algorithmwould detect the correct top-level structure and the correctexpression on the left side of the addition sign. On the rightside, it would request alternatives “in context”; that is, usingonly the ink related to the c symbol and being feasible forthe right side of an addition expression, as determined by thegrammar.

Algorithm 1 cc(g, e): Count the number of correctionsrequired to match recognition results to ground-truth.Require: A recognizer R, a ground-truth expression g, and the first

recognition alternative e from R.if e = g then

return 0errorhere← 0 // Indicates whether an error appears in the top levelof the expression treewhile e has different top-level semantics from g or uses a differentpartition of the input do

errorhere← 1e← the next alternative from R in this contextif e is null (R provides no more alternatives) then

return ∞for each pair of subexpressions ei , gi of e, g do

ni ← recurse on R, gi , eiif ni = ∞ then

return ∞return errorhere +∑

ni

The correction count produced by this algorithm is akinto a tree edit distance, except that it is constrained so thatthe edits must be obtainable through the recognizer’s output.They cannot be arbitrary terminal or subtree substitutions.

6.1.2 Methodology

Although the correction count metric provides accuracyscores for both correct and incorrect recognition results, thereare some types of recognition errors that it cannot account for.

123


For example, if an expression is recognized correctly exceptfor one symbol, for which the correct alternative is not avail-able, then the correction count will be ∞, even though theexpression is “nearly correct”.

To reduce the number of these types of results, we testedthe recognizer in two scenarios. In the first, called default, wedivided the 3,672 corpus transcriptions into training and test-ing sets. The training set contained all 1,536 transcriptionshaving fewer than four symbols. The remaining 2,136 tran-scriptions formed the testing set. These transcriptions con-tained between four and 23 symbols each, with an averageof seven. All of the symbols were extracted from the trainingset and used to augment our pre-existing symbol database.The pre-existing database contained samples of each symbolwritten by one to three writers and is unrelated to the evalu-ation data set. It was used to ensure baseline coverage overall symbol types.

The second scenario, called perfect, evaluated the qualityof expression parsing in isolation from symbol recognition.In it, the same testing set of 2,136 transcriptions was used, butthe terminal symbol relation was evaluated so as to exactlymatch ground truth. There were no alternatives for the parserto choose between, and no ambiguities in stroke grouping.This scenario, thus, represents a “best possible world” forthe parser and gives an upper bound on its real-world perfor-mance.

Each transcription used for testing may be placed into oneof the following four categories:

1. Correct: No corrections were required. The top-rankedinterpretation was correct.

2. Attainable: The correct interpretation was obtained fromthe recognizer after one or more corrections.

3. Incorrect: The correct interpretation was not obtainedfrom the recognizer.

4. Infeasible: The symbol recognizer failed to provide thecorrect symbol candidates to the parser, preventing cor-rect recognition.

Note that the “incorrect” and “infeasible” categories bothrefer to incorrectly recognized transcriptions. Using two cat-egories allows us to distinguish between those transcriptionsfor which the correct expression structure was not identifiedat all (“incorrect”) and those for which the correct structuremay have been identified, but correct recognition was impos-sible because one or more symbols did not have the correctcandidate available for the parser to choose (“infeasible”).Our symbol classifier distinguishes between over 100 sym-bols, but is significantly less accurate than the relation clas-sifier. As such, it is useful to divide incorrectly recognizedtranscriptions into two categories, so as to know which of thetwo subsystems caused the failure.

Fig. 9 Categorization of transcriptions during evaluation

Fig. 10 Average correction count during evaluation

6.1.3 Accuracy evaluation

Figure 9 summarizes the categorization of the testing set inboth scenarios. Figure 10 shows the correction count in eachscenario, averaged over correct and attainable transcriptions.The correction count is divided into terminal corrections (i.e.,corrections that only change terminal symbol identity) andstructural corrections (i.e., all others).

Overall, about 19 % of the transcriptions were recognizedcorrectly in the default scenario, compared with about 83 % inthe perfect scenario. 67 % of the transcriptions were attain-able in the default scenario with about 1.2 corrections onaverage, while over 98 % were attainable in the perfect sce-nario, with about 0.2 corrections on average. If we consideronly feasible transcriptions (those for which the correct sym-bol identities were detected, but not necessarily top-ranked,by the symbol classifier), then 27 % of the transcriptions wererecognized correctly, and 98 % were attainable in the defaultscenario. (The figures for the perfect scenario are unchanged,as all transcriptions were feasible in that case.) This attain-ability rate is very close to the one attained in the perfectscenario, but the correct rate is still quite low, reflecting thedifficulty of simultaneously resolving all symbol classifica-tion ambiguity perfectly.

When an expression was attainable, it rarely required morethan a few corrections. The top graph in Fig. 11 shows howmany expressions required various numbers of corrections inthe default scenario. The bottom graph shows the same fre-quency measurement for the perfect scenario. This indicates

123


Fig. 11 Expression frequency for correction counts in the default (top)and perfect (bottom) scenarios

that it would usually be fairly simple and quick for a userto manually correct recognition errors and obtain the correctresults.

Aside from symbol classification errors, there were twomain reasons why transcriptions were classified as incorrect.

1. Violation of the ordering assumption. Although theordering assumption of Sect. 4 was motivated by thegeneral block-based structure of math notation, it wasoccasionally violated by transcriptions in our corpus.In the left expression of Fig. 12, the right parenthesisextends to the left beyond the left edge of the “2” symbol.When ordered by <x , the parenthesis therefore comesfirst. Assuming correct symbol recognition, the parsertherefore sees, from left to right, the subsequence (9.)2,preventing correct recognition of the parenthesized num-ber. Violations of the ordering assumptions were alsosometimes caused by more reasonable-looking expres-sions, as in the right expression of the figure. Here, theC symbol begins to the right of the y symbol. Again,this causes the C to be considered by the parser to comestrictly after the y and prevents the lower range of thesummation from being considered during the parser’srecursive process of input subdivision.

2. Geometric relation failure. When symbol or subexpres-sion bounding boxes are arranged such that the dis-tance or angle between them falls outside of the rangesdescribed in Sect. 5, then the arrangement is assigned amembership grade of zero. Such arrangements are never

Fig. 12 Expressions violating the ordering assumption

Fig. 13 Expressions for which relation classification failed

attainable, which sometimes caused errors. Again, theseerrors were often caused by messy writing and incorrecttranscriptions (e.g., the left expression in Fig. 13 wasintended to contain the subexpressions qY and jX , but thewriting appears to read qy and j x), but were also causedby reasonable transcriptions (e.g., in the right expression,the subexpression

∑� − ∫

Ddy was intended to be asuperscript of b, but was not recognized as such due togeometric relation failure).

6.1.4 Performance evaluation

The worst-case runtime estimates given in Sect. 4 are quitepessimistic. In actual use, the time required to parse an inputand report results depends on many factors that are diffi-cult to quantify generally, including the number of distinctrectangular subsets of the input, the number of subset pairswhich have non-zero membership grade in relations (and inhow many relations they have such scores), the amount ofambiguity in the symbols used and in the placement of thestrokes making them up, and so on. It is therefore interestingto investigate how parse table size varies with input size andto, thus, obtain empirical estimates of the parser’s behaviorin realistic cases.

We first consider the number of rectangular subsets of theinput actually used by the parser. (Recall that, in the worstcase, there are

(n4

) = O (n4

)distinct rectangular subsets.)

The graphs in Fig. 14 show log–log plots of the numberof subsets considered with respect to the number of strokesappearing in the input. The default scenario is shown on thetop, and the perfect scenario on the bottom. Based on thisdata, the parser explored on the order of n2.9 subsets, onaverage, in the default scenario, and n2.2 subsets in the per-fect scenario. This second figure mirrors findings by Lianget al. [22] in their work on rectangular hulls.

The number of parse table cells depends primarily onthe grammar (which is fixed throughout this evaluation), thenumber of rectangular subsets, and the number of pairs ofsubsets that are members of the fuzzy geometric relations.Figure 15 shows log–log plots of the number of parse tablecells with respect to the number of input strokes. As before,

123


Fig. 14 log–log plots of number of rectangular subsets w.r.t. input size

Fig. 15 log–log plots of number of parse table cells w.r.t. input size

the top graph corresponds to the default scenario, while thebottom graph corresponds to the perfect scenario. In thedefault scenario, the parse table contained on the order ofn2.1 cells on average, while it contained n1.7 in the perfectscenario. It is interesting that the exponents are lower whencounting parse cells than when counting subsets, indicating

Fig. 16 log–log plots of number of parse table links w.r.t. input size

that not all of the rectangular subsets explored are involvedin any parses. This is possible because the forest constructionalgorithm from Sect. 4 parses a production A0

r⇒ A1 · · · Ak

on a partition t1 ∪ · · · ∪ tk by first parsing A1 on t1, then A2

on t2, and so on, until it reaches the end of the production, ora subparse fails. So some of the subsets ti may be exploredwithout contributing to any useful parse results. It is possiblethat a heuristic more sophisticated than the rectangular setrestriction would reduce the amount of such wasted effort.At the same time, the small size of the parse table relative tothe number of subsets available for parsing shows that ourparser does a good job of ignoring unproductive subsets sincethey are identified early in the parsing process.

Finally, we consider the number of links between cellsin the parse table (represented by sets of arrows emanatingfrom the AND nodes in Fig. 3). The number of alternativeinterpretations the tree extraction algorithm must consider isdirectly linked to the number of inter-cell parse table links.The top graph in Fig. 16 indicates how link count scales withinput size in the default scenario, and the bottom graph showsthe same measurements for the perfect scenario. As before,we use this data to estimate that, on average, the parse tablecontains on the order of n2.4 links in the default scenarioand n1.7 in the perfect scenario. These figures indicate thatthe parse graph is typically quite sparsely linked. That is, theparser identifies only a small number of ways to parse a givenproduction on a given input subset. Intuitively, this makessense: to parse a production like [EXPR] + [EXPR], onemust have a symbol that looks like a plus sign, surrounded

123


Table 3 Evaluation onCROHME 2011 corpus

Recognizer Stroke reco. Symbol seg. Symbol reco. Expressionreco.

Part 1 of corpus

MathBrush 71.73 84.09 85.99 32.04

Rochester Institute of Technology (USA) 55.91 60.01 87.24 7.18

Sabanci University (TR) 20.90 26.66 81.22 1.66

Universitat Politècnica de València (ES) 78.73 88.07 92.22 29.28

Institute for Language and Speech 48.26 67.75 86.30 0.00

Processing (GR)

IRCCyN-IVC (FR) 78.57 87.56 91.67 40.88

Part 2 of corpus

MathBrush 66.82 80.26 86.11 20.11

Rochester Institute of Technology (USA) 51.58 56.50 91.29 2.59

Sabanci University (TR) 19.36 24.42 84.45 1.15

Universitat Politècnica de València (ES) 78.38 87.82 92.56 19.83

Institute for Language and Speech 52.28 78.77 78.67 0.00

Processing (GR)

IRCCyN-IVC (FR) 70.79 84.23 87.16 22.41

by expressions, all plausibly horizontally adjacent. For mostwritten addition expressions, there will be few reasonableways of breaking up the input to satisfy these constraints.That the parse table links reflect this fact illustrates the effi-cacy of the terminal symbol milestone and spatial relationtest heuristics built into our parser.

6.2 Evaluation on the CROHME 2011 corpus

A recent math recognition competition compared the accu-racy of five recognizers on the CROHME 2011 corpus [31].To compare these five recognizers to our own, we acquiredthe corpus data and associated marking scripts and evaluatedthe MathBrush recognizer on the competition data.

The data are divided into two parts, each of which includestraining and testing data. The first part includes a rela-tively small selection of mathematical notation, while thesecond includes a larger selection. For details, refer to thecompetition paper [31]. According to competition organiz-ers, the recognizers were typically not trained exclusively onthe training data. We, therefore, trained our symbol recog-nizer on the same data as in the previous evaluation, aug-mented by all of the symbols appearing in the appropriatepart of the CROHME training data. For recognition, we usedthe grammars provided in the competition documentation byconverting them into the file format recognized by our parser.

The accuracy of our recognizer was measured using a perlscript provided by the competition organizers. In this eval-uation, only the top-ranked parse was considered. There arefour accuracy measurements. Stroke reco. indicates the per-centage of input strokes that were correctly recognized and

placed in the parse tree. Symbol seg. indicates the percent-age of symbols for which the correct strokes were properlygrouped together. Symbol reco. indicates the percentage ofsymbol recognized correctly, out of those correctly grouped.Finally, expression reco. indicates the percentage of expres-sions for which the top-ranked parse tree was exactly cor-rect (corresponding to a “correct” result in our classificationscheme for the previous evaluation).

The results are shown in Table 3, which also reproduces forcomparison purposes the updated competition results appear-ing on the CROHME website.1 Note that the IRCCyN-IVCrecognizer was developed by one of the competition orga-nizers and did not directly participate in the competition.Our MathBrush parser obtained higher expression recogni-tion rates than the competition participants, though they werestill lower than those of the IRCCyN-IVC recognizer. How-ever, our symbol segmentation and, in particular, symbol rec-ognition rates were not as high as those of some competitionparticipants. These lower-level subsystems would thereforebe promising areas on which to focus in future work, so asto improve our overall recognition rates.

7 Conclusions and future work

We have described a recognition system for handwrittenmathematical notation, which reports all recognizable inter-pretations of users’ writing and allows users to repair incor-rect symbols or subexpressions, while maintaining a runtime

1 http://www.isical.ac.in/~crohme2011/result.html.

123

http://www.isical.ac.in/~crohme2011/result.html


that is appropriate for real-time use. To build this system, weintroduced a new fuzzy r-CFG formalism designed for pars-ing ambiguous, non-linear input. This formalism capturesboth the structures and the ambiguities inherent in mathemat-ical writing in particular, and it explicitly models recognitionprocesses like symbol and relation classification. We believethat this formalism is applicable to any structured domainexhibiting the types of syntactic ambiguities found in naturallanguages.

To facilitate efficient parsing, we introduced the order-ing assumption and demonstrated how it leads naturally tothe rectangular hulls proposed by Liang et al. While thisassumption theoretically allows most mathematical nota-tions to be adequately described, there are practical exam-ples of mathematical writing which, while easily readable byhumans, cannot be parsed by our current algorithm becauseof the ordering assumption. We intend to investigate how thisassumption may be relaxed to more flexibly model expres-sion geometry while still affording efficient parsing algo-rithms.

To report parse results in ranked order of confidence, weextended the idea of syntactic classes to one of relationalclasses. We plan to expand our selection of relational classesfrom representing symbol shapes to subexpression shapes,so that our relation membership functions can account forsuperscript and subscript geometry, for example, rather thantreating all multi-symbol expressions as equivalent. How-ever, parse speed decreases significantly when the parsermust choose between many classes. We, therefore, plan toinvestigate more targeted assumptions on the behavior ofgrammar relations which will allow a greater degree of con-text sensitivity without a significant performance penalty.The rule-based approach used in this paper does not scalewell to large sets of relational classes, for which empiricalapproaches are better-suited. At present, we lack sufficientempirical data to condition the class-specific relation mem-bership functions, but the creation of new handwritten cor-pora should provide useful training data.

The fuzzy membership grade functions defined in Sect. 3,while reasonable, cannot naturally account for some typesof information that might be useful during recognition. (Forexample, subexpression co-occurence frequencies, the distri-bution of subexpressions within larger expressions, etc.) Weplan to investigate the application of Bayesian probabilitytheory and formal statistical methods to the two-dimensionalparsing problem in a way that avoids dependence on writingorder. As well as the examples mentioned above, there areseveral existing components of our system, such as the rela-tion classifier, for which it seems likely that appropriatelyconditioned statistical information would be useful. We arelooking into efficient computational methods for combiningvariegated information (e.g., probabilistic, possibilistic, rule-based, etc.) during parsing.

Finally, our algorithms must be made more scalable.While they do address the complexity issues arising inour parsing scheme, they take a fairly brute-force approachwithin the constraints allowed by the simplifying assump-tions. As such, there is a considerable decrease in parsingspeed as the input becomes large. We plan to investigate waysin which per-strokes efficiency can be improved while keep-ing available all relevant recognition results.

Appendix A: Grammar specification

Below are the productions used in our math recognition gram-mar. Note that, although matrices are included in the gram-mar so that they may be part of mathematical expressions,the [MATRIX-INTERNAL] non-terminal causes the parserto invoke a specialized matrix recognition engine instead ofparsing the corresponding rectangular set in the usual way.This special system is necessary to ensure that the matricesare well-formed, as it is impossible in a context-free languageto specify in a general way that each row contains the samenumber of entries. The matrix analysis engine will be thesubject of a future report.

[EXPR]⇒ [REL-EXPR] | [REL-TERM][REL-EXPR]

→⇒ [REL-TERM][REL-OP][EXPR]

[REL-OP]⇒ =| =|<|>|≤|≥[REL-TERM]⇒ [ADD-EXPR] | [ADD-LEAD-TERM][ADD-EXPR]

→⇒ [ADD-LEAD-TERM][ADD-OP]

[REL-TERM]

[ADD-OP]⇒ + | − | ±[ADD-LEAD-TERM]⇒ [ADD-TERM] | [NEG]

[NEG]→⇒ −[ADD-TERM]

[ADD-TERM]⇒ [MULT] | [NUM] | [MULT-TRAIL][MULT]

→⇒ [MULT-LEAD-TERM][MULT-RHS]

[MULT-LEAD-TERM]⇒ [MULT-TERM] | [NUM][MULT-RHS]⇒ [MULT-REC] | [MULT-TRAIL][MULT-REC]

→⇒ [MULT-TERM][MULT-RHS]

[MULT-TRAIL]⇒ [MULT-TERM] | [INTEGRAL]| [SUM] | [LIMIT]

[MULT-TERM]⇒ [VAR-EXPR] | [FRAC]| [SUP] | [ROOT]

[VAR-EXPR]⇒ [VAR-TERM] | [FUNC]| [FENCED] | [MATRIX]

[VAR-TERM]⇒ [LETTER] | [SUBSCRIPT][SUBSCRIPT]

↘⇒ [LETTER][SUB-BASE]

[SUB-BASE]⇒ [NUM] | [VAR-TERM][FENCED]⇒ [PARENS] | [BRACKETS] | [BRACES][PARENS]

→⇒ ([REL-TERM])

[BRACKETS]→⇒ [[REL-TERM]]

123


[BRACES]→⇒ {[REL-TERM]}

[MATRIX]→⇒ [[MATRIX-INTERNAL]]

[FUNC]→⇒ [FUNC-LHS][PARENS]

| [FUNC-SPECNAME][ADD-TERM][FUNC-SPECNAME]

→⇒ sin | cos | tan | exp | log | ln | erf

[FUNC-LHS]⇒ [FUNC-NAME] | [FUNC-SUP][FUNC-NAME]⇒ [VAR-TERM] | [FUNC-SPECNAME][FUNC-SUP]

↗⇒ [FUNC-NAME][REL-TERM]

[FRAC]↓⇒ [REL-TERM]—[REL-TERM]

[SUP]↗⇒ [SUP-BASE][REL-TERM]

[SUP-BASE]⇒ [VAR-EXPR] | [ROOT][ROOT]

�⇒ √ [REL-TERM]

[SUM]→⇒ [SUM-LIMITS][REL-TERM]

[SUM-LIMITS]⇒∑

[SUM-LIMITS]↓⇒

∑[EXPR]

[SUM-LIMITS]↓⇒

∑[REL-TERM]

∑[EXPR]

[INTEGRAL]→⇒ [INT-LIMITS][REL-TERM]

d[VAR-TERM]

[INT-LIMITS]⇒∫

[INT-LIMITS]↓⇒

∫[REL-TERM]

[INT-LIMITS]↓⇒ [REL-TERM]

∫

[INT-LIMITS]⇒ [INT-UPRGT-LIMITS]

[INT-UPRGT-LIMITS]↗⇒

∫[REL-TERM]

[INT-LIMITS]↘⇒

∫[REL-TERM]

[INT-LIMITS]↓⇒ [REL-TERM]

∫[REL-TERM]

[INT-LIMITS]↘⇒ [INT-UPRGT-LIMITS][REL-TERM]

[LIMIT]→⇒ [LIM-LHS][REL-TERM]

[LIM-LHS]↓⇒ [LIM-TEXT][LIM-APPROACH]

[LIM-TEXT]→⇒ lim

[LIM-APPROACH]→⇒ [VAR-TERM]→ [REL-TERM]

[NUM]⇒ [INT] | [FLOAT] | ∞[INT]⇒ [DIGIT]

[INT]→⇒ [DIGIT][INT]

[DIGIT]⇒ 0 | 1 | · · · | 9[FLOAT]

→⇒ [FLOAT-LEAD][INT]

[FLOAT-LEAD]↘⇒ [INT]·

[LETTER]⇒ a | b | · · · | z | A | B | · · · | Z

| α | β | γ | δ | ε | π | σ | θ | λ | μ | φ| ψ | ρ | τ | ξ | ζ | | � | � | � | �

References

1. Álvaro, F., Sánchez, J.-A., Benedí, J.-M.: Recognition of printedmathematical expressions using two-dimensional stochastic con-text-free grammars. In: Proceedings of the International Confer-ence on Document Analysis and Recognition, pp. 1225–1229(2011)

2. Anderson, R.H.: Syntax-directed Recognition of Hand-printedTwo-dimensional Mathematics, Ph.D. thesis, Harvard University(1968)

3. Awal, A.-M., Mouchère, H., Viard-Gaudin, C.: The problem ofhandwritten mathematical expression recognition evaluation. In:Proceedings of the International Conference on Frontiers in Hand-writing Recognition, pp. 646–651 (2010)

4. Belaid, A., Haton, J.-P.: A syntactic approach for handwritten math-ematical formula recognition. IEEE Trans. Pattern Anal. Mach.Intell. 6(1), 105–111 (1984)

5. Blostein, D., Grbavec, A. : Recognition of mathematical nota-tion. In: Wang, P.S.P., Bunke, H. (eds.) Handbook on OpticalCharacter Recognition in Document Analysis, pp. 557–582. WorldScientific, Singapore (1996)

6. Blostein, D. : Math-literate computers, Calculemus/MKM. In:Carette, J., Dixon, L., Coen, C.S., Watt, S.M. (eds.) Lecture Notesin Computer Science, vol. 5625, pp. 2–13. Springer, Berlin (2009)

7. Caraballo, S.A., Charniak, E.: New figures of merit for best-first probabilistic chart parsing. Comput. Linguist. 24(2), 275–298 (1998)

8. Chan, K.-F., Yeung, D.-Y.: Error detection, error correction andperformance evaluation in on-line mathematical expression recog-nition. In: On-Line Mathematical Expression Recognition, PatternRecognition (1999)

9. Chan, K.-F., Yeung, D.-Y.: Mathematical expression recognition:a survey. Int. J. Doc. Anal. Recogn. 3, 3–15 (1999)

10. Chou, P.A.: Recognition of equations using a two-dimensional sto-chastic context-free grammar. In: Proceedings of the SPIE, VisualCommunication and Image Processing IV, vol. 1199, pp. 852–863(1989)

11. Costagliola, G., De Lucia, A., Orefice, S., Tortora, G. : Posi-tional grammars: a formalism for lr-like parsing of visuallanguages. In: Marriott, K., Meyer, B. (eds.) Visual LanguageTheory, pp. 171–192. Springer, New York (1998)

12. Fitzgerald, J.A., Geiselbrechtinger, F., Kechadi, T.: Mathpad: Afuzzy logic-based recognition system for handwritten mathematics,Document Analysis and Recognition. ICDAR 2007. Ninth Inter-national Conference on, vol. 2, Sept. 2007, pp. 694–698 (2007)

13. Garain, U., Chaudhuri, B.B.: Recognition of online handwrittenmathematical expressions. Syst. Man. Cybern. Part B: Cybern.IEEE Trans. 34(6), 2366–2376 (2004)

14. Garain, U., Chaudhuri, B.: A corpus for ocr research on mathemat-ical expressions. Int. J. Doc. Anal. Recognit. 7(4), 241–259 (2005)

15. Grune, D., Jacobs, C.J.H.: Parsing Techniques: A Practical Guide,2 edn. Springer, Berlin (2008)

16. Hull, J.: Recognition of mathematics using a two-dimensionaltrainable context-free grammar, Master’s thesis, MassachusettsInstitute of Technology (1996)

17. Labahn, G., Lank, E., MacLean, S., Marzouk, M., Tausky, D.:Mathbrush: a system for doing math on pen-based devices. In: TheEighth IAPR Workshop on Document Analysis Systems (DAS),pp. 599–606 (2008)

18. Bernard, L.: Towards a Uniform Formal Framework for Pars-ing, Current Issues in Parsing Technology. pp. 153–171. Kluwer,Dordrecht (1991)

123


19. Laviola, Jr. J.J.: Mathematical Sketching: A New Approach to Cre-ating and Exploring Dynamic Illustrations, Ph.D. thesis, BrownUniversity (2005)

20. Lee, E.T., Zadeh, L.A. : Note on fuzzy languages. In: Klir,G.J., Yuan, B. (eds.) Fuzzy Sets, Fuzzy Logic, and Fuzzy Sys-tems, pp. 69–82. World Scientific, Singapore (1996)

21. Li, C., Miller, T.S., Zeleznik, R.C., LaViola Jr. J.J.: Algosketch:Algorithm sketching and interactive computation. In: Proceedingsof Sketch-Based Interfaces and Modeling (2008)

22. Liang, P., Narasimhan, M., Shilman, M., Viola, P.: Efficient geo-metric algorithms for parsing in two dimensions, ICDAR ’05:Proceedings of the Eighth International Conference on DocumentAnalysis and Recognition (Washington, DC, USA), IEEE Com-puter Society, pp. 1172–1177 (2005)

23. Liu, W., Zhang, L., Tang, L., Dori, D.: Cost evaluation of interac-tively correcting recognized engineering drawings, Lecture Notesin Computer Science, no. 1941, pp. 329–334 (2000)

24. MacLean, S., Labahn, G., Lank, E., Marzouk, M., Tausky,D.: Grammar-based techniques for creating ground-truthed sketchcorpora. Int. J. Doc. Anal. Recogn. 14, 65–74 (2011)

25. MacLean, S.: Tools for the efficient generation of hand-drawn cor-pora based on context-free grammars. In: Third International Work-shop on Pen-Based Mathematics Computing, http://www.orcca.on.ca/conferences/cicm09/workshops/PenMath/programme-hand.html (2009)

26. MacLean, S., Labahn, G.: Elastic matching in linear time and con-stant space. In: Proceedings of Ninth IAPR International Workshopon Document Analysis Systems, (Short paper), pp. 551–554 (2010)

27. MacLean, S., Labahn, G.: Recognizing handwritten mathematicsvia fuzzy parsing, Technical Report CS-2010-13, School of Com-puter Science, University of Waterloo (2010)

28. Manning, C.D., Schuetze, H.: Foundations of Statistical NaturalLanguage Processing. The MIT Press, Cambridge (1999)

29. Marriott, K., Meyer, B., Wittenburg, K.B. : A survey ofvisual language specification and recognition. In: Marriott, K.,Meyer, B. (eds.) Visual Language Theory, pp. 5–85. Springer, NewYork (1998)

30. Miller, E.G., Viola, P.A.: Ambiguity and constraint in mathe-matical expression recognition. In: Proceedings of the FifteenthNational/tenth Conference on Artificial Intelligence/InnovativeApplications of Artificial Intelligence, pp. 784–791 (1998)

31. Mouchère, H., Viard-Gaudin, C., Garain, U., Kim, D.H., Kim, J.H.:Crohme2011: Competition on recognition of online handwrittenmathematical expressions. In: Proceedings of the 11th InternationalConference on Document Analysis and Recognition (2011)

32. Rhee, T.H., Kim, J.H.: Efficient search strategy in structural anal-ysis for handwritten mathematical expression recognition. PatternRecogn. 42(12), 3192–3201 (2009)

33. Rutherford, I.: Structural analysis for pen-based math input sys-tems, Master’s thesis, David R. Cheriton School of Computer Sci-ence, University of Waterloo (2005)

34. Sain, K., Dasgupta, A., Garain, U.: Emers: a tree matching-basedperformance evaluation of mathematical expression recognitionsystems. IJDAR 14(1), 75–85 (2011)

35. Shi, Y., Li, H., Soong, F.K.: A unified framework for symbol seg-mentation and recognition of handwritten mathematical expres-sions. In: Ninth International Conference on Document Analysisand Recognition, pp. 854–858 (2007)

36. Tomita, M. : Parsing 2-dimensional languages. In: Tomita, M.(ed.) Current Issues in Parsing Technology, pp. 277–289. Kluwer,Dordrecht (1991)

37. Toyota, S., Uchida, S., Suzuki, M.: Structural analysis of mathe-matical formulae with verification based on formula descriptiongrammar. Doc. Anal. Syst. VII:153–163 (2006)

38. Unger, S.H.: A global parser for context-free phrase structure gram-mars. Commun. ACM 11, 240–247 (1968)

39. Winkler, H.-J., Fahrner, H., Lang, M.: A soft-decision approachfor structural analysis of handwritten mathematical expressions.In: Proceedings of International Conference on Acoustics, Speechand Signal Processing, pp. 2459–2462 (1995)

40. Zadeh, L.A.: Fuzzy sets. Inf. Control 8, 338–353 (1965)41. Zanibbi, R., Blostein, D., Cordy, J.R.: Recognizing mathematical

expressions using tree transformation. IEEE Trans. Pattern Anal.Mach. Intell. 24(11), 1455–1467 (2002)

42. Zhang, L., Blostein, D., Zanibbi, R.: Using fuzzy logic to ana-lyze superscript and subscript relations in handwritten mathemat-ical expressions. In: Proceedings of Eighth International Confer-ence on Document Analysis and Recognition, vol. 2, pp. 972–976(2005)

123

http://www.orcca.on.ca/conferences/cicm09/workshops/PenMath/programme-hand.html



Documents

A new approach for recognizing handwritten mathematics using relational ...glabahn/Papers/fuzzy.pdf · A new approach for recognizing handwritten mathematics ... A new approach for