1
Converting Handwritten Mathematical Expressions into LaTeX Norah Borus (nborus), William Bakst (wbakst), Amit Schechter (amitsch) Motivation Datasets Typing up mathema.cal equa.ons has become quite a necessity when submi6ng academic papers, wri.ng and solving problem sets, presen.ng mathema.cal concepts, and much more. We’re looking to automate the process of conver.ng handwri@en expressions to LaTeX using modern Machine Learning techniques. Features Our dataset includes the traces of 10,000 handwri@en expressions mapped to the corresponding ground truth LaTeX expression (obtained from Kaggle). We split the data into 80% train, 10% dev, and 10% test. We use traces for CSeg, normalized pixel arrays for OCR, and traces with segmented characters for SA. Example: $(x-x^3)^2 + (y-y^3)^2 = r^2$ Discussion Experimental Results Methods Future Work CSeg: implement a combined model using NN beam search and SVM predic.ons to increase accuracy. Use AdaBoost and mul.-scale shape context features. OCR: Experiment with advanced OCR techniques such as Random Forests to help improve the accuracy of our CNN and SVM. Use data augmenta.on techniques to add examples of uncommon symbols. SA: We plan on finding or implemen.ng a LaTeX parser that will enable us to train an SVM to aid our current model in correctly building square root, exponents, and other more complex equa.ons. E2E: implement a CNN for an end to end approach (similar to Google’s Tesseract). Segmenta.on: features come from data. SVM uses features including overlap, inverse horizontal, ver.cal, and centroids distance, and horizontal containment, which are the main indicators for grouping traces. Character Recogni.on: SVM and NN use data based features include normalized fla@ened greyscale pixel array. We derived Histogram of Oriented Gradients (HOG) from the pixel array improves image recogni.on by using overlapping local contrast normaliza.on. Character Segmenta.on (CSeg) Character Recogni.on (OCR) Structural Analysis (SA) CSeg Accuracy: OCR Accuracy: Most character recogni.on errors come from symbols that do not appear as frequently in dataset (e.g. \mu). Segmenta.on: NN beam search and the binary SVM achieve similar accuracy (64%). Beam search tends to group traces that should remain separated. Binary SVM tends to keep strokes separated. We are working on improving our Structural Analysis. We are struggling with how to accurately test the output and train models and are currently repor.ng accuracy by hand to preserve correctness . A Input: normalized pixel array. Output: character’s LaTex representa.on. SVM: uses mul.class SVM with a linear Kernel, L 2 regulariza.on. Mul.class SVM loss: NN (mul;class): uses same internal structure as NN in CSeg. Output is the classifica.on of the LaTex symbol. Cross entropy loss: CNN: mul.class CNN with 5 hidden layers, ReLU ac.va.on func.on, and cross entropy loss. Input: list of all traces for equa.on. Output: traces grouped by characters. Overlap: merges overlapping strokes. All other strokes are separate. SVM: uses a binary SVM with a linear Kernel, L 2 regulariza.on, and C value 50. Binary SVM loss func.on: NN (binary classifica;on): with single hidden layer, sohmax and sigmoid ac.va.on, and cross entropy loss. Input is a 32X32 fla@ened pixel array of a single or mul.ple strokes. Output is whether or not the image is a valid LaTex symbol. Beam Search: we use beam search to explore possible segmenta.ons. We maximize the segmenta.on score, which is based on NN predic.on (confidence of it being a valid character). Next Σ ϕ ( x ) Next Next Next Sup Sub $\sum_{i=1} ^{m}\phi(x)$ m i=1 Baseline Heuris;cs Test 14% 65% SA Analysis Accuracy: Test Train CNN 44% NA Overlap 53% 53% SVM 64% 96% Binary NN 78% 97% Beam search 64% NA References: M. Thoma “on-line recogni.on of Handwri@en Mathema.cal Symbols”, 2015, Bachelor’sTthesis, KIT. Scikit Learn. h@p://scikit-learn.org/stable/modules/generated/sklearn.svm.LinearSVC.html. Pytorch. h@p://pytorch.org/tutorials/beginner/blitz/neural_networks_tutorial.html. Structure analysis is done according to rela.ve loca.on and size. We use features such as overlap, squared loss from superscript/subscript bounding box to determine the rela.onship between two characters. We then use these rela.onships to rebuild the ground truth LaTeX. 0% 20% 40% 60% 80% 100% CNN SVM NN Train Test

Converting Handwritten Mathematical Expressions into LaTeXcs229.stanford.edu/proj2017/final-posters/5145725.pdf · Converting Handwritten Mathematical Expressions into LaTeX Norah

  • Upload
    others

  • View
    32

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Converting Handwritten Mathematical Expressions into LaTeXcs229.stanford.edu/proj2017/final-posters/5145725.pdf · Converting Handwritten Mathematical Expressions into LaTeX Norah

Converting Handwritten Mathematical Expressions into LaTeX Norah Borus (nborus), William Bakst (wbakst), Amit Schechter (amitsch)

Motivation

Datasets

•  Typing up mathema.cal equa.ons has becomequite a necessity when submi6ng academicpapers, wri.ng and solving problem sets,presen.ng mathema.cal concepts, and muchmore.

•  We’re looking to automate the process ofconver.ng handwri@en expressions to LaTeXusingmodernMachineLearningtechniques.

Features

•  Our dataset includes the traces of 10,000handwri@en expressions mapped to thecorresponding ground truth LaTeX expression(obtainedfromKaggle).

•  We split the data into 80% train, 10% dev, and10% test. We use traces for CSeg, normalizedpixelarrays forOCR,and traceswithsegmentedcharactersforSA.

•  Example:

$(x-x^3)^2+(y-y^3)^2=r^2$

Discussion

Experimental Results Methods

Future Work •  CSeg:implementacombinedmodelusingNNbeamsearchandSVMpredic.ons

toincreaseaccuracy.UseAdaBoostandmul.-scaleshapecontextfeatures.•  OCR: Experiment with advanced OCR techniques such as Random Forests to

help improve the accuracy of our CNN and SVM. Use data augmenta.ontechniquestoaddexamplesofuncommonsymbols.

•  SA:Weplanonfindingor implemen.nga LaTeXparser thatwill enableus totrain an SVM to aid our current model in correctly building square root,exponents,andothermorecomplexequa.ons.

•  E2E: implement a CNN for an end to end approach (similar to Google’sTesseract).

•  Segmenta.on:featurescomefromdata.SVMusesfeatures including overlap, inverse horizontal,ver.cal, and centroids distance, and horizontalcontainment, which are the main indicators forgroupingtraces.

•  Character Recogni.on: SVM and NN use databased features include normalized fla@enedgreyscalepixelarray.

WederivedHistogramofOrientedGradients(HOG)fromthepixelarray–improvesimagerecogni.onbyusingoverlappinglocalcontrastnormaliza.on.

CharacterSegmenta.on(CSeg)

CharacterRecogni.on(OCR)

StructuralAnalysis(SA)

CSegAccuracy:OCRAccuracy:

•  Most character recogni.on errors come from symbols that do notappearasfrequentlyindataset(e.g.\mu).

•  Segmenta.on: NN beam search and the binary SVM achieve similaraccuracy(64%).Beamsearchtendstogrouptracesthatshouldremainseparated.BinarySVMtendstokeepstrokesseparated.

•  We are working on improving our Structural Analysis. We arestruggling with how to accurately test the output and train modelsandarecurrentlyrepor.ngaccuracybyhandtopreservecorrectness.

A

Input:normalizedpixelarray.Output:character’sLaTexrepresenta.on.•  SVM:usesmul.classSVMwithalinearKernel,L2regulariza.on.Mul.classSVMloss:

•  NN(mul;class):usessameinternalstructureasNNinCSeg.Outputistheclassifica.onoftheLaTexsymbol.

•  Crossentropyloss:•  CNN:mul.classCNNwith5hiddenlayers,ReLUac.va.onfunc.on,and

crossentropyloss.

Input:listofalltracesforequa.on.Output:tracesgroupedbycharacters.

•  Overlap:mergesoverlappingstrokes.Allotherstrokesareseparate.•  SVM:usesabinarySVMwithalinearKernel,L2regulariza.on,andC

value50.BinarySVMlossfunc.on:•  NN(binaryclassifica;on):withsinglehiddenlayer,sohmaxandsigmoid

ac.va.on,andcrossentropyloss.Inputisa32X32fla@enedpixelarrayofasingleormul.plestrokes.OutputiswhetherornottheimageisavalidLaTexsymbol.

•  BeamSearch:weusebeamsearchtoexplorepossiblesegmenta.ons.Wemaximizethesegmenta.onscore,whichisbasedonNNpredic.on(confidenceofitbeingavalidcharacter).

NextΣ ϕ ( x )Next Next Next

Sup

Sub

$\sum_{i=1}^{m}\phi(x)$

m

i=1

Baseline Heuris;cs

Test 14% 65%

SAAnalysisAccuracy:

Test Train

CNN 44% NA

Overlap 53% 53%

SVM 64% 96%

BinaryNN 78% 97%

Beamsearch 64% NA

References:•  M.Thoma–“[email protected]”,2015,Bachelor’sTthesis,KIT.•  ScikitLearn.h@p://scikit-learn.org/stable/modules/generated/sklearn.svm.LinearSVC.html.•  Pytorch.h@p://pytorch.org/tutorials/beginner/blitz/neural_networks_tutorial.html.

•  Structureanalysisisdoneaccordingtorela.veloca.onandsize.

•  Weusefeaturessuchasoverlap,squaredlossfromsuperscript/subscriptboundingboxtodeterminetherela.onshipbetweentwocharacters.

•  Wethenusetheserela.onshipstorebuildthegroundtruthLaTeX.

0%20%40%60%80%100%

CNN SVM NN

Train

Test