29
Application example: Photo OCR Problem description and pipeline Machine Learning

Application example: Photo OCR Problem description and pipeline Machine Learning

Embed Size (px)

Citation preview

Introduction to Programming

Application example: Photo OCRProblem description and pipeline

Machine Learning1The Photo OCR problem

LULA Bs ANTIQUE MALLLULA BsLULA BsOPENAndrew NgPhoto OCR pipeline1. Text detection2. Character segmentation3. Character classification

NAT

Andrew NgImageText detectionCharacter segmentationCharacter recognitionPhoto OCR pipelineApplication example: Photo OCRSliding windows

Machine Learning5Text detectionPedestrian detection

Andrew Ng6Positive examplesSupervised learning for pedestrian detectionpixels in 82x36 image patches

Negative examples

Andrew Ng

Sliding window detectionAndrew Ng

Sliding window detectionAndrew Ng

Sliding window detectionAndrew Ng

Sliding window detectionAndrew NgText detection

Andrew NgText detectionPositive examples

Negative examples

Andrew NgText detection

[David Wu]Andrew NgExpansionRectangles around operator

141D Sliding window for character segmentation

Positive examples

Negative examples

Andrew NgChange it to segmentation examples instead:15Photo OCR pipeline1. Text detection2. Character segmentation3. Character classification

NAT

Andrew NgApplication example: Photo OCRGetting lots of data: Artificial data synthesis

Machine Learning17Character recognition

NIAQTAAndrew NgSort it, spell antique18Artificial data synthesis for photo OCR

Real dataAbcdefgAbcdefgAbcdefgAbcdefgAbcdefg[Adam Coates and Tao Wang]Andrew NgArtificial data synthesis for photo OCR

Real dataSynthetic data[Adam Coates and Tao Wang]Andrew NgSynthesizing data by introducing distortions[Adam Coates and Tao Wang]

Andrew NgSynthesizing data by introducing distortions: Speech recognitionOriginal audio:

Audio on bad cellphone connectionNoisy background: CrowdNoisy background: Machinery[www.pdsounds.org]

Andrew NgSynthesizing data by introducing distortions Distortion introduced should be representation of the type of noise/distortions in the test set.Audio:Background noise, bad cellphone connectionUsually does not help to add purely random/meaningless noise to your data.intensity (brightness) of pixel random noise

[Adam Coates and Tao Wang]

Andrew Ng2x2, add noise23Discussion on getting more dataMake sure you have a low bias classifier before expending the effort. (Plot learning curves). E.g. keep increasing the number of features/number of hidden units in neural network until you have a low bias classifier.How much work would it be to get 10x as much data as we currently have?Artificial data synthesisCollect/label it yourselfCrowd source (E.g. Amazon Mechanical Turk)Andrew NgDiscussion on getting more dataMake sure you have a low bias classifier before expending the effort. (Plot learning curves). E.g. keep increasing the number of features/number of hidden units in neural network until you have a low bias classifier.How much work would it be to get 10x as much data as we currently have?Artificial data synthesisCollect/label it yourselfCrowd source (E.g. Amazon Mechanical Turk)Andrew NgApplication example: Photo OCRCeiling analysis: What part of the pipeline to work on next

Machine Learning26Estimating the errors due to each component (ceiling analysis)ImageText detectionCharacter segmentationCharacter recognitionWhat part of the pipeline should you spend the most time trying to improve?ComponentAccuracyOverall system72%Text detection89%Character segmentation90%Character recognition100%Andrew NgAnother ceiling analysis exampleFace recognition from images (Artificial example)

Logistic regressionFace detectionCameraimageEyes segmentationNose segmentationMouth segmentationPreprocess(remove background)LabelAndrew NgComponentAccuracyOverall system85%Preprocess (remove background)85.1%Face detection91%Eyes segmentation95%Nose segmentation96%Mouth segmentation 97%Logistic regression100%Another ceiling analysis exampleLogistic regressionFace detectionCameraimageEyes segmentationNose segmentationMouth segmentationPreprocess(remove background)LabelAndrew Ng