24
End-to-End Text End-to-End Text Recognition with Recognition with Convolutional Neural Convolutional Neural Networks Networks Tao Wang*, David J. Wu*, Adam Coates, Andrew Y. Ng Computer Science Department Stanford University * Denotes equal contributio

End-to-End Text Recognition with Convolutional Neural Networks

  • Upload
    uta

  • View
    140

  • Download
    0

Embed Size (px)

DESCRIPTION

End-to-End Text Recognition with Convolutional Neural Networks. Tao Wang*, David J. Wu*, Adam Coates, Andrew Y. Ng. Computer Science Department Stanford University. * Denotes equal contribution. Scene Text Recognition Overview. Text “in the wild” are hard to recognize - PowerPoint PPT Presentation

Citation preview

Page 1: End-to-End Text Recognition with Convolutional Neural Networks

End-to-End Text Recognition with End-to-End Text Recognition with Convolutional Neural NetworksConvolutional Neural Networks

Tao Wang*, David J. Wu*, Adam Coates, Andrew Y. Ng

Computer Science Department

Stanford University

* Denotes equal contribution

Page 2: End-to-End Text Recognition with Convolutional Neural Networks

Tao WangTao Wang 2

Scene Text Recognition OverviewScene Text Recognition Overview

• Text “in the wild” are hard to recognize

• Wide range of variations in backgrounds, textures, fonts, and lighting conditions

Street View Text Dataset K.Wang et al., 2011

ICDAR 2003 Dataset S. Lucas et al., 2003

Page 3: End-to-End Text Recognition with Convolutional Neural Networks

Tao WangTao Wang 3

Detection/Classification High-level Inference

“HOTEL”

Two-Stage FrameworkTwo-Stage Framework

Page 4: End-to-End Text Recognition with Convolutional Neural Networks

Tao WangTao Wang 4

Exhaustive Graph Search

MSER + SVM with RBF Kernel

Neumann and Matas, 2012

CRF + N-gram model

HOG + SVM with RBF Kernel

Mishra et al., 2012

Pictorial Structure

HOG + Random FernsK. Wang et al., 2011

Semi-Markov CRF

Appearance + GeometryWeinman et al., 2008

High-level inference

Classification and detection

Works

Page 5: End-to-End Text Recognition with Convolutional Neural Networks

Tao WangTao Wang 5

Simple off-the-shelf heuristics

Learnt features + Learnt features + 2-layer CNN2-layer CNNOur approachOur approach

Graph based inference models

Hand-designed features + off-the-shelf classifier

Most other approaches

High-level inference

Classification and detection

Page 6: End-to-End Text Recognition with Convolutional Neural Networks

Tao WangTao Wang 6

ICDAR 62-way cropped character classification

Detection/Classification End-to-end system after high-level inference

Various BenchmarksVarious Benchmarks

ICDAR and SVT end-to-end text recognition

ICDAR and SVT Cropped word recognition Lexicon

SOTASOTA

SOTA on ICDAR SOTASOTA

Page 7: End-to-End Text Recognition with Convolutional Neural Networks

Tao WangTao Wang 7

Unsupervised Feature LearningUnsupervised Feature Learning

Contrast Normalization + ZCA whitening

K-Means

Coates et al., 2011

Page 8: End-to-End Text Recognition with Convolutional Neural Networks

Tao WangTao Wang 8

Convolution ConvolutionSpatial Pooling Spatial Pooling

LL22-SVM Classifier-SVM Classifier

√√ TextText × × Non-TextNon-Text

Backpropagation

Large representation but not enough data. Overfitting?

96

256

~10K parameters for detection

~50K parameters for classification

1st layer 2nd layer

Page 9: End-to-End Text Recognition with Convolutional Neural Networks

Tao WangTao Wang 9

Synthetic DataSynthetic Data

Color Statistics

Synthetic “hard negatives”

Real SyntheticUnrealistic Synthetic DataReal Data

Java.Font + Natural backgrounds

Page 10: End-to-End Text Recognition with Convolutional Neural Networks

Tao WangTao Wang 10

Detector PerformanceDetector Performance

Page 11: End-to-End Text Recognition with Convolutional Neural Networks

Tao WangTao Wang 11

Text Line Bounding boxes

Candidate spaces

Page 12: End-to-End Text Recognition with Convolutional Neural Networks

Tao WangTao Wang 12

81.4 81.7

64

89

50

55

60

65

70

75

80

85

90

95

100

Yokobayashi etal., 2006

Coates et al.,2011

K.Wang et al.,2011

Our Approach Human

83.9

62-way classification accuracy on ICDAR cropped characters62-way classification accuracy on ICDAR cropped characters

(on ICDAR-Sample characters)

Acc

urac

y(%

)

Higher is better

Classifier PerformanceClassifier Performance

Page 13: End-to-End Text Recognition with Convolutional Neural Networks

Tao WangTao Wang 13

Page 14: End-to-End Text Recognition with Convolutional Neural Networks

Tao WangTao Wang 14

Ch

ar

Cla

ss

Sliding window position

Page 15: End-to-End Text Recognition with Convolutional Neural Networks

Tao WangTao Wang 15

Word RecognitionWord Recognition

Lexicon:…

MAKESERIESESTATEPOKER

S E R I E S -5.45

7.82

-1.74

-9.02

max ∑

Page 16: End-to-End Text Recognition with Convolutional Neural Networks

Tao WangTao Wang 16

76

82

90

62

84

57

7370

40

50

60

70

80

90

100

ICDAR-WD-50 ICDAR-WD-FULL SVT-WD

K.Wang et al., 2011

Mishra, et al., 2012

Our approach

Cropped Word Recognition AccuracyCropped Word Recognition AccuracyA

ccur

acy(

%)

Cropped Words Benchmarks

Higher is better

Page 17: End-to-End Text Recognition with Convolutional Neural Networks

Tao WangTao Wang 17

Candidate spacesgenerated by detector

max( )j

j

MSeg

M Seg

S

Page 18: End-to-End Text Recognition with Convolutional Neural Networks

Tao WangTao Wang 18

Page 19: End-to-End Text Recognition with Convolutional Neural Networks

Tao WangTao Wang 19

End-to-end text recognition resultsEnd-to-end text recognition results

0.72

0.76

0.7

0.74

0.68

0.72

0.51

0.67

0.38

0.46

0.3

0.35

0.4

0.45

0.5

0.55

0.6

0.65

0.7

0.75

0.8

ICDAR-5 ICDAR-20 ICDAR-50 ICDAR-FULL SVT

K.Wang etal., 2011

Ourapproach

F-S

core

End-to-end Benchmarks

Higher is better

Page 20: End-to-End Text Recognition with Convolutional Neural Networks

Tao WangTao Wang 20

Sample Output Sample Output Images from SVTImages from SVT

Page 21: End-to-End Text Recognition with Convolutional Neural Networks

Tao WangTao Wang 21

Sample Output Images Sample Output Images from ICDAR-FULLfrom ICDAR-FULL

Page 22: End-to-End Text Recognition with Convolutional Neural Networks

Tao WangTao Wang 22

max( )

max({ \ })

n c

m n c n

-- “confidence margin”

PEOSTELPEOST

POSTPOS

Hunspell

POSEPOST

PEOPLEPISTOL

LEXICON

Suggested Words

Our F-score: 0.38

Neumann and Matas, 2010: 0.40

c

Page 23: End-to-End Text Recognition with Convolutional Neural Networks

Tao WangTao Wang 23

• Learnt features + 2-layer CNN for+ character detection and classification• Simple heuristics to build end-to-end scene text recognition system• State-of-the-art performances onState-of-the-art performances on

- ICDAR cropped character classification- ICDAR cropped word recognition- Lexicon based end-to-end recognition on ICDAR and SVT

• Extensible to more general lexicon with off-the-shelf spelling checker

ConclusionConclusion

Page 24: End-to-End Text Recognition with Convolutional Neural Networks

Tao WangTao Wang 24