rips-hk-lenovo (1)

Creation and Optimization of a LogoRecognition System

Haozhi Qi, Owen Richfield, Xiaohui Zeng, Michael Zhao

Academic Mentor: Dr. Albert KuIndustrial Mentor: Mr. Sun Lin

August 6, 2015

Qi, Richfield, Zeng, Zhao

RIPS-HK: Lenovo

Problem Description

Problem: What if there was anapp that could provide asmartphone user withinformation about a companyjust by recognizing thatcompany’s logo in an image?Goal: Create this app.


RIPS-HK: Lenovo

Outline

� Model Introduction

I Bag of Features ModelI Convolutional Neural Network

� Model Testing and Results� Application Demonstration� Conclusions and Future Work


RIPS-HK: Lenovo

Outline

� Model IntroductionI Bag of Features Model

I Convolutional Neural Network



RIPS-HK: Lenovo

Outline

� Model IntroductionI Bag of Features ModelI Convolutional Neural Network



RIPS-HK: Lenovo

Outline


� Model Testing and Results

� Application Demonstration� Conclusions and Future Work


RIPS-HK: Lenovo

Outline


� Model Testing and Results� Application Demonstration

� Conclusions and Future Work


RIPS-HK: Lenovo

Outline




RIPS-HK: Lenovo

Bag of Features Model


RIPS-HK: Lenovo



RIPS-HK: Lenovo



RIPS-HK: Lenovo



RIPS-HK: Lenovo



RIPS-HK: Lenovo



RIPS-HK: Lenovo

Feature Extraction


RIPS-HK: Lenovo

Feature Extraction and description: SURF

� Interest points detection

I Rotational and scale-invariant features� Interest points description

I Good representation form of image


RIPS-HK: Lenovo


� Interest points detectionI Rotational and scale-invariant features

� Interest points description



RIPS-HK: Lenovo



� Interest points description



RIPS-HK: Lenovo



� Interest points descriptionI Good representation form of image


RIPS-HK: Lenovo



� Interest points descriptionI Good representation form of image


RIPS-HK: Lenovo

SURF: Interest points detection

Use determinant of Hessian to detect blob-like structure


RIPS-HK: Lenovo



Use box filter to approximate the second order derivative of Gaussian filter

Second-order box filter


RIPS-HK: Lenovo




Taking advantages of integral image


RIPS-HK: Lenovo




Taking advantages of integral domain

Apply scale-space analysis to choosethe appropriate points scale


RIPS-HK: Lenovo

SURF: Interest points description

Calculate dominant orientation based on Haar wavelet analysis


RIPS-HK: Lenovo

SURF: Interest points description

Calculate dominant orientation based on Haar wavelet analysis

Build 4*4 descriptor


RIPS-HK: Lenovo

BOW Training


RIPS-HK: Lenovo

Feature Vector Clustering


RIPS-HK: Lenovo

Basics of K-means

� Clustering Method in N -dimensional Space

� Algorithmic Steps:

I With a given set of data, choose k cluster centersI Calculate distances between each data point and each

clusterI Cluster points based on min distanceI Recalculate cluster centers:

vi =1

ci

ci∑j=1

xj

I vi=new cluster center, ci=number of data points in ith

cluster, xj=jth data point in ith cluster.


RIPS-HK: Lenovo

Basics of K-means

� Clustering Method in N -dimensional Space� Algorithmic Steps:



vi =1

ci

ci∑j=1

xj




RIPS-HK: Lenovo

Basics of K-means


I With a given set of data, choose k cluster centers

I Calculate distances between each data point and eachcluster

I Cluster points based on min distanceI Recalculate cluster centers:

vi =1

ci

ci∑j=1

xj




RIPS-HK: Lenovo

Basics of K-means



cluster

I Cluster points based on min distanceI Recalculate cluster centers:

vi =1

ci

ci∑j=1

xj




RIPS-HK: Lenovo

Basics of K-means



clusterI Cluster points based on min distance

I Recalculate cluster centers:

vi =1

ci

ci∑j=1

xj




RIPS-HK: Lenovo

Basics of K-means




vi =1

ci

ci∑j=1

xj




RIPS-HK: Lenovo

Basics of K-means




vi =1

ci

ci∑j=1

xj




RIPS-HK: Lenovo

K-means Clustering


RIPS-HK: Lenovo

Hierarchical K-means


RIPS-HK: Lenovo

Bag of Words and Hierarchical K-means

FEATURE VECTORS

CL.

CL. CL. CL.

CL.

CL. CL.

CL.

CL. CL.

CL.

CL. CL.


RIPS-HK: Lenovo


CL.

CL. CL. CL.

CL.

CL. CL.

CL.

CL. CL.

CL.

CL. CL.


RIPS-HK: Lenovo


CL.

CL. CL. CL.

CL.

CL. CL.

CL.

CL. CL.

CL.

CL. CL.


RIPS-HK: Lenovo


CL.

CL. CL. CL.

CL.

CL. CL.

CL.

CL. CL.

CL.

CL. CL.

X


RIPS-HK: Lenovo


CL.

CL. CL. CL.

CL.

CL. CL.

CL.

CL. CL.

CL.

CL. CL.

X X


RIPS-HK: Lenovo


CL.

CL. CL. CL.

CL.

CL. CL.

CL.

CL. CL.

CL.

CL. CL.

X XXXXXXX X X


RIPS-HK: Lenovo


word1

word2

word3

word4

word5

0

2

4

6

8

3

8

2

5

1matches

1

;


RIPS-HK: Lenovo



RIPS-HK: Lenovo

Inverted File Index

� word 1:� word 2� word 3� word 4� word 5� word 6� ...


RIPS-HK: Lenovo

Inverted File Index

� word 1: image 1, image 3, image 5, ...� word 2: image 4, image 9, image 16, ...� word 3: image 4, image 12, image 13, ...� word 4: image 1, image 5, image 7, ...� word 5: image 2, image 3, image 9, ...� word 6: image 7, image 12, image 17, ...� ...


RIPS-HK: Lenovo

Classification: Inverted File Index

� Benefit: retrieval via the inverted file is faster thansearching every image

� Drawback: lack of spatial accuracy

� Need additional verification to re-rank the retrieval images


RIPS-HK: Lenovo






RIPS-HK: Lenovo






RIPS-HK: Lenovo



RIPS-HK: Lenovo

Re-ranking of Return Images

� Match descriptors of query image to descriptors in imagesin returned list.

� Simple Algorithm:

I Match each descriptor in query image to its nearestneighbor descriptor from list image.

I Compare L2 norm of the pair to the norm of the querydescriptor and every other descriptor in list image.

I If original norm is significantly smaller, count as “match”.I Sum up number of “matches” for each list image and divide

by total number of features.

� The returned list is then re-ranked based on this “matchratio” and returned to the user.


RIPS-HK: Lenovo



� Simple Algorithm:

I Match each descriptor in query image to its nearestneighbor descriptor from list image.






RIPS-HK: Lenovo



� Simple Algorithm:I Match each descriptor in query image to its nearest

neighbor descriptor from list image.






RIPS-HK: Lenovo




neighbor descriptor from list image.I Compare L2 norm of the pair to the norm of the query

descriptor and every other descriptor in list image.





RIPS-HK: Lenovo





descriptor and every other descriptor in list image.I If original norm is significantly smaller, count as “match”.

I Sum up number of “matches” for each list image and divideby total number of features.



RIPS-HK: Lenovo





descriptor and every other descriptor in list image.I If original norm is significantly smaller, count as “match”.I Sum up number of “matches” for each list image and divide




RIPS-HK: Lenovo





descriptor and every other descriptor in list image.I If original norm is significantly smaller, count as “match”.I Sum up number of “matches” for each list image and divide




RIPS-HK: Lenovo

Convolutional NeuralNetworks (CNNs)


RIPS-HK: Lenovo

Neural Networks

Figure: Neural network from http://www.texample.net/media/tikz/examples/PNG/neural-network.png


RIPS-HK: Lenovo

http://www.texample.net/media/tikz/examples/PNG/neural-network.png

http://www.texample.net/media/tikz/examples/PNG/neural-network.png

Convolutional Neural Networks

Convolutional neural networks are neural networks with anadditional biological inspiration.

Each layer is of two basictypes: convolution and pooling.

� Convolution is the process of convolving an image with akernel. This idea comes from image processing where ithas been used for things like edge detection. Here, wewant to learn kernels specific to the data.

� Pooling refers to the process of providing a statisticalsummary of the outputs of several nearby “neurons”, e.g.by taking an average or max.


RIPS-HK: Lenovo


Convolutional neural networks are neural networks with anadditional biological inspiration. Each layer is of two basictypes: convolution and pooling.� Convolution is the process of convolving an image with a

kernel. This idea comes from image processing where ithas been used for things like edge detection. Here, wewant to learn kernels specific to the data.



RIPS-HK: Lenovo


Convolutional neural networks are neural networks with anadditional biological inspiration. Each layer is of two basictypes: convolution and pooling.� Convolution is the process of convolving an image with a

kernel. This idea comes from image processing where ithas been used for things like edge detection. Here, wewant to learn kernels specific to the data.



RIPS-HK: Lenovo

Figure: Description of convolution process from http://www.songho.ca/dsp/convolution/files/conv2d_matrix.jpg.


RIPS-HK: Lenovo

http://www.songho.ca/dsp/convolution/files/conv2d_matrix.jpg

http://www.songho.ca/dsp/convolution/files/conv2d_matrix.jpg

Implementation and Architecture

For implementation of CNNs, we used Caffe [?]. We only hadaround 16,000 images, so we used two pre-trained models todo fine-tuning:

� AlexNet [?], the winner of the ImageNet Large Scale VisualRecognition Challenge (ILSVRC) 2012.

� GoogLeNet [?], the winner of the ILSVRC 2014.

Both of these are provided in Caffe’s Model Zoo, with a file thatstores the weights of these models after training on ImageNet.


RIPS-HK: Lenovo


For implementation of CNNs, we used Caffe [?]. We only hadaround 16,000 images, so we used two pre-trained models todo fine-tuning:� AlexNet [?], the winner of the ImageNet Large Scale Visual

Recognition Challenge (ILSVRC) 2012.

� GoogLeNet [?], the winner of the ILSVRC 2014.Both of these are provided in Caffe’s Model Zoo, with a file thatstores the weights of these models after training on ImageNet.


RIPS-HK: Lenovo



Recognition Challenge (ILSVRC) 2012.� GoogLeNet [?], the winner of the ILSVRC 2014.



RIPS-HK: Lenovo






RIPS-HK: Lenovo






RIPS-HK: Lenovo

AlexNet

Figure: Image of AlexNet architecture (from [?]). This also illustrateshow original the network was split to train on two GPUs.


RIPS-HK: Lenovo

GoogLeNet

Figure: Image of GoogLeNet architecture (from [?]). Deeper, and 12xfewer parameters than AlexNet.


RIPS-HK: Lenovo

Filter/Layer Visualization

Let’s do some filter/layer visualization!� 143.89.75.120/filayer.html


RIPS-HK: Lenovo

Model Testing


RIPS-HK: Lenovo

Dataset Construction

We gathered a data set of images of logos of 167 brands usingBing Search API (on average, 100 images per brand),searching for things like “<brand>”, “<brand>building”,“<brand><product>”.

One problem we faced was that wedownloaded either mislabeled images or irrelevant images. Wefiltered the dataset using two methods:

� compute the proportion of matching SIFT descriptorsbetween the downloaded image and a reference image forthat brand, and toss the image if it doesn’t meet somethreshold

� import ManualLabor


RIPS-HK: Lenovo


We gathered a data set of images of logos of 167 brands usingBing Search API (on average, 100 images per brand),searching for things like “<brand>”, “<brand>building”,“<brand><product>”. One problem we faced was that wedownloaded either mislabeled images or irrelevant images. Wefiltered the dataset using two methods:

� compute the proportion of matching SIFT descriptorsbetween the downloaded image and a reference image forthat brand, and toss the image if it doesn’t meet somethreshold



RIPS-HK: Lenovo


We gathered a data set of images of logos of 167 brands usingBing Search API (on average, 100 images per brand),searching for things like “<brand>”, “<brand>building”,“<brand><product>”. One problem we faced was that wedownloaded either mislabeled images or irrelevant images. Wefiltered the dataset using two methods:� compute the proportion of matching SIFT descriptors

between the downloaded image and a reference image forthat brand, and toss the image if it doesn’t meet somethreshold



RIPS-HK: Lenovo


We gathered a data set of images of logos of 167 brands usingBing Search API (on average, 100 images per brand),searching for things like “<brand>”, “<brand>building”,“<brand><product>”. One problem we faced was that wedownloaded either mislabeled images or irrelevant images. Wefiltered the dataset using two methods:� compute the proportion of matching SIFT descriptors

between the downloaded image and a reference image forthat brand, and toss the image if it doesn’t meet somethreshold



RIPS-HK: Lenovo

Testing the original pipeline

� parameter tuning� cross validation


RIPS-HK: Lenovo

Parameter Tuning

� BOW structure: how to choose vocabulary size:I words = BL

I B: number of branch; L: number of level

I Too large: lack of generalization, overfitting

I


RIPS-HK: Lenovo

Parameter Tuning


I B: number of branch; L: number of levelI Too large: lack of generalization, overfitting

I


RIPS-HK: Lenovo

Parameter Tuning


I B: number of branch; L: number of levelI Too large: lack of generalization, overfittingI Too small: lack of discrimination,mismatched

I


RIPS-HK: Lenovo

Parameter Tuning

� vocabulary size� How to choose the number of images returned by inverted

file index searchI accuracyI the computation time of re-ranking

� How to choose the number of image shown in the clientside

I accuracyI mobile application, the size of screen

post


RIPS-HK: Lenovo

Parameter Tuning





post


RIPS-HK: Lenovo

Parameter Tuning





post


RIPS-HK: Lenovo

Parameter Tuning

� vocabulary size� the number of images returned by searching� the number of image shown� Re-ranking: how to determine weight factor w in the

weighted functionI scores = w ∗ I + (1− w) ∗ FI I: number of inliersI F: frequency of the brands in the return images


RIPS-HK: Lenovo

Parameters for Evaluation

� vocabulary sizeI number of branchI number of level

� the number of images returned by searching� the number of image shown� weight factor w in the weighted function� calculation of the accuracy

I one correct return then accuracy = 1


RIPS-HK: Lenovo

Cross Validation

� applicationI model selectionI model assessment

� procedure


RIPS-HK: Lenovo

Cross Validation

randomly divide the data into Kequal sized parts.� leave out part k, fit the

model to the other K-1parts(combined), and thenobtain predictions for theleft-out kth part

� this is done in turn for eachpart k=1,2,...K, and thenthe results are combined

� choose k = 5


RIPS-HK: Lenovo

Testing Result


RIPS-HK: Lenovo

Testing Result

� test on vocabulary size� optimal number of words: 500000 to 800000

I number of branch = 14 or 15I number of level = 5


RIPS-HK: Lenovo

Testing Result

� With otherparameters fixed,test on

I weight factorI number of return

imageI number of image

shown on theclient side


RIPS-HK: Lenovo

Testing Result

� optimal parametersetting:

I number of imageshown = 6

I set number ofreturn image tobe 15, savingabout 0.3s


RIPS-HK: Lenovo

Testing Summary

� optimal parameter setting:I number of words: 500000 to 800000I number of image return: 15I number of image shown: 6

� stability of the system was also test:I standard deviation of 5 fold cross validation range from

0.005 to 0.007


RIPS-HK: Lenovo

Evaluation of Deep Learning framework

Cross-validation for AlexNet (Top-5 Accuracy)

0.87

0.88

0.89

0.9

0.91

0.92

0.93

0.94

0.95

10

00

60

00

11

00

0

16

00

0

21

00

0

26

00

0

31

00

0

36

00

0

41

00

0

46

00

0

51

00

0

56

00

0

61

00

0

66

00

0

71

00

0

76

00

0

81

00

0

86

00

0

91

00

0

96

00

0

10

10

00

10

60

00

11

10

00

11

60

00

12

10

00

12

60

00

13

10

00

13

60

00

14

10

00

14

60

00

15

10

00

15

60

00

16

10

00

16

60

00

17

10

00

17

60

00

18

10

00

18

60

00

19

10

00

19

60

00

Cross Validation Example

94.63% 94.02%93.80%94.02%93.90%93.59%94.11%93.44%94.54%93.80%


RIPS-HK: Lenovo


Cross-validation for AlexNet

Final Accuracy reaches: (AlexNet)

AlexNet

Top-1 Accuracy 93.33%



RIPS-HK: Lenovo


Cross-validation for GoogleNet (Top-5 Accuracy)


RIPS-HK: Lenovo


Cross-validation for AlexNet

Cross-validation for GoogleNet

Final Accuracy reaches: (GoogleNet)

GoogleNet




RIPS-HK: Lenovo


Final Comparison

GoogleNet AlexNet Visual Bag of Words

Accuracy (Top-5) 97.39% 96.73% 87.6%

Efficiency

Preprocess 8.47ms 7.5ms 6ms

Classification 17.7ms 6.94ms

SURF Featureextraction

24ms

Total Time(Including some system level operation)

129ms 170ms 281ms


RIPS-HK: Lenovo

Demonstration


RIPS-HK: Lenovo

Future development

There is still something we can do to improve the system� We can enlarge the data set. (Currently 167 classes and

16,000 images)

� Test different deep learning frameworks.� Combine locally hand-crafted feature and globally deep

learned feature to achieve better accuracy.


RIPS-HK: Lenovo

Future development


16,000 images)� Test different deep learning frameworks.

� Combine locally hand-crafted feature and globally deeplearned feature to achieve better accuracy.


RIPS-HK: Lenovo

Future development


16,000 images)� Test different deep learning frameworks.� Combine locally hand-crafted feature and globally deep

learned feature to achieve better accuracy.


RIPS-HK: Lenovo

We would like to thank� Mr. Sun Lin and Lenovo-Hong Kong.� Professor Shingyu Leung, Dr. Ku Yin Bon and Hong Kong

University of Science and Technology.� Professor Susanna Serna and the Institute for Pure and

Applied Mathematics.� The National Science Foundation for program funding -

Grant DMS #0931852.


RIPS-HK: Lenovo


RIPS-HK: Lenovo

Documents

rips-hk-lenovo (1)