104
Creation and Optimization of a Logo Recognition System Haozhi Qi, Owen Richfield, Xiaohui Zeng, Michael Zhao Academic Mentor: Dr. Albert Ku Industrial Mentor: Mr. Sun Lin August 6, 2015 Qi, Richfield, Zeng, Zhao RIPS-HK: Lenovo

rips-hk-lenovo (1)

Embed Size (px)

Citation preview

Page 1: rips-hk-lenovo (1)

Creation and Optimization of a LogoRecognition System

Haozhi Qi, Owen Richfield, Xiaohui Zeng, Michael Zhao

Academic Mentor: Dr. Albert KuIndustrial Mentor: Mr. Sun Lin

August 6, 2015

Qi, Richfield, Zeng, Zhao

RIPS-HK: Lenovo

Page 2: rips-hk-lenovo (1)

Problem Description

Problem: What if there was anapp that could provide asmartphone user withinformation about a companyjust by recognizing thatcompany’s logo in an image?Goal: Create this app.

Qi, Richfield, Zeng, Zhao

RIPS-HK: Lenovo

Page 3: rips-hk-lenovo (1)

Outline

� Model Introduction

I Bag of Features ModelI Convolutional Neural Network

� Model Testing and Results� Application Demonstration� Conclusions and Future Work

Qi, Richfield, Zeng, Zhao

RIPS-HK: Lenovo

Page 4: rips-hk-lenovo (1)

Outline

� Model IntroductionI Bag of Features Model

I Convolutional Neural Network

� Model Testing and Results� Application Demonstration� Conclusions and Future Work

Qi, Richfield, Zeng, Zhao

RIPS-HK: Lenovo

Page 5: rips-hk-lenovo (1)

Outline

� Model IntroductionI Bag of Features ModelI Convolutional Neural Network

� Model Testing and Results� Application Demonstration� Conclusions and Future Work

Qi, Richfield, Zeng, Zhao

RIPS-HK: Lenovo

Page 6: rips-hk-lenovo (1)

Outline

� Model IntroductionI Bag of Features ModelI Convolutional Neural Network

� Model Testing and Results

� Application Demonstration� Conclusions and Future Work

Qi, Richfield, Zeng, Zhao

RIPS-HK: Lenovo

Page 7: rips-hk-lenovo (1)

Outline

� Model IntroductionI Bag of Features ModelI Convolutional Neural Network

� Model Testing and Results� Application Demonstration

� Conclusions and Future Work

Qi, Richfield, Zeng, Zhao

RIPS-HK: Lenovo

Page 8: rips-hk-lenovo (1)

Outline

� Model IntroductionI Bag of Features ModelI Convolutional Neural Network

� Model Testing and Results� Application Demonstration� Conclusions and Future Work

Qi, Richfield, Zeng, Zhao

RIPS-HK: Lenovo

Page 9: rips-hk-lenovo (1)

Bag of Features Model

Qi, Richfield, Zeng, Zhao

RIPS-HK: Lenovo

Page 10: rips-hk-lenovo (1)

Bag of Features Model

Qi, Richfield, Zeng, Zhao

RIPS-HK: Lenovo

Page 11: rips-hk-lenovo (1)

Bag of Features Model

Qi, Richfield, Zeng, Zhao

RIPS-HK: Lenovo

Page 12: rips-hk-lenovo (1)

Bag of Features Model

Qi, Richfield, Zeng, Zhao

RIPS-HK: Lenovo

Page 13: rips-hk-lenovo (1)

Bag of Features Model

Qi, Richfield, Zeng, Zhao

RIPS-HK: Lenovo

Page 14: rips-hk-lenovo (1)

Bag of Features Model

Qi, Richfield, Zeng, Zhao

RIPS-HK: Lenovo

Page 15: rips-hk-lenovo (1)

Feature Extraction

Qi, Richfield, Zeng, Zhao

RIPS-HK: Lenovo

Page 16: rips-hk-lenovo (1)

Feature Extraction and description: SURF

� Interest points detection

I Rotational and scale-invariant features� Interest points description

I Good representation form of image

Qi, Richfield, Zeng, Zhao

RIPS-HK: Lenovo

Page 17: rips-hk-lenovo (1)

Feature Extraction and description: SURF

� Interest points detectionI Rotational and scale-invariant features

� Interest points description

I Good representation form of image

Qi, Richfield, Zeng, Zhao

RIPS-HK: Lenovo

Page 18: rips-hk-lenovo (1)

Feature Extraction and description: SURF

� Interest points detectionI Rotational and scale-invariant features

� Interest points description

I Good representation form of image

Qi, Richfield, Zeng, Zhao

RIPS-HK: Lenovo

Page 19: rips-hk-lenovo (1)

Feature Extraction and description: SURF

� Interest points detectionI Rotational and scale-invariant features

� Interest points descriptionI Good representation form of image

Qi, Richfield, Zeng, Zhao

RIPS-HK: Lenovo

Page 20: rips-hk-lenovo (1)

Feature Extraction and description: SURF

� Interest points detectionI Rotational and scale-invariant features

� Interest points descriptionI Good representation form of image

Qi, Richfield, Zeng, Zhao

RIPS-HK: Lenovo

Page 21: rips-hk-lenovo (1)

SURF: Interest points detection

Use determinant of Hessian to detect blob-like structure

Qi, Richfield, Zeng, Zhao

RIPS-HK: Lenovo

Page 22: rips-hk-lenovo (1)

SURF: Interest points detection

Use determinant of Hessian to detect blob-like structure

Use box filter to approximate the second order derivative of Gaussian filter

Second-order box filter

Qi, Richfield, Zeng, Zhao

RIPS-HK: Lenovo

Page 23: rips-hk-lenovo (1)

SURF: Interest points detection

Use determinant of Hessian to detect blob-like structure

Use box filter to approximate the second order derivative of Gaussian filter

Taking advantages of integral image

Qi, Richfield, Zeng, Zhao

RIPS-HK: Lenovo

Page 24: rips-hk-lenovo (1)

SURF: Interest points detection

Use determinant of Hessian to detect blob-like structure

Use box filter to approximate the second order derivative of Gaussian filter

Taking advantages of integral domain

Apply scale-space analysis to choosethe appropriate points scale

Qi, Richfield, Zeng, Zhao

RIPS-HK: Lenovo

Page 25: rips-hk-lenovo (1)

SURF: Interest points description

Calculate dominant orientation based on Haar wavelet analysis

Qi, Richfield, Zeng, Zhao

RIPS-HK: Lenovo

Page 26: rips-hk-lenovo (1)

SURF: Interest points description

Calculate dominant orientation based on Haar wavelet analysis

Build 4*4 descriptor

Qi, Richfield, Zeng, Zhao

RIPS-HK: Lenovo

Page 27: rips-hk-lenovo (1)

BOW Training

Qi, Richfield, Zeng, Zhao

RIPS-HK: Lenovo

Page 28: rips-hk-lenovo (1)

Feature Vector Clustering

Qi, Richfield, Zeng, Zhao

RIPS-HK: Lenovo

Page 29: rips-hk-lenovo (1)

Basics of K-means

� Clustering Method in N -dimensional Space

� Algorithmic Steps:

I With a given set of data, choose k cluster centersI Calculate distances between each data point and each

clusterI Cluster points based on min distanceI Recalculate cluster centers:

vi =1

ci

ci∑j=1

xj

I vi=new cluster center, ci=number of data points in ith

cluster, xj=jth data point in ith cluster.

Qi, Richfield, Zeng, Zhao

RIPS-HK: Lenovo

Page 30: rips-hk-lenovo (1)

Basics of K-means

� Clustering Method in N -dimensional Space� Algorithmic Steps:

I With a given set of data, choose k cluster centersI Calculate distances between each data point and each

clusterI Cluster points based on min distanceI Recalculate cluster centers:

vi =1

ci

ci∑j=1

xj

I vi=new cluster center, ci=number of data points in ith

cluster, xj=jth data point in ith cluster.

Qi, Richfield, Zeng, Zhao

RIPS-HK: Lenovo

Page 31: rips-hk-lenovo (1)

Basics of K-means

� Clustering Method in N -dimensional Space� Algorithmic Steps:

I With a given set of data, choose k cluster centers

I Calculate distances between each data point and eachcluster

I Cluster points based on min distanceI Recalculate cluster centers:

vi =1

ci

ci∑j=1

xj

I vi=new cluster center, ci=number of data points in ith

cluster, xj=jth data point in ith cluster.

Qi, Richfield, Zeng, Zhao

RIPS-HK: Lenovo

Page 32: rips-hk-lenovo (1)

Basics of K-means

� Clustering Method in N -dimensional Space� Algorithmic Steps:

I With a given set of data, choose k cluster centersI Calculate distances between each data point and each

cluster

I Cluster points based on min distanceI Recalculate cluster centers:

vi =1

ci

ci∑j=1

xj

I vi=new cluster center, ci=number of data points in ith

cluster, xj=jth data point in ith cluster.

Qi, Richfield, Zeng, Zhao

RIPS-HK: Lenovo

Page 33: rips-hk-lenovo (1)

Basics of K-means

� Clustering Method in N -dimensional Space� Algorithmic Steps:

I With a given set of data, choose k cluster centersI Calculate distances between each data point and each

clusterI Cluster points based on min distance

I Recalculate cluster centers:

vi =1

ci

ci∑j=1

xj

I vi=new cluster center, ci=number of data points in ith

cluster, xj=jth data point in ith cluster.

Qi, Richfield, Zeng, Zhao

RIPS-HK: Lenovo

Page 34: rips-hk-lenovo (1)

Basics of K-means

� Clustering Method in N -dimensional Space� Algorithmic Steps:

I With a given set of data, choose k cluster centersI Calculate distances between each data point and each

clusterI Cluster points based on min distanceI Recalculate cluster centers:

vi =1

ci

ci∑j=1

xj

I vi=new cluster center, ci=number of data points in ith

cluster, xj=jth data point in ith cluster.

Qi, Richfield, Zeng, Zhao

RIPS-HK: Lenovo

Page 35: rips-hk-lenovo (1)

Basics of K-means

� Clustering Method in N -dimensional Space� Algorithmic Steps:

I With a given set of data, choose k cluster centersI Calculate distances between each data point and each

clusterI Cluster points based on min distanceI Recalculate cluster centers:

vi =1

ci

ci∑j=1

xj

I vi=new cluster center, ci=number of data points in ith

cluster, xj=jth data point in ith cluster.

Qi, Richfield, Zeng, Zhao

RIPS-HK: Lenovo

Page 36: rips-hk-lenovo (1)

K-means Clustering

Qi, Richfield, Zeng, Zhao

RIPS-HK: Lenovo

Page 37: rips-hk-lenovo (1)

Hierarchical K-means

Qi, Richfield, Zeng, Zhao

RIPS-HK: Lenovo

Page 38: rips-hk-lenovo (1)

Bag of Words and Hierarchical K-means

FEATURE VECTORS

CL.

CL. CL. CL.

CL.

CL. CL.

CL.

CL. CL.

CL.

CL. CL.

Qi, Richfield, Zeng, Zhao

RIPS-HK: Lenovo

Page 39: rips-hk-lenovo (1)

Bag of Words and Hierarchical K-means

CL.

CL. CL. CL.

CL.

CL. CL.

CL.

CL. CL.

CL.

CL. CL.

Qi, Richfield, Zeng, Zhao

RIPS-HK: Lenovo

Page 40: rips-hk-lenovo (1)

Bag of Words and Hierarchical K-means

CL.

CL. CL. CL.

CL.

CL. CL.

CL.

CL. CL.

CL.

CL. CL.

Qi, Richfield, Zeng, Zhao

RIPS-HK: Lenovo

Page 41: rips-hk-lenovo (1)

Bag of Words and Hierarchical K-means

CL.

CL. CL. CL.

CL.

CL. CL.

CL.

CL. CL.

CL.

CL. CL.

X

Qi, Richfield, Zeng, Zhao

RIPS-HK: Lenovo

Page 42: rips-hk-lenovo (1)

Bag of Words and Hierarchical K-means

CL.

CL. CL. CL.

CL.

CL. CL.

CL.

CL. CL.

CL.

CL. CL.

X X

Qi, Richfield, Zeng, Zhao

RIPS-HK: Lenovo

Page 43: rips-hk-lenovo (1)

Bag of Words and Hierarchical K-means

CL.

CL. CL. CL.

CL.

CL. CL.

CL.

CL. CL.

CL.

CL. CL.

X XXXXXXX X X

Qi, Richfield, Zeng, Zhao

RIPS-HK: Lenovo

Page 44: rips-hk-lenovo (1)

Bag of Words and Hierarchical K-means

word1

word2

word3

word4

word5

0

2

4

6

8

3

8

2

5

1matches

1

;

Qi, Richfield, Zeng, Zhao

RIPS-HK: Lenovo

Page 45: rips-hk-lenovo (1)

Bag of Features Model

Qi, Richfield, Zeng, Zhao

RIPS-HK: Lenovo

Page 46: rips-hk-lenovo (1)

Inverted File Index

� word 1:� word 2� word 3� word 4� word 5� word 6� ...

Qi, Richfield, Zeng, Zhao

RIPS-HK: Lenovo

Page 47: rips-hk-lenovo (1)

Inverted File Index

� word 1: image 1, image 3, image 5, ...� word 2: image 4, image 9, image 16, ...� word 3: image 4, image 12, image 13, ...� word 4: image 1, image 5, image 7, ...� word 5: image 2, image 3, image 9, ...� word 6: image 7, image 12, image 17, ...� ...

Qi, Richfield, Zeng, Zhao

RIPS-HK: Lenovo

Page 48: rips-hk-lenovo (1)

Classification: Inverted File Index

� Benefit: retrieval via the inverted file is faster thansearching every image

� Drawback: lack of spatial accuracy

� Need additional verification to re-rank the retrieval images

Qi, Richfield, Zeng, Zhao

RIPS-HK: Lenovo

Page 49: rips-hk-lenovo (1)

Classification: Inverted File Index

� Benefit: retrieval via the inverted file is faster thansearching every image

� Drawback: lack of spatial accuracy

� Need additional verification to re-rank the retrieval images

Qi, Richfield, Zeng, Zhao

RIPS-HK: Lenovo

Page 50: rips-hk-lenovo (1)

Classification: Inverted File Index

� Benefit: retrieval via the inverted file is faster thansearching every image

� Drawback: lack of spatial accuracy

� Need additional verification to re-rank the retrieval images

Qi, Richfield, Zeng, Zhao

RIPS-HK: Lenovo

Page 51: rips-hk-lenovo (1)

Bag of Features Model

Qi, Richfield, Zeng, Zhao

RIPS-HK: Lenovo

Page 52: rips-hk-lenovo (1)

Re-ranking of Return Images

� Match descriptors of query image to descriptors in imagesin returned list.

� Simple Algorithm:

I Match each descriptor in query image to its nearestneighbor descriptor from list image.

I Compare L2 norm of the pair to the norm of the querydescriptor and every other descriptor in list image.

I If original norm is significantly smaller, count as “match”.I Sum up number of “matches” for each list image and divide

by total number of features.

� The returned list is then re-ranked based on this “matchratio” and returned to the user.

Qi, Richfield, Zeng, Zhao

RIPS-HK: Lenovo

Page 53: rips-hk-lenovo (1)

Re-ranking of Return Images

� Match descriptors of query image to descriptors in imagesin returned list.

� Simple Algorithm:

I Match each descriptor in query image to its nearestneighbor descriptor from list image.

I Compare L2 norm of the pair to the norm of the querydescriptor and every other descriptor in list image.

I If original norm is significantly smaller, count as “match”.I Sum up number of “matches” for each list image and divide

by total number of features.

� The returned list is then re-ranked based on this “matchratio” and returned to the user.

Qi, Richfield, Zeng, Zhao

RIPS-HK: Lenovo

Page 54: rips-hk-lenovo (1)

Re-ranking of Return Images

� Match descriptors of query image to descriptors in imagesin returned list.

� Simple Algorithm:I Match each descriptor in query image to its nearest

neighbor descriptor from list image.

I Compare L2 norm of the pair to the norm of the querydescriptor and every other descriptor in list image.

I If original norm is significantly smaller, count as “match”.I Sum up number of “matches” for each list image and divide

by total number of features.

� The returned list is then re-ranked based on this “matchratio” and returned to the user.

Qi, Richfield, Zeng, Zhao

RIPS-HK: Lenovo

Page 55: rips-hk-lenovo (1)

Re-ranking of Return Images

� Match descriptors of query image to descriptors in imagesin returned list.

� Simple Algorithm:I Match each descriptor in query image to its nearest

neighbor descriptor from list image.I Compare L2 norm of the pair to the norm of the query

descriptor and every other descriptor in list image.

I If original norm is significantly smaller, count as “match”.I Sum up number of “matches” for each list image and divide

by total number of features.

� The returned list is then re-ranked based on this “matchratio” and returned to the user.

Qi, Richfield, Zeng, Zhao

RIPS-HK: Lenovo

Page 56: rips-hk-lenovo (1)

Re-ranking of Return Images

� Match descriptors of query image to descriptors in imagesin returned list.

� Simple Algorithm:I Match each descriptor in query image to its nearest

neighbor descriptor from list image.I Compare L2 norm of the pair to the norm of the query

descriptor and every other descriptor in list image.I If original norm is significantly smaller, count as “match”.

I Sum up number of “matches” for each list image and divideby total number of features.

� The returned list is then re-ranked based on this “matchratio” and returned to the user.

Qi, Richfield, Zeng, Zhao

RIPS-HK: Lenovo

Page 57: rips-hk-lenovo (1)

Re-ranking of Return Images

� Match descriptors of query image to descriptors in imagesin returned list.

� Simple Algorithm:I Match each descriptor in query image to its nearest

neighbor descriptor from list image.I Compare L2 norm of the pair to the norm of the query

descriptor and every other descriptor in list image.I If original norm is significantly smaller, count as “match”.I Sum up number of “matches” for each list image and divide

by total number of features.

� The returned list is then re-ranked based on this “matchratio” and returned to the user.

Qi, Richfield, Zeng, Zhao

RIPS-HK: Lenovo

Page 58: rips-hk-lenovo (1)

Re-ranking of Return Images

� Match descriptors of query image to descriptors in imagesin returned list.

� Simple Algorithm:I Match each descriptor in query image to its nearest

neighbor descriptor from list image.I Compare L2 norm of the pair to the norm of the query

descriptor and every other descriptor in list image.I If original norm is significantly smaller, count as “match”.I Sum up number of “matches” for each list image and divide

by total number of features.

� The returned list is then re-ranked based on this “matchratio” and returned to the user.

Qi, Richfield, Zeng, Zhao

RIPS-HK: Lenovo

Page 59: rips-hk-lenovo (1)

Convolutional NeuralNetworks (CNNs)

Qi, Richfield, Zeng, Zhao

RIPS-HK: Lenovo

Page 60: rips-hk-lenovo (1)

Neural Networks

Figure: Neural network from http://www.texample.net/media/tikz/examples/PNG/neural-network.png

Qi, Richfield, Zeng, Zhao

RIPS-HK: Lenovo

Page 61: rips-hk-lenovo (1)

Convolutional Neural Networks

Convolutional neural networks are neural networks with anadditional biological inspiration.

Each layer is of two basictypes: convolution and pooling.

� Convolution is the process of convolving an image with akernel. This idea comes from image processing where ithas been used for things like edge detection. Here, wewant to learn kernels specific to the data.

� Pooling refers to the process of providing a statisticalsummary of the outputs of several nearby “neurons”, e.g.by taking an average or max.

Qi, Richfield, Zeng, Zhao

RIPS-HK: Lenovo

Page 62: rips-hk-lenovo (1)

Convolutional Neural Networks

Convolutional neural networks are neural networks with anadditional biological inspiration. Each layer is of two basictypes: convolution and pooling.� Convolution is the process of convolving an image with a

kernel. This idea comes from image processing where ithas been used for things like edge detection. Here, wewant to learn kernels specific to the data.

� Pooling refers to the process of providing a statisticalsummary of the outputs of several nearby “neurons”, e.g.by taking an average or max.

Qi, Richfield, Zeng, Zhao

RIPS-HK: Lenovo

Page 63: rips-hk-lenovo (1)

Convolutional Neural Networks

Convolutional neural networks are neural networks with anadditional biological inspiration. Each layer is of two basictypes: convolution and pooling.� Convolution is the process of convolving an image with a

kernel. This idea comes from image processing where ithas been used for things like edge detection. Here, wewant to learn kernels specific to the data.

� Pooling refers to the process of providing a statisticalsummary of the outputs of several nearby “neurons”, e.g.by taking an average or max.

Qi, Richfield, Zeng, Zhao

RIPS-HK: Lenovo

Page 64: rips-hk-lenovo (1)

Figure: Description of convolution process from http://www.songho.ca/dsp/convolution/files/conv2d_matrix.jpg.

Qi, Richfield, Zeng, Zhao

RIPS-HK: Lenovo

Page 65: rips-hk-lenovo (1)

Implementation and Architecture

For implementation of CNNs, we used Caffe [?]. We only hadaround 16,000 images, so we used two pre-trained models todo fine-tuning:

� AlexNet [?], the winner of the ImageNet Large Scale VisualRecognition Challenge (ILSVRC) 2012.

� GoogLeNet [?], the winner of the ILSVRC 2014.

Both of these are provided in Caffe’s Model Zoo, with a file thatstores the weights of these models after training on ImageNet.

Qi, Richfield, Zeng, Zhao

RIPS-HK: Lenovo

Page 66: rips-hk-lenovo (1)

Implementation and Architecture

For implementation of CNNs, we used Caffe [?]. We only hadaround 16,000 images, so we used two pre-trained models todo fine-tuning:� AlexNet [?], the winner of the ImageNet Large Scale Visual

Recognition Challenge (ILSVRC) 2012.

� GoogLeNet [?], the winner of the ILSVRC 2014.Both of these are provided in Caffe’s Model Zoo, with a file thatstores the weights of these models after training on ImageNet.

Qi, Richfield, Zeng, Zhao

RIPS-HK: Lenovo

Page 67: rips-hk-lenovo (1)

Implementation and Architecture

For implementation of CNNs, we used Caffe [?]. We only hadaround 16,000 images, so we used two pre-trained models todo fine-tuning:� AlexNet [?], the winner of the ImageNet Large Scale Visual

Recognition Challenge (ILSVRC) 2012.� GoogLeNet [?], the winner of the ILSVRC 2014.

Both of these are provided in Caffe’s Model Zoo, with a file thatstores the weights of these models after training on ImageNet.

Qi, Richfield, Zeng, Zhao

RIPS-HK: Lenovo

Page 68: rips-hk-lenovo (1)

Implementation and Architecture

For implementation of CNNs, we used Caffe [?]. We only hadaround 16,000 images, so we used two pre-trained models todo fine-tuning:� AlexNet [?], the winner of the ImageNet Large Scale Visual

Recognition Challenge (ILSVRC) 2012.� GoogLeNet [?], the winner of the ILSVRC 2014.

Both of these are provided in Caffe’s Model Zoo, with a file thatstores the weights of these models after training on ImageNet.

Qi, Richfield, Zeng, Zhao

RIPS-HK: Lenovo

Page 69: rips-hk-lenovo (1)

Implementation and Architecture

For implementation of CNNs, we used Caffe [?]. We only hadaround 16,000 images, so we used two pre-trained models todo fine-tuning:� AlexNet [?], the winner of the ImageNet Large Scale Visual

Recognition Challenge (ILSVRC) 2012.� GoogLeNet [?], the winner of the ILSVRC 2014.

Both of these are provided in Caffe’s Model Zoo, with a file thatstores the weights of these models after training on ImageNet.

Qi, Richfield, Zeng, Zhao

RIPS-HK: Lenovo

Page 70: rips-hk-lenovo (1)

AlexNet

Figure: Image of AlexNet architecture (from [?]). This also illustrateshow original the network was split to train on two GPUs.

Qi, Richfield, Zeng, Zhao

RIPS-HK: Lenovo

Page 71: rips-hk-lenovo (1)

GoogLeNet

Figure: Image of GoogLeNet architecture (from [?]). Deeper, and 12xfewer parameters than AlexNet.

Qi, Richfield, Zeng, Zhao

RIPS-HK: Lenovo

Page 72: rips-hk-lenovo (1)

Filter/Layer Visualization

Let’s do some filter/layer visualization!� 143.89.75.120/filayer.html

Qi, Richfield, Zeng, Zhao

RIPS-HK: Lenovo

Page 73: rips-hk-lenovo (1)

Model Testing

Qi, Richfield, Zeng, Zhao

RIPS-HK: Lenovo

Page 74: rips-hk-lenovo (1)

Dataset Construction

We gathered a data set of images of logos of 167 brands usingBing Search API (on average, 100 images per brand),searching for things like “<brand>”, “<brand>building”,“<brand><product>”.

One problem we faced was that wedownloaded either mislabeled images or irrelevant images. Wefiltered the dataset using two methods:

� compute the proportion of matching SIFT descriptorsbetween the downloaded image and a reference image forthat brand, and toss the image if it doesn’t meet somethreshold

� import ManualLabor

Qi, Richfield, Zeng, Zhao

RIPS-HK: Lenovo

Page 75: rips-hk-lenovo (1)

Dataset Construction

We gathered a data set of images of logos of 167 brands usingBing Search API (on average, 100 images per brand),searching for things like “<brand>”, “<brand>building”,“<brand><product>”. One problem we faced was that wedownloaded either mislabeled images or irrelevant images. Wefiltered the dataset using two methods:

� compute the proportion of matching SIFT descriptorsbetween the downloaded image and a reference image forthat brand, and toss the image if it doesn’t meet somethreshold

� import ManualLabor

Qi, Richfield, Zeng, Zhao

RIPS-HK: Lenovo

Page 76: rips-hk-lenovo (1)

Dataset Construction

We gathered a data set of images of logos of 167 brands usingBing Search API (on average, 100 images per brand),searching for things like “<brand>”, “<brand>building”,“<brand><product>”. One problem we faced was that wedownloaded either mislabeled images or irrelevant images. Wefiltered the dataset using two methods:� compute the proportion of matching SIFT descriptors

between the downloaded image and a reference image forthat brand, and toss the image if it doesn’t meet somethreshold

� import ManualLabor

Qi, Richfield, Zeng, Zhao

RIPS-HK: Lenovo

Page 77: rips-hk-lenovo (1)

Dataset Construction

We gathered a data set of images of logos of 167 brands usingBing Search API (on average, 100 images per brand),searching for things like “<brand>”, “<brand>building”,“<brand><product>”. One problem we faced was that wedownloaded either mislabeled images or irrelevant images. Wefiltered the dataset using two methods:� compute the proportion of matching SIFT descriptors

between the downloaded image and a reference image forthat brand, and toss the image if it doesn’t meet somethreshold

� import ManualLabor

Qi, Richfield, Zeng, Zhao

RIPS-HK: Lenovo

Page 78: rips-hk-lenovo (1)

Testing the original pipeline

� parameter tuning� cross validation

Qi, Richfield, Zeng, Zhao

RIPS-HK: Lenovo

Page 79: rips-hk-lenovo (1)

Parameter Tuning

� BOW structure: how to choose vocabulary size:I words = BL

I B: number of branch; L: number of level

I Too large: lack of generalization, overfitting

I

Qi, Richfield, Zeng, Zhao

RIPS-HK: Lenovo

Page 80: rips-hk-lenovo (1)

Parameter Tuning

� BOW structure: how to choose vocabulary size:I words = BL

I B: number of branch; L: number of levelI Too large: lack of generalization, overfitting

I

Qi, Richfield, Zeng, Zhao

RIPS-HK: Lenovo

Page 81: rips-hk-lenovo (1)

Parameter Tuning

� BOW structure: how to choose vocabulary size:I words = BL

I B: number of branch; L: number of levelI Too large: lack of generalization, overfittingI Too small: lack of discrimination,mismatched

I

Qi, Richfield, Zeng, Zhao

RIPS-HK: Lenovo

Page 82: rips-hk-lenovo (1)

Parameter Tuning

� vocabulary size� How to choose the number of images returned by inverted

file index searchI accuracyI the computation time of re-ranking

� How to choose the number of image shown in the clientside

I accuracyI mobile application, the size of screen

post

Qi, Richfield, Zeng, Zhao

RIPS-HK: Lenovo

Page 83: rips-hk-lenovo (1)

Parameter Tuning

� vocabulary size� How to choose the number of images returned by inverted

file index searchI accuracyI the computation time of re-ranking

� How to choose the number of image shown in the clientside

I accuracyI mobile application, the size of screen

post

Qi, Richfield, Zeng, Zhao

RIPS-HK: Lenovo

Page 84: rips-hk-lenovo (1)

Parameter Tuning

� vocabulary size� How to choose the number of images returned by inverted

file index searchI accuracyI the computation time of re-ranking

� How to choose the number of image shown in the clientside

I accuracyI mobile application, the size of screen

post

Qi, Richfield, Zeng, Zhao

RIPS-HK: Lenovo

Page 85: rips-hk-lenovo (1)

Parameter Tuning

� vocabulary size� the number of images returned by searching� the number of image shown� Re-ranking: how to determine weight factor w in the

weighted functionI scores = w ∗ I + (1− w) ∗ FI I: number of inliersI F: frequency of the brands in the return images

Qi, Richfield, Zeng, Zhao

RIPS-HK: Lenovo

Page 86: rips-hk-lenovo (1)

Parameters for Evaluation

� vocabulary sizeI number of branchI number of level

� the number of images returned by searching� the number of image shown� weight factor w in the weighted function� calculation of the accuracy

I one correct return then accuracy = 1

Qi, Richfield, Zeng, Zhao

RIPS-HK: Lenovo

Page 87: rips-hk-lenovo (1)

Cross Validation

� applicationI model selectionI model assessment

� procedure

Qi, Richfield, Zeng, Zhao

RIPS-HK: Lenovo

Page 88: rips-hk-lenovo (1)

Cross Validation

randomly divide the data into Kequal sized parts.� leave out part k, fit the

model to the other K-1parts(combined), and thenobtain predictions for theleft-out kth part

� this is done in turn for eachpart k=1,2,...K, and thenthe results are combined

� choose k = 5

Qi, Richfield, Zeng, Zhao

RIPS-HK: Lenovo

Page 89: rips-hk-lenovo (1)

Testing Result

Qi, Richfield, Zeng, Zhao

RIPS-HK: Lenovo

Page 90: rips-hk-lenovo (1)

Testing Result

� test on vocabulary size� optimal number of words: 500000 to 800000

I number of branch = 14 or 15I number of level = 5

Qi, Richfield, Zeng, Zhao

RIPS-HK: Lenovo

Page 91: rips-hk-lenovo (1)

Testing Result

� With otherparameters fixed,test on

I weight factorI number of return

imageI number of image

shown on theclient side

Qi, Richfield, Zeng, Zhao

RIPS-HK: Lenovo

Page 92: rips-hk-lenovo (1)

Testing Result

� optimal parametersetting:

I number of imageshown = 6

I set number ofreturn image tobe 15, savingabout 0.3s

Qi, Richfield, Zeng, Zhao

RIPS-HK: Lenovo

Page 93: rips-hk-lenovo (1)

Testing Summary

� optimal parameter setting:I number of words: 500000 to 800000I number of image return: 15I number of image shown: 6

� stability of the system was also test:I standard deviation of 5 fold cross validation range from

0.005 to 0.007

Qi, Richfield, Zeng, Zhao

RIPS-HK: Lenovo

Page 94: rips-hk-lenovo (1)

Evaluation of Deep Learning framework

Cross-validation for AlexNet (Top-5 Accuracy)

0.87

0.88

0.89

0.9

0.91

0.92

0.93

0.94

0.95

10

00

60

00

11

00

0

16

00

0

21

00

0

26

00

0

31

00

0

36

00

0

41

00

0

46

00

0

51

00

0

56

00

0

61

00

0

66

00

0

71

00

0

76

00

0

81

00

0

86

00

0

91

00

0

96

00

0

10

10

00

10

60

00

11

10

00

11

60

00

12

10

00

12

60

00

13

10

00

13

60

00

14

10

00

14

60

00

15

10

00

15

60

00

16

10

00

16

60

00

17

10

00

17

60

00

18

10

00

18

60

00

19

10

00

19

60

00

Cross Validation Example

94.63% 94.02%93.80%94.02%93.90%93.59%94.11%93.44%94.54%93.80%

Qi, Richfield, Zeng, Zhao

RIPS-HK: Lenovo

Page 95: rips-hk-lenovo (1)

Evaluation of Deep Learning framework

Cross-validation for AlexNet

Final Accuracy reaches: (AlexNet)

AlexNet

Top-1 Accuracy 93.33%

Top-5 Accuracy 96.73%

Qi, Richfield, Zeng, Zhao

RIPS-HK: Lenovo

Page 96: rips-hk-lenovo (1)

Evaluation of Deep Learning framework

Cross-validation for GoogleNet (Top-5 Accuracy)

Qi, Richfield, Zeng, Zhao

RIPS-HK: Lenovo

Page 97: rips-hk-lenovo (1)

Evaluation of Deep Learning framework

Cross-validation for AlexNet

Cross-validation for GoogleNet

Final Accuracy reaches: (GoogleNet)

GoogleNet

Top-1 Accuracy 94.05%

Top-5 Accuracy 97.39%

Qi, Richfield, Zeng, Zhao

RIPS-HK: Lenovo

Page 98: rips-hk-lenovo (1)

Evaluation of Deep Learning framework

Final Comparison

GoogleNet AlexNet Visual Bag of Words

Accuracy (Top-5) 97.39% 96.73% 87.6%

Efficiency

Preprocess 8.47ms 7.5ms 6ms

Classification 17.7ms 6.94ms

SURF Featureextraction

24ms

Total Time(Including some system level operation)

129ms 170ms 281ms

Qi, Richfield, Zeng, Zhao

RIPS-HK: Lenovo

Page 99: rips-hk-lenovo (1)

Demonstration

Qi, Richfield, Zeng, Zhao

RIPS-HK: Lenovo

Page 100: rips-hk-lenovo (1)

Future development

There is still something we can do to improve the system� We can enlarge the data set. (Currently 167 classes and

16,000 images)

� Test different deep learning frameworks.� Combine locally hand-crafted feature and globally deep

learned feature to achieve better accuracy.

Qi, Richfield, Zeng, Zhao

RIPS-HK: Lenovo

Page 101: rips-hk-lenovo (1)

Future development

There is still something we can do to improve the system� We can enlarge the data set. (Currently 167 classes and

16,000 images)� Test different deep learning frameworks.

� Combine locally hand-crafted feature and globally deeplearned feature to achieve better accuracy.

Qi, Richfield, Zeng, Zhao

RIPS-HK: Lenovo

Page 102: rips-hk-lenovo (1)

Future development

There is still something we can do to improve the system� We can enlarge the data set. (Currently 167 classes and

16,000 images)� Test different deep learning frameworks.� Combine locally hand-crafted feature and globally deep

learned feature to achieve better accuracy.

Qi, Richfield, Zeng, Zhao

RIPS-HK: Lenovo

Page 103: rips-hk-lenovo (1)

We would like to thank� Mr. Sun Lin and Lenovo-Hong Kong.� Professor Shingyu Leung, Dr. Ku Yin Bon and Hong Kong

University of Science and Technology.� Professor Susanna Serna and the Institute for Pure and

Applied Mathematics.� The National Science Foundation for program funding -

Grant DMS #0931852.

Qi, Richfield, Zeng, Zhao

RIPS-HK: Lenovo

Page 104: rips-hk-lenovo (1)

Qi, Richfield, Zeng, Zhao

RIPS-HK: Lenovo