APPLICATION OF PHASOR MEASUREMENT UNIT IN SMART GRID

A NOVEL TEXT DETECTION AND LOCALIZATION METHOD BASED ON CORNERRESPONSE

Li Sun, Guizhong Liu, Xueming Qian, Danping Guo

School of Electronics and Information Engineering, Xian Jiaotong University, 710049, China

ABSTRACT

Information of text in videos and images plays an impor-tant role in semantic analysis. In this paper, we propose aneffective method for text detection and localization in noisybackground. The algorithm is based on corner response.Compared to non-text regions, there often exist dense edgesand corners in text regions. So we can get relatively strongresponses from text regions and low responses from non-textregions. These responses provide us useful cues for text de-tection and localization in images. Then using a simple blockbased threshold scheme, we get candidate regions for text.These regions are further veried by combining other featuressuch as color and size range of connected component. Finally,Text line is located accurately by the projection of corner re-sponse. The experimental results show the effectiveness ofour methods.

Index Terms Text detection, text localization, cornerresponse

1. INTRODUCTION

Text detection in video and image has attracted researchersattention for many years. Texts provide more intuitive infor-mation and they are closely related to the content of videos[1]. So it is natural and convenient to analyze semantic ofvideo based on the information of texts. But all these workcan only be done if texts can be accurately and efciently de-tected.

The previous work of text detection usually falls into threecategories: 1) Connected component based method assumessome constraints of text regions, such as uniform colors, cer-tain sizes, and spatial alignments, are satised. Jain and Yuidentify texts as connected components in video frames bycombining features of color and size range [2, 3]. The mainproblem of this kind of method is that it is not universal forall images. Because color, size and shape of the text canvary greatly from image to image; 2) Edge or texture based

* This work is partially supported by China National NaturalScience Foundations (NSF No.60572045), China National 973 Project(No.2007CB311002), China State-funded Study Abroad Program (CSCNo.2007U06001), MVTec Software GmbH, and Huawei Technologies Co.,Ltd.

method hold the assumption that backgrounds are much moresmooth than text regions. So it is possible to classify textregion and non-text region according to edge or texture inten-sity. But how to reduce noises coming from complex back-grounds is still an open problem. Lyu et al. propose a methodfor detecting multilingual and multi-resolution text [4]. Sobeledge maps are used as features and a special local thresholdis adopted to to locate the candidate region of text. Li et al.propose a method based on central moments of image block[1]. They prove that different scale text can be detected usingthis feature; 3) Machine learning based method use featuresextracted from text region and non-text region to train supportvector machine or neutral network and then text detection be-comes a supervised classication problem. In [5], a SVMbased algorithm using response from stroke lter is proposed.Hu et al. propose an adaptive SVM based paradigm, basedon maximum gradient differences and other connected com-ponent features, which can obtain relatively low false rate [6].The shortcoming of machine learning based method is that itneeds a large number of training samples of different kind.

In this paper, a new text detection and localization methodbased on corner response(CR) is proposed. CR is the outputresult of a special lter which can extract gray value corners inimage. The local maximum in CR is the well-known Harriscorner point. Although CR doesnt contain the exact positioninformation of corner points, it reects the probability of apixel being a corner point. We nd that it is a suitable featurefor text detection. It also works well under different resolu-tion, so texts of different size can be detected. Combined withfeature of uniform color and connected-component size, goodresults have been achieved.

Compared with previous work, our work makes contribu-tion on the following two aspects: 1) It is more robust thanother edge or texture based methods. Because CR is more ef-fective and it already reduces noises in the feature extractionstage. Even if the background of image is complex, we canstill detect texts from it; 2) It can detect texts in large font,since the feature we use can work well under both ne andcoarse resolution. 3) Compared with the method in [5] and[7], our method is efcient. Because CR is easy to computeand we dont need to know the exact position of corner points.

This paper is organized as follows: In Section 2, a cornerresponse based method for texts detection and localization is

390978-1-4244-4291-1/09/$25.00 2009 IEEE ICME 2009

proposed. Experimental results and discussions are given inSection 3. Conclusions are nally drawn in Section 4.

2. TEXT REGION DETECTION ANDLOCALIZATION

This section explains the method for nding text regions inimages based on corner response. It consists of 3 stages: (1)computing corner response in multi-scale space and thresh-olding it to get the candidate region of text; (2) Verifying thecandidate region by combining color and size range features;(3) Locating the text line using bounding box. Fig.1. showstotal owchart of our scheme.

Original Image

Corner Response

Down Sample Image

Block Based Threshold

Color BasedVerification

Projection BasedVerification

Fig. 1. Flowchart of our proposed method.

2.1. Computing corner response at multiple scale

Corner is a special two-dimensional feature point which hashigh curvature in the region boundary. It can be located bynding the local maximum in corner response(CR). In [7],corner points in video frame are used to generate connectedcomponent. But they use just the number of corner points,not CR, to classify text and non-text region. The advantagesof using CR instead of the number of corner points lie in fol-lowing two aspects. First, we dont have to know the accurateposition of corner point, while we just want to know whichpart of the image is likely to have corner points. CR is justthe feature describing the possibility of corner points. Second,CR can get a continuous value for each pixel point which iseasy to handle for following procedures.

Here we briey explain the calculation of CR. For moretechnical details, see [8]. Given an image I(x, y), the basicform of CR is shown in equation (1).

CR(x, y) =u,v

W (u, v)[I(x + u, y + v) I(x, y)]2 (1)

Here W (u, v) is window function. It can be proved that CRcan be approximately computed using the formula bellow.

CR(x, y) = A(x, y)B(x, y) (C(x, y))2weight (A(x, y) + B(x, y))2 (2)

Here A(x, y), B(x, y) and C(x, y) are computed as follow:

A(x, y) = W (u, v) (xI(x, y))2 (3)B(x, y) = W (u, v) (yI(x, y))2 (4)

C(x, y) = W (u, v) xI(x, y) yI(x, y) (5)In the formula above,xI(x, y) andyI(x, y) are edge am-plitudes along x direction and y direction which we can get bysobel operator. W (u, v) is a gaussian template for smoothing.

W (u, v) = exp(u2 + v2)/2 (6)and we can choose value and size of the template.

Serval images and their correspondent CR are shown inFig.2. We can see that text region stands out from the back-ground in some images, like Fig.2(a). Complex backgroundregion may also be detected like Fig.2(b), but it can be elimi-nated in following step. CR is not strong enough in Fig.2(c)because of the big font of texts. But if we down-sample theimage, CR becomes strong enough, as is shown in Fig.2(d).For the purpose of detecting text in big font, a pyramid ofscaled down images is generated rst and then CR is com-puted at each level of the pyramid. The value of the scaledfactor and the level of the pyramid depend on resolution oforiginal image and size of text which we want to detect. Inour experiment, we nd two level pyramid with scaled factorequaling 0.5 is enough.

2.2. Text candidate region generation

Text candidate region is generated based on CR. First, it isdivided into small blocks. We choose the block size 8 8 inour experiment. Then the mean intensity value of each blockin CR Mblk is calculated. A threshold Tblk is set for Mblk. Iffollowing condition satised,

Mblk > Tblk (7)

Tblk =1

H WH,W

x=0,y=0

CR(x, y) (8)

current block is considered as one of the block in text candi-date region. The threshold used here is relatively low sincemany pixels in CR image is 0. This is reasonable becausewe dont want to lose blocks which actually contain text andthe noisy block come from the background can be reduced byfollowing step. The result of text candidate region are shownin Fig.3(a) and Fig.3(b).

2.3. Text verication using color constraints

The color of text in image is often uniform and different frombackground. So gray value deviation of text character is smallcompared with background, and gray value has a great differ-ence between text and background. We take advantage of thisfeature to eliminate noisy blocks.

391

(a)

(b)

(c)

(d)

Fig. 2. Corner responses of some images: a. CR for Englishtexts; b. CR for Chinese texts in noisy background; c. CRfor large texts; d. CR for large text after down-sampling.

In each candidate block, we set a threshold TCR for everypixel in CR and get a collection of points Rt and Rb in eachblock.

CR(x, y) TCR, (x, y) RtCR(x, y) < TCR, (x, y) Rb

Then we calculate Dev and Dis as follows. Here g(x, y) isthe gray value of a pixel.

Dev =

1Nt

(x,y) in Rt

(g(x, y)Mt)2 (9)

Dis = |MtMb| (10)Mt and Mb are mean gray value in Rt and Rb respectively.Finally, we check if following condition is satised.

Dis > Tdis Dev < Tdev (11)

(a) (b)

(c) (b)

Fig. 3. Text candidate region and its verication result. a)text candidate region in noisy background; b) large Englishtext and its candidate region; c) verication result of (a); d)verication result of (b).

If it is satised, we consider current block as text block.Fig.3(c) and Fig.3(d) show the result after text verication.Although there still exist some small noisy regions, it is easyto eliminate them using area and aspect ratio constraints [3].

2.4. Text line localization

We have already got text regions after verication. But theshape of the region is still irregular, and it needs to rene intoaligned rectangle. Since text region is usually horizontally orvertically aligned in video, the projection method describedin [9, 10] is used to get the accurate position of the text line.The projection is based on CR. It is done as follows. Foreach connected-component region, it is rst extended by fourpixels along the border. Then a bounding box is used to locatethe region. For each row or column in the bounding box,we calculate the summation of the intensity in CR and get acurve as is shown in Fig.2(b). Because there is always spacebetween characters, the curve needs to be smoothed. Here agauss lter is adopted to smooth it. Finally, a threshold is usedto locate the position of text line. Fig.2 shows an example ofboth horizontal and vertical projection. We have found that ifwe set the threshold to the value of 30% of the peak, the nalresult is good as is shown Fig.2(b).

3. EXPERIMENTAL RESULTS

The proposed text detection and localization algorithm hasbeen tested on a number of real-life video clips include TVnews and movies. Texts in these videos are either English orChinese. The resolution of the image is 320 240 or 352

392

(a) (b) (c)

(d) (e) (f)

Fig. 4. Example of text line localization. a) bounding boxafter text verication; b) CR in bounding box; c) horizontalprojection ofCR in bounding box. Red lines are the thresholdwe set; d) text line after horizontal localization; e)CR for textline after horizontal localization; f) vertical projection of CR.

Table 1. Performance comparison with previous workMethod Recall Accuracy Speed (ms)

Method in [4] 90.69% 90.77% 96.1Method in [5] 91.32% 92.47% 77.5Method in [7] 91.84% 94.58% 105.4Our method 91.63% 95.86% 70.2

288. 500 images are randomly selected from these videos fortesting.

Here we adopt two most widely used quantitative evalu-ation measurements: recall, accuracy and speed. The recallrate evaluates how many percents of all ground-truth text re-gions are correctly detected.The detection accuracy evaluateshow many percents of the detected text regions are correct. Acorrect detection is counted only if the intersection between adetected region and a ground truth region covers at least 90%of both their area. To evaluate speed, we use average process-ing time per image as a measurement.

From experiment results shown in table 1, we can seethat compared with the work in [4] and [5], our method canachieve high accuracy, while the recall is almost same. Thereason lies in two aspects. First,CR itself contains less noisesthan other features, which means CR is a effective feature fortext detection. Second, we combine other features such ascolor and size range to eliminate the false detection. Com-pared with the work in [7], our method is fast and efcient.It takes only about 30ms to compute CR and 40ms to verifyand locate the text region. In [7], it takes 75ms to get justthe position of Harris corner points and at least another 30msto locate text. All these tests are done on the same computerwith computer vision software Hdevelop.

4. CONCLUSION

This paper proposes a text detection and localization methodbased on CR of image. There are three basic steps in ourmethod. First, text candidates are generated based on theblock of CR. Second, text candidate regions are veried by

combining other features. Third, text lines are accurately lo-cated based on CR. Experimental results have demonstratedthe effectiveness of the proposed method on the text detectionand localization task.

5. REFERENCES

[1] H. Li, D. Doermann, and O. Kia, Automatic text detec-tion and tracking in digital video, IEEE Trans. ImageProcessing, vol. 9, no. 1, pp. 147156, 2000.

[2] B. Yu and A. Jain, A generic system for form dropout,IEEE Trans. Pattern Analysis And Machine Intelligence,vol. 18, pp. 11271134, 1996.

[3] A.K. Jain and B. Yu, Automatic text location in imagesand video frames, Pattern Recognition, vol. 31, no. 12,pp. 20552076, 1998.

[4] M.R. Lyu and J.-Q. Song, A comprehensive methodfor multilingual video text detection, localization, andextraction, IEEE Trans. Circuits and System for VideoTechnology, vol. 15, no. 2, pp. 243255, 2005.

[5] Xiaojun Li, Weiqiang Wang, Shuqiang Jiang, QingmingHuang, and Wen Gao, Fast and effective text detec-tion, in Proc. of the IEEE International Conference onImage Processing (ICIP), 2008.

[6] Shiyan Hu and Minya Chen, Adaptive fre/splacute/chet kernel based support vector machine for textdetection, in Proc. of the IEEE International Con-ference on Acoustics, Speech, and Signal Processing(ICASSP), 2005.

[7] Xian sheng Hua, Xiang rong Chen, Liu Wenyin, andHong jiang Zhang, Automatic location of text in videoframes, in Proceeding of ACM Multimedia 2001 Work-shops: Multimedia Information Retrieval (MIR2001),2005.

[8] C.G. Harris and M.J. Stephens, A combined corner andedge detector, in Proceeding of the 4th Alvey VisionConference, 1988, pp. 147152.

[9] Rainer Lienhart and Axel Wernicke, Localizing andsegmenting text in images and videos, IEEE Trans.Circuits and System for Video Technology, vol. 12, pp.256267, 2002.

[10] Xueming Qian, Guizhong Liu, Huan Wang, and RuiSu, Text detection, localization, and tracking in com-pressed video, Signal Processing: Image Communica-tion, vol. 22, no. 9, pp. 752768, 2007.

393

Documents

APPLICATION OF PHASOR MEASUREMENT UNIT IN SMART GRID