15
Signature Recognition and Language Identifcation for Document Retrieval MADDALI MERCY VINUTHA AND K. ASHOK BABU Sri Indu College of Engineering & Technology, Affiliated to JNTU, Hyderabad, Ibrahimpatnam E-mail: [email protected] Abstract: Advancements in the digital technologies leading the budding non-human interfaces at same time demanding for secure functionalities. One of the requirements rose in dealing with document and its authentication. Digital advancements have high potentials in documents authentication without any indecisive by using image processing techniques. In this paper, we focus on two fundamental problems in signature-based document image retrieval. First, we propose a novel multi-scale approach to jointly detecting and segmenting signatures from document images. Than treat the problem of signature retrieval in the unconstrained setting of translation, scale, and rotation invariant non-rigid shape matching. And identifying the language of which the retrieved document contained. At the same time increasingly automated including the task of presenting a text in any language as automatically translated text in any other language. For multilingual type of documents, it is very essential to identify the text language portion of the document, before the analysis of the contents could be made. Index Terms: Document image analysis and retrieval, signature detection and segmentation, signature matching, deformable shape, Language Identification, Horizontal Lines, Vertical Lines, Feature Extraction 1. INTRODUCTION As unique and evidentiary entities in a wide range of business and forensic applications, signatures provide an important form of indexing that enables effective exploration of large heterogeneous document image collections. Given an abundance of documents, searching for a specific signature is a highly effective way of retrieving documents authorized or authored by an individual [1]. In this context, handwriting recognition is suboptimal because of its prohibitively low recognition rate and the fact that the character sequence of a signature is often unrelated to the signature itself. More importantly, as a number of studies [2], [3] demonstrated, signatures are highly stylistic in nature and are best described by their graphic style. Searching for relevant documents from large complex document image repositories is a central problem in document image analysis and retrieval. One I J C S S E I T, Vol. 5, No. 1, June 2012, pp. 13-27

Signature Recognition and Language Identifcation …serialsjournals.com/serialjournalmanager/pdf/1344921061.pdfSignature Recognition and Language Identifcation for Document Retrieval

Embed Size (px)

Citation preview

Page 1: Signature Recognition and Language Identifcation …serialsjournals.com/serialjournalmanager/pdf/1344921061.pdfSignature Recognition and Language Identifcation for Document Retrieval

Signature Recognition and Language Identifcation forDocument Retrieval

MADDALI MERCY VINUTHA AND K. ASHOK BABU

Sri Indu College of Engineering & Technology, Affiliated to JNTU, Hyderabad, IbrahimpatnamE-mail: [email protected]

Abstract: Advancements in the digital technologies leading the budding non-humaninterfaces at same time demanding for secure functionalities. One of the requirements rosein dealing with document and its authentication. Digital advancements have high potentialsin documents authentication without any indecisive by using image processing techniques.In this paper, we focus on two fundamental problems in signature-based document imageretrieval. First, we propose a novel multi-scale approach to jointly detecting and segmentingsignatures from document images. Than treat the problem of signature retrieval in theunconstrained setting of translation, scale, and rotation invariant non-rigid shape matching.And identifying the language of which the retrieved document contained. At the same timeincreasingly automated including the task of presenting a text in any language asautomatically translated text in any other language. For multilingual type of documents, itis very essential to identify the text language portion of the document, before the analysis ofthe contents could be made.Index Terms: Document image analysis and retrieval, signature detection and segmentation,signature matching, deformable shape, Language Identification, Horizontal Lines, VerticalLines, Feature Extraction

1. INTRODUCTIONAs unique and evidentiary entities in a wide range of business and forensicapplications, signatures provide an important form of indexing that enables effectiveexploration of large heterogeneous document image collections. Given anabundance of documents, searching for a specific signature is a highly effectiveway of retrieving documents authorized or authored by an individual [1]. In thiscontext, handwriting recognition is suboptimal because of its prohibitively lowrecognition rate and the fact that the character sequence of a signature is oftenunrelated to the signature itself. More importantly, as a number of studies [2], [3]demonstrated, signatures are highly stylistic in nature and are best described bytheir graphic style.

Searching for relevant documents from large complex document imagerepositories is a central problem in document image analysis and retrieval. One

I J C S S E I T, Vol. 5, No. 1, June 2012, pp. 13-27

Page 2: Signature Recognition and Language Identifcation …serialsjournals.com/serialjournalmanager/pdf/1344921061.pdfSignature Recognition and Language Identifcation for Document Retrieval

14 Maddali Mercy Vinutha and K. Ashok Babu

approach is to recognize text in the image using an optical character recognition(OCR) system and apply text indexing and query. This solution is primarily restrictedto machine printed text content because the state-of-the-art handwriting recognitionis error prone and limited to applications with a small vocabulary, such as postaladdress recognition and bank check reading [4]. In broader, unconstrained domains,including searching of historic manuscripts [5] and the processing of languages wherecharacter recognition is difficult [6], image retrieval has demonstrated much betterresults. Detecting, segmenting, and matching deformable objects such as signaturesare important and challenging problems in computer vision. In the following sections,briefed on detection, segmentation, and matching in the context of signature-baseddocument image retrieval and present an overview of our approach.

Identification of the language in a document image is of primary importancefor selection of a specific OCR system processing multi lingual documents [7].Language identification may seem to be an elementary and simple issue for humansin the real world, but it is difficult for a machine, primarily because different scripts(a script could be a common medium for different languages) are made up ofdifferent shaped patterns to produce different character sets [8]. OCR is of specialsignificance for a multi-lingual country like India, where the text portion of thedocument usually contains information in more than one language.

In an automated environment such document processing systems relying onOCR would clearly need human intervention to select the appropriate OCR package,which is certainly inefficient, undesirable and impractical [8]. A pre-OCR languageidentification system would enable the correct OCR system to be selected in orderto achieve the best character interpretation of the document [9]. This area has notbeen very widely researched to date, despite its growing importance to thedocument image processing community and the progression towards the “paperlessoffice” [9]. Keeping this drawback in mind, in this paper an attempt has been madeto solve a more foundation problem of language identification of a text from amultilingual document, before its contents are automatically read.

Language identification is one of the vision application problems. Generallyhuman system identifies the language in a document using some visiblecharacteristic features such as texture, horizontal lines, vertical lines, which arevisually perceivable and appeal to visual sensation. This human visual perceptioncapability has been the motivator for the development of the proposed system.With this context, in this paper, an attempt has been made to simulate the humanvisual system, to identify the type of the language based on visual clues, withoutreading the contents of the document.

2. PREVIOUS WORKFrom the literature survey, it has been revealed that some amount of work hasbeen carried out in Signature recognition and script/language identification. A

Page 3: Signature Recognition and Language Identifcation …serialsjournals.com/serialjournalmanager/pdf/1344921061.pdfSignature Recognition and Language Identifcation for Document Retrieval

Signature Recognition and Language Identification for Document Retrieval 15

wide range of methods for signature verification has been reported. Differentapproaches can be categorized based on the model used for verification. Most ofthe approaches do a fair amount of preprocessing before extracting features fromthe signature.

Detecting salient structures from images is an important task in manysegmentation and recognition problems. The saliency function defined in [10]monotonically increases with the length and monotonically decreases with the totalsquared curvature of the evaluated curve. To reduce the exponential search space,the saliency network assumed that an optimal solution has a recurrent structure,which they called extensibility, so that searching by dynamic programming in theexponential space of possible curves takes polynomial time. Rigid shape matchinghas been approached in a number of ways with intent to obtain a discriminativeglobal shape description. Matching for nonrigid shapes needs to consider unknowntransformations that are both linear (e.g., translation, rotation, scaling, and shear)and nonlinear. One comprehensive framework for shape matching in this generalsetting is to iteratively estimate the correspondence and the transformation. Theiterative closest point (ICP) algorithm introduced by Besl and McKay [11] and itsextensions [12], [13], [14] provide a simple heuristic approach. Srihari et al. [15]developed a signature matching and retrieval approach by computing correlationof gradient, structural, and concavity features extracted from fixed-size imagepatches. It achieved 76.3 per cent precision using a collection of 447 manuallycropped signature images from the Tobacco-800 database [16], [17]. The approachis not translation, scale, or rotation invariant.

Peake and Tan [9] have proposed a method for automatic script and languageidentification from document images using multiple channel (Gabour) filters andgray level co-occurrence matrices for seven languages: Chinese, English, Greek,Korean, Malayalam, Persian and Russian. Pal and Choudhuri [18] have proposedan automatic technique of separating the text lines from 12 Indian scripts (English,Devanagari, Bangla, Gujarati, Kannada, Kashmiri, Malayalam, Oriya, Punjabi,Tamil, Telugu and Urdu) using ten triplets formed by grouping English andDevanagari with any one of the other scripts. Basavaraj Patil and Subbareddy [19]have developed a character script class identification system for machine printedbilingual documents in English and Kannada scripts using probabilistic neuralnetwork. The detailed investigations were presented [20] related to the study ofthe applicability of horizontal and vertical projections and segmentation methodsto identify the language of a document considering specifically the three languagesKannada, Hindi and English. The system identifies the three languages in fourstages: in the first stage Hindi is identified, in the second stage Kannada is identified,in the third stage English is identified and in the fourth and the last stage, languagesother than Kannada, Hindi and English are grouped into fourth class categoryOTHERS without identifying the type of that language as our main aim is to focusonly on Kannada, Hindi and English languages.

Page 4: Signature Recognition and Language Identifcation …serialsjournals.com/serialjournalmanager/pdf/1344921061.pdfSignature Recognition and Language Identifcation for Document Retrieval

16 Maddali Mercy Vinutha and K. Ashok Babu

3. PROPOSED METHODS AND PEROCESS

(A) Signature Detection and SegmentationFor object detection, using saliency measures that fit high-level knowledge of theobject gives globally more consistent results than jointly optimizing a fixed set oflow-level vision constraints. Once the salient structures are identified, the problemof grouping becomes simpler based on constraints such as proximity and goodcontinuation. Second, we can effectively formulate structural saliency across imagescales, as opposed to single-scale approaches such as the saliency network.Multiscale detection is important for nonrigid objects like signatures, whosecontours can severely be broken due to the poor ink condition and imagedegradations. Finally, multiscale saliency computation generates detectionhypotheses at the natural scale, where grouping among a set of connectedcomponents becomes structurally obvious. These provide a unified framework forobject detection and segmentation that produces meaningful representation forobject recognition and retrieval.

Signatures exhibit large variations compared to other forms of handwritingbecause they functionally serve as highly stylish entities for representing individuals[2]. This greatly undercuts the possibility of posing signature detection orrecognition as a straightforward classification problem. As shown in Fig. 1, localfeatures, including size, aspect ratio, and spatial density, are not discriminative

Figure 1: Nonsignature Objects (blue dots) and Signatures (red pluses) are Nonseparable byLocal Features, Including Size, Aspect Ratio, and Spatial Density, as a Significant

Portion of them Overlaps in the Feature Space

Page 5: Signature Recognition and Language Identifcation …serialsjournals.com/serialjournalmanager/pdf/1344921061.pdfSignature Recognition and Language Identifcation for Document Retrieval

Signature Recognition and Language Identification for Document Retrieval 17

enough to separate signatures (red pluses) from nonsignature objects (blue dots),on a groundtruthed collection of 1,290 real-world documents. In this sectiondescribed a structural saliency approach to signature detection by searching overa range of scales S = s1, σ1, σn. We select the initial scale σ1 based on the resolutionof the input image. We define the multiscale structural saliency for a curve Γ as

1

( ) max ( ( ), ),|i i iS

f σ σσ ∈Φ Γ = Φ Γ σ (1)

where f : IR2 → IR+ is a function that normalizes the saliency over its scale andiσΓ is

the corresponding curve computed at the scale σ1. Using multiple scales for detectionrelaxes the requirement that the curve Γ be well connected at a specific scale.

This effectively computes a coarse representation of the original image in whichsmall gaps in the curve are bridged by smoothing followed by resampling (see Fig.3). It form connected components on the edge image at the scale σ1, and computethe saliency of each component using the measure presented in following sections,which characterizes its dynamic curvature. We define the saliency of a connectedcomponent

1

kσΓ as the sum of saliency values computed from all pairs of edges on

it. Third, we identify the most salient curves and use a grouping strategy based onproximity and curvilinear constraints to obtain the rest of the signature parts withintheir neighborhood. Our joint detection and segmentation approach considersidentifying the most cursive structure and grouping it with neighboring elementsin two steps. By locating the most salient signature component, we effectively focusour attention on its neighborhood. Subsequently, a complete signature is segmentedout from background by grouping salient neighboring structures.

(B) Measure of Saliency for SignaturesIn this section, we consider the problem of recognizing the structural saliency of a2D offline signature segment using a signature production model. As shown inFig. 2, among the infinite number of geometric curves that pass two given endpoints E1 and E2 on a signature, very few are realistic. This is because the wrist is

Figure 2: Among the Large Number of Geometric Curves Passing through the Two EndPoints E1 and E2 on a Signature, a Few are Realistic (solid curves).

An Example of Unrealistic Curve is shown in the Dotted Line

Page 6: Signature Recognition and Language Identifcation …serialsjournals.com/serialjournalmanager/pdf/1344921061.pdfSignature Recognition and Language Identifcation for Document Retrieval

18 Maddali Mercy Vinutha and K. Ashok Babu

highly constrained in the degrees of freedom when producing a signature.Furthermore, a signature segment rarely fits locally to a high-order polynomial, asshown by the dotted curve in Fig. 2.

We propose a signature production model that incorporates two degrees offreedom in the Cartesian coordinates (21). We assume that the pen moves in acycloidal fashion with reference to a sequence of shifting baselines when signing asignature. The local baseline changes as the wrist adjusts its position with respectto the document. Within a short curve segment, we assume that the baseline remainsunchanged. In addition, the locus of the pen maintains a proportional distancefrom the local center point (focus) to the local baseline (directrix), which imposes anadditional constraint that limits the group of second-order curves to ellipses. Asignature fragment can thus be equivalently viewed as concatenations of smallelliptic segments. In fact, when the signature baseline is aligned to the x axis, ageneral form of velocities in the x and y directions is

| sin( ) ,sin( ),

x x x

y y y

v a t cv b t

= ω + φ += ω + φ (2)

where a and b are horizontal and vertical velocity amplitudes, ωx and ωy are thehorizontal and vertical frequencies, φx and φy are the horizontal and vertical phases,t is the time variable, and constant c is the average speed of horizontal movement.Without loss of generality, consider the case of a = b, ωx = ωy and φx – φy = 2. We canshow that, for different values of a and c, the resulting curves are curtate cycloid,cycloid, and prolate cycloid, as shown in Fig. 4, respectively.

Because the second order curves, hyperbola degenerates into the two intersectedstraight lines t1(x, y) and t2(x, y). We prove in the Appendix that λ0 is given by (21).

1 2 1 1 2 1 2 1 2 2 1 20

1 2 2 1 1 2 2 1

4[ ( ) ( )] [ ( | ) ( )]( ) ( )

p x x q y y p x x q y yp q p q p q p q− + − − + −

λ = ×− − (3)

Equation (9) provides a theoretical second-order upper bound of the dynamiccurvature, given the parameter set (x1, y1), (x2; y2), (p1; q1), (p2; q2) that fits the signatureproduction model.

(a) (b) (c)

Figure 3: The Prominence of a Signature is Perceived Across Scales, whereas Background TextMerges into Blobs at Coarser Scales. (a) Detected Signature Region without Segmentation.

(b) and (c) are detected and Segmented Signatures by our Approach at the Two Scales

Page 7: Signature Recognition and Language Identifcation …serialsjournals.com/serialjournalmanager/pdf/1344921061.pdfSignature Recognition and Language Identifcation for Document Retrieval

Signature Recognition and Language Identification for Document Retrieval 19

The saliency of a curve1

kσΓ at scale σ1 is defined as the sum of saliency values

computed from all pairs of points on it and is written as

,

( ) | ( , )i i i

ki j i

ki j

E E

E Eσ

σ σ σ∈Γ

Φ Γ = Λ∑ (4)

It intuitively measures the likelihood of elliptic segment fitting given a set of2D points and the computation does not require knowing the temporal order amongpoints. The analysis so far applies only to the continuous case. To account for thediscretization effect, we impose two additional conditions. First, the absolute valuesof the two functions on the left-hand side of (3a and 3b) must be strictly large thanε. Second, the denominator term in (22) must be strictly large than ε. In ourexperiments, we use ε = 0.1. For robustness, we weight the saliency contributionby the gradient magnitude of the weaker edge. Separating saliency detection fromgrouping significantly reduces the level of complexity. Let the total number ofedge points be N and the average length of connected components be Lc. Saliencycomputation for each component in (23) requires O (Lc

2) time, on average. Therefore,the overall computation is of order (N/Lc) x Lc

2 =N Lc. Since Lc is effectively upperbounded by prior estimate of signature dimensions and the range of searched scalesn is limited, they can be considered as constants. The complexity in saliencycomputation is linear in N. Gaussian smoothing and connected component analysisboth require O(N) time. The total complexity in the signature detection algorithmis therefore O(N).

(C) Measures of Shape DissimilaritySeveral measures of shape dissimilarity have demonstrated success in objectrecognition and retrieval. One is the thin-plate spline bending energy Dbe andanother is the shape context distance Dsc. As a conventional tool for interpolatingcoordinate mappings from IR2 to IR2 based on point constraints, the thin-platespline (TPS) is commonly used as a generic representation of nonrigidtransformation [24]. The TPS interpolant f (x, y) minimizes the bending energy

Figure 4: (a) Curtate Cycloid, (b) Cycloid, and (c) Prolate Cycloid Curves Generatedwhen the Speeds of Writing in the Horizontal Baseline and the Vertical

Direction both Vary Sinusoidally

Page 8: Signature Recognition and Language Identifcation …serialsjournals.com/serialjournalmanager/pdf/1344921061.pdfSignature Recognition and Language Identifcation for Document Retrieval

20 Maddali Mercy Vinutha and K. Ashok Babu

2

2 222 2 2

2 22IR

f f f dxdyx x y y

∂ ∂ ∂ + + ∂ ∂ ∂ ∂ ∫ ∫ (5)

over the class of functions that satisfy the given point constraints. Equation (25)imposes smoothness constraints to discourage nonrigidities that are too arbitrary.The shape context distance Dsc between a template shape T composed of m pointsand a deformed shape D of n points is defined in [26] as

1 1( , ) arg min ( ( ), ) arg min ( ( ), ),sc d tt dD C T t d C T t d

m n∈ ∈∈ ∈

= +∑ ∑

(6)

where T(.) denotes the estimated TPS transformation and C (.,.) is the cost functionfor assigning correspondence between any two points. We have taken (21) theamount of anisotropic scaling between two shapes by estimating the ratio of thetwo scaling factors Sx and Sy in the x and y-directions, respectively. A TPStransformation can be decomposed into a linear part corresponding to a globalaffine alignment, together with the superposition of independent, affine-freedeformations (or principal warps) of progressively smaller scales [24]. We ignorethe nonaffine terms in the TPS interpolant when estimating Sx and Sy. The 2D affinetransformation is represented as a 2x 2 linear transformation matrix A and a 2x1translation vector T

,u x

A Tv y

= +

(7)

where we can compute Sx and Sy by singular value decomposition on matrix A.And taken Das from (XX)

max( , )log

min( , )x y

asx y

S SD

S S= (8)

We use the two most commonly cited measures, average precision and R-precision, to evaluate the performance of each ranked retrieval. After matching,we compute the overall shape distance for retrieval as the weighted sum ofindividual distances given by all the measures shape context distance, TPS bendingenergy, anisotropic scaling, and registration residual errors for optimisation.

(D) Language Visual Feature Discriminations and Zonalization Text LineIn this section we described the some discriminating features in the characters offew Indian languages namely Kannada, Hindi and English text words and themodels proposed for identifying the three languages- Kannada, Hindi and English.Extraction of some visual discriminating features of Kannada, Hindi and English

Page 9: Signature Recognition and Language Identifcation …serialsjournals.com/serialjournalmanager/pdf/1344921061.pdfSignature Recognition and Language Identifcation for Document Retrieval

Signature Recognition and Language Identification for Document Retrieval 21

Text Words are as an integral part of any recognition system. The aim of featureextraction is to describe the pattern by means of minimum number of features orattributes that are effective in discriminating pattern classes [27]. The algorithms(28) presented in this paper are inspired by a simple observation that every script/language defines a finite set of text patterns, each having a distinct visual appearance[29]. The character shape descriptors take into account any feature that appears tobe distinct for the language [29] and hence every language could be identifiedbased on its visual discriminating features. Presence and absence of the fourdiscriminating features of Kannada, Hindi and English text words are given inTable 1. Some typical Hindi, Kannada words are given below in fig. 5.

Table 1Presence and Absence of Discriminating Features of Kannada, Hindi and English Text Words.

(Yes Means Presence and No Means Absence of that Feature. F1: Horizontal lines; F2:Vertical lines; F3: Variable sized blocks; F4: Blocks with more than One Component)

Discriminating Features F1 F2 F3 F4

Text wordsKannada Yes No Yes YesHindi Yes Yes Yes NoEnglish No Yes No No

Figure 5: Typical Hindi, Kannada words

Pal and Choudhuri [30] have proposed that text lines of some Indian languagesmight be partitioned into three zones. which is useful in this method forfeature extraction. A sample text line in English, Hindi and Kannadalanguages, partitioned/zonalized into three zones is shown in Figure-6 is adoptedfrom (28)

Page 10: Signature Recognition and Language Identifcation …serialsjournals.com/serialjournalmanager/pdf/1344921061.pdfSignature Recognition and Language Identifcation for Document Retrieval

22 Maddali Mercy Vinutha and K. Ashok Babu

This model proposed (28) is developed based on the four visual features suchas (i) horizontal lines (ii) vertical lines (iii) variable sized blocks and (iv) the numberof components present in each block. Two assumptions are made in our proposedmodel: 1. Input document is a machine printed document with standard font forKannada, Hindi and English text lines. 2. Every text line must have at least fourwords and each text line may have different font sizes but the font and font sizewithin a text line is same.

In the present method, the percentage of the presence of the four features foreach text of the three languages Kannada, Hindi and English, are practicallycomputed using sufficient data set. Based on the studies, a supportive knowledgebase is constructed to store the percentage of the presence of the four visual features.The technique of obtaining adopted for the four visual features from the inputimage through experimentation is explained in (28). The percentage of the spatialoccurrence of all the four visual features for each of the three languages arepractically computed through extensive experimentation and stored in theknowledge base as given in Table-2 for later use during decision-making.

Table 2Percentage of the Presence of Discriminating Features of Kannada, Hindi and

English Text Words. (F1: Horizontal lines; F2: Vertical lines; F3: Variable Sized Blocks;F4: Blocks with more than One Component)

Discriminating Features F1 F2 F3 F4

Text wordsKannada 65% 3% 60% 30%Hindi 90% 80% 40% 5%English 2% 80% 5% 5%

Figure 6: Partitioned Text Lines of English, Hindi and Kannada

Page 11: Signature Recognition and Language Identifcation …serialsjournals.com/serialjournalmanager/pdf/1344921061.pdfSignature Recognition and Language Identifcation for Document Retrieval

Signature Recognition and Language Identification for Document Retrieval 23

4. RESULTS AND DISCUSSIONTo quantitatively evaluate detection and matching in signature-based documentimage retrieval, we used two large collections of real-world documents. Data baseis prepared different signatures and different languages. Defined and preparednumber of documents is generated by different combinations signatures andlanguages. We have developed the code for the proposed method of signaturerecognition and language identification. The sample documents are tested bycropping image of the signature and the text of the document and verified theoutput after running the program. The program successful in recognition of thesignature defined and the language of which the document contained. some of theresults are populated in below.

(a) (b)

(d)(c)

Page 12: Signature Recognition and Language Identifcation …serialsjournals.com/serialjournalmanager/pdf/1344921061.pdfSignature Recognition and Language Identifcation for Document Retrieval

24 Maddali Mercy Vinutha and K. Ashok Babu

(e) (f)

(g)

(h)

Page 13: Signature Recognition and Language Identifcation …serialsjournals.com/serialjournalmanager/pdf/1344921061.pdfSignature Recognition and Language Identifcation for Document Retrieval

Signature Recognition and Language Identification for Document Retrieval 25

5. CONCLUSIONIn this paper, we proposed a novel approach for signature recognition and languageidentification technique for document image retrieval process. For this we usedsignature recognition and segmentation approach based on the view that objectdetection can be a process that aims to capture the characteristic structural saliencyof the object across image scales and line-wise detection models developed basedon the four visual discriminating features, which serve as useful visual clues forlanguage identification. The methods help to accurately identify and separatedifferent language portions of Kannada, for language identification purpose. Wequantitatively evaluated these techniques in challenging retrieval tests, eachcomposed of a different signature and different languages instances. The combineeffort of these methods can leads to authentication and identification applicationsof online and offline documents are simple, non human interface and unfailingresults.

References[1] G. Zhu, Y. Zheng, D. Doermann, and S. Jaeger, “Multi-Scale Structural Saliency for Signature

Detection,” Proc. IEEE Conf. Computer Vision and Pattern Recognition, pp. 1-8, 2007.[2] K. Guo, D. Doermann, and A. Rosenfeld, “Forgery Detection by Local Correspondence,” Int’l J.

Pattern Recognition and Artificial Intelligence, vol. 15, no. 4, pp. 579-641, 2001.[3] B. Fang, C. Leung, Y. Tang, K. Tse, P. Kwokd, and Y. Wonge, “Off- Line Signature Verification by

the Tracking of Feature and Stroke Positions,” Pattern Recognition, vol. 36, no. 1, pp. 91-101, 2003.[4] R. Plamondon and S. Srihari, “On-Line and Off-Line Handwriting Recognition: A

Comprehensive Survey,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 22, no. 1, pp.63-84, Jan. 2000.

[5] T. Rath and R. Manmatha, “Word Image Matching Using Dynamic Time Warping,” Proc. IEEEConf. Computer Vision and Pattern Recognition, vol. 2, pp. 521-527, 2003.

(i)

Figure 10: a to f Sequential Step after Running Matlab Code and g to i Output Results

Page 14: Signature Recognition and Language Identifcation …serialsjournals.com/serialjournalmanager/pdf/1344921061.pdfSignature Recognition and Language Identifcation for Document Retrieval

26 Maddali Mercy Vinutha and K. Ashok Babu

[6] J. Chan, C. Ziftci, and D. Forsyth, “Searching Off-Line Arabic Documents,” Proc. IEEE Conf.Computer Vision and Pattern Recognition, pp. 1455-1462, 2006.

[7] Santanu Choudhury, Gaurav Harit, Shekar Madnani, R. B. Shet, “Identification of Scripts ofIndian Languages by Combining Trainable Classifiers”, ICVGIP 2000, Dec., 20-22, Bangalore,India.

[8] M. C. Padma, P. Nagabhushan, “Horizontal and Vertical Linear Edge Features as Useful Cluesin the Discrimination of Multiligual (Kannada, Hindi and English) Machine Printed Documents”,Proc. National Workshop on Computer Vision, Graphics and Image Processing (WVGIP), Madhurai,204-209, (2002).

[9] G. S. Peake, T. N. Tan, “Script and Language Identification from Document Images”, Proc.Eighth British Mach. Vision Conference., 2, 230-233, (1997).

[10] A. Sha’ashua and S. Ullman, “Structural Saliency: The Detection of Globally Salient StructuresUsing a Locally Connected Network,” Proc. Int’l Conf. Computer Vision, pp. 321-327, 1988.

[11] P. Besl and N. McKay, “A Method for Registration of 3D Shapes,” IEEE Trans. Pattern Analysisand Machine Intelligence, Vol. 14, No. 2, pp. 239-256, Feb. 1992.

[12] Z. Zhang, “Iterative Point Matching for Registration of Free-Form Curves and Surfaces,” Int’l J.Computer Vision, Vol. 13, No. 2, pp. 119-152, 1994.

[13] J. Feldmar and N. Anyche, “Rigid, Affine and Locally Affine Registration of Free-Form Surfaces,”Int’l J. Computer Vision, Vol. 18, No. 2, pp. 99-119, 1996.

[14] T. Wakahara and K. Odaka, “Adaptive Normalization of Handwritten Characters Using Global/Local Affine Transform,” IEEE Trans. Pattern Analysis and Machine Intelligence, Vol. 20, No. 12,pp. 1332-1341, Dec. 1998.

[15] S. Srihari, S. Shetty, S. Chen, H. Srinivasan, C. Huang, G. Agam, and O. Frieder, “DocumentImage Retrieval Using Signatures as Queries,” Proc. Int’l Conf. Document Image Analysis forLibraries, pp. 198-203, 2006.

[16] G. Agam, S. Argamon, O. Frieder, D. Grossman, and D. Lewis, The Complex Document ImageProcessing (CDIP) Test Collection, Illinois Inst. of Technology, http://ir.iit.edu/projects/CDIP.html,2006.

[17] D. Lewis, G. Agam, S. Argamon, O. Frieder, D. Grossman, and J. Heard, “Building a TestCollection for Complex Document Information Processing,” Proc. ACM SIGIR Conf., pp. 665-666, 2006.

[18] U.Pal, B.B.Choudhuri, “Script Line Separation From Indian Multi-Script Documents”, Proc. 5thInternational Conference on Document Analysis and Recognition(IEEE Comput. Soc. Press),406-409, (1999).

[19] S. Basvaraj Patil, N. V. Subba Reddy, “Character Script Class Identification System usingProbabilistic Neural Network for Multi-script Multi Lingual Document Processing”, Proc.National Conference on Document Analysis and Recognition, Mandya, Karnataka, 1-8, (2001).

[20] M. C. Padma, P. Nagabhushan, “Study of the Applicability of Horizontal and Vertical Projectionsand Segmentation in Language Identification of Kannada, Hindi and English Documents”,Proc. National Conference NCCIT, Kilakarai, Tamilnadu, 93-102, (2001).

[21] Guangyu Zhu, Yefeng Zheng, David Doermann, and Stefan Jaeger “Signature Detection andMatching for Document Image Retrieval” IEEE Transactions on Pattern Analysis and MachineIntelligence, Vol. 31, No. 11, November 2009.

[22] R. Sabourin, G. Genest, and F. Preteux, “Off-Line Signature Verification by Local GranulometricSize Distributions,” IEEE Trans. Pattern Analysis and Machine Intelligence, Vol. 19, No. 9, pp. 976-988, Sept. 1997.

Page 15: Signature Recognition and Language Identifcation …serialsjournals.com/serialjournalmanager/pdf/1344921061.pdfSignature Recognition and Language Identifcation for Document Retrieval

Signature Recognition and Language Identification for Document Retrieval 27

[23] M. Munich and P. Perona, “Continuous Dynamic Time Warping for Translation Invariant CurveAlignment with Applications to Signature Verification,” Proc. Int’l Conf. Computer Vision, pp.108- 115, 1999.

[24] F. Bookstein, “Principle Warps: Thin-Plate Splines and the Decomposition of Deformations,”IEEE Trans. Pattern Analysis and Machine Intelligence, Vol. 11, No. 6, pp. 567-585, June 1989.

[25] M. Kalera, S. Srihari, and A. Xu, “Off-Line Signature Verification and Identification UsingDistance Statistics,” Int’l J. Pattern Recognition and Artificial Intelligence, Vol. 18, No. 7, pp. 1339-1360, 2004.

[26] S. Belongie, J. Malik, and J. Puzicha, “Shape Matching and Object Recognition Using ShapeContexts,” IEEE Trans. Pattern Analysis and Machine Intelligence, Vol. 24, No. 4, pp. 509-522,2002.

[27] P.Nagabhushan, S.A.Angadi, B.S.Anami, “A Fuzzy Statistical Approachto Kannada VowelRecognition based on Invariant Moments”, proc. 2nd National Conference, NCDAR, Mandya, 275-285, (2003).

[28] M. C. Padma and DR. P. A. Vijaya, “Language Identification of Kannada, Hindi and EnglishText Words Through Visual Discriminating Features”, International Journal of ComputationalIntelligence Systems, Vol. 1, No. 2, 2008, 116–126.

[29] P. Naghabhushan, Radhika M Pai, “Modified Region Decomposition Method and Optimal DepthDecision Tree in the Recognition of non-uniform sized characters – An Experimentation withKannada Characters”, Journal of Pattern Recognition Letters, 20, 1467-1475, (1999).

[30] U. Pal, B. B. Choudhuri, “Script Line Separation From Indian Multi-Script Documents”, Proc.5th International Conference on Document Analysis and Recognition (IEEE Comput. Soc. Press),406-409, (1999).