Pattern Recognition Review

  • View

  • Download

Embed Size (px)

Text of Pattern Recognition Review

  • 8/6/2019 Pattern Recognition Review



    Statis tical Pattern R ecognition: A R eviewAnil K. Jain, Fellow, IEEE, Robert P.W. Duin, and Jianchang Mao, Senior Member, fEEE

    Abstract-The primary goal of pattern recognition is supervised or unsupervised classification. Among the various frameworks in

    which pattern recognition has been traditionally formulated. the statistical approach has been most intensively studied and used inpractice. More recently. neural network techniques and methods imported from statistical learning theory have been receivingincreasing attention. The design of a recognition system requires careful attention to the following issues: definition of pattern classes,sensing environment, pattern representation. feature extraction and selection. cluster analysis, classifier design and learning, selectionof training and test samples, and performance evaluation. In spite of almost 50 years of research and development in this f ie ld , thegeneral problem of recognizing complex patterns with arbitrary orientation, location, and scale remains unsolved. New and emergingapplications, such as data mining, web searching, retrieval of multimedia data, face recognition, and cursive handwriting recognition,require robust an d efficient pattern recognition techniques. The objective of this review paper is to summarize and compare some ofthe well-known methods used in various stages of a pattern recognition system and identify research topics and applications which areat the forefront of this exciting and challenging field,

    Index Terms~Statistical pattern recognition, classification. clustering, feature extraction, feature selection, error estirnanon. classifiercombination, neural networks.


    B y the time they are five years old, most children canrecognize digits and letters. Small characters, largecharacters, handwritten, machine printed, or rotated-allare easily recognized by the young. The characters may bewritten on a cluttered background, on crumpled paper ormay even be partially occluded. We take this ability forgmnted until we face the task of teaching a machine how todo the same. Pattern recognition is the study of howmachines can observe the environment, learn to distinguishpatterns of interest from their background, and make soundand reasonable decisions about the categories of thepatterns. In spite of almost 50 years of research, design of

    a general purpose machine pattern recognizer remains anelusive goal.

    The best pattern rccognizors in most instances arehumans, yet we do not understand how humans recognizepatterns. Ross [140] emphasizes the work of Nobel Laureatel Ierhert Simon whose central finding was that patternrecognition is critical in most human decision making tasks:"The more relevant patterns at your disposal, the betteryour decisions will be. This is hopeful news to proponentsof artificial intelligence, since computers can surely betaught to recognize patterns. Indeed, successful computerprograms that help banks score credit applicants, helpdoctors diagnose disease and help pilots land airplanes

    AK. Jni~1is w itll th e Vep nrtm clit o f C om p~ lter S cie nce lin d E ng in eerin g,Michigl1tl Stnle University, Ens / L ~n sillg , M I 4 88 2 4.E-tlinil:jilil1@c~L'.m~II,L'dll,

    RP. W, Duin is with the Department ofApplied Pl ly~ics ,Delft Ullivfl"snyof Teciuwlogy, 2600 G ADelft. tile Netllcl'imlrls.L-lIIl1i/.'

    J .M no is w ith th e IBM A lm aden R esenrdl C fllter, 65 U [ Inrry I(olld, S anJOS(, CA 9 51 20 .F.-m~il: nwo@lIimarien.i/,

    Mml ll sc ri (l t r e cc iocd23 Jilly 1999; accepted 12 O ct. 1 99 9.R ec omme nd ed fa r a cc ep ta nc eby K. Bow!Jer.Fo r i nf orma ti on on O / lt ~i ll l' lI g n~pr i! lt 5 o f t hi s a rt ic le ,please send c-rlll1il to ;tl'lImi@(om, an d rcfrrcncc IFFEeS Lng Numb ff ll0 29 6.


    depend in some way on pattern recognition ... We need topay much more explicit attention to teaching patternrecognition." Our goal here is to introduce pattern recogni-tion as the best possible way of utilizing available sensors,processors, and domain knowledge to make decisionsautomatically.

    1.1 What is Pattern Recognit ion?Automatic (machine) recognition, description, classifica-tion, and grouping of patterns are important problems in avariety of engineering and scientific disciplines such asbiology, psychology, medicine, marketing, computer vision,

    artificial intelligence, and remote sensing. But what is apattern? Watanabe [163] defines a pattern "as opposite of achaos: it is an entity, vaguely defined, that could be given aname." For example, a pattern could be a fingerprint image,a handwritten cursive word, a human face, or a speechsignal. Given a pattern, its recognition/classification mayconsist of one of the following two tasks [163]:1)supervisedclassification (e.g., discriminant analysis) in which the inputpattern is identified as a member of a predefined class,2) unsupervised classification (o.g., clustering) in which thepattern is assigned to a hitherto unknown class. Note thatthe recognition problem here is being posed as a classifica-tion or categorization task, where the classes are eitherdefined by the system designer (in supervised classifica-

    tion) or are learned based on the similarity of patterns (inunsupervised classification).

    Interest in the area of pattern recognition has beenrenewed recently due to emerging applications which arenot only challenging but also computationally moredemanding (see Table I). These applications include datamining (identifying a "pattern," e.g., correlation, or anoutlier in millions of multidimensional patterns), documentclassification (efficiently searching text documents), finan-cial forecasting, organization and retrieval of multimediadatabases, and biometrics (personal identification based on

    OJ 6 2 SB 2B IOO I $j 0 ,0 0 { ~ 2 00 0IEEE,mailto:nwo@lIimarien.i/,
  • 8/6/2019 Pattern Recognition Review



    TABLE 1Examples of Pattern Recognition Applications


    .----------------,--------- -------,------------------------------------~

    I ProblemDoma i n Pattem ClaSff'Snput Pattern-Sequenceanalysis

    S e arc hin g fo r l ' o in t . s lO" ;~ l ld t i - C om pact and w ell-rncauingfulpatterns dim ensional space separatedc lusters

    -t--------;-;--:-----'--'------,------'----,------t-------;;;;--------;------'------;-----,,;-_:_----,-,-- -----Intnrnct search Text document ,Semanticcawgorios

    ( e.g. busines s ,sports,iide,)

    Do e um on t i ll la g c - - + - "" 'R : - r .- fl c -- ;C ] j :- n - !l ;-m - a - c - - ;- h - :- II - 1 c - f;C - o - r-+- - - -c ;D" ' "o-c-u-m -c-n- , - - t--;-h-Il-ag- 'o- '---'1--- A f llh allu lr icr ic-- - -analysis _ lhe blind ' characters, words

    I-=-II-[(- : -h- ls- t l - ' - i -a-7

    1u tomat ion Printed circui t board lntonsityo r l'ilIig;-II,--lli'foctivC/llOll-ddcdivcinspection imago n atu re o f p ro (jn ct

    11 ' 1111i ln ( !d G I(lat-a- ' ---b-f1~-r-+------ ;r ' - --I l- :-I ,c- ,rn-- ' --c- ,t-s-ca-[-'ch;----+---------;-;-vi(le0e liji -- - . Vi d e ogolll'l'S (o .g . ,retrieval action,dialogue, etc.)

    I li ome tr ie r ec ogni ti on Per sona l i dontH i ca ti oi ~ -- - .FrtC(),iris. Author.sed users for


    f---Da ta mininR


    DNA/Prot,elnHCq lW I I C r :Knownty p es o f gm w f,/patterns

    Remot e s en si ng \ Iu l tl sp ec tr al image

    ---- ,-_

    access con trol

    Speech waveform

    Land u se c ate go ri es ,g rowth p atte rn o f r ;r op s

    -- S]lOI0~l'ordfielephone direc toryenquiry without

    ope ra to r a s si s tance---~----------~-------------

    various physical attr ibutes such as taco and fingerprints).Picard [125] has identified a novel application of patternrecognition, called affective computing which will give acomp-uter the ability to recognize and express emotions, torespond intelligently to human emotion, and to employmechanisms of emotion that contribute to rational decisionmaking. A common characteristic of a number of theseapplications is that the available features (typica lly. in thethousands) arc not usually suggested by domain experts,but must be extracted and optimized by data-drivenprocedures.

    The rapidly growing and available computing power,while enabling faster processing of huge data sets, has alsofacilitated the use of elaborate and diverse methods for datilanalysis and classification. At the same time, demands onautomatic pattern recognition systems are rising enor-mously due to the availability of large databases andstringent performance requirements (speed, accuracy, andcost). In many of the emerging applications, it is clear that

    no single approach for classif ication is "optimal" and thatmultiple methods and approaches have to be used.Consequently, combining several sensing modalities andclassifiers is now a commonly used practice in patternrecognition.

    The design of a pattern recognition system essentiallyinvolves the following three aspects: 1)data acquisition andpreprocessing, 2) data representation, and 3) decisionmaking. The problem domain dictates the choice ofsensorts), preprocessing technique, representation scheme,and the decision making model. T t is generally agreed that a

    well-defined and sufficiently constrained recognition pro-blem (small intraclass variations and largo interclassvariations) will lead to a compact pattern representationand a simple decision making strategy. Learning from a setof examples (training s