Decision fusion for postal address recognition using belief functions

Expert Systems with Applications 36 (2009) 5643–5653

Contents lists available at ScienceDirect

Expert Systems with Applications

journal homepage: www.elsevier .com/locate /eswa

Decision fusion for postal address recognition using belief functions

David Mercier a,b,*, Geneviève Cron b, Thierry Den�ux a, Marie-Hélène Masson a

a UMR CNRS 6599 Heudiasyc Université de Technologie de Compiègne, BP 20529, F-60205 Compiègne cedex, Franceb SOLYSTIC, 14 Avenue Raspail, F-94257 Gentilly Cedex, France

a r t i c l e i n f o a b s t r a c t

Keywords:Mailing address recognition
Information fusionDempster–Shafer theoryEvidence theoryTransferable belief model
0957-4174/$ - see front matter � 2008 Elsevier Ltd. Adoi:10.1016/j.eswa.2008.06.058

* Corresponding author. Address: UMR CNRS 65Technologie de Compiègne, BP 20529, F-60205 Comp

E-mail address: [email protected]

Combining the outputs from several postal address readers (PARs) is a promising approach for improvingthe performances of mailing address recognition systems. In this paper, this problem is solved using theTransferable Belief Model, an uncertain reasoning framework based on Dempster–Shafer belief functions.Applying this framework to postal address recognition implies defining the frame of discernment (or setof possible answers to the problem under study), converting PAR outputs into belief functions (takinginto account additional information such as confidence scores when available), combining the resultingbelief functions, and making decisions. All these steps are detailed in this paper. Experimental resultsdemonstrate the effectiveness of this approach as compared to simple combination rules.

� 2008 Elsevier Ltd. All rights reserved.

1. Introduction

Postal address recognition is a complex task that involves differ-ent processes including image scanning, address block and writingfields location, character and word recognition, database queryingand association of a group of words with a delivery address. Each ofthese stages requires different algorithms and complex decisionprocedures (Palumbo & Srihari, 1996; Srihari, 2000). Systems incharge of this task are referred to as postal address readers (PARs).

A generic representation of a PAR is shown in Fig. 1. A PAR islinked with a database containing the whole set of postal addressesof the concerned country. It assigns a handwritten or machineprinted mail piece image to a delivery address. Additional piecesof information can also be output by a PAR such as alternative ad-dresses, confidence scores, or other intermediate results.

In the past years, a lot of work has been devoted to improvingthe performances of PARs through refining various subsystemssuch as, e.g., block location (Yu, Jain, & Mohiuddin, 1997) or hand-writing recognition (Lee & Leedham, 2004; Plamondon & Srihari,2000). A different way of improvement, which has comparativelyreceived less attention, consists in combining different PARs.Indeed, a decision based on a great number of varied pieces ofinformation is generally more robust than any decision made indi-vidually from a single piece of information (Bloch, 2003; Kittler,Hatef, Duin, & Matas, 1998; Kuncheva, 2004).

As explained by Xu, Krzyzak, and Suen (1992), the combinationof multiple classifiers includes several problems: selecting the clas-sifiers to combine, choosing an architecture for the combination,

ll rights reserved.

99 Heudiasyc Université deiègne cedex, France.(D. Mercier).

and combining the classifier outputs in order to achieve better per-formances than each classifier individually. In this article, we focuson the problem of combining the outputs supplied by differentPARs, following the fusion scheme illustrated in Fig. 2.

Various approaches to classifier fusion have been proposed inthe literature (see, e.g., Bloch, 2003; Kuncheva, 2004). These meth-ods are based on various frameworks such as probabilistic theory,fuzzy sets, possibility theory, Dempster–Shafer theory, voting, etc.As remarked in Bloch (2002), no general solution exists: the suc-cess of a fusion method strongly depends on the fusion problem it-self, and the way a given formalism is applied.

In the postal domain, the fusion problem is very specific. Ad-dresses for the purpose of physical mail delivery have a hierarchi-cally structure: depending on the particular application, they arecharacterized by a country, state, town, post office, street, streetnumber or post office (PO) box, and secondary number such asapartment number. Moreover, PARs have the possibility to outputa complete or a partial address. For example, a PAR can provide asolution for the town, but not for the street, if this latter has notbeen recognized. Moreover, as previously mentioned, some PARscan also supply additional information such as confidence scores.Until now, approaches to PAR fusion have been very limited. Theyare essentially based on common sense combination rules (De Leo,Vicenzi, & Franzone, 2005; Fisher & Siemens, 2005). For example,the main idea in Fisher and Siemens (2005) is to choose the postaladdress provided by a majority of PARs, if it exists; otherwise, thepostal address output by the most reliable PAR is chosen. Althoughsuch simple rules can be sufficient for some applications, higherrecognition rates can be expected to result from finer modellingof the fusion problem.

In this paper, it is proposed to model this fusion problem usingthe transferable belief model (TBM) (Smets & Kennes, 1994; Smets,

mailto:[email protected]

http://www.sciencedirect.com/science/journal/09574174

http://www.elsevier.com/locate/eswa

Fig. 1. Generic representation of a postal address reader.

Fig. 2. The PAR fusion problem.

5644 D. Mercier et al. / Expert Systems with Applications 36 (2009) 5643–5653

1998), a subjectivist and non-probabilistic interpretation of theDempster–Shafer theory of belief function (Shafer, 1976; Smets,1994). This framework is known to offer great flexibility for themanipulation of partial knowledge and information representedat various granularity levels (see, e.g., Benouhiba & Nigro, 2006;Malpica, Alonso, & Sanz, 2007; Sikder & Gangopadhyay, 2007), thusproviding a range of tools well suited to the application at hand.

This article is organized as follows. The postal problem is firstpresented in Section 2. Background material on belief functions isrecalled in Section 3. A generic model for combining postal ad-dresses is then introduced in Section 4, and experimental resultsare reported in Section 5. Finally, Section 6 concludes the paper.

2. Problem description

2.1. Hierarchical structure of postal addresses

A postal address is a string of characters indicating where mailsent to a person or an organization should be delivered. Each coun-try has structured its addresses in a specific manner. However, allthese arrangements have a hierarchical organization. From top to

Fig. 3. Hierarchy of p

bottom, each field specifies the destination of the mail in a finerway.

In the simple model considered in this paper for illustrative pur-poses, a postal address will be assumed to consist of only twoparts:

� the town part indicating the city where the mail has to bedelivered;

� the distribution part indicating the precise physical locationwithin a particular city. This place can be a street or any coun-try-specific notion like a PO box for example.

Denoting a town by Ti, a street by Sj and a PO box or any coun-try-specific distribution point by Bk, where i, j and k are indices, acomplete address can take any of the two following forms:

� TiSj for street Sj in town Ti;� TiBk for PO box Bk in town Ti.

Complete addresses form the distribution level (bottom level) ofthe hierarchy shown in Fig. 3. When only the town is specified, oneobtains a partial address of the form Ti. Such partial addresses forthe second level of the hierarchy. Finally, a third level can be de-fined as the ‘‘empty” address noted ‘‘_”, which corresponds to thecase where the address is completely unspecified.

The output of a PAR for a given mail piece is usually a completeor partial address at any level in this hierarchy.

2.2. Performance rates and objectives

Using a test set of labelled letters (letters whose actual addressis known), the performances of PARs can be assessed at the townand distribution levels, using the following three performancerates: the correct recognition rate, the error rate, and the rejectionrate.

At the distribution level, an output is correct if and only if thetown and the distribution are correct. At the town level, an outputis correct if and only if the town is correct.

Example 1. When the truth is ‘‘TiSj”:

� at the distribution level, only the output ‘‘TiSj” is considered tobe correct; outputs ‘‘TiSj0” and ‘‘Ti0”, with j0–j and i0–i, are consid-ered as errors, and outputs ‘‘Ti” and ‘‘_” are considered asrejections;

� at the town level, outputs ‘‘TiSj”,‘‘TiSj0”, and ‘‘Ti” are considered tobe correct, while an output ‘‘Ti0” with i0–i is an error, and an out-put ‘‘_” is a rejection.

At each level and for each PAR, the sum of these three perfor-mance rates is equal to one.

ostal addresses.

Fig. 4. The objective of the fusion process: achieving the greatest possible correctrecognition rate while keeping the error rate below that of the best individual PAR.

D. Mercier et al. / Expert Systems with Applications 36 (2009) 5643–5653 5645

As different PARs are available, our main objective is to improvethe postal address recognition process by combining the outputsfrom different PARs. More precisely, one seeks a fusion schemeleading to the greatest possible correct recognition rate, while hav-ing an acceptable error rate at both town and distribution levels. Inthis work, the maximal tolerated error rate is set to the best errorrate among individual PARs, as shown in Fig. 4.

A fusion scheme designed to meet these objectives, based onthe transferable belief model, is presented in this paper.

3. The transferable belief model

The transferable belief model (TBM) is a model of uncertain rea-soning and decision making based on two levels (Dubois, Prade, &Smets, 1996; Smets & Kennes, 1994):

� the credal level, where available pieces of information are repre-sented by belief functions;

� the pignistic or decision level, where belief functions are trans-formed into probability measures, and the expected utility ismaximized.

3.1. Basic concepts

Let X ¼ fx1; . . . ;xKg, called the frame of discernment, be a finiteset comprising all possible answers to a given question Q. The be-liefs held by a rational agent Ag regarding the answer to question Qcan be quantified by a basic belief assignment (bba) mX

Ag, defined as afunction from 2X to ½0;1� verifying:XA # X

mXAgðAÞ ¼ 1:

Fig. 5. A refining q from a frame H

The quantity mXAgðAÞ represents the part of the unit mass allocated

to the hypothesis that the answer to question Q is in the subset Aof X. When there is no ambiguity on the agent or the frame of dis-cernment, the notation mX

Ag will be simplified to mX or m.A subset A of X such that mðAÞ > 0 is called a focal set of m. A bba

m with only one focal set A is called a categorical bba and is denotedmA; we then have mAðAÞ ¼ 1. Total ignorance is represented by thebba mX such that mXðXÞ ¼ 1, called the vacuous bba. A normal bbam satisfies the condition mð;Þ ¼ 0. A bba whose focal sets arenested is said to be consonant.

The belief and plausibility functions associated with a bba m aredefined, respectively, as:

belðAÞ ¼X;–B # A

mðBÞ;

and

plðAÞ ¼X

B\A–;mðBÞ;

for all A # X. Functions m, bel and pl are in one-to-one correspon-dence, and thus constitute different forms of the same information.

The basic operation for combining bbas induced by distinctsources of information is the conjunctive rule of combination(CRC), also referred to as the unnormalized Dempster’s rule of com-bination, defined as

m1 � \m2ðAÞ ¼X

B\C¼A

m1ðBÞm2ðCÞ; 8A # X: ð1Þ

3.2. Refinements and coarsenings

When applying the TBM to a real-world application, the deter-mination of the frame of discernment X, which defines the set ofstates on which beliefs will be expressed, is a crucial step. As no-ticed by Shafer (1976, Chap. 6), the degree of granularity of X is al-ways, to some extent, a matter of convention, as any element of Xrepresenting a given state can always be split into several alterna-tives. Hence, it is fundamental to examine how a belief functiondefined on a frame may be expressed in a finer or, conversely, ina coarser frame. The concepts of refinement and coarsening can bedefined as follows.

Let H and X denote two frames of discernment. A mappingq : 2H ! 2X is called a refining of H (see Fig. 5) if it verifies the fol-lowing properties (Shafer, 1976):

1. The set fqðfhgÞ; h 2 Hg # 2X is a partition of X.2. For all A # H:

qðAÞ ¼[h2A

qðfhgÞ: ð2Þ

H is then called a coarsening of X, and X is called a refinement of H.

to a refinement X (Example 2).


A bba mH on H may be transformed without any loss of infor-mation into a bba mX on a refinement X of H by transferring eachmass mHðAÞ for A # H to B ¼ qðAÞ: mHðAÞ ¼ mXðqðAÞÞ. This opera-tion is called the vacuous extension of mH to X.

Conversely, a bba mX on X may be transformed into a bba mH

on a coarsening H of X by transferring each mass mXðAÞ forA # X to the smallest subset B # H such that A # qðBÞ. Note that,in this case, some information may be lost in the process.

Thanks to these transformations, a piece of information, initiallydefined on a frame X, may be expressed on any frame obtained byrefining and/or coarsening X.

Example 2. Let us consider a bba mH defined on H ¼ fh1; h2; h3g bymHðfh1gÞ ¼ 0:7 and mHðfh2; h3gÞ ¼ 0:3. Let X ¼ fx1;x2; . . . ;x7gbe a refinement of H as shown in Fig. 5. The piece of evidence mH

can be expressed on the finer frame X by mXðfx3;x4;x5gÞ ¼ 0:7and mXðfx1;x2;x6;x7gÞ ¼ 0:3.

Conversely, let us consider a bba mX defined by mXðfx3gÞ ¼ 0:6and mXðfx1;x6gÞ ¼ 0:4. This piece of information can be expressedon the coarser frame H by mHðfh1gÞ ¼ 0:6 and mHðfh2; h3gÞ ¼ 0:4.

Fig. 6. Fusion scheme.

3.3. BBA correction mechanisms

When receiving a piece of information represented by a BBA m,agent Ag may have some doubt regarding the reliability of thesource that has provided this BBA. Such metaknowledge can be ta-ken into account using the discounting operation introduced byShafer (1976, p. 252), and defined by:am ¼ ð1� aÞmþ amX; ð3Þ

where a 2 ½0;1� is a coefficient referred to as the discount rate.A discount rate a equal to 1 means that the source is not reliable

and the piece of information it provides cannot be taken into ac-count, so Ag’s knowledge remains vacuous: mX

Ag ¼ 1m ¼ mX. Onthe contrary, a null discount rate indicates that the source is fullyreliable and the piece of information it provides is entirely ac-cepted: mX

Ag ¼ 0m ¼ m. In practice, an agent rarely knows for surewhether the source is reliable or not, but it has some degree of be-lief in the thee source reliability, equal to 1� a (Smets, 1993).

Using metaknowledge on the reliability of the source, the BBA mcan thus be corrected into am using the discounting operation.However, this discounting operation only allows agent Ag to weak-en a source of information, whereas it could sometimes be usefulto strengthen it, e.g., when this source is too cautious. In Mercier,Den�ux, and Masson (2006), the following mechanism for weak-ening or reinforcing a BBA m was introduced and justified:

mm ¼ m1mX þ m2mþ m3trm; ð4Þ

with mi 2 ½0;1� 8i 2 f1;2;3g and m1 þ m2 þ m3 ¼ 1. The BBA trm corre-sponds to the BBA m totally reinforced, and is defined by:

trmðAÞ ¼mðAÞ

1�mðXÞ if A � X;

0 if A ¼ X;

(ð5Þ

if m is not vacuous, and trm ¼ m otherwise. If m is not vacuous, thetotal reinforcement trm of m consists in transferring uniformly thewhole mass allocated to X to the focal sets of m different from X.As shown in Den�ux and Smets (2006), this process is the dual ofthe normalization process that consists in uniformly redistributingto non-empty subsets the mass initially allocated to the empty set.

Example 3. Let us consider a bba mX defined on X ¼ fx1;x2;x3gby mXðfx1gÞ ¼ 0:4, mXðfx2gÞ ¼ 0:1, and mXðXÞ ¼ 0:5. The beliefsheld by an agent Ag knowing that mX should be totally reinforced,is given by trmXðfx1gÞ ¼ 0:4=ð1� 0:5Þ ¼ 0:8, and trmXðfx2gÞ ¼ 0:1=ð1� 0:8Þ ¼ 0:2.

In the correction mechanism (4), rates m1, m2 and m3 are equal,respectively, to the degree of belief that the source is unreliable(having to be discarded), fully reliable (having to be accepted with-out modifying it), and too cautious (having to be reinforced). Anapplication of this correction mechanism will be presented in Sec-tion 4.2.2.

3.4. Decision making

In Bayesian decision theory, modelling a decision process im-plies defining:

1. A set D of decisions that can be made;2. A set C of considered states of nature;3. A cost function c : D� C! R, such that cðd; cÞ represents the

cost of making decision d 2 D when c 2 C is the true state ofnature.

Rationality principles (DeGroot, 1970; Savage, 1954) then jus-tify the choice of the decision d� 2 D corresponding to the mini-mum expected cost or risk according to some probabilitymeasure PC on C:

d� ¼ arg mindqðdÞ;

with

qðdÞ ¼Xc2C

cðd; cÞPCðfcgÞ; 8d 2 D: ð6Þ

In the TBM, this decision-theoretic framework is accepted (Smets,2002, 2005). The set C, called the betting frame, is often taken equalto X. However, it may be any refinement or coarsening of X, or itmay obtained from X by a succession of refinings and coarsenings.The probability measure PC is obtained by the pignistic transforma-tion (Smets, 2002, 2005), which consists in firstly expressing thepiece of evidence mX, initially defined on the frame of discernment,on the betting frame C, and then computing the pignistic probabil-ity BetPC ¼ PC as:

BetPCðfcgÞ ¼X

fA # C;c2Ag

mCðAÞjAjð1�mCð;ÞÞ ; ð7Þ

where j A j denotes the cardinality of A.

4. Application to postal address recognition

The different stages of the fusion process based on the TBM areillustrated in Fig. 6. First, the frame of discernment X on which


beliefs are expressed has to be chosen (step 1). Outputs oi, whichare partial or complete postal addresses provided by each availablePAR i are then converted into bbas on X (step 2), and combinedinto one bba mX synthesizing the whole available information(step 3). The pignistic transformation is then applied to reach a fi-nal decision (step 4). These four steps are described in more detailbelow.

Fig. 8. Frame of discernment associated with the small database of Example 4.

4.1. Frame of discernment

The question Q of interest is to determine the destination ad-dress indicated on a given mail piece. The set of possible answersto this question should naturally contain all delivery addresses,which are usually stored in a database. However, there are addi-tional possibilities, as the address written on the envelope maycorrespond to no valid delivery address. For instance, a part of anaddress may be non-existent or unreadable, as shown in Fig. 7.Such addresses are said to be invalid. An address in which the towncannot be identified is said to be totally invalid. Addresses depictedin Fig. 7a–c are totally invalid. An address for which the town canbe identified but not the distribution part, as illustrated in Fig. 7d,is said to be partially invalid.

The frame of discernment X we associate to question Q is thuscomposed of all valid addresses contained in the database, the par-tially invalid addresses associated with the towns included in thedatabase, and an additional element representing totally invalidaddresses. Such of frame typically contains several millionselements.

Example 4. (Small database) Let us consider a toy problem inwhich the database contains only two towns T1 and T2. Town T1

contains two streets S1 and S2. Town T2 contains one street S1 (S1 inT1, and S1 in T2 are physically different streets, but they may havethe same name) and two P.O. Boxes B1 and B2.

The frame X is shown in Fig. 8. It is composed of:

Fig. 7. Some examples o

� valid addresses: T1S1, T1S2, T2S1, T2B1, T2B2;� partially invalid addresses associated with towns T1 and T2,

denoted by T1inv and T2inv, respectively;� an additional element, denoted finvg, representing totally inva-

lid addresses.

The subset fT1S1; T1S2; T1invg contains all addresses in town T1

and is denoted T1. Likewise, T2 ¼ fT2S1; T2B1; T2B2; T2invg containsall addresses associated with town T2. Finally, X is equal toT1 [ T2 [ finvg. The corresponding hierarchy is shown in Fig. 9.

Each output of a PAR is thus associated with a unique subset ofX. Outputs corresponding to a complete address are singletons ofX; for example, an address ‘‘TiSj” is associated with the singleton{TiSj}. Outputs corresponding to a partial address are subsets of X:

f invalid addresses.

Fig. 9. Address hierarchy of Example 4.


� an output ‘‘Ti” is associated with the subset Ti of X;� the output ‘‘_” representing total ignorance is associated with X.

4.2. Construction of basic belief assignments

Once the frame of discernment has been defined, the next step isto define a method for converting PAR outputs into belief functions.Here, it is proposed to base this conversion on the observed perfor-mances of the PAR on a learning set of manually labelled mail pieces.More precisely, we will use a confusion matrix counting the occur-rences of different types of errors. This method is in the spirit ofthe work described in Xu et al. (1992) in the context of character rec-ognition. However, the problem considered here is much more com-plex because of the hierarchical structure of the address space, andthe existence of different address categories (streets, P.O. boxes,. . .). The method described here generalizes that introduced by theauthors in Mercier, Cron, Den�ux, and Masson (2005).

4.2.1. Basic approachIn this section, only postal addresses output by PARs are consid-

ered as input to the fusion process. The use of confidence scoreswill be addressed in the Section 4.2.2.

Each PAR provides an output o composed of a complete or par-tial address, i.e., a subset of X. Depending on the true value of theaddress, an output o, if not correct, may be partially correct. For in-stance, in Example 4, an output o ¼ fT1S1g, when the truth isfT1S2g, is not correct. However, the town is correct: fT1S1g andfT1S2g are in the same set T1 of level 2. We then say that the outputo of level 1 is correct at level 2.

Formally, an output o of level p is said to be correct at level q P p,where q is the level of the first element in the hierarchy containing oand the truth. In the worst case, q is equal to P, where P is the num-ber of levels in the hierarchy. An output of level p correct at level p issaid to be correct. Let us introduce the following notations:

� np;qc denotes the number of mail pieces in the learning set classi-

fied in category c of level p that are correct at level q, withq 2 ½p; P�;

Table 1Confusion matrix associated with a PAR on Example 4

Level Cat Output Truth

inv T1S1 T1S2

1 s T1S1 0 95 2T1S2 0 1 88T2S1 0 0 1

pob T2B1 0 0 0T2B2 0 0 0

2 t T1 0 24 22T2 0 0 0

3 X 4 2 3

Notations: cat, category; pob, P.O. Boxes; s, streets; t, town.

� H is the address hierarchy;� u is the function that maps an element of H different from X to

its parent element in H. For instance, in Example 4,uðfT1S1gÞ ¼ T1 and uðT1Þ ¼ X.

The total number of mail pieces classified at level p and in cat-egory c is then equal to np

c ¼PP

q¼pnp;qc .

When a PAR provides an output o of level p and category c, asimple approach could be to define the corresponding bba m asfollows:

mðuq�pðoÞÞ ¼ np;qc

npc; 8q 2 ½p; P�; ð8Þ

where

uq�p ¼ u � . . . � u � u|fflfflfflfflfflfflfflfflfflffl{zfflfflfflfflfflfflfflfflfflffl}q�p times;

and u0 is the identity function. In other words, each parent of o atlevel q 2 ½p; P� receives a mass equal to the number of mail piecesof the same level p and category c as o that were correctly classifiedat level q, divided by the total number of mail pieces classified bythe PAR at level p and category c. We note that the bba m computedusing (8) is consonant.

Although Eq. (8) seems reasonable, it can be refined to allow theallocation of belief masses to invalid addresses, based on past deci-sions made in such cases. An extension of the above scheme thataddresses this issue may be defined as follows.

Let us call each element Tiinv an invalid address of level 1 andinv the invalid address of level 2. Let us denote by np;invðqÞ

c the num-ber of time the truth is an invalid address of level q when the out-put is provided at level p and is of category c. When a PAR providesan output o of level p and category c, a bba m may be defined as:

mðuq�pðoÞÞ ¼ np;qc � np;invðq�1Þ

c

npc

; 8q 2 ½p; P�; ð9Þ

mð½uq�pðoÞ�invÞ ¼ np;invðq�1Þc

npc

; 8q 2 ½p; P�; ð10Þ

where, by convention, np;invð0Þc ¼ 0, Xinv ¼ inv, and ½uq�pðoÞ�inv de-

notes the invalid address associated to uq�pðoÞ.Applying assignment method (9) and (10) allows us to distin-

guish between two different hypotheses in case of rejection: eitherthe true complete address exists but the PAR has not recognized it,or the true address is invalid.

Example 4 (continued). Let us consider a PAR with the confu-sion matrix shown in Table 1. This matrix describes the PAR perfor-mances on a learning set. For instance, this PAR proposed theaddress fT1S1g at level 1 for 95 mail pieces whose actual addresswas fT1S1g, and for 2 mail pieces whose actual address wasfT1S2g. Likewise, it proposed the address T1 at level 2, 24 timesfor mail pieces whose actual address was fT1S1g, and 22 timesfor mail pieces whose actual address was fT1S2g.

T1inv T2S1 T2B1 T2B2 T2inv

0 1 0 0 02 0 0 0 00 98 2 0 00 1 55 1 00 0 0 46 03 1 0 0 01 20 13 16 02 2 0 1 2

Fig. 10. Coefficients m1 and m3 as functions of the score. The three coefficients m1, m2

and m3 are linked by the equation m1 þ m2 þ m3 ¼ 1.


Let us suppose that this PAR provides an output o ¼ fT1S1g for anew mail piece. This output belongs to level 1 and category street,denoted s. The number of correct outputs of level 1 and category sis equal to the sum of the numbers in bold in Table 1:

n1;1s ¼ 95þ 88þ 98 ¼ 281: ð11Þ

Similarly, the number of outputs of level 1 and category s that arecorrect at level 2 (respectively, at level 3) is equal to the sum ofthe numbers in italics (respectively, underlined) in Table 1:

n1;2s ¼ 2þ 1þ 2þ 2 ¼ 7 ð12Þ

n1;3s ¼ 1þ 1 ¼ 2: ð13Þ

Finally, the number of outputs of level 1 and category s which wereinvalid at level 1 is n1;invð1Þ

s ¼ 2. The bba m representing the piece ofinformation provided by the PAR is thus given by:

mðfT1S1gÞ ¼n1;1

s

n1s¼ 95þ 88þ 98

290¼ 0:969;

mðfT1invgÞ ¼ n1;invð1Þs

n1s¼ 2

290¼ 0:007;

mðT1Þ ¼n1;2

s � n1;invð1Þs

n1s

¼ 2þ 1þ 2þ 2� 2290

¼ 0:017;

mðfinvgÞ ¼ n1;invð2Þs

n1s¼ 0

290¼ 0:000;

mðXÞ ¼ n1;3s � n1;invð2Þ

s

n1s

¼ 1þ 1� 0290

¼ 0:007:

Likewise, assume that the same PAR provides an output o ¼ T1,which belongs to level 2 and category town denoted t. We have

n2;2t ¼ 24þ 22þ 3þ 20þ 13þ 16 ¼ 98

n2;3t ¼ 1þ 1 ¼ 2

n2;invð1Þt ¼ 3

n2;invð2Þt ¼ 0:

Hence,

mðT1Þ ¼n2;2

t � n2;invð1Þ

n2t

¼ 98� 3100

¼ 0:95;

mðT1invÞ ¼ n2;invð1Þt

n2t¼ 3

100¼ 0:03;

mðXÞ ¼ n2;3t � n2;invð2Þ

t

n2t

¼ 2100

¼ 0:02;

mðfinvgÞ ¼ n2;invð2Þt

n2t¼ 0

100¼ 0:

Finally, if o ¼ X, we have:

mðfinvgÞ ¼ n3;invð2Þ

n3 ¼ 416¼ 0:25;

mðXÞ ¼ n3;3 � n3;invð2Þ

n3 ¼ 4þ 2þ 3þ 2þ 2þ 0þ 1þ 2� 416

¼ 0:75:

Fig. 11. An extended model using a discounting/reinforcement correction mech-anism based on the scores provided by PAR1 and PAR2.

4.2.2. Using confidence scoresIn some cases, a PAR provides not only a partial or complete ad-

dress, but also a score indicating the degree of confidence in theoutput. Such a score may typically be provided by character recog-nition algorithms. When available, this additional information mayeasily be incorporated in our framework, using the bba correctionmechanisms introduced in Section 3.3.

A simple approach to use confidence scores is to reinforce thepiece of information provided by a PAR when the score is high,and, conversely, to discount it when the score is low. For that

purpose, we may define four thresholds Ti, i ¼ 1;2;3;4 such thatinformation provided by the PAR is:

� totally discounted if the score is lower than T1;� discounted proportionally to the score if the score belongs to½T1; T2�;

� left unchanged if the score belongs to ½T2; T3�;� reinforced proportionally to the score if the score belongs to½T3; T4�;

� totally reinforced if the score is greater than T4.

This may be achieved by defining three coefficient m1, m2 and m3

as functions of the score as shown in Fig. 10, and use them to cor-rect the output bba m using (4), (5) as explained in Section 3.3.

Fig. 11 shows a typical fusion scheme involving three PARs, twoof which provide confidence scores that are used to correct theircorresponding output bbas.

4.3. Combination

Once each PAR output has been converted into a bba, the differ-ent bbas can be combined using the conjunctive rule of combina-tion (1).

Example 5. (Combination) Let us consider three PARs associatedwith outputs o1 ¼ T2, o2 ¼ T1, and o3 ¼ fT2S1g, respectively. Let ussuppose that assignment methods (9) and (10) have led to:

m1ðT2Þ ¼ 0:95; m1ðfT2invgÞ ¼ 0:03; m1ðXÞ ¼ 0:02;m2ðT1Þ ¼ 0:9; m2ðXÞ ¼ 0:1;m3ðfT2S1gÞ ¼ 0:93; m3ðT2Þ ¼ 0:05; m3ðXÞ ¼ 0:02:


Then, m ¼ m1 � \m2 � \m3 is given by:

mðfT2S1gÞ ¼ 0:0902; mðT1Þ ¼ 0:0004;mðfT2invgÞ ¼ 0:0002; mðXÞ ¼ 0:0000;mðT2Þ ¼ 0:0068; mð;Þ ¼ 0:9024:

ð14Þ

The large mass allocated to the empty set reflects the high conflictbetween the outputs from the three PARs.

4.4. Decision making

As explained in Section 3.4, decision making in the TBM re-quires the definition of a set D of decisions, a betting frame C,and a cost function.

In this application, the definition of D is based on classifier out-puts. Recall that each classifier output is an element for the hierar-chy H of addresses, i.e., a subset of X. The set D of decisions issimply defined as the subset of 2X composed of PAR outputs, theirparents in the hierarchy, and X. For instance, in Example 5, thePARs outputs are o1 ¼ T2, o2 ¼ T1, and o3 ¼ fT2S1g. The correspond-ing possible decisions are d1 ¼ X, d2 ¼ T1, d3 ¼ T2, and d4 ¼ fT2S1g.

The other ingredients of the decision model presented in Sec-tion‘ 3.4, namely, the betting frame and the cost function, will bedescribed in the following subsections.

4.4.1. Betting frameThe result of the combination is a bba mX on X. As explained in

Section 3.4, decision making in the TBM is based on the pignistictransformation applied after a betting frame has been defined. Inthis application, X cannot be chosen as the betting frame becauseit is much too big. Additionally, the cardinalities of various subsetsof X are unknown: we do not know exactly how many addressesare contained in a given town, for instance. Computing this kindof information from the database would be too time-consuming.For this reason, it was decided to define the betting frame C as acoarsening (i.e., a partition) of X. In order to avoid loosing informa-tion, C was chosen as the coarsest partition of X such that all non-empty focal sets of the combined bba mX as well as their associatedinvalid addresses can be obtained as elements or unions of ele-ments of C. The formal construction of C from X is detailed in Algo-rithm 1.

Algorithm 1 (Construction of the Betting Frame C.).

Require: mX a bba; H a hierarchy;
FðmÞ all the focal sets of m;
Fig. 12. Betting frame C for the bba m of Example 5.

FðmÞ FðmÞ [ fXg;for each element F of FðmÞdo

add to FðmÞ the invalid address associated with F, if it isnot already contained in FðmÞ;

end forC ;;repeat

F a highest level element of FðmÞ;c F n

SfA 2FðmÞs:t: F

is the smallest element of FðmÞ containing Ag;add c to C;remove F from FðmÞ;

until FðmÞ ¼ ;

Example 5 (continued). The non-empty focal sets of m given by(14) are fT2S1g, T1, fT2invg, X and T2. The invalid addresses associ-ated to T1 and X are T1inv and inv, respectively. The correspondingbetting frame C is represented in Fig. 12. It is composed of sevenelements:

c1 ¼ X n ðT2 [ T1 [ finvgÞ; c2 ¼ finvg;c3 ¼ T1 n fT1invg; c4 ¼ T2 n ðfT2S1g [ fT2invgÞ;c5 ¼ fT1invg; c6 ¼ fT2S1g;c7 ¼ fT2invg:

ð15Þ

It is easy to see that the focal sets of m may be recovered from theci: we have

fT2S1g ¼ c6; T1 ¼ c3 [ c5; fT2invg ¼ c7;

T2 ¼ c4 [ c6 [ c7; X ¼[7i¼1

ci:

As explained in Section 3.2, the bba m on X given by Eq. (14) can beexpressed on the coarsening C as:

mCðfc6gÞ ¼ 0:0902; mCðfc3; c5gÞ ¼ 0:0004;mCðfc7gÞ ¼ 0:0002; mCðCÞ ¼ 0:0000;mCðfc4; c6; c7gÞ ¼ 0:0068; mCð;Þ ¼ 0:9024:

ð16Þ

The pignistic probability computed from mC using Eq. (7) is:

BetPðfc1gÞ ¼ 0:000; BetPðfc2gÞ ¼ 0:000BetPðfc3gÞ ¼ 0:002; BetPðfc4gÞ ¼ 0:023BetPðfc5gÞ ¼ 0:002; BetPðfc6gÞ ¼ 0:948BetPðfc7gÞ ¼ 0:025:

4.4.2. Cost functionAs explained in Section 3.4, the final decision d is the one that

minimizes the risk defined as the expected cost with respect tothe pignistic probability distribution. To compute this risk, we haveto define a cost function c : D� C! R.

Let x0 denote the true address, c0 the element of C containingx0, and d0 the smallest element of D containing c0. The costs aredefined as follows. If d ¼ d0, then the decision is correct and thecost is equal to zero. If d–d0, four situations are considered:

Table 2Costs of making a decision di when the truth is an element cj of the betting frame inExample 5

Decisions Truth

c1 c2 c3 c4 c5 c6 c7

d1 ¼ X 0 0 CtownR Ctown

R CtownR Ctown

R CtownR

d2 ¼ T1 CtownE Ctown

E 0 CtownE 0 Ctown

E CtownE

d3 ¼ T2 CtownE Ctown

E CtownE 0 Ctown

E CdistR 0

d4 ¼ fT2S1g CtownE Ctown

E CtownE Cdist

E CtownE 0 Cdist

E


1. if d ¼ X (the mail piece was rejected), then cðd; cÞ ¼ CtownR ;

parameter CtownR is called the rejection cost at town level;

2. if d ¼ Ti for some town Ti (only the town but no street or PO boxwas given) and d0 6 # d (the true address is not in town Ti), thencðd; cÞ ¼ Ctown

E ; parameter CtownE is called the error cost at town

level;3. if d ¼ Ti for some town Ti and d0 � d (the true address is in town

Ti), then cðd; cÞ ¼ CdistR ; parameter Cdist

R is called the rejection costat distribution level;

4. if d ¼ fxg for some x 2 X, x–x0, then cðd; cÞ ¼ CdistE ; parame-

ter CdistE is called the error cost at distribution level;

Note that cases 2 and 4 correspond to errors (on the town or onthe address within a town), whereas cases 1 and 3 correspond tosituations where the decision is too imprecise, but not erroneous.Table 2 shows the costs associated with the decisions and bettingframe elements of Example 5.

The cost function is thus defined using four parameters CtownR ,

CtownE , Cdist

R and CdistE . It is natural to assume the following ordering

relation between these parameters:

0 6 CdistR 6 Ctown

R 6 CdistE 6 Ctown

E : ð17Þ

Fig. 13. Hierarchy of addresse

Fig. 14. Confidence scores and addresses provided by PAR1 regarding the images of the leis associated with an address whose town is correct.

For instance, an error in the distribution part of the address isless prejudicial than an error in the town part. Likewise, it is pref-erable to make a rejection than an error. Let us note, however, thatthis order may depend on postal agencies. For example, the erroron the distribution when the town is correct may incur a smallercost than the work implied by outright rejection. The rejection costat the town level may be thus higher than the error cost at the dis-tribution level.

Example 5 (concluded). With I ¼ f1; . . . ;7g, the expected risksof each decision are the following:

qbetðd1Þ ¼Xc2C

cðd1; cÞBetPðcÞ ¼X

i2Inf1;2gCtown

R BetPðfcigÞ

CtownR ;

qbetðd2Þ ¼Xc2C

cðd2; cÞBetPðcÞ ¼X

i2Inf3;5gCtown

E BetPðfcigÞ

0:996CtownE ;

qbetðd3Þ ¼X

i2f1;2;3;5gCtown

E BetPðfcigÞ þ CdistR BetPðfc6gÞ

0:004CtownE þ 0:948Cdist

R ;

qbetðd4Þ ¼X

i2f1;2;3;5gCtown

E BetPðfcigÞ þX

i2f4;7gCdist

E BetPðfcigÞ

0:004CtownE þ 0:048Cdist

E :

s in the real application.

arning set. A black dot corresponds to an address whose town is incorrect. A grey dot


Then, according to the cost vector c ¼ ðCdistR ;Ctown

R ;CdistE ;Ctown

E Þ,decision d1, d3 or d4 can be made. As Ctown

E P CdistR (17), qbetðd2ÞP

qbetðd3Þ;8c.For instance, if c ¼ ð1;2;3;4Þ (respectively, ð1;2;20;40Þ and

ð1;2;100;300Þ), the final decision is d4 ¼ fT2S1g (respectively,d3 ¼ T2 and d1 ¼ X). Ideally, the cost vector, reflecting real financialcosts, should be provided by postal agencies. Unfortunately, suchinput is rarely available. However, as explained in Section 2.2,the goal of the postal fusion is to achieve the highest possible cor-rect recognition rate associated with a satisfactory error rate. Thecosts can thus be determined using a set of labelled letters in orderto obtain acceptable performances.

Fig. 15. Performances of individual PARs and variou

Fig. 16. Performances of individual PARs and various fus

5. Experimental results

The fusion scheme described in this paper was applied to thecombination of three PARs denoted PAR1, PAR2 and PAR3. A setof 56,000 mixed handwritten and machine printed mail pieceswas divided into a learning set and a test set in equal proportions.The confusion matrix of each PAR was computed using the learningset, which was also used to tune the costs so as to achieve accept-able performances on the learning set. For different cost settings,error and correct recognition rates were then computed usingthe test set. Note that, in this real application concerning France,the address hierarchy H is more complex than the simple one

s fusion schemes on the test set at town level.

ion schemes on the test set at the distribution level.


considered above to explain the principles of the approach. Thisreal hierarchy is depicted in Fig. 13.

In addition to proposed addresses, PARs 1 and 2 also provideconfidence scores in the case of handwritten mail, as consideredin Section 4.2.2. Fig. 14, which displays the confidence scores pro-vided by PAR 1 for all handwritten mail pieces in the database,shows that errors at the town level tend to be associated with low-er scores, which confirms that the score carries useful information.Fig. 14 also shows the definition of the four thresholds Ti used forthe correction mechanism detailed in Section 4.2.2.

Figs. 15 and 16 show the performances of each of the individualPARs as well as various fusions schemes at the town and distribu-tion levels. To preserve confidentiality on the performance levelachieved, the origins of these two plots are not disclosed. Recogni-tion rates, represented on the x-axis, are expressed relatively to areference recognition rate, denoted R at the distribution level,and R0 at the town level. Error rates, represented on the y-axis,are expressed relatively to a reference error rate, denoted E at bothdistribution and town levels. Rates R and R0 are, respectively, great-er than 50% and 80%, while E is smaller than 0.1%.

In addition to the three individual PARs, the performances oftwo reference fusion rules, noted Maj and CPAR1, are plotted in Figs.15 and 16 for comparison. The former correspond to a simplemajority voting scheme, whereas the latter is a variant given moreimportance to PAR1 in the decision. The special role of PAR1 in thiscombination rule was motivated by its good overall performancesat the distribution and town levels.

The four combined PARs C1–C4 were obtained using differentcost settings, without using confidence scores. At the distributionlevel (Fig. 16), each selected combination point has an acceptableerror rate (i.e., an error rate lower than the minimum over individ-ual PARs). At the town level (Fig. 15), however, only combinationpoints C1 and C2 remain under the maximal tolerated error rate.Overall, the cost setting corresponding to point C2 meets therequirements mentioned in Section 2: it provides a combinationscheme yielding more correct recognitions and fewer errors thaneach of the individual PARs, both at the town and distributionlevels.

Finally, combined PARs with different cost settings and correc-tions based on confidence scores are also represented as crosses inFigs. 15 and 16. We can see that the use of scores further improvesthe performances of the combination. This fusion scheme allows usto obtain a combination point noted Cþ, which is associated withan acceptable error rate and a higher correct recognition rate thanprevious combination point C2, both at the town and distributionlevels. This confirms the interest of using confidence scores inthe fusion process.

6. Conclusion

A fusion scheme for postal address recognition based on thetransferable belief model has been presented. One of the key as-pects of this scheme concerns the conversion of PAR outputs intobelief functions, based on confusion matrices and confidencescores when available. Belief functions associated to the differentPARS are then combined using Dempster’s rule, and a final decisionis made by minimizing the expected cost relative to a pignisticprobability distribution defined on a suitable betting frame. Thismethod has been shown experimentally to meet the requirementsset by postal agencies.

In this work, PARs have been considered as black boxes, whichallows great flexibility and modularity in the maintenance of thefusion system. For instance, PARs can be added or removed withoutmodification of the whole system: only new confusion matrices

having to be estimated. The fusion module is built on top of exist-ing PARs and constitutes an independent component. Furtherimprovements might be gained by using additional informationprovided by specific recognition algorithms inside each PAR, atthe expense of some loss of generality. Situations where PARs pro-vide more complex outputs such as ordered lists of addresses couldalso be studied. This is left for future research.

References

Benouhiba, T., & Nigro, J.-M. (2006). An evidential cooperative multi-agent system.Expert Systems with Applications, 30(2), 255–264.

Bloch, I. (2002). Fusion of information under imprecision and uncertainty,numerical methods, and image information fusion. In A. K. Hyder et al. (Eds.),Multisensor data fusion (pp. 267–294). Kluwer.

Bloch, I. (2003). Fusion d’informations en traitement du signal et des images. Paris,France: Hermès.

De Leo, G., Vicenzi,M., & Franzone, C. ELSAG SPA. European patent: Mail recognitionmethod. Number: EP 1 594 077 A2, November 2005.

DeGroot, M. H. (1970). Optimal statistical decisions. New York: McGraw-Hill.Den�ux, T., & Smets, Ph. (2006). Classification using belief functions: The

relationship between the case-based and model-based approaches. IEEETransactions on Systems, Man and Cybernetics B, 36(6), 1395–1406.

Dubois, D., Prade, H., & Smets, Ph. (1996). Representing partial ignorance. IEEETransactions on Systems, Man and Cybernetics, 26(3), 361–377.

Fisher, M., & Siemens, A. G. International patent: System and method for smartpolling. Number: WO 2005/050545 A1, June 2005.

Kittler, J., Hatef, M., Duin, R. P. W., & Matas, J. (1998). On combining classifiers. IEEETransactions on Pattern Analysis and Machine Intelligence, 20(3), 226–239.

Kuncheva, L. I. (2004). Combining pattern classifiers: Methods and algorithms. Wiley-Interscience.

Lee, C. K., & Leedham, C. G. (2004). A new hybrid approach to handwritten addressverification. International Journal of Computer Vision, 57(2), 107–120.

Malpica, J. A., Alonso, M. C., & Sanz, M. A. (2007). Dempster–Shafer theory ingeographic information systems: A survey. Expert Systems with Applications,32(1), 47–55.

Mercier, D., Cron, G., Den�ux, T., & Masson, M. (2005). Fusion of multi-level decisionsystems using the transferable belief model. In: Proceedings of the 8thinternational conference on information fusion, FUSION’2005, Philadelphia,USA. Paper C8-2, July 25–29.

Mercier, D., Den�ux, T., & Masson, M. (2006). General correction mechanisms forweakening or reinforcing belief functions. In: Proceedings of the 9thinternational conference on information fusion, FUSION’2006, Florence, Italy.Paper 146, July 10–14.

Palumbo, P. W., & Srihari, S. N. (1996). Postal address reading in real time.International Journal of Imaging Systems and Technology, 7(4), 370–378.

Plamondon, R., & Srihari, S. N. (2000). On-line and off-line handwriting recognition:A comprehensive survey. IEEE Transactions on Pattern Analysis and MachineIntelligence, 22(1), 63–84.

Savage, L. J. (1954). The foundations of statistics. New York: Wiley.Shafer, G. (1976). A mathematical theory of evidence. Princeton, NJ: Princeton

University Press.Sikder, I. U., & Gangopadhyay, A. (2007). Managing uncertainty in location services

using rough set and evidence theory. Expert Systems with Applications, 32(2),386–396.

Smets, Ph., & Kennes, R. (1994). The transferable belief model. Artificial Intelligence,66, 191–243.

Smets, Ph. (1993). Belief functions: The disjunctive rule of combination and thegeneralized bayesian theorem. International Journal of Approximate Reasoning, 9,1–35.

Smets, Ph. (1994). What is Dempster–Shafer’s model? In R. R. Yager, J. Kacprzyk, &M. Fedrizzi (Eds.), Advances in the Dempster–Shafer theory of evidence (pp. 5–34).New York: Wiley.

Smets, Ph. (1998). The transferable belief model for quantified belief representation.In D. M. Gabbay & Ph. Smets (Eds.). Handbook of defeasible reasoning anduncertainty management systems (Vol. 1, pp. 267–301). Dordrecht, TheNetherlands: Kluwer Academic Publishers.

Smets, Ph. (2002). Decision making in a context where uncertainty is representedby belief functions. In R. P. Srivastava & T. J. Mock (Eds.), Belief functions inbusiness decisions (pp. 17–61). Heidelberg: Physica-Verlag.

Smets, Ph. (2005). Decision making in the TBM: The necessity of the pignistictransformation. International Journal of Approximate Reasoning, 38(2), 133–147.

Srihari, S. N. (2000). Handwritten address interpretation: A task of many patternrecognition problems. International Journal of Pattern Recognition and ArtificialIntelligence, 14(5), 663–674.

Xu, L., Krzyzak, A., & Suen, C. Y. (1992). Methods of combining multiple classifiersand their applications to handwriting recognition. IEEE Transactions on Systems,Man and Cybernetics, 22(3), 418–435.

Yu, B., Jain, A. K., & Mohiuddin, M. (1997). Address block location on complex mailpieces. In: Proceedings of the fourth international conference on documentanalysis and recognition (pp. 897–901).

Documents

Decision fusion for postal address recognition using belief functions