DOJProposal7.doc

Adding human expertise to the quantitative analysis of fingerprints Busey and Chen

PROGRAM NARRATIVE

A. Research Question

Machine learning algorithms take a number of approaches to the quantitative analysis of

fingerprints. These include identifying and matching minutiae (refs), matching patterns of local

orientation based on dynamic masks (refs), and neural network approaches that attempt to learn

the structure of fingerprints (refs). While these techniques provide good results in biometric

applications and serve a screening role in forensic cases, they are less useful when applied to

severely degraded fingerprints, which must be matched by human experts. Indeed, statistical

approaches and human experts have different strengths. Despite the enormous computational

power available today for use by computer analysis systems, the human visual system remains

unequaled in its flexibility and pattern recognition abilities. Three possible reasons for this

success come from the experts knowledge of where the most important regions are located on a

particular set of prints, the ability to tune their visual systems to specific features, and the

integration of information across different features. In the present project, we propose to

integrate the knowledge of experts into the quantitative analysis of fingerprints to a degree not

achieved by other approaches. There is much that fingerprint examiners can add to machine

learning algorithms and, as we describe below, many ways in which statistical learning

algorithms can assist human experts. Thus the central research question of this proposal is: How

can the integration of information derived from experts improve the quantitative analysis of

fingerprints?

B. Research goals and objectives

The goal of the present proposal is to integrate data from human experts with statistical

learning algorithms to improve the quantitative analysis of inked and latent prints. We introduce

a novel procedure developed by one investigator (Tom Busey) and use it to guide the input to

statistical learning algorithms developed and extended by our other investigator (Chen Yu). The

fundamental idea behind our approach is that the quantitative evaluation of the information

Page 1


contained in latent and inked prints can be vastly improved by using elements of human

expertise to assist the statistical modeling, as well as to introduce a new dimension of time that is

not contained in the static latent print analysis. The main benefit, as we discuss in sections C.x.x,

is that the format of the data extracted from experts allows the application of novel quantitative

models that are adapted from related areas. To apply this knowledge derived from experts, we

will use our backgrounds in vision, perception, machine learning and behavioral testing to design

experiments that extract relevant information from experts and use this to improve the

quantitative analysis techniques applied to fingerprints by integrating the two sources of

information.

Our research interests differ somewhat from the existing approaches and reflects the

adaptations that are necessary to incorporate human expert knowledge. Existing statistical

algorithms developed to match fingerprints rely on several different classes of algorithms, Some

extract minutiae and other robust sources of information such as the number of ridges between

minutiae (refs). Others rely on the computation of local curvature of the ridges, and then partition

these into different classes (MASK refs). Virtually all approaches make reasoned and reasonable

guesses as to what the important sources of information might be, such as minutiae, local ridge

orientation or local ridge width (dgs paper). The present approach takes a more agnostic

approach to what might be the important sources of information in fingerprints, and we will

develop statistical models that take advantage of the data derived from experts. However, a

major goal of the grant is to demonstrate how expert knowledge can be applied to any extant

model, and to suggest how this might be accomplished. Thus we will spend substantial time

documenting our application of expert knowledge for our statistical models. In addition, we will

make all of our expert data available for other researchers and practitioners. It is likely that the

data will have implications for training, although this is not the focus of the present proposal.

C. Research design and methods

At the heart of our approach is idea that human expertise, properly represented, can improve

the quantitative analyses of fingerprints. In a later section we describe how we apply human

Page 2


expert knowledge to various statistical analyses, but first we need to answer the question of

whether human experts can add something to the quantitative analyses of prints.

The answer to this question can be broken down into two parts. First, do human visual

systems in general possesses attributes not captured by current statistical approaches, and second,

do human experts have additional capacities not shared by novices, capacities that could further

inform statistical approaches. Below we briefly summarize what the visual science literature tells

us about how humans recognize patterns, and then describe our own work that has addressed the

differences between experts and novices. As we will show, human experts have much to add to

quantitative approaches.

We should stress that while we will gather data from human experts to improve our

quantitative analyses of fingerprints, the goal of this grant is not to study human experts in order

to determine whether or how they differ from novices, nor are we interested in questions about

the reliability or accuracy of human experts. Instead, we will generalize our previous results that

demonstrate strong differences in the visual processing of fingerprints in experts, and apply this

expertise to our own statistical analyses. As a result, we will only gather data from human

experts (latent print examiners with at least 5 years of post-apprentice work in the field) under

the assumption that this will provide maximum improvement to our statistical methods. We can

demonstrate the effectiveness of this knowledge by simply re-running the statistical analyses

without the benefit of knowledge from experts. There are various metrics attached to each

analysis technique that demonstrate the superiority of expert-enhanced analyeses, such as the

correct recognition/false recognoition tradeoff graphs, or the dimensionality

reduction/reconstruction successes of data reduction techniques.

We will also apply novel approaches adapted from the related domain of language analyses.

It might seem odd to apply techniques developed for linguistic analyses to a visual domain such

as pattern recognition, but the principles that underlie both domains are very similar. Both

involve large numbers of features that have complex statistical relations. In the case of language,

the features are often words, phonemes or other acoustical signals. Fingerprints are defined by a

Page 3


complex but very regular dictionary of features that also share a complex and meaningful

correlational structure. One of us (Chen) is a highly-published expert in the field machine

learning algorithms as applied to multimodal data, and several papers inlcuded as appendicies

detail this expertisze. His work on multimodal applications between visual and auditory domains

make him well-suited to address the relation between human data and machnie leanring

algorythms. Both linguistic and visual informaiton contain highly-structured data that consist of

regularities that are extracted by perceivers, and this is not unlike the temporal sequence that

experts go through when they perform a latent print examination, as we describe in a later

section. First, however, we address how we might document the principles of human expertise.

Can we use elements of the human visual system to improve our statistical analyses?

The answer to this question is straightforward, in part because of the overwhelming evidence

that human-based recognition systems contain processes that are not captured by current

statistical approaches. One of us (Busey) has published many articles addressing different

aspects of human sensation, perception and cognition, and thus is well-suited to manage the

acquisition and application of human expertise to statistical approaches. Below we briefly

summarize the properties of the human visual system and in a later section we describe how we

plan to extract fundamental principles from this design in order to improve our statistical

analyses of fingerprints.

An analyses of the human visual system by vision scientists demonstrates that the recognition

process proceeds via an hierarchical series of stages, each with important non-linearities (nature

ref), that produce areas that respond to objects of greater and greater complexity. This process

also provides increasing spatial independence, allowing brain areas to integrate over larger and

larger regions. This will become important for holistic or configural processing, as discussed in a

later section. (also talk about feature-based attention)

A second benefit of this hierarchical approach is that objects achieve limited scale and

contrast invariance. Statistical approaches often deal with this through local contrast or

Page 4


brightness normalization, but this is a separate process. Scale invariance is often achieved by

explicitly measuring the width of ridges (grayscale ref), again a separate process.

A third strength of the human visual system is that it appears to have the ability to form new

feature templates through an analyses of the statistical information contained in the fingerprints.

This process, called unitization, will tend to improve feature detection in noisy environments as

is often found with latent prints.

Do forensic scientists have visual capabilities not shared by novices?

The prior summary of the elements of the human visual system suggests that current

statistical approaches can be improved by adapting some of the principles underlying the human

visual system. There are, however, other processes that are specifically developed by latent print

examiners that may also be profitably applied to statistical models. Below we summarize the

results of two empirical studies that have recently been published in the highly respected journal

Vision Research (Busey & Vanderkolk, 2005). The results demonstrate not only that experts are

better than novices, but suggest the nature of the processes that produce this superior

performance.

Visual expertise takes many forms. It could be different for different parts of the

identification process, and may not even be verbalizable by the expert since many elements of

perceptual expertise remain cognitively impenetrable (refs). A major focus of our research is to

capture elements of this expertise and use this as a training signal for our statistical learning

algorithms. What is novel to our approach is our ability to capture the expertise at a very deep

and rich level. In the next section we describe our prior work documenting the nature of the

processes that enable experts to perform at levels much superior to novices, and then in Section

C.2 we describe how we capture this expertise in a way that we can use it to improve our

statistical learning algorithms.

Page 5


C.1. Documenting expertise in human latent print examiners

Initially, experts tend to focus on the entire print, which leads to benefits that we have

previously identified as configural processing (Busey & Vanderkolk, 2005). Configural

processing takes several forms, but the basic idea behind this process is that instead of focusing

on individual features or minutiae, the observer instead integrates information over a large

region, to identify important relations such as relative locations of features or curvature of ridge

flow. Fingerprint examiners often talk about 'viewing the image in its totality', which is different

language for the same process.

While configural processing reveals the overall structure of an image and selects important

regions for further inspection, the real work comes in comparing small regions in one print to

regions in the other. These regions may be selected on the basis of minutiae identified in the

print, or high-quality Level 3 detail. We know from related work on perceptual learning in the

visual system that one of the processes by which expertise develops is through the development

of new feature detectors. Experts spend a great deal of time viewing prints, and this has the

potential to result in profound changes in how their visual systems process fingerprints. (config

processing refs)

One process by which experts could improve how they extract latent print information from

noisy prints is termed unitization, in which novel feature detector are created through experience

(unitization refs). Fingerprints contain remarkable regularities and the human visual system

C.1.a. Do experts have information valuable to training networks or documenting the

quantitative nature of fingerprints?

Fingerprint examiners have received almost no attention in the perceptual learning or

expertise literatures, and thus the PI began a series of studies in consultation with John

Vanderkolk, of the Indiana State Police Forensic Sciences Laboratory in Fort Wayne, Indiana.

Our first study addressed the nature of the expertise effects in a behavioral experiment, and then

we followed up evidence for configural processing with an electrophysiological study. The

Page 6


discussion below describes the

experiments in some detail, in part

because extensions of this work are

proposed in Section D, and a complete

description here illustrates the technical

rigor and converging methods of our

approach.

C.1.b. Behavioral evidence for

configural processing

In our first experiment, we abstracted

what we felt were the essential elements

of the fingerprint examination process

into an X-AB task that could be

accomplished in relatively short order.

This work is described in Busey and

Vanderkolk (2005), but we briefly describe the methods here since they illustrate how our

approach seeks to find a paradigms that is less time-consuming than fully realistic forensic

examinations (which can take hours to days to complete) yet still maintains enough ecological

validity to tap the expertise of the examiners. Figure 1 shows the stimuli used in the experiment

as well as a timeline of one trial. We cropped out fingerprint fragments from inked prints,

grouped them into pairs, and briefly presented one of the two for 1 second. This was followed by

a mask for either 200 or 5200 ms, and then the expert or novice subject made a forced-choice

response indicating which of the two test prints they believe was shown at study. We introduced

orientation and brightness jitter at study, and the construction of the pairs was done to reduce the

reliance on idiosyncratic features such as lint or blotches.

At test, we introduced two manipulations that we thought captured aspects of latent prints, as

shown in Figure 2. First, latent prints are often embedded in visual noise from the texture of the

Page 7

Study Image1 SecondMask 200 or 5200MillisecondsTest ImagesUntil Response

Figure 1. Sequence of events in a behavioral experiment with fingerprint experts and novices. Note that the study image has a different orientation and is slightly brighter to reduce reliance on low-level cues.


surface, dust, and other sources. One

expert, in describing how he approached

latent prints, stated that his job was to

'see through the noise.' To simulate at

least elements of this noise, we

embedded half of our test prints in white

visual noise. While this may have a

spatial distribution that differs from the

noise typically encountered by experts,

we hoped that it would tap whatever facilities experts may have developed to deal with noise.

The second manipulation was motivated by the observation that latent prints are rarely

complete copies of their inked counterparts. They often appear patchy if made on an irregular

surface, and sections may be partially masked out. To simulate this, we created partially-masked

fingerprint fragments as shown in the upper-right panel of Figure 2. Note that the partially-

masked print and its complement each contain exactly half of the information of the full print

and the full print can be recovered by

summing the two partial prints pixel-by-

pixel. We use this property to test for

configural effects as described in a later

section.

All three manipulations (delay

between study and test, added noise and

partial masking) were fully crossed to

create 8 conditions. The data is shown in

Figure 3, which show main effects for all

three factors for novices. Somewhat

surprising is the finding that while

Page 8

Clear FragmentsPartially-Masked Fragments

Partially-Masked FragmentsPresented in NoiseFragmentsPresented in NoiseFigure 2. Four types of test trials.

0.5

0.6

0.7

0.8

0.9

1.0

Full Image Partial Image

Experts- Short Delay

No NoiseNoise Added

Percent Correct

Image Type

0.5

0.6

0.7

0.8

0.9

1.0


Experts- Long Delay

No NoiseNoise Added

Percent Correct

Image Type

0.5

0.6

0.7

0.8

0.9

1.0


Novices- Short Delay

No NoiseNoise Added

Percent Correct

Image Type

0.5

0.6

0.7

0.8

0.9

1.0


Novices- Long Delay

No NoiseNoise Added

Percent Correct

Image Type

Figure 3. Behavioral Experiment Data. Error bars represent one standard error of the mean (SEM).


experts show effects of added noise and partial masking, they show no effect of delay, which

suggests that they are able to re-code their visual information into a more durable store resistant

to decay, or have better visual memories. Experts also show an interaction between added noise

and partial masking, but novices do not. This interaction seen with the experts may result from

very strong performance for full images embedded in noise, and may result from configural

processes. To test this in a scale-invariant manner, we developed a multinomial model which

makes a prediction for full-image performance given partial-image performance using principles

similar to probability summation. The complete results are found in Busey & Vanderkolk (2005),

but to summarize, when partial image performance is around 65%, the model predicts full image

performance to be about 75%, and it is almost at 90%, significantly above the probability

summation prediction. Thus it appears that when both halves of an image are present (as in the

full image) experts are much more efficient at extracting information from each half.

The results of this experiment lay the groundwork for a more complete investigation of

perceptual expertise in fingerprint examiners. From this work we have evidence that:

1) Experts perform much better than novices overall, despite the fact that the testing

conditions were time-limited and somewhat different than those found in a traditional latent print

examination.

2) Experts appear immune to longer delays between study and test images, suggesting better

information re-coding strategies and/or better visual memories

3) Experts may have adopted configural processing abilities over the course of their training

and practice. All observers have similar facilities for faces as a consequence the ecological

importance of faces and our quotidian exposure as a result of social interactions. Experts may

have extended this ability to the domain of fingerprints, since configural processing is seen as

one mechanism underlying expertise (e.g. Gauthier & Tarr, 1997).

C.1.c. Electrophysiological evidence for configural processing

To provide converging evidence that fingerprint experts process full fingerprints

configurally, we turned to an electrophysiological paradigm based on work from the face

Page 9


recognition literature. This experiment is described more fully in Busey and Vanderkolk (2005),

which is included as an appendix. However, these results support the prior conclusions described

above, and demonstrate that the configural processing observed with fingerprint examiners is a

result of profound and qualitative changes that occur in the very earliest stages of their

perceptual processing of fingerprints.

C.2. Elements of human expertise that could improve quantitative analyses

The two studies described above are important because they illustrate that configural

information is one process that could be adapted for use in the quantitative analyses of

fingerprints. Existing quantitative models of fingerprints incorporate some elements of the

expertise seen above, but many elements could be added that would improve the recognition

accuracy of existing programs. The two major approaches to fingerprint matching rely on local

features such as minutiae detection (refs), and more global approaches such as dynamic masks

applied to orientation computed at many locations on a grid overlaying the print (refs). Of these

two approaches, the dynamic mask approach comes closer to the idea of configural processing,

although it does not compute minutiae directly. strengthen this intro

Neither approach takes advantage of the temporal information that expresses elements of

expertise in the human matching process. Quantitative information such as fingerprint data, when

represented in pixel form, has a highly-dimensional structure. The two techniques described

above reduce this dimensionality by either extracting salient points such as minutiae, or

computing orientation only at discrete locations. Both of these approaches throw out a great deal

of information that could otherwise be used to train a statistical model on the elemental features

that allow for matches. Part of the reason this is necessary is that the high-dimensional space is

difficult to work in: all prints are more or less equally similar without this dimensionality

reduction, and by reducing the dimensionality computations such as similarity become tractable.

The key, then, is to reduce the dimensionality while preserving the essential features that allow

for discrimination among prints. One technique that has been explored in language acquisition is

the concept of "starting small" (Elman ref). In this procedure, machine learning approaches such

Page 10


as neural network analyses are given very coarse information at first, which helps the network

find an appropriate starting point. Gradually, more and more detail information is added, which

allows the network to make finer and finer discriminations.

We discuss these ideas more fully in section X.Xx, but we mention it here to motivate the

empirical methods described next. Experts likely select which information they choose to

initially examine based on the need to organize their search processes. Thus they likely acquire

information that may not immediately indicate to a definitive conclusion of confirmation or

rejection, but guides the later acquisition process. In the scene perception literature, this process

is known as 'gist acquisition' (refs), and suggests that the order in which a system (machine or

human) learns information matters. In the section below we describe how we acquire both spatial

and temporal information from experts, and then describe how this knowledge can be

incorporated into quantitative models.

C.3. Capturing the information acquisition process: The moving window paradigm

To identify the nature of the information used by experts, and the order in which it is

gathered, we have begun to use a technique called a moving window procedure. In the sections

below we describe this procedure and how it can be extended to address the role of configural or

gist information in human experts.

C.3.a. The moving window paradigm

The moving window paradigm is a software tool that simulate the relative acuity of the fovea

and peripheral visual systems. As we look around the world, there is a region of high acuity at

the location our eyes are currently pointing. Regions outside the foveal viewing cone are

represented less well. In the moving window paradigm we represent this state by slightly

blurring the image and reducing the contrast.

http://cognitrn.psych.indiana.edu/busey/FingerprintExample/

Figure 4 shows several frames of the moving window program, captured at different points in

time. The two images have been degraded by a blurring operation that somewhat mimics the

Page 11

http://cognitrn.psych.indiana.edu/busey/FingerprintExample/


reduced representation of peripheral vision. The exception is a clear circle that responds in real

time to the movement of the mouse. This dynamic display forces the user to move the clear

window to regions of the display that warrant special interest. The blurred portions provide some

context for where to move the window. By recording the position of the mouse each time it is

moved, we can reconstruct a complete record of the manner in which the user examined the

prints. This method has some drawbacks in that the eyes move faster than the mouse. However,

Page 12

Figure 4. The moving window paradigm allows the user to move the circle of interest around to different locations on the two prints. This circle provides high-quality information, and allows the expert the opportunity to demonstrate, in a procedure that is very similar to an actual latent print examination, which sections of the prints they believe are most informative. This procedure also records the order in which different sites are visited.


we find that with practice the experts report very little limitations with this procedure and it has

the benefit of precise spatial localization. A major benefit of this procedure is that it can be done

over the web, reaching dozens of experts and producing a massive dataset. Many related

information theoretic approaches such as latent semantic analysis find that a large corpus of data

is necessary in order to reveal the underlying structure of the representation of information, and a

web-based approach provides sufficient data.

The data produced by this paradigm is vast: x/y coordinates for the clear window at each

millisecond. We have begun to analyze this data using several different techniques. The first

analysis we designed creates a mask that is black for regions the observer never visited and clear

for areas visited most often. Figure 5 shows an example of this kind of analysis. Areas visited

less often are somewhat darkened. The left panels of Figure 5 show two masked images, which

shows not only where the experts visited, but how long they spent inspecting each location. Thus

it represents a window into the regions the experts believed informative.

The right panels give a slightly different view, where unvisited areas are represented in red.

This illustrates that experts actually spend most of their time in relatively small regions of the

prints.

As a first pass, the images in Figure 5 reveal where the experts believe the task-relevant

information resides. However, lost in such a representation is the order in which these sites were

visited. In addition, this information is very specific to a particular set of print. Ultimately we

will produce more general representation that characterizes both the fundamental set of features

(often described as the basis set) that experts rely on, as well as how they process these features.

We have begun to explore an information-theoretic approach to this problem that seeks to find a

set of visual features that is common to a number of experts and fingerprint pairs. This approach

is related to many of the dimensionality reduction techniques that have been applied to natural

images (e.g. Olshausen & Field, 1996). Later project extend this approach to incorporate

elements of configural processing or context-specific models. In the present proposal we discuss

several different ways we plan to analyze what is a very rich dataset.

Page 13


Our experts report relatively little hindrance when using the mouse to move the window. The

latent and inked prints have their own window (only one is visible at any one time) and users

press a key to flip back and forth between the two prints. This flip is actually faster than an

eyemovement and automatically serves as a landmark pointer for each print, making this

procedure almost as easy to use as free viewing of the two prints (which are often done under a

loupe with its own movement complexities). In addition, we also give users brief views of the

entire image to allow configural processes to work to establish the basic layout.

C.3.b. Measuring the role of configural processing in latent print examinations

behavioral experiment- blurred vs. very low contrast- qualitative changes across experts?

complete this section

C.3.c. Verification with eyemovement recording

complete this section

Page 14

Figure 5. Examples of masked imaged revealing where experts choose to acquire information in order to make an identification. The black versions show only regions where the expert spent any time, and the mask is clearer for regions in which the expert spent more time. The right-hand images show teh same information, but allow some of the uninspected information to show through. These images reveal that experts pay relatively little attention to much of the image and only focus on regions they deem releveant for the identification. We suggest that this element of expertise, learning to attend to relevant locations, is something that coudl benefit quantitative analyes of fingerprints.


C.4. Extracting the fundamental features used when matching prints

Because latent and inked prints are rarely direct copies of each other, an expert must extract

invariants from each image that survive the degradations due to noise, smearing, and other

transformations. Once these invariants are extracted, the possibility of a match can be assessed.

This is similar in principle to the type of categorical perception observed in speech recognition,

in which the invariants of parts of speech are extracted from the voices of different talkers. This

suggests that there exists a set of fundamental building blocks, or basis functions, that experts

use to represent and even clean up degraded prints. The nature and existence of these features are

quite relevant for visual expertise, since in some sense these are the direct outcomes of any

perceptual system that tunes itself to the visual diet it experiences.

We propose to perform data reduction techniques on the output of the moving window

paradigm. These techniques have successfully been applied to derive the statistics of natural

images (Hyvarinen & Hoyer, 2000). The results provided individual features that are localized in

space and resemble the response profiles of simple cells in primary visual cortex. Many of these

studies are performed on random sampling of images and visual sequences, but the moving

window application provides an opportunity to use these techniques to recover the dimensions of

only the inspected regions, and to compare the recovered dimensions from experts and

representations based on random window locations.

The specifics of this technique are straightforward. For each position of the moving window,

extract out (say) a 12 x 12 patch of pixels. This is repeated at each location that was inspected by

the subject, with each patch weighted by the amount of time spent at each location. The moving

window experiment tens of thousands of patches of pixels, which are submitted to a data

reduction technique (independent component analysis, or ICA), which is similar to principle

components analysis, with the exception that the components are independent, not just

uncorrelated. The linear decomposition generated by ICA has the property of sparseness, which

has been shown to be important for representational systems (Field, 1994; Olshausen & Field,

1996) and implies that a random variable (the basis function) is active only very rarely. In

Page 15


practice, this sparse representation creates basis functions that are more localized in space than

those captured by PCA and are more representative of the receptive fields found in the early

areas of the visual system.

Huge copra of samples are required to extract invariants from noisy images, and at present

we have only pilot data from several experts. However, the results of this preliminary analysis

can be found in Figure 6. This figure shows features discovered using the ICA algorithm (Hurri

& Hyvarinen, 2003; Hyvarinen, Hoyer & Hurri, 2003). Each image represents a basis function

that when linearly combined will reproduce the windows examined by experts. Inspection of

Figure 6 reveals that features such as ridge endings, y-brachings and islands are beginging to

become represented. This analysis takes on greater value when applied to the entire database we

will gather, since it will combine across individual features to derive the invariant stimulus

features that provide the basis for fingerprint examinations done by human experts.

The ICA analysis is very sensitive to spatial location, and while cells in V1 are likely also

highly position sensitive, the measured basis functions are properties of the entire visual stream,

not just the early stages. More recent advances in ICA techniques have addressed this issue in a

similar way that the visual system has chosen to solve the problem. In addition performing data

reduction techniques to extract the fundamental basis sets, these extended ICA algorithms group

the recovered components based on their energy (squared outputs). This grouping has shown to

produce classes of basis functions that are position invariant by virtue of the fact that they

Page 16

Figure 6. ICA components from expert data.


include many different positions for each fundamental feature type. The examples shown in

Figure 7 were generated by this technique, which reduces the reliance on spatial location. This

groups the recovered features by class and accounts for the fact that rectangles have similar

properties to nearby rectangles. Note that the features in Figure 14 are less localized than those

typically found with ICA decompositions, which may be due to the large correlational structure

inherent in fingerprints, although this remains an open question addressed by this proposal.

The development of ICA approaches is an ongoing field, and we anticipate that the results of

the proposed research will help extend these models as we develop our own extensions based on

the applications to fingerprint experts. There are several ways in which the recovered

components can be used to evaluate the choice of positions by experts (which ultimately

determine, along with the image, the basis functions). First, one can visually inspect the sets of

basis functions recovered from datasets produced by experts, and compare this with one

generated from random window locations.

A second technique can be used to demonstrate that experts do indeed posses a feature set

that differs from a random set. The data from random windows and experts can be combined to

produce a common set of components (basis functions). ICA is a linear technique, and thus the

original data for both experts and random windows can be recovered through weighted sums of

the components, with some error if only some of the components are saved. If experts share a

Page 17

Figure 7. ICA components from expert data, and grouped by energy. This analyses allows the basis functions to have partial spatial independence, at a slight cost to image quailty. This latter issue is less relevant for larger corpi when many similar features are combined by individual basis function groups.


common set of features that is estimated by ICA, then their data should be recovered with less

error than that of the random windows. This would demonstrate that an important component of

expertise is the ability to take a highly dimensional dataset (as produced by noisy images) and

reduce it down to fundamental features. From this perspective, visual expertise is data reduction.

These kinds of data reduction techniques serve a separate purpose. Many of the experiments

described in other sections of this proposal depend on specifying particular features. While initial

estimates of the relevant features can be made on the basis of discussions with fingerprint

experts, we anticipate that the results of the ICA analysis will help refine our view of what

constitutes an important feature within the context of fingerprint matching.

The moving window procedure has the disadvantage of being a very localized procedure, due

to the nature of the small moving window. There is a fundamental tradeoff between the size of

the window and the spatial acuity of the procedure. If the window is made too large, we know

less about the regions from which the user is attempting to acquire information. To offset this,

we have provided the user the opportunity to view quick flashes of the full image, enough to

provide an overview of the prints, but not enough to allow matches of specific regions. We will

also conduct the studies using large and small windows to see whether the nature of the

recovered components changes with window size.

C.4. Starting Small: Guiding feature extraction with expert knowledge

We need to ask whether this is compelling, and cut it if it is not.

Feature extraction procedures attempt to take a high dimensional space and use the

redundancies in this space to derive a lower-dimensional representation that combine across the

redundancies to provide a basis set. This basis set can be thought of as the fundamental feature

set, and the development of this set can be thought of as one mechanism underlying human

expertise. The difficulty with these highly-dimensional spaces is that algorithms that attempt to

uncover the feature set through iterative procedures like Independent Component Analysis or

neural networks may fall into local minima and fail to converge upon a global solution. One

solution that has been proposed in the human developmental literature is one of starting small

Page 18


(Elman, 1993). In this technique, programmers intially restrict the inputs to statistical models to

provide general kinds of information rather than specific information that would lead to learning

of specific instances. As a network matures, more specific information is added, which allows

the network to avoid falling into local minima that represent non-learned states. While the exact

nature of these effects are still being worked out (Rohde & Plaut, 1999), recent work has

provided empirical support in the visual domain (Conway, Ellefson & Christiansen, ref). This

suggests that we might use the temporal component of the data from experts in the moving

window paradigm to help guide the training of our networks.

As an expert views a print, they initially are likely to focus on broad, overall types of

information that give the need to finish if necessary

C.5. Automatic detection of regions of interest using expert knowledge

In both fingerprint classification (e.g. Dass & Jain, 2004; Jain, prabhakar & Hong 1999;

Cappelli, Lumini, Maio & Maltoni, 1999) and fingerprint identification (e.g. Pankanti, Prabhakar

& Jain, 2002; Jain, Prabhakar & Pankanti, 2002) applications, there are two main components for

an automatic system: (1) feature extraction and (2) matching algorithm to compare (or classify)

fingerprints based on feature representation. The feature extraction is the first step to convert

raw images into feature representations. The goal is to find robust and invariant features to deal

with various conditions in real-world applications, such as illumination, orientation and

occlusion. Given a whole image of fingerprint, most fingerprint recognition systems utilize the

location and direction of minutiae as features for pattern matching. In our preliminary study of

human expert behaviors, we observe that human experts focus on just parts of images (regions of

interest – ROIs) as shown in Figure XX, suggesting that it is not necessary for a human expert to

check through all minutiae in a fingerprint. A small subset of minutiae seems to be sufficient for

the human expert to make a judgment. What regions are useful for matching among all the

minutiae in a fingerprint? Is it possible to build an automatic ROI detection system that can

achieve a similar performance as a human expert? We attempt to answer this question by

building a classification system based on the training data captured from human experts. Given a

Page 19


new image, the detection system is able to automatically detect and label regions of interest for

the matching purpose. We want to note that we expect that most regions selected by our system

will be minutiae but we also expect that the system will potentially discover the structure

regularities from non-minutia regions that are overlooked in previous studies. Different from

previous studies of minutiae detection (e.g. Maio & Maltoni, 1997), our automatic detection

system will not simply detect minutiae in a fingerprint but focus on detecting both a small set of

minutiae and other useful regions for the matching task. Considering the difficulties in

fingerprint recognition, building this automatic detection system is challenging. However, we are

confident that this proposed research will be first steps toward the success and make important

contributions. This confidence lies in two important factors that make our work different from

other studies: (1) we will record detailed behaviors of human experts (e.g. where they look in a

matching task) and recruits the knowledge extracted from human experts to build a pattern

recognition system; and (2) we will apply state-of-art machine learning techniques in this study

to efficiently encode both expert knowledge and regularities in fingerprint data. The combination

of these two factors will lead us to achieve this research plan.

To build this kind of system, we need to develop a machine learning algorithm and estimate

the parameters based on the training data. Using the moving window paradigm (described in

Page 20

Figure X. The overview of automatic detection of regions of interest. The red regions in the fingerprints indicate whether human expert focus on in pattern matching task.


C.3), we collect the information of where a human expert examines from moment to moment

when he performs a matching task. Hence, the expert’s visual attention and behaviors (moving

the windows) can be utilized as labels of regions of interest – providing the teaching signals for a

machine learning algorithm. In the proposed research, we will build an automatic detection

system that captures the expert’s knowledge to guide the detection of useful regions in a

fingerprint for pattern matching.

We will use the data collected from C.X. Each circular area examined by the expert is filtered

by a bank of Gabor filters. Specifically, the Gabor filters with three scales and five orientations

are applied to the segmented image. It is assumed that the local texture regions are spatially

homogeneous, and the mean and the standard deviation of the magnitude of the transform

coefficients are used to represent an object in a 48-dimensional feature vector. We reduced the

high-dimensional feature vectors into the vectors of dimensionality 10 by principle component

analysis (PCA), which represents the data in a lower dimensional subspace by pruning away

those dimensions with the least variance. We also randomly sample other areas that the expert

doesn’t pay attention to and code these areas with a Non-ROI label which is paired with feature

vectors extracted from these areas. In total, the training data consists of two groups of labeled

features – ROI and Non-ROI.

Next, we will build a binary classifier based on Support vector machines (SVMs). SVMs

have been successfully applied to many classification tasks (Vapnik 1995; Burges 1998). A SVM

trains a linear separating plane for classifying data, through maximizes the margins of two

parallel planes near the separating one. The central idea is to nonlinearly map the input vector

into a high-dimensional feature space and then construct an optimal hyperplane for separating

the features. This decision hyperplane depends on only a subset of the training data called

support vectors.

For a set of n-dimensional training examples, miix 1}{ ==Χ labeled by expert’s visual attention

miiy 1}{ = , and a mapping of data into q-dimensional vectors m

iixX 1)}({)( == φφ by kernel function

Page 21


where nq >> , a SVM can be built on the set of mapping training data based on the solution of the

following optimization problem:

Minimizing over ),...,,,( 1 mbw ξξ the cost function: ∑=

+m

ii

T Cww12

1ξ

Subject to: iiTm

i bxwy ξφ −≥+∀= 1))((:1 and

0≥iξ for all i

Where C is a user-specified constant for controlling the penalty to the violation terms denoted by

each iξ . The ξ is called slack variables that measure the deviation of a data point from the ideal

condition of pattern separability. After training, w and b constitute of the classifier:

))(( bxwsigny T += φ

Compared with other approaches used in fingerprint recognition, such as neural networks and

k-nearest neighbors, SVMs have demonstrated more effective in many classification tasks. In

addition, we first transform original features into a lower-dimensional space based on PCA. The

purpose of this first step is to deal with the curse of dimensionality. We then map the data points

into another higher-dimensional space so that they are linearly separable. By doing so, we

convert the original pattern recognition problem into a simpler one. This idea is quite in line with

kernel-based nonlinear PCA (Scholkopf, Smola & Muller 1998) that have been successfully used

in several fields (e.g. Wu, Su & Carpuat 2004).

Given a new testing fingerprint, we will shift a 40x40 window over the image and classify all

the patches at each location and scale. The system will first extract Gabor-based features from

local patches which will be the input to the detector. The detector will label all the regions as

either ROI or Non-ROI. We expect that most ROIs are minutiae. Different from the methods

based on minutiae matching, we also expect that only a small of minutiae are utilized by human

experts. Moreover, we expect the system to detect some areas that are not defined as minutiae

but human experts also pay attention to during the matching task. Thus, the ROI detector we

develop will go beyond the standard approach in fingerprint recognition (minutiae extraction and

matching). By efficiently encoding the knowledge of human expert, the proposed system will

Page 22


have opportunities to discover the statistical regularities in fingerprints that have been

overlooked in previous studies.

C.6. Using expert-identified correspondences to extract environmental models

In our moving window paradigm, a human expert moves the window back and forth between

inked and latent fingerprints to perform pattern matching. We propose that the dynamic

behaviors of the expert provide additional signals indicating one-to-one correspondences

between two images. In light of this, our hypothesis is that an expert’s decision is based on the

comparison of these one-to-one patches. Therefore, we propose that these expert-identified

correspondences can serve as additional information to find the regularities in fingerprint and

build the automatic detection system.

We propose to use this knowledge as a prior for the training data. We observe that not all the

focused regions in the latent print have the corresponding regions in the inked print. Thus, it is

more likely that those one-to-one pairs play a more important role in pattern matching than other

regions of interest. Based on this observation, we propose to maintain a set of weights over the

training data. More specifically, for each ROI in the latent image, we find the most likely pairing

patch in the inked image. Two constraints guide the searching of the matching pair. The temporal

constraint is based on the expert’s behaviors. For instance, the patch in the inked pair that the

expert immediately examine (right after looking at the ROI in the latent image) is more likely to

associate with that ROI in the latent pair. The spatial constraint is to find the highest similarity of

the patch in the latent image and any other patch in the inked image. In this way, each ROI in the

latent image can be assigned with a weight indicating the probability to map this region to a

region in the other image. With a set of weighted training data, we will apply a SVM-based

algorithm (briefly described in C.5) which will focus on the paired samples (with high weights)

in the training data. More specifically, we replace the constant C in the standard SVM with a set

of variables ic , each of which corresponds to the weight of a data point. Accordingly, the new

objective function is ∑=

+m

iii

T cww12

1ξ . Thus, the matching regions receive more penalties if they

Page 23


are nonseparable points while other regions receive less

attention because it is more likely that they are irrelevant to

the expert’s decision. Thus, the parameters of the SVM are

tuned up to favorite the regions that human experts are

especially interested in. By encoding this knowledge in a

machine learning algorithm, we expect that this method will

lead to a better performance by closely imitating the

expert’s decision.

C.7. Dependencies between global and local

information: The role of gist information

Fingerprints are categorized into several classes, such as whorl, right loop, left loop, arch,

and tented arch in the Henry classification system (Henry 1900). In the literatures, researchers

use only 4-7 classes in an automatic classification system. This is because the task of

determining a fingerprint class can be difficult. For example, it is hard to find robust features

from raw images that can aid classification as well as exhibit low variations within each class. In

C.5 and C.6, we discuss how to use expert knowledge to find useful features for pattern

matching. By taking a bigger picture of feature detection and fingerprint classification in this

section, we find that we need to deal with a chicken-and-egg problem: (1) useful local features

can predict fingerprint classes; and (2) a specific fingerprint class can predict what kinds of local

regions likely occur in this type of fingerprint. In contrast, standard alone feature detection

algorithms (e.g. in C.5 and C.6) usually look at local pieces of the image in isolation when

deciding whether the patch is a region of interest. In machine learning, Murphy, Torralba and

Freeman (2003) proposed a conditional random filed for jointly solving the tasks of object

detection and scene classification. In light of this, we propose to use the whole image context as

an extra source of global information to guide the searching of ROIs. In addition, a better set of

ROIs will also potentially make the classification of the whole fingerprint more accurate. Thus,

Page 24

Figure X. The overview of automatic detection of regions of interest. The red regions in the fingerprints indicate whether human expert focus on in pattern matching task.


the chicken-and-eggs problem is tackled by a bootstrapping procedure in which local and global

pattern recognition systems interact with and boost each other.

We propose a machine learning system based on graphical models (Jordan 1999) as shown in

Figure XX. We define the gist of image as a feature vector extracted from the whole image by

treating it as a single patch. The gist is denoted by Gv . Then we introduce a latent variable T

describing the type of fingerprint. The central idea in our graphical model is that ROI presence is

conditionally independent given the type and the type is determined by the gist of image. Thus,

our approach encodes the contextual information on a per image basis instead of extracting

detailed correlations between different kinds of ROIs (e.g. a fix prior such as the patch A always

occurs to the left of the patch B) because of the complexity and variations of detailed

descriptions. Next we need to classify fingerprint types. We will simply train a one-vs-all binary

SVM classifier for recognizing each fingerprint type based on the gist. We will then normalize

the results: ∑ ==

=='

' )|1(

)|1()|(

t Gt

Gt

G vTp

vTpvtTp where )|1( Gt vTp = is the output of tth one-vs-all

classifier.

As far as the fingerprint type is known, we can use this information to facilitate ROI

detection. As shown on the tree-structured graphical model in Figure XX, the following

conditional joint density can be expressed as follows:

∏∑∏ ==i t

ititGi

iiGN vTRpTpvTpz

vTRpvTpz

vRRTp ),|()()|(1

),|()|(1

)|,...,,( 1

Page 25


Where Gv and iv are local and global features respectively. iR is the class of a local patch. In the

proposed research, we will investigate two types of R. One classification defines ROI and Non-

ROI types which is the same with C.5 and C.6. The other classification defines several minutia

types (plus Non-ROI) such as termination minutia and bifurcation minutia. z is a normalizing

constant. Based on this graphical model, we will be able to use contextual knowledge to facilitate

the classification of a local image. We also plan to develop a more advanced model which will

use local information to facilitate the fingerprint type classification. We expect that this kind of

approach will lead to a more effective automatic system that can perform both top-down

inference (fingerprint types to minutia types) and bottom-up inference (minutia types to

fingerprint types).

C.8. Summary of quantitative approaches

(Tom writes)

General themes:

Incorporate expert knowledge

Links between global and local structure made possible by input from experts

Specification of elemental basis or feature set

Classifying informativeness of regions

Defining an intermediate level between low-level feature extractors and high-level gist or

configural information

D. Implications for knowledge and practice

The implications of the knowledge gained by the results of these studies and analyses falls

into four broad categories, each of which are discussed below.

D.1. Implications for quantitative understanding of the information content of fingerprints

Page 26


D.2. Implications for an understand of the links between quantitative information content

and the latent print examination process

D.3. Implications for the classification and filtering of poor-quality latent prints

D.4. Implications for the development of software-based tools to assist human-based latent

print examinations and training

E. Mangement plan and organization

F. Dissemination plan for project deliverables

scientific articles, presentations at machine learning conferences and fingerprint conferences,

proof-of-concept Java-based applets.

(end of 30 pages)

Page 27


G. Description of estimated costs

Personnel

The project will be co-directed by Thomas Busey and Chen Yu. We request 11 weeks of

summer support, during which time both will devote 100% of their efforts to the project.

Benefits are calculated at 19.81%. The salaries are incremented 3% per year.

Many of the simulations will be conducted by a graduate student, who will be hired

specifically for the purposes of this project. This student, likely an advanced computer science

student with a background in cognitive science, requires a stipend, a fee remission and health

insurance. The health insurance is incremented at 5% per year.

Subject coordination and database management will be coordinated by hourly students who

will work 20 hours/wk on the project. We will pay them $10/hr.

Consultant

John Vanderkolk, with whom Busey has worked with for the past two years, has agreed to

serve as an unpaid consultant on this grant. He does require modest travel costs when he visits

Bloomington.

Travel

Money is requested to bring in four experts for testing using the eyemovement recording

equipment. These costs will total approximately $1500/yr.

Money is requested for three conferences a year. These will enable the investigators to travel

to conferences such as Neural Information Processing (NIPS) and forensic science conferences

such as the International Association for Identification (IAI) to interact with colleagues and share

the results of our analyses. These trips serve an important role in communicating the efforts of

this grant to a wider audience.

Other Costs

Equipment

This research is very computer-intensive, and thus we require a large UNIX-based server to

run simulations in parallel. In addition, we require three pc-based workstations to run Matlab and

Page 28


other simulations programs. Finally, conferences such as IAI and local Society for Identification

meetings provide an ideal place to gather data from experts, and thus we require a portable

computer for such onsite data-gathering purposes. We anticipate that up to half of our data can

be collected using these on-site techniques, and this technique is preferable because we have

control over the monitor and software. Thus the laptop computer represents a good investiment

in the success of the project.

Other costs

The graduate student line requires a fee remission each year. The fee remission is

incremented at 5% per year.

The results of our studies require resources to reach a wide audience, and thus we require

dissemination costs to cover the costs of publication and web-based dissemination.

This project is highly image-intensive, and we require money to purchase image-processing

software and upgrades. These include software packages such as Adobe Photoshop, as well as

new image processing packages as they become available.

We will test 80 subjects a year to obtain the necessary data for use in our statistical

applications. Each subject requires $20 for the approximate 90 minute testing period.

The project will consume supplies of approximately $100/month, for items such as backups,

power supplies, etc.

Indirect Costs

The indirect rate negotiated between Indiana University and the federal government is set at

51.5%. This rate is assessed against all costs except the fee remission. This was negotiated with

DHHS on 5.14.04

G. Staffing plan and Resources

Both Busey and Chen maintain laboratories in the Department of Psychololgy at Indiana

University that each contain approximately 700 sq. feet of space. These have subject running

rooms, offices and spaces for servers. Chen's lab contains an eyemovement recording setup that

Page 29


is sufficent for the eyemovement porition of the experiments. Both investigators have offices in

the Psychology department as well.

We will recruit a graduate student from the Computer Science or Psychology programs at

Indiana University. This student must have experience with machine learning algorythms at a

theoretical level, and also be an expert programmer. They will work 20 hrs/wk. We will also

recruit two hourly undergraduate students to coordinate the subject running, data analysis and

server maintainance. They will also be responsible for managing the data repository site where

our data will be accessible by other reserachers who wish to integrate human expert knowledge

into their networks.

The bulk of the theoretical work will be handled by Chen and Busey, while the graduate

student will work in impliemnation and model testing.

H. Timeline

This is a multi-year project that is designed to alternate between acquiring human data and

using it to refine the quantitative analyses of latent and inked prints.

Year 1: Acquire necessary fingerprint databases. Begin testing 80 experts on 72 different

latent/inked print pairs. Program Support Vector and Global Local models. Test 2 experts on the

eyemovement equipment using all 72 prints.

Year 2: Test an additional 80 experts on 72 new latent/inked prints. Begin model fitting and

refinement. Test 2 experts on the eyemovement equipment using all 72 prints. Compare results

from eyemovement studies and moving window studies.

Year 3: Test the final 80 experts on 72 new latent/inked prints. Develop new versions of

statistical models based on prior results. Put entire database online for use by other researchers.

Disseminate results to peer-reviewed journals.

Page 30

Documents

DOJProposal7.doc