20
A decision support system for Crithidia luciliae image classification Paolo Soda a,b,* Leonardo Onofri a Giulio Iannello a,b a Medical Informatics and Computer Science Laboratory, Integrated Research Centre, University Campus Bio-Medico of Rome, Via Alvaro del Portillo, 21, 00128 Roma, Italy {p.soda,leonardo.onofri,g.iannello}@unicampus.it b Fondazione Alberto Sordi, Via dei Compositori 130, 00128 Roma, Italy Summary Objective : Systemic lupus erythematosus is a connective tissue disease affecting multiple organ systems and characterised by a chronic inflammatory process. It is considered a very serious sickness, further to be classified as an invalidating chronic disease. The recommended method for its detection is the indirect immunofluores- cence (IIF) based on Crithidia luciliae (CL) substrate. Hoverer, IIF is affected by several issues limiting tests reliability and reproducibility. Hence, an evident med- ical demand is the development of computer-aided diagnosis tools that can offer a support to physician decision. Methods : In this paper we propose a system that classifies CL wells integrating in- formation extracted from different images. It is based on three main decision phases. Two steps, named as threshold-based classification and single cells recognition, are applied for image classification. They minimise false negative and false positive clas- sifications, respectively. Feature extraction and selection have been carried out to determine a compact set of descriptors to distinguish between positive and negative cells. The third step applies majority voting rule at well recognition level, enabling us to recover possible errors provided by previous phases. Results : The system performance have been evaluated on an annotated database of IIF CL wells, composed of 63 wells for a total of 342 images and 1487 cells. Accuracy, sensitivity and specificity of image recognition step are 99.4%, 98.6% and 99.6%, respectively. At level of well recognition, accuracy, sensitivity and specificity are 98.4%, 93.3% and 100.0%, respectively. The system has been also validated in a daily routine fashion on 48 consecutive analyses of hospital outpatients and inpatients. The results show very good performance for well recognition (100% of accuracy, sensitivity and specificity), due to the integration of cells and images information. Conclusions : The described recognition system can be applied in daily routine in order to improve the reliability, standardisation and reproducibility of CL readings in IIF. Preprint submitted to Elsevier 7 May 2010

A decision support system for Crithidia Luciliae image classification

Embed Size (px)

Citation preview

A decision support system for Crithidia

luciliae image classification

Paolo Soda a,b,∗ Leonardo Onofri a Giulio Iannello a,b

aMedical Informatics and Computer Science Laboratory, Integrated ResearchCentre, University Campus Bio-Medico of Rome,Via Alvaro del Portillo, 21, 00128 Roma, Italyp.soda,leonardo.onofri,[email protected]

bFondazione Alberto Sordi, Via dei Compositori 130, 00128 Roma, Italy

Summary

Objective: Systemic lupus erythematosus is a connective tissue disease affectingmultiple organ systems and characterised by a chronic inflammatory process. It isconsidered a very serious sickness, further to be classified as an invalidating chronicdisease. The recommended method for its detection is the indirect immunofluores-cence (IIF) based on Crithidia luciliae (CL) substrate. Hoverer, IIF is affected byseveral issues limiting tests reliability and reproducibility. Hence, an evident med-ical demand is the development of computer-aided diagnosis tools that can offer asupport to physician decision.Methods: In this paper we propose a system that classifies CL wells integrating in-formation extracted from different images. It is based on three main decision phases.Two steps, named as threshold-based classification and single cells recognition, areapplied for image classification. They minimise false negative and false positive clas-sifications, respectively. Feature extraction and selection have been carried out todetermine a compact set of descriptors to distinguish between positive and negativecells. The third step applies majority voting rule at well recognition level, enablingus to recover possible errors provided by previous phases.Results: The system performance have been evaluated on an annotated databaseof IIF CL wells, composed of 63 wells for a total of 342 images and 1487 cells.Accuracy, sensitivity and specificity of image recognition step are 99.4%, 98.6% and99.6%, respectively. At level of well recognition, accuracy, sensitivity and specificityare 98.4%, 93.3% and 100.0%, respectively. The system has been also validatedin a daily routine fashion on 48 consecutive analyses of hospital outpatients andinpatients. The results show very good performance for well recognition (100% ofaccuracy, sensitivity and specificity), due to the integration of cells and imagesinformation.Conclusions: The described recognition system can be applied in daily routine inorder to improve the reliability, standardisation and reproducibility of CL readingsin IIF.

Preprint submitted to Elsevier 7 May 2010

Key words: Supervised pattern recognition, Computer-aided diagnosis (CAD),Indirect immunofluorescence (IIF), Crithidia luciliae classification, Systemic lupuserythematosus (SLE)

1 Introduction

Systemic lupus erythematosus (SLE) is a connective tissue disease (CTD) ofunknown aetiology, affecting multiple organ systems and characterised by achronic inflammatory process. The incidence of SLE in Europe per year rangesfrom 1.9 up to 4.8 cases every 100000 inhabitants, whereas its prevalence variesbetween 12.5 and 68 cases every 100000 inhabitants [1,2]. Recent studies havereported that survival at 5 years in cohorts of patients with SLE ranges from91% up to 97%, whereas at 10 years it varies between 83% and 92% [3–6].Although SLE should be tagged as a rare illness due to its incidence, it isconsidered a very serious sickness, further to be classified as an invalidatingchronic disease.

The numerous and different manifestations of SLE make its diagnosis a bur-densome task. The American College of Rheumatology established that thedetection of anti-double strand DNA (anti-dsDNA) autoantibodies in patientserum is the key criterion for SLE diagnosis [7]. In general, CTDs investiga-tion is a first level test looking also for other types of autoantibodies, such asantinuclear autoantibodies (ANAs) and anti-neutrophil cytoplasm antibodies(ANCAs). Indeed, data on positivity, quantity, and antibodies types provideinformation on patient diagnosis and prognosis.

The recommended method for autoantibodies detection is the indirect im-munofluorescence (IIF) assay. IIF makes use of a substrate containing a specificantigen that can bond with serum antibodies, forming a molecular complex.Then, this complex reacts with human immunoglobulin conjugated with afluorochrome and the complex becomes observable at the fluorescence micro-scope.

ANAs, anti-dsDNA autoantibodies and ANCAs are detected in IIF using dif-ferent substrates named as HEp-2, Crithidia Luciliae (CL) and human neu-trophils, respectively.

Despite the pivotal role of IIF, reproducibility and reliability of its readingsare affected by several issues, such as:

∗ Corresponding author: Paolo Soda, E-mail: [email protected], tel: +39 06225419620, fax: +39 06 225419609

2

Fig. 1. Image acquired from a positive well of Crithidia Luciliae.

• the lack of resources and adequately trained personnel [8,9];• the low level of standardization [10];• the inter and intralaboratory variance [8,11];• the lack of automatised procedures;• the photobleaching effect, which bleaches significantly in a few seconds bi-

ological tissues stained with fluorescent dyes [12].

To date, the highest level of automation in IIF tests is the preparation of slideswith robotic devices performing dilution, dispensation and washing operations[13,14]. Although such instruments helps in speeding up the slide preparationstep, the development of computer-aided-diagnosis (CAD) systems supportingIIF diagnostic procedure would be beneficial in many respects.

To our knowledge, recent papers dealing with CAD tools for ANA tests usingHEp-2 substrate can be found in the literature [8,15–19], whereas no worksdeal with CL and ANCA automated classification. In response to the growingdemand of automated and more standardised tests [8,11], this paper focuseson the development of a decision support system for CL well 1 classification,since SLE is a relevant and serious disease. The CAD is based on three deci-sion steps, namely cell, image and well classification that permit to integrateheterogeneous information about the well. The approach has been evaluatedon an annotated dataset and the system has been validated in clinical practice.

The paper is organized as follows. After presenting background and motiva-tions in section 2, in section 3 we describe the recognition approach. Section 4presents features extraction and selection, section 5 describes the experimentalevaluation, whereas section 6 reports the validation of the system. In section7 we conclude the paper.

1 A well is a portion of the slide where the patient serum is dispensed.

3

2 Background and Motivations

2.1 Application Context

International guidelines recommend to use the CL substrate to verify thepresence of SLE associated autoantibodies using the IIF method [20,21].

CL is a unicellular organism containing both the nucleus and a compressedmass of double strand DNA (dsDNA) named as kinetoplast. The kinetoplastfluorescence is the key parameter to report a test as positive, since it impliesthat serum autoantibodies reacted with the dsDNA mass. Kinetoplast stainingcan be observed at the fluorescence microscope.

Further to general issues affecting IIF readings reported in section 1, severalreasons of uncertainty arise from the fluorescence of other structures of CLcells that are not markers of anti-dsDNA autoantibodies, e.g. the basal bodyor the flagellum. Fig. 2 shows different cases that may occur, depicted usingstylized cells where light and dark grey represent high and low fluorescence,respectively.

Fig. 2. Positive and negative cases depicted through stylized cells. Light and darkgreen represent high and low fluorescence, respectively.

The two top panels (A and B) report positive cases. The former shows a cellwhere only the kinetoplast exhibits a fluorescence staining stronger than cellbody, while in the latter also the nucleus is highly fluorescent.

The three bottom panels show negative cases. In panel C, the cell is clearlynegative since the whole cell body exhibits a weak and quite uniform fluores-cence. Panels D and E depict cells where regions different from the kinetoplastexhibit strong fluorescence staining. Indeed, in panel D the basal body, whichis similar to kinetoplast in size and type of fluorescence staining, is lighter thancell body. In panel E, one or more parts different from the kinetoplast exhibit

4

strong staining, as drawn in the figure. Finally, notice also that in some casesfluorescence objects, i.e. artefacts, outside cell bodies can be observed. Hence,several and different reasons of uncertainty can be observed in CL images,making the right determination of kinetoplast staining a demanding task.

2.2 Related Work

Humans ability to detect and diagnose diseases during image interpretation islimited both by their non-systematic search patterns and by the presence ofnoise. In addition, similar characteristics of some abnormal and normal struc-ture may cause interpretational errors. Furthermore, the vast amount of imagedata generated by some imaging devices makes the detection of potential dis-eases a burdensome task causing oversight errors. Developments in computervision and artificial intelligence in medical image interpretation have shownthat CAD system can pursue the following major objectives [22–24]:

• performing a pre-selection of the cases to be examined, enabling the physi-cian to focus his/her attention only on relevant cases;• carrying out mass screening campaigns, acting as a fully automated system;• serving as a second reader, thus augmenting the physician capabilities and

reducing errors;• aiding the physician while he/she carries out the diagnosis;• working as a tool for training and education of specialized medical personnel.

IIF is the recommended method for autoantibodies determination but, up tonow, physicians rarely made use of quantitative information. Hence, the de-velopment of a CAD in this field would improve medical practice, achievingthe advantages mentioned above and improving test standardisation. In thisrespect, some papers recently presented CAD systems that aim at support-ing the classification of HEp-2 images [8,15–19]. These systems apply patternrecognition approaches based on different classifiers, such as nearest neighbouror decision trees, and methods for classifiers combination, e.g. multi-expertssystem. They achieved promising results paving the way for increasing thestandardisation level and reducing the inter and intra-laboratory variability.It is worth noting that some of these research papers evolved into commercialapplications [25–27].

To our knowledge, no work in the literature has presented solutions to sup-port the physicians in the classification of CL slides: this paper proposes anapproach in this direction.

5

3 Recognition Approach

Microscope magnification typically used during the acquisition process impliesthat the collected digital images do not cover all the surface of the well. Forthis reason, several images of the same well are always acquired. This featurepermits us to exploit a certain degree of redundancy, integrating informationextracted from different images of the same well. Well classification is per-formed on the strength of its images classification, as depicted in panel A ofFig. 3.

Turning our attention to single image classification, we proceed as shown inpanel B of Fig. 3. First, we apply a threshold-based classification to detectpresumed kinetoplasts. Indeed, on the basis of the considerations reported inthe previous section, compact set of pixels more fluorescent than other parts ofthe image should be tagged as candidate kinetoplast. Conversely, the absenceof such regions permits us to label the image as negative. For example panelC of Fig.2 will be classified as negative, while the other panels contain regionscandidate to be a kinetoplast and proceed in the next classification steps. As asecond step, when the image contains at least one of these regions, we locate itscells and extract from them a set of features. As a third step, we classify eachcell as positive if it contains the kinetoplast, as negative otherwise. Finally,each image is classified on the basis of the labels of its cells.

Well Images

m Image

Classification

m Well

Classification Negative Well

Positive Well

A: Well classification approach

Cells location and Feature extraction

Negative Image

Image Classification

Threshold based Classification

Cells Classification

n n

Negative Image

Positive Image

B: Schematic representation of image classification block

Fig. 3. Description of the classification approach

In our opinion, such an approach provides the following two benefits. First, theinitial threshold-based classification allows a rapid categorisation of severalimages. Second, this approach is tolerant with respect to misclassificationsin cells recognition: if enough cells per image are available, it is reasonablethat misclassified cells, if limited, does not affect image classification. This

6

feature permits to lower the effect of both erroneously segmented cells andartefacts. Notice also that similar considerations hold for well classification,where multiple images are available.

Following the scheme reported in Fig. 3, next subsections present the mainsteps of the proposed approach. To this aim, let us introduce the followingnotation:

• Ω = P,N is the set of class labels, where P and N represent the positiveand negative labels, respectively;• α is a cell;• γ(α) is the label associated to cell α;• I is an image, and I(x, y) represents the image pixel of coordinates (x, y);• y(I) is the label associated to image I;• W is a well;• yW = (y(I1

W ), . . . , y(InW )) is the vector of image labels of W ;

• w(yW ) is the label associated to the entire well W .

3.1 Threshold-based classification

Each image contains several cells whose kinetoplasts can be positive or neg-ative. As a first step, we would like to identify clearly negative images, i.e.those not containing any fluorescent kinetoplast. To this aim, we performa threshold-based classification looking for connected regions satisfying con-straints on both intensity and dimension.

Firstly, we extract from the image the set of pixels S whose value is largerthan the intensity threshold Ti. Formally,

S = I(x, y)|I(x, y) > Ti (1)

Threshold Ti is defined as max(I) − δ, where δ has been heuristically deter-mined to avoids selecting pixels with maximum intensity only.

Secondly, dimension-based threshold classification finds in S the set R of 8-connected regions larger than Td pixels. Threshold Td has been determinedobserving typical sizes of kinetoplasts.

It is worth noting that threshold-based classification permits to identify neg-ative images if R = ∅. However, some negative images could contain regionsthat satisfy the two constrains, resulting in card(R) > 0, as in panels D and Eof Fig. 2. Hence, if none region candidate to be a kinetoplast is detected, i.e.R = ∅, the steps described in section 3.2 are skipped and the image is labelledas negative. Otherwise, we determine if it is a true positive image through the

7

classification of its individual cells. In this way we verify if detected regionsare either kinetoplasts or other fluorescence structures inside the cell body.

3.2 Cell location and feature extraction

Single cells in the image are segmented if the threshold-based segmentationapplied to the corresponding image identified compact regions with a strongstaining. The segmentation method applies Otsu’s algorithm [28] and mor-phological operations, such as filling and connection analysis, to output thebinary mask. Cells connected with the image border are suppressed.

Next, from each segmented cell we extract the set of selected features describedin section 4.

3.3 Cell and image classification

In order to distinguish between true and false positive images, we classifysingle cells. The set of extracted features is given to the cell classificationsystem that labels α with γ(α) ∈ Ω (section 5.2). Then, the application ofmajority voting (MV) rule assigns to I the label y(I) corresponding to theclass receiving the majority of votes. If we introduce the auxiliary quantityOh(I) representing the sum of votes on cells of I assigned to class h ∈ Ω, y(I)is defined as:

y(I) =

P if card(R) > 0 ∧OP (I) > ON(I)

N otherwise(2)

3.4 Well Classification

As reported at the beginning of the section, the class of the whole well isdetermined on the basis of the labels associated to its images. We apply againthe MV rule where Vh(W ) represents the sum of votes on images of W assignedto class h ∈ Ω. The label of W is given by:

w(yW ) =

P if VP (W ) > VN(W )

N if VP (W ) < VN(W )

reject otherwise

(3)

8

Notice that the system suspends the decision when an equal number of imageshas opposite labels. This choice corresponds to a conservative criterion thataims at minimising the misclassification risk.

4 Features Extraction and Selection

This section describes how we have selected the heterogeneous set of descrip-tors used to represent the peculiarities of CL cells.

A first set considered consists of texture features that have been successfullyused in previous works on HEp-2 classification [18,19]. They are related tostatistical and spectral measures. The former have been extracted both fromintensity histogram and from grey level co-occurence matrix by means of com-puting their statistical moments, e.g. skewness, kurtosis, energy, entropy, toname a few. The latter have been computed from Fourier transform (FT),Wavelet transform and Zernike moments.

We also computed other types of features: rectangle features, features ex-tracted from local binary patterns and morphological descriptors.

The first are simple descriptors, which have demonstrate very effective in thefield of face detection [29]. Indeed, similarly to face detection, we are interestedin detecting bright regions inside an object. A rectangle feature is computed asa weighted sum of pixels inside adjacent rectangles in the image. Defining asRecSumi(r) the weighted sum of pixels into a rectangular region r accordingto the mask ωi [29], rectangle features RecFeat are given by

RecFeat =∑

i

RecSumi(r) (4)

Rectangle features owe their success, besides their effectiveness, to the possi-bility of being efficiently implemented through an intermediate representationof the image called integral image.

Local Binary Pattern (LBP) operator assigns to each pixel of the image alabel obtained comparing it with its neighbourhood matrix. Different typesof LBP operators can be computed varying both the pixel arrangement andthe number of neighbours, respectively. In this work we use the standard LBPoperator obtained applying a 3x3 neighbourhood square matrix [30]. Morespecifically, defining as gc the value of the pixel with coordinates (xc, yc), andwith gp the value of its p-th neighbour, the LBP operator is defined as follows:

LBP (xc, yc) =7∑

p=0

2ps(gp − gc) (5)

9

where

s(x) =

1 if x ≥ 0

0 otherwise

The basic LBP operator is neither robust to changes in spatial resolution norto texture rotations. For this reason we extract also the circular LBP, whichis computed using bilinear interpolated circular neighbourhoods, since theydemonstrated robustness to grayscale variations and to rotated textures [31].

Finally, morphological descriptors analyse the shape and the intensity of cellsand of presumed kinetoplast regions. They are:

(1) the sum of pixels of the cell;(2) the sum of pixels of the presumed kinetoplast;(3) the sum of cell pixels outside the presumed kinetoplast;(4) the area A of the presumed kinetoplast;(5) circularity of the presumed kinetoplast, which is defined as 4πA/L2,

where L is its perimeter.

The first and third descriptors aim at characterizing the brightness of the cell,whereas the others aim at discriminating between true and false kinetoplasts.

In conclusion, the initial set is constituted of 431 features extracted from thecell dataset.

The search of the best discriminant subset has been performed running greedystepwise and best first searches, both forward and backward. Then, it has beenrefined by an exhaustive search, taking into account the dimensionality of thedataset and of the features set. At the end of this process we individuate tendescriptors with the strongest capability to separate the samples in the fea-ture space. Such a set is composed of heterogeneous measures belonging to thefollowing four categories: intensity histogram, FT, circular LBP and morpho-logical descriptors (Table 1). Indeed, features related to intensity histogramfocus on intensity distribution. Skewness measures the asymmetry of the his-togram: the more negative the value, the more the number of high fluorescentpixels. For instance, consider cells depicted in panels A and B of Fig. 2: theskewness of the intensity histogram of the former is larger than skewness ofthe latter. The absolute maximum value of intensity histogram distinguishesbetween artefacts and non-artefacts regions since artefacts are usually lighterthan kinetoplasts. Grey-level co-occurrence matrix describes the image tex-ture, which varies between cells where only the kinetoplast is fluorescent andcells where the basal body, or the nucleus or artefacts are fluorescent. Theenergy of angular bins in FT spectrum catches information related to spatialfrequency in the image. For instance, the more the number of fluorescent ob-jects inside a cell, the higher the frequency in the spectrum and, hence, therelated energy. Circular LBP features describe image texture with reference

10

to circular information. Finally, the selected morphological descriptors catchinformation on shape and intensity of presumed kinetoplast and, due to theirdiscrimination strength, they compose one third of the final features vector.

Table 1Selected features for cell classification.

Category of features Features

Intensity histogram Skewness, value of absolute maximum

Grey-level co-occurrence matrix Inertia

Fourier transform Energy of angular bin

Circular LBP Mean of the output image histogram, number of relative maxima in the output image histogram

Morphological descriptors A, circularity and sum of pixels of the presumed kinetoplast

5 Experimental Evaluation

5.1 Data Set

Since, to our knowledge, there are not reference databases of IIF images pub-licly available, we populated a database of annotated images, using slides ofCL substrate at the fixed dilution of 1:10, as recommended by the guidelines[20,21]. Two specialists take CL images with an acquisition unit consisting ofthe fluorescence microscope coupled with a 50 W mercury vapour lamp andwith a digital camera. The camera has a CCD with squared pixel of equalside to 6.45µm. The images have a resolution of 1388x1038 pixels, a colourdepth of 24 bits and they are stored in bitmap format. We have used twodifferent magnifications, 25-fold and 50-fold, to test system robustness to cellsize variation.

We acquire 342 images, 74 positive (21.6%) and 268 negative (78.4%), be-longing to 63 wells, 15 positive (23.8%) and 48 negative (76.2%). 154 imageshave been acquired using 25-fold magnification and the remaining 188 usingthe 50-fold magnification. Since our approach requires the labels of individualcells to train the corresponding classifier, IIF specialists labelled a set of cellsbelonging to images with fluorescent cells. Hence, such images belong either topositive or negative classes, depending on which is the fluorescent structure.This procedure were carried out at a workstation monitor, since at the fluores-cence microscope is not possible to observe one cell at a time. Notice that theuse of digital images in IIF testing for diagnostic purposes has been recentlydiscussed [9]. At the end, the cells data set consists of 1487 cells belongingto 34 different wells. The cells are therefore labelled: 928 as positive (62.4%)and 559 as negative (37.6%). Of these 1487 cell images, 941 belong to imagesacquired using 25-fold magnification and 546 to images acquired using 50-foldmagnification.

11

5.2 Experimental Protocol

We investigate the performance that could be achieved in CL cell recognitionwhen popular classifiers belonging to different paradigms are used. In thisrespect, we test a Multi-Layer Perceptron (MLP) as a neural network, a NaıveBayes classifier as a bayesian classifier, a Support Vector Machine (SVM) asa kernel machine, and AdaBoost as an ensemble of classifiers. In the followingwe briefly report the experimental set-up of each classification architecture:

MLP We use a MLP with one hidden layer. The number of neurons in theinput layer is given by the number of the features, whereas the number ofneurons in the output layer is equal to two. Several preliminary tests havebeen carried out to determine the best configuration of the MLP in termsof number of neurons in the hidden layer: specifically, configurations from1 up to 50 neurons were tested.

Naıve Bayes The Naıve Bayes does not demand for specific set-up.SVM In the SVM experiments, we use a Radial Basis Function kernel. We

perform a grid search to determine the best setup of the SVM in termsof Gaussian width, σ, and of regularisation parameter, C. The searchesof σ and C have been carried out in the intervals [0.01; 30] and [0.1; 60],respectively.

AdaBoost For AdaBoost we use decision stumps for base hypotheses, explor-ing different numbers of iterations (in the interval [10; 100]) to determinethe best configuration.

To estimate the cell, image and well recognition performance we measure thefollowing parameters: global accuracy (Acc), true positive rate or sensitivity(Sens) , true negative rate or specificity (Spec), precision, and F-measure[32]. We also draw ROC curves and use the area-under-curve (AUC) for cellclassifiers comparison [33].

5.3 Results

In this subsection we sequentially detail the performance achieved in the clas-sification of individual cells, images and whole well, which are summarised inTable 2.

5.3.1 Cell Classification

Table 3 reports the performance of the four classification paradigms estimatedusing a cross validation approach that works at well level. Fig. 4 shows the cor-responding ROC curves. In our case traditional random k-fold cross validation

12

Table 2Performance of cell, image and whole well classification.

Acc (%) Sens (%) Spec (%) Precision (%) F-measure (%)

Cell classification 94.2 95.5 92.1 95.3 95.4

Image classification 99.4 98.6 99.6 98.6 98.6

Well classification 98.4 93.3 100.0 100.0 96.6

does not accurately estimate classifier performance since cells belonging to thesame well, and thus with similar features, can compose both the training andthe test sets. To overcome this limitation, we divided the set of 1487 cells into34 subsets, one for each well and then performed a 34-folds cross validation,where the cells of one well constitute the test set, and the others the trainingset.

Reported data point out that: i) SVMs outperforms other classifiers, ii) theaccuracy is balanced over the two classes although class distribution is skewedtowards the negative one. This finding confirms previous papers reporting thatSVMs have very good performance on binary classification tasks [34]. In thefollowing subsections we report the results achieved using SVMs only, with34-folds cross validation.

Let us analyse SVM performance with respect to different magnifications usedduring the image acquisition phase. Recognition accuracies on cell imagesacquired at 25-fold and 50-fold are 94.37% and 93.96%, respectively. Thesevalues are very close together, suggesting that cell classifier is robust to cellsize variation.

Table 3Performance measures of the classifiers labelling Crithidia Luciliae cells, estimatedusing 34-fold cross validation. In parenthesis we report the best configuration ofeach classifier: in case of MLP the number represents the neurons in the hiddenlayer, in case of SVM the values are C and σ, and in case of Adaboost the valuecorresponds the number of iterations.

Classifier Acc (%) Sens(%) Spec (%) Precision (%) F-measure (%) AUC

Naive Bayes 88.7 87.7 90.3 93.8 90.7 0.914

MLP (40) 93.0 94.5 90.5 94.3 94.4 0.973

AdaBoost (50) 93.8 95.6 90.7 94.5 95.0 0.984

SVMs (17,1) 94.2 95.5 92.1 95.3 95.4 0.984

5.3.2 Image Classification

With reference to the classification approach presented in section 3, imageclassification phase consists of two steps.

13

Fig. 4. ROC curves of classifiers labelling single cells. The values of AUC are reportedin Table 3

First, we apply a threshold-based classification trying to detect negative im-ages. The corresponding confusion matrix is reported in Table 4. Interestingly,all positive images pass this phase, whereas the 17.5% of negative imagesare misclassified. Such errors are expected, since threshold-based classificationlooks for fluorescent connected regions, corresponding to presumed kineto-plast. However, this step does not permit to get satisfactory performance inwell classification, since the discrimination between true and false positivesamples remains an open issue.

Hence, the second step applies cell classifier and MV to recognize if the imageis positive, i.e. it contains true fluorescent kinetoplasts. Table 5 presents theconfusion matrix computed on the 121 images labelled as positive by theprevious step. The results show that only two images have been misclassified,proving that the approach using several cells is an effective choice to labelsingle images.

Table 6 and second row of Table 2 report the performance of the whole imageclassification phase, integrating the results of threshold-based classificationand cell recognition. Comparing Table 4 with Table 6 notice that most ofthe false positive images are now correctly classified. Nevertheless, one truepositive image is now misclassified.

With reference to performance achieved on image acquired using 25-fold and50-fold magnifications, notice that threshold-based classification does not de-pend on this parameter. Indeed, classification accuracies on images acquiredusing 25-fold and 50-fold magnifications are 99.35% and 99.47%, respectively.These results confirm that the system is robust to variation of magnifications

14

Table 4Confusion matrix of threshold-based image classification.

Hypothesised class

Positive Negative

True Class Positive 74 (100.0%) 0 (0.0%)

Negative 47 (17.5%) 221 (82.5%)

Table 5Confusion matrix achieved when the label of images containing presumed kineto-plasts are determined on the basis of single cells classification.

Hypothesised class

Positive Negative

True Class Positive 73 (98.6%) 1 (1.4%)

Negative 1 (2.1%) 46 (97.9%)

Table 6Confusion matrix of overall image classification.

Hypothesised class

Positive Negative

True Class Positive 73 (98.6%) 1 (1.4%)

Negative 1 (0.4%) 267 (99.6%)

5.3.3 Well Classification

The integration of information from multiple images of the same wells enablesus to recover the possible misclassifications occurred at the previous steps.The approach correctly classified the 98.4% of wells (62 out of 63), while onewell has been rejected since it is composed of two images labelled to the twoopposite classes. Indeed, in this case the system suspends its decision andasks the physician for the final decision. Third row of Table 2 reports thecorresponding performance.

5.3.4 Classification Time

In order to analyse the practical applicability of the CAD, we measure theclassification time (Matlab 7 code processed with 2.5 GHz Intel CPU and 3 GBof RAM). Threshold-based classification is the first step of image classification:it is dedicated to detect negative images (section 3.1). This step uses 0.23s±0.09s. The second step of image classification applies cell classifier and MVto classify the images. In this case, classification time is 6.06s ± 1.81s and7.61s ± 2.62s for images acquired using 25-fold and 50-fold magnifications,respectively. It is worth noting that such a difference depends upon cell size

15

and number: 25-fold images have more but smaller cells than 50-fold images.Furthermore, in the 34-fold cross validation we need 2.4s to train the cellclassifier. To classify a well, we therefore need, at maximum, between 30s and40s.

In order to reduce this time, we analyse processing time and found, as ex-pected, that computing circular LBP features is the most time consumingoperation. Hence, we write the relative code in C++, compiling a library andimporting it in Matlab. In this way, the classification time is 3.33s±0.86s and4.24s ± 1.46s for images acquired using 25-fold and 50-fold magnifications,dropping by 55% with respect to Matlab code.

5.4 Discussion

The rationale of the system for CL wells recognition lies in the peculiaritiesof these images. Indeed, a trivial approach could classify the images applyingthe threshold-based classification, only. However, in this case the recognitionaccuracy drops to 85.7% (54 wells out of 63), whereas the reject and misclassi-fication rates raise to 4.7% and 9.6%, respectively. Hence, we introduce the cellrecognition step applied to images labelled as positive by the threshold-basedclassification phase in order to detect those images containing true fluores-cent kinetoplasts. In this way we exploit the benefits of both threshold-basedclassification and individual cell recognition. Indeed, the former minimises theoccurrence of false negative classifications, whereas the latter permits to dis-tinguish between true and false positive images. Furthermore, the applicationof cell segmentation algorithm and discriminant features analysis only on im-ages containing cells with fluorescent structures has permitted, on the onehand, to develop a robust segmentation algorithm and, on the other hand, toget a set of descriptors with a strong separation capability between positiveand negative cells.

6 System Validation

CAD systems can be a valuable tool for physicians since they may offer asupport in different working scenarios, as reported in section 1. However,physicians are often doubtful of performance of such systems, because theirapplication in clinical practice requires a careful assessment in a daily routinefashion.

In this respect, we have validated our system using 48 consecutive sera ofoutpatients and inpatients of the Campus Bio-Medico, University Hospital of

16

Table 7Confusion matrix of single image classification computed during system validation

in clinical routine.Hypothesised class

Positive Negative

True Class Positive 32 (94.1%) 2 (5.9%)

Negative 1 (0.8%) 130 (99.2%)

Rome, Italy screened for anti-dsDNA by IIF on CL slides. The serum of eachpatient is dispensed on one well, and several non overlapping images have beenacquired using 50-fold magnification. The a priori probability of positive andnegative classes were 22.9% and 77.1% (11 and 37 wells), respectively. In totalwe have collected 165 images, 34 positive (20.6%) and 131 negative (79.4%).Each well has been classified according to scheme reported in Fig. 3: the systemfirst labels its images and then takes the final decision. Cell classifier is a SVMtrained over the cell dataset presented in section 5.1.

Table 7 reports the confusion matrix of single image classification, whereasfirst row of Table 8 presents the derived performance measures. Comparingthem with Table 6 and second row of Table 2, we notice that the CAD stillexhibits very good performance.

Table 8Performance at image and well level computed during system validation in clinicalroutine.

Acc (%) Sens (%) Spec (%) Precision (%) F-measure (%)

Image classification 98.2 94.1 99.2 97.0 95.5

Well classification 100.0 100.0 100.0 100.0 100.0

Let us now turn the attention to well classification, which represents the mostimportant outcome in terms of clinical application. The CAD correctly clas-sified all the 48 wells, confirming it can be applied in clinical routine.

7 Conclusion

In this paper, we have presented a system that supports the classification CLwells prepared by IIF. It is based on three main decision steps, namely cell,image and well classification integrating heterogeneous information about thewell. In particular, the two steps applied for image classification, i.e. threshold-based classification and single cells recognition, allow to minimise false nega-tive and false positive classifications, respectively. Furthermore, MV rule ap-plied at well recognition level enables us to recover possible errors providedby image classification.

17

The proposed approach permits to achieve good results in a field of laboratorymedicine characterised by high variability. System validation, carried out ina clinical fashion, has shown that automation is a viable alternative for CLanalysis in IIF. It is therefore reasonable using the CAD system in differentworking scenarios, e.g. mass screening campaigns, second reader, etc..

Future works are directed toward the integration of the system that classifiesHEp-2 slides with the one presented in this paper. The goal is a comprehensiveCAD supporting different sides of IIF tests.

References

[1] K. M. Uramoto, C. J. Michet, J. Thumboo, J. Sunku, W. M. O’Fallon, S. E.Gabriel, Trends in the incidence and mortality of systemic lupus erythematosus,1950–1992, Arthritis & Rheumatism 42 (1) (1999) 46–50.

[2] M. Govoni, G. Castellino, S. Bosi, N. Napoli, F. Trotta, Incidence andprevalence of systemic lupus erythematosus in a district of North Italy, Lupus15 (2) (2006) 110–113.

[3] C. Stahl-Hallengren, A. Jonsen, O. Nived, G. Sturfelt, Incidence studies ofsystemic lupus erythematosus in Southern Sweden: increasing age, decreasingfrequency of renal manifestations and good prognosis, The Journal ofrheumatology 27 (3) (2000) 685–691.

[4] C. A. Peschken, J. M. Esdaile, Systemic lupus erythematosus in North AmericanIndians: a population based study, The Journal of rheumatology 27 (8) (2000)1884–1891.

[5] C. C. Mok, K. W. Lee, C. T. K. Ho, C. S. Lau, R. Wong, A prospective study ofsurvival and prognostic indicators of systemic lupus erythematosus in a southernChinese population, Rheumatology 39 (2000) 399–406.

[6] V. Bellomio, A. Spindler, E. Lucero, A. Berman, M. Santana, C. Moreno,et al., Systemic lupus erythematosus: mortality and survival in Argentina. Amulticenter study, Lupus 9 (5) (2000) 377–381.

[7] M. C. Hochberg, Updating the American College of Rheumatology revisedcriteria for the classification of systemic lupus erythematosus, Arthritis &Rheumatism 40 (9) (1997) 1725.

[8] R. Hiemann, T. Buttner, T. Krieger, D. Roggenbuck, U. Sack, K. Conrad,Challenges of automated screening and differentiation of non-organ specificautoantibodies on HEp-2 cells, Autoimmunity Reviews 9 (1) (2009) 17–22.

[9] A. Rigon, P. Soda, D. Zennaro, G. Iannello, A. Afeltra, Indirectimmunofluorescence in autoimmune diseases: assessment of digital images fordiagnostic purpose, Cytometry 72 (6) (2007) 472–7.

18

[10] B. N. Pham, S. Albarede, P. Maisonneuve, Impact of external quality assessmenton antinuclear antibody detection performance, Lupus 14 (2) (2005) 113–119.

[11] A. Piazza, F. Manoni, A. Ghirardello, D. Bassetti, D. Villalta, M. Pradella,P. Rizzotti, Variability between methods to determine ANA, anti-dsDNA andanti-ENA autoantibodies: a collaborative study with the biomedical industry,Journal of Immunological Methods 219 (1–2) (1998) 99–107.

[12] L. Song, E. J. Hennink, I. T. Young, H. J. Tanke, Photobleaching kinetics offluorescein in quantitative fluorescence microscopy, Biophysical Journal 68 (6)(1995) 2588–2600.

[13] Das s.r.l., Service manual AP22 Speedy IF, Palombara Sabina (RI), 5th Edition(March 2010).

[14] Bio-Rad Laboratories Inc., Service manual PhD System, Hercules, CA (April2004).

[15] P. Perner, H. Perner, B. Muller, Mining knowledge for HEp-2 cell imageclassification, Artificial Intelligence in Medicine 26 (1) (2002) 161–173.

[16] U. Sack, S. Knoechner, H. Warschkau, U. Pigla, F. Emmerich, M. Kamprad,Computer-assisted classification of HEp-2 immunofluorescence patterns inautoimmune diagnostics, Autoimmunity Reviews 2 (5) (2003) 298–304.

[17] R. Hiemann, N. Hilger, J. Michel, U. Anderer, M. Weigert, U. Sack, Principles,methods and algorithms for automatic analysis of immunofluorescence patternson HEp-2 cells, Autoimmunity Reviews (2006) 86.

[18] P. Soda, G. Iannello, M. Vento, A multiple experts system for classifyingfluorescence intensity in antinuclear autoantibodies analysis, Pattern Analysis& Applications 12 (3) (2009) 215–226.

[19] P. Soda, G. Iannello, Aggregation of classifiers for staining pattern recognitionin antinuclear autoantibodies analysis, IEEE Transactions on InformationTechnology in Biomedicine 13 (3) (2009) 322–329.

[20] R. Tozzoli, N. Bizzaro, E. Tonutti, D. Villalta, D. Bassetti, F. Manoni, et al.,Guidelines for the laboratory use of autoantibody tests in the diagnosis andmonitoring of autoimmune rheumatic diseases, American Journal of ClinicalPathology 117 (2) (2002) 316–324.

[21] CDC, Center for Disease Control, Quality assurance for the indirectimmunofluorescence test for autoantibodies to nuclear antigen (IF-ANA):approved guideline, NCCLS I/LA2-A 16 (11).

[22] H. D. Cheng, X. Cai, X. Chen, L. Hu, X. Lou, Computer-aided detectionand classification of microcalcifications in mammograms: a survey, PatternRecognition 36 (12) (2003) 2967–2991.

[23] B. Van Ginneken, B. M. Ter Haar Romeny, M. A. Viergever, Computer-aideddiagnosis in chest radiography: a survey, IEEE Transactions on Medical Imaging20 (12) (2001) 1228–1241.

19

[24] O. Tuzel, L. Yang, P. Meer, D. J. Foran, Classification of hematologicmalignancies using texton signatures, Pattern Analysis & Applications 10 (4)(2007) 1–14.

[25] Das s.r.l., SL-IM, Palombara Sabina, Italy (2009).URL www.dasitaly.com

[26] ImageInterpret Ltd., Hep-2 pad, Leipzig, Germany (2009).URL www.imageinterpret.de

[27] Medipan GmbH, Aklides, Dahlewitz, Germany (2009).URL http://medipan.de

[28] N. Otsu, A threshold selection method from gray-level histograms, IEEETransactions on Systems, Man, and Cybernetics 9 (1) (1970) 62–66.

[29] P. Viola, M. Jones, Rapid object detection using a boosted cascade of simplefeature, in: Computer Vision and Pattern Recognition, 2001. CVPR 2001.Proceedings of the 2001 IEEE Computer Society Conference on, Vol. 1, 2001,pp. 511–518.

[30] T. Ojala, M. Pietikainen, D. Harwood, A comparative study of texture measureswith classification based on feature distribution, Pattern Recognition 29 (1)(1996) 51–59.

[31] T. Ojala, M. Pietikainen, T. Maenpaa, Multiresolution gray-scale and rotationinvariant texture classification with local binary pattern, IEEE Transactions onPattern Analysis and Machine Intelligence 24 (7) (2002) 971–987.

[32] R. O. Duda, P. E. Hart, D. G. Stork, Pattern classification, Wiley Interscience,New York, 2001.

[33] T. Fawcett, An introduction to ROC analysis, Pattern recognition letters 27 (8)(2006) 861–874.

[34] V. N. Vapnik, The Nature of Statistical Learning Theory, Springer-Verlag,Berlin, 1995.

20