2
Selective Scanning for faster Prostate Pathology Mayank Kabra ECE, U. C. San Diego [email protected] Yoav Freund CSE, U. C. San Diego [email protected] Steve Baird School of Medicine, U. C. San Deigo [email protected] Abstract Pathology slide scanning microscopes are now commercially available. These microscopes combine a digital microscope with a scanning mechanism to produce high magnification and high resolution images of complete specimens. These digital images have the potential of revolutionizing histopathology. For example, instead of sending glass slides through the mail, pathologists at different locations can consult with each other by sharing image files through the Internet. One of the main obstacles on the way to realizing this possibility is the very large size of the image files that are generated. A single image scanned at 400× resolution requires around 20GB before compression and around 1GB after compression using jpeg2000. As a result, a terabyte hard disk can store only a thousand or so images and sharing such large files over the Internet can result in large delays that would make collaborative consultation impractical. Pathologists, however, hardly ever examine the whole image at high resolution but concentrate on a few relevant fields. If only relevant fields are stored at higher resolution space can be saved, and if these fields of greatest diagnostic interest are transmitted first over the network delay can be reduced. We use a combination of computer vision and machine learning techniques to analyze low magnification images and select those parts of the specimen that should be scanned at high magnification. This is a similar problem to that of detecting cancer, however, the required accuracy is significantly lower. Instead of using computer vision to produce a diagnosis we use computer vision to eliminate those fields of the slide which are unlikely to yeild useful information for human diagnosticians. Examples of prostate tissue sections are shown in Figure 1. Areas of interest marked by an experienced pathologist at low resolution are used as a training set for computer learning (see Figure 2). To discriminate between cancerous and non cancerous fields we generate a 23 bin color histogram for each 20x20 region of the image. We choose to use histograms as different regions have a specific mix of stroma(pink), nuclei(blue) and gland lumens(white) which are explicit in histograms. We use Adaboost to generate a mapping from histograms to scores where high scores correspond to high confidence that the region is cancerous (see Figure 2 (c)). Adaboost [3] is a general-purpose learning algorithm which is particularly powerful when combining many weakly predictive features. The predictive features in this case are the values of the histogram bins. Specifically, we use the alternating decision tree algorithm [2] as implemented in the Jboost open source software package [1]. The classifier built using the histogram is able to separate connective tissue and non-malignant glands from malignant fields. Using the classifier, around 80-90% of the image can be discarded while still keeping all the parts relevant for diagnosis. This decreases the space required for storage and time for transmission by a factor of ten. Our experiments show that our system can discriminate well between malignant and non-malignant fields. However, it is not good at discriminating between malignant fields and hyperplasia. The reason is that color distribution does not provide sufficient information to make this discrimination. We are currently working on features for characterizing the shape of the lumens. We believe that by adding these features into the input for Adaboost we will be able to improve the ability of the system to discriminate between malignant fields and hyperplasia. References [1] Jboost. http://www.cs.ucsd.edu/aarvey/jboost/. [2] Yoav Freund and Llew Mason. The alternating decision tree learning algorithm. In Machine Learning: Proceed- ings of the Sixteenth International Conference, pages 124–133, 1999. 1

Selective Scanning for faster Prostate Pathology

Embed Size (px)

Citation preview

Page 1: Selective Scanning for faster Prostate Pathology

Selective Scanning for faster Prostate Pathology

Mayank KabraECE, U. C. San Diego

[email protected]

Yoav FreundCSE, U. C. San Diego

[email protected]

Steve BairdSchool of Medicine, U. C. San Deigo

[email protected]

Abstract

Pathology slide scanning microscopes are now commercially available. These microscopes combine a digital microscope witha scanning mechanism to produce high magnification and high resolution images of complete specimens. These digital imageshave the potential of revolutionizing histopathology. For example, instead of sending glass slides through the mail, pathologists atdifferent locations can consult with each other by sharing image files through the Internet.

One of the main obstacles on the way to realizing this possibility is the very large size of the image files that are generated.A single image scanned at 400× resolution requires around 20GB before compression and around 1GB after compression usingjpeg2000. As a result, a terabyte hard disk can store only a thousand or so images and sharing such large files over the Internet canresult in large delays that would make collaborative consultation impractical.

Pathologists, however, hardly ever examine the whole image at high resolution but concentrate on a few relevant fields. If onlyrelevant fields are stored at higher resolution space can be saved, and if these fields of greatest diagnostic interest are transmittedfirst over the network delay can be reduced.

We use a combination of computer vision and machine learning techniques to analyze low magnification images and selectthose parts of the specimen that should be scanned at high magnification. This is a similar problem to that of detecting cancer,however, the required accuracy is significantly lower. Instead of using computer vision to produce a diagnosis we use computervision to eliminate those fields of the slide which are unlikely to yeild useful information for human diagnosticians.

Examples of prostate tissue sections are shown in Figure 1. Areas of interest marked by an experienced pathologist at lowresolution are used as a training set for computer learning (see Figure 2).

To discriminate between cancerous and non cancerous fields we generate a 23 bin color histogram for each 20x20 region of theimage. We choose to use histograms as different regions have a specific mix of stroma(pink), nuclei(blue) and gland lumens(white)which are explicit in histograms.

We use Adaboost to generate a mapping from histograms to scores where high scores correspond to high confidence that theregion is cancerous (see Figure 2 (c)). Adaboost [3] is a general-purpose learning algorithm which is particularly powerful whencombining many weakly predictive features. The predictive features in this case are the values of the histogram bins. Specifically,we use the alternating decision tree algorithm [2] as implemented in the Jboost open source software package [1].

The classifier built using the histogram is able to separate connective tissue and non-malignant glands from malignant fields.Using the classifier, around 80-90% of the image can be discarded while still keeping all the parts relevant for diagnosis. Thisdecreases the space required for storage and time for transmission by a factor of ten.

Our experiments show that our system can discriminate well between malignant and non-malignant fields. However, it is notgood at discriminating between malignant fields and hyperplasia. The reason is that color distribution does not provide sufficientinformation to make this discrimination. We are currently working on features for characterizing the shape of the lumens. Webelieve that by adding these features into the input for Adaboost we will be able to improve the ability of the system to discriminatebetween malignant fields and hyperplasia.

References[1] Jboost. http://www.cs.ucsd.edu/∼aarvey/jboost/.

[2] Yoav Freund and Llew Mason. The alternating decision tree learning algorithm. In Machine Learning: Proceed-ings of the Sixteenth International Conference, pages 124–133, 1999.

1

Page 2: Selective Scanning for faster Prostate Pathology

a b c d

Figure 1: a: Glands in prostate tissue have epithelial cells (blue) neatly arranged on the edge of gland lumen. b: Connective tissueis less complex and has very few nuclei. c: Region with prostate cancer. Malignant cells divide inappropriately, modify the structureof glands and produce smaller lumens. d: Hyperplasia, where excess growth in epithelial cells leads to papillary in-folding whilestill maintaining the gland structure. Hyperplasia is benign and also needs to be separated from malignancy.

a b c

Figure 2: Results: a: Training Image b: Areas of training image marked as suspicious. c: Scores for each pixel of the image afterusing Adaboost. Warmer colors indicate high scores while cooler colors correspond to lower scores. Higher resolution images areavailable at http://xxxxx

[3] Yoav Freund and Robert E. Schapire. A short introduction to boosting. Journal of Japanese Society for ArtificialIntelligence, 14(5):771–780, September 1999. Appearing in Japanese, translation by Naoki Abe, Available fromhttp://www.cse.ucsd.edu/∼yfreund/papers/.

[4] Robert E. Schapire, Yoav Freund, Peter Bartlett, and Wee Sun Lee. Boosting the margin: A new explanation forthe effectiveness of voting methods. The Annals of Statistics, 26(5):1651–1686, October 1998.

2