Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
Medical Imaging with Deep Learning Tutorial
Chapter 1 - Radiology and Multi-View
Chapter 2 - Histology and Segmentation
Chapter 3 - Cell Counting
Chapter 4 - Incorrect Feature Attribution
Chapter 5 - GANs in Medical Imaging
Slides by Joseph Paul Cohen 2020Email: [email protected]
License: Creative Commons Attribution-Sharealike
Watch online
2
Chapter 1
Radiology and Multi-View
Common X-ray projections/views
PA = PosteroAnterior = BackFront
Image: [Bustos, “PadChest: A Large Chest x-Ray Image Dataset with Multi-Label Annotated Reports.” 2019]
LR
Most common
Ronald SummersNIH Clinical Center
Released 2017, first large scale chest X-ray dataset
>100k frontal images released as public domain
Enabled the deep learning radiology revolution
Chest X-ray14 Dataset
Stanford Pneumonia study
https://stanfordmlgroup.github.io/projects/chexnet/
In 2017 Pranav Rajpurkar and Jeremy Irvin trained a DenseNet on NIH data scaled to 224x224 pixels
Set the benchmark performance which has not been significantly improved.
They evaluated pneumonia predictions against 4 radiologists.
"We find that the model exceeds the average radiologist performance on the pneumonia detection task."
Criticism of the Chest X-ray14 Dataset
https://lukeoakdenrayner.wordpress.com/2017/12/18/the-chestxray14-dataset-problems/
In 2017 Luke Oakden-Rayner published a blog post discussing issues with the labels in the NIH data.
This led to more work on automatic label extraction.
In a sample of images red are said to be wrong
2019: the year of chest X-ray data
PADCHEST160k imagesMultiple views
Almost 200 labels
27% hand labelled, others using an RNN.
License:Creative Commons Attribution-ShareAlike
CheXpert224k images
PA and L views13 labels.
Automated rule-based labeler
Non-commercial research purposes only
MIMIC-CXR377k images
PA and L views13 labels.
Automated rule-based labeler. NIH (NegBio) and CheX
labelers ran.
Non-commercial research purposes only. Confidentially
training required.
8/28
PADCHEST, ~200 labels
27% hand labelled, others using an RNN.
CheXpert, 13 labels
Custom rule-based labeler.
MIMIC-CXR, 13 labels
Automated rule-based labeler. NIH (NegBio) and
CheX labelers used.
NIH chest X-ray1414 labels
Automated rule-based labeler (NegBio)
RSNA Pneumonia KaggleRelabelled NIH data
A group at Google relabelled a subset of NIH images
MeSH automatic labeller
Many datasets exist with different methods of obtaining labels. Automatic or hand labelled
PALateral
Flattened diaphragm
Pleural effusion
Multi-modal/view inference (X-ray use case)
Here saliency maps are from models trained on single views.
These two tasks perform better when using lateral views.
[Bertrand, 2019]
Also: Multi-modal/view inference (MRI use case)
T1 T2
T1C Flair
Ischemic stroke lesion segmentation
(ISLES dataset)
Stroke perfusion estimation
Brain tumor segmentation
(BraTS dataset)
Image Credit: Mohammad Havaei
Patient 1
Patient 2
Patient 3
Incomplete
Input!
Expected:
Given:
Challenge: missing modalities/views
Integrating multiple views
Combine images right at the input
Take mean of activations in the
middle of the network
Concat output features of two
models with single prediction
Three losses. A network for each
modality with losses that regularize each
network.
Image: [Hashir, Quantifying the Value of Lateral Views in Deep Learning for Chest X-rays, 2020]
Integrating multiple views (X-ray images)
All models are about equal in performance given the right hyperparameters. Hyperparameter tuning is easier on some models but not others
Image: [Hashir, Quantifying the Value of Lateral Views in Deep Learning for Chest X-rays, 2020]
Chapter 1 - ReferencesBustos, A., Pertusa, A., Salinas, J.-M., & de la Iglesia-Vayá, M. (2019). PadChest: A large chest x-ray image dataset with multi-label
annotated reports. ArXiv Preprint. http://arxiv.org/abs/1901.07441Wang, X., Peng, Y., Lu, L., Lu, Z., Bagheri, M., & Summers, R. M. (2017). ChestX-ray8: Hospital-scale Chest X-ray Database and
Benchmarks on Weakly-Supervised Classification and Localization of Common Thorax Diseases. Computer Vision and Pattern Recognition. https://doi.org/10.1109/CVPR.2017.369
Rajpurkar, P., Irvin, J., Zhu, K., Yang, B., Mehta, H., Duan, T., Ding, D., Bagul, A., Langlotz, C., Shpanskaya, K., Lungren, M. P., & Ng, A. Y. (2017). CheXNet: Radiologist-Level Pneumonia Detection on Chest X-Rays with Deep Learning. Arxiv. http://arxiv.org/abs/1711.05225
Viviano, J. D., Simpson, B., Dutil, F., Bengio, Y., & Cohen, J. P. (2019). Underwhelming Generalization Improvements From Controlling Feature Attribution. Arxiv:1910.00199. http://arxiv.org/abs/1910.00199
Hashir, M., Bertrand, H., & Cohen, J. P. (2020). Quantifying the Value of Lateral Views in Deep Learning for Chest X-rays. Medical Imaging with Deep Learning.
Havaei, M., Guizard, N., Chapados, N., & Bengio, Y. (2016). HeMIS: Hetero-modal image segmentation. Medical Image Computing and Computer Assisted Intervention, 9901 LNCS. https://doi.org/10.1007/978-3-319-46723-8_54
Rubin, J., Sanghavi, D., Zhao, C., Lee, K., Qadir, A., & Xu-Wilson, M. (2018, April 20). Large Scale Automated Reading of Frontal and Lateral Chest X-Rays using Dual Convolutional Neural Networks. Machine Intelligence in Medical Imaging.
Cohen, J. P., Hashir, M., Brooks, R., & Bertrand, H. (2020). On the limits of cross-domain generalization in automated X-ray prediction. Medical Imaging with Deep Learning. https://arxiv.org/abs/2002.02497
Cohen, J. P., Viviano, J., Hashir, M., & Bertrand, H. (2020). TorchXRayVision: A library of chest X-ray datasets and models. https://github.com/mlmed/torchxrayvision
15
Chapter 2
Histology and Segmentation
Peter Bandi, et al. From detection of individual metastases to classification of lymph node status at the patient level: the CAMELYON17 challenge. IEEE-TMI 2018
CAMELYON17: A large high resolution open histology dataset for cancer detection
CAMELYON17 Dataset1000 whole-slide images (WSIs) of sentinel lymph node. (~3GB each!)5 medical centers. 40 patients from each center. 5 whole-slide images per patient.
https://colab.research.google.com/drive/13T9s3weexAw6YskKoY6c-VvoUgUvWsqf
Starting with a full slide image of breast tissue. Image is labelled as IDC or not Image is chopped into patches
and labelled as IDC or not
Patch wise segmentationUse case: Invasive Ductal Carcinoma (most common subtype of all breast cancers)
Slide design: Fei-Fei Li & Andrej Karpathy & Justin Johnson
Extract patch
Run througha CNN
Classify center pixel
CNN p(cancer)
Class imbalance is an issue. Patch wise training allows easy balancing of classes using standard methods.
Patch wise segmentationUse case: Invasive Ductal Carcinoma (most common subtype of all breast cancers)
Fully convolutional processing
Kernel size 3
Kernel size 2
Input size 4
Output size 1
Fully convolutional processing
Kernel size 3
Kernel size 2
Input size 5
Output size 2
Fully convolutional processing
Kernel size 3
Kernel size 2
Input size 5
Output size 2
Input size 4Input size 4
Fully convolutional processing
Kernel size 3
Kernel size 2
Input size 5
Output size 2
Model's receptive field = 4 nodes
Multiplications saved = 4
Allows for very fast inference.
However, training this way requires a lot of memory. Need to save past outputs.
Patch wise training together with FCN inference is a good balance.
Input size 4Input size 4
https://colab.research.google.com/drive/13T9s3weexAw6YskKoY6c-VvoUgUvWsqf
Input image Output class 0 Output class 1 class 1 > class 0
Noh et al, “Learning Deconvolution Network for Semantic Segmentation”, ICCV 2015Slide design: Fei-Fei Li & Andrej Karpathy & Justin Johnson
Normal VGG “Upside down” VGG
Recap: Segmentation using a bottleneck
Upsampling possible with ● Unpooling● Transposed convolutions
Recap: U-NET
Difference: Skip connections (like resnet)
Dogma: skips carry spatial information, bottleneck carries
high level structure.
Segmentation metricsgt pred
True PositiveTrue Negative
False Negative
False Positive
IoU=0.4 IoU=0.7 IoU=0.9
Training with dice
Using the dot product to compute the intersection allows for a differentiable loss.
For multiple classes a basic approach is to average over all
classes
Exercise: What p maximizes this?
More reading: https://arxiv.org/abs/1707.00478
Use a sigmoid or a softmax to restrict output.
Images provided by Konrad Wagstyl (University College London) 2020
input gt seg
Predicted p(cortex) Edge prediction pred seg
Baseline
With edge prediction
Tricks: Improving edges in segmentations by predicting edges
Brain histology image
More reading about idea: [Polzounov, WordFence: Text Detection in Natural Images with Border Awareness, 2017]
Ground truthp(cortex)
Task: segment cortical layers in brain histology
Model output
None
Challenge: extreme class imbalance (e.g. lung nodule)
Background classes can dominates the loss and cause learning instability do to large
gradients.
Balanced sampling may not work as well because patches which could yield false
positives are rarely seen to train on.
CASED importance sampling for large images
[Jesson, https://arxiv.org/abs/1807.10819 ]
General Idea:
Store a probability for each patch.
Generate patches based on this probability.
Probability is inverse of how well your model performs on that patch.
Samples are stratified by class.
Chapter 2 - References
Fidon, L., Li, W., Garcia-Peraza-Herrera, L. C., Ekanayake, J., Kitchen, N., Ourselin, S., & Vercauteren, T. (2017). Generalised Wasserstein Dice Score for Imbalanced Multi-class Segmentation using Holistic Convolutional Networks. http://arxiv.org/abs/1707.00478
Litjens, G., Bandi, P., Bejnordi, B. E., et al (2018). 1399 H&E-stained sentinel lymph node sections of breast cancer patients: The CAMELYON dataset. In GigaScience (Vol. 7, Issue 6). https://doi.org/10.1093/gigascience/giy065
Noh, H., Hong, S., & Han, B. (2015). Learning Deconvolution Network for Semantic Segmentation. International Conference on Computer Vision. https://arxiv.org/abs/1505.04366
Polzounov, A., Ablavatski, A., Escalera, S., Lu, S., & Cai, J. (2018). Wordfence: Text detection in natural images with border awareness. International Conference on Image Processing, 2017-Septe, 1222–1226. https://doi.org/10.1109/ICIP.2017.8296476
Jesson, A., Guizard, N., Ghalehjegh, S. H., Goblot, D., Soudan, F., & Chapados, N. (2017, September 10). CASED: Curriculum Adaptive Sampling for Extreme Data Imbalance. Medical Image Computing and Computer Assisted Intervention. https://doi.org/10.1007/978-3-319-66179-7_73
Janowczyk, A., & Madabhushi, A. (2016). Deep learning for digital pathology image analysis: A comprehensive tutorial with selected use cases. Journal of Pathology Informatics, 7(1), 29. https://doi.org/10.4103/2153-3539.186902
32
Chapter 3
Cell Counting
Use case: Proliferation/Cell growth studies
Treat cells with different compounds and observe proliferation over time
Standard 96-well plate
Bachstetter, MW151 Inhibited IL-1? Levels after Traumatic Brain Injury with No Effect on Microglia Physiological Responses, PLOS ONE, 2017
Use case: Proliferation/Cell growth studies
Complicated cell structure
Use case: Counting in histology slides
1. Create binary segmentation image
2. Watershed segmentation
3. Isolate and count
Cell counting (classic CV)
1. Create binary segmentation image
2. Watershed segmentation
3. Isolate and count
Cell counting (classic CV)
1. Create binary segmentation image
2. Watershed segmentation
3. Isolate and count
Cell counting (classic CV)
1. Create binary segmentation image
2. Watershed segmentation
3. Isolate and count
Cell counting (classic CV)
1. Create binary segmentation image
2. Watershed segmentation
3. Isolate and count
Cell counting (classic CV)
This works well on easy tasks but doesn't scale.
"Pipelines" end up breaking on new images with different lighting or stain.
How to get labels?
V. Lempitsky and A. Zisserman, “Learning To Count Objects in Images,” 2010.
Counting via Segmentation
Targets for regressionSigma is typically small
like a few pixels
V. Lempitsky and A. Zisserman, “Learning To Count Objects in Images,” 2010.
Counting via Segmentation
Targets for regressionSigma is typically small
like a few pixels
Train model to regress
V. Lempitsky and A. Zisserman, “Learning To Count Objects in Images,” 2010.
Counting via Segmentation
To recover count:
Targets for regressionSigma is typically small
like a few pixels
Train model to regress
V. Lempitsky and A. Zisserman, “Learning To Count Objects in Images,” 2010.
Counting via Segmentation
To recover count:
Targets for regressionSigma is typically small
like a few pixels
Train model to regress
Note: Square kernels for redundant countingwork better [Cohen 2017]
Multiple output classes
Count and classify different cell types [Bidart 2018]
Counting and classifying also possible using multiple output channels.
Combine losses together
Max prediction over output channels for each cell identified
Chapter 3 - References
Lempitsky, V., & Zisserman, A. (2010). Learning To Count Objects in Images. Neural Information Processing Systems (NeurIPS).
Cohen, J. P., Boucher, G., Glastonbury, C. A., Lo, H. Z., & Bengio, Y. (2017). Count-ception: Counting by Fully Convolutional Redundant Counting. International Conference on Computer Vision Workshop on BioImage Computing. http://arxiv.org/abs/1703.08710
Xie, W., Noble, J. A., & Zisserman, A. (2016). Microscopy cell counting and detection with fully convolutional regression networks. Computer Methods in Biomechanics and Biomedical Engineering: Imaging & Visualization. https://doi.org/10.1080/21681163.2016.1149104
Gangeh, M. J., Bidart, R., Peikari, M., Martel, A. L., Ghodsi, A., Salama, S., & Nofech-Mozes, S. (2018). Localization and classification of cell nuclei in post-neoadjuvant breast cancer surgical specimen using fully convolutional networks. In M. N. Gurcan & J. E. Tomaszewski (Eds.), Medical Imaging 2018: Digital Pathology (Vol. 10581, p. 23). SPIE. https://doi.org/10.1117/12.2292815
BBBC021 - Human MCF7 cells – compound-profiling
RxRx1 - CellSignal: Disentangling biological signal from experimental noise
MBM - Modified Bone Marrow cell counting dataset
48
Chapter 4
Incorrect Feature Attribution
Incorrect feature attribution
[Ross, Right for the Right Reasons, 2017][Viviano, Underwhelming Generalization Improvements From Controlling Feature Attribution, 2019]
Goal: predict if there are two plus signs anywhere
However, an easy to spot confounder exists!
The confounding variable distracts the model causing it to fail to generalize.
Incorrect feature attribution
[Ross, Right for the Right Reasons, 2017][Viviano, Underwhelming Generalization Improvements From Controlling Feature Attribution, 2019]
Goal: predict if there are two plus signs anywhere
However, an easy to spot confounder exists!
The confounding variable distracts the model causing it to fail to generalize.
Incorrect feature attribution
[Ross, Right for the Right Reasons, 2017][Viviano, Underwhelming Generalization Improvements From Controlling Feature Attribution, 2019]
Goal: predict if there are two plus signs anywhere
However, an easy to spot confounder exists!
The confounding variable distracts the model causing it to fail to generalize.
We can observe this by looking at the saliency map
Incorrect feature attribution
Models can overfit to confounding variables in the data.
● Merging datasets with different class imbalance (confounding artifacts from each hospital)
● Labels confounding with each other
● Demographics confounding with labels
[Ross, Right for the Right Reasons, 2017][Zeck, Confounding variables can degrade generalization performance of radiological ..., 2018]
[Viviano, Underwhelming Generalization Improvements From Controlling Feature Attribution, 2019][Simpson, GradMask: Reduce Overfitting by Regularizing Saliency, 2019]
Incorrect feature attribution
Models can overfit to confounding variables in the data.
● Merging datasets with different class imbalance (confounding artifacts from each hospital)
● Labels confounding with each other
● Demographics confounding with labels
[Ross, Right for the Right Reasons, 2017][Zeck, Confounding variables can degrade generalization performance of radiological ..., 2018]
[Viviano, Underwhelming Generalization Improvements From Controlling Feature Attribution, 2019][Simpson, GradMask: Reduce Overfitting by Regularizing Saliency, 2019]
(10k images)
Example:Systematic discrepancy between average image in datasets
Incorrect feature attribution
Recall:NIH/PADCHEST Diff
[Viviano, Underwhelming Generalization Improvements From Controlling Feature Attribution, 2019]
Mitigation approachesFeature engineering
● Range normalization ( /max)● Subspace alignment (align data using their eigenbasis based on a feature) [Fernando 2014]● Removing the largest principle component (joint PCA and reconstruct without largest eigenvector)
55
Mitigation approachesFeature engineering
● Range normalization ( /max)● Subspace alignment (align data using their eigenbasis based on a feature) [Fernando 2014]● Removing the largest principle component (joint PCA and reconstruct without largest eigenvector)
During training
● Reverse gradient (make intermediate layer invariant to a label) [Ganin & Lempitsky, 2014]● Right for the Right Reasons (regularize saliency map) [Ross, Hughes, & Finale Doshi-Velez, 2017]● GradMask (regularize contrast saliency map between classes) [Simpson, 2019]● ActivDiff (regularize representation to focus on pathology) [Viviano, 2019]
56
What if feature artifact is correlated with target label?Is the reason that should be used for prediction known?What if it is not known?
Chapter 4 - References
Ganin, Y., & Lempitsky, V. (2015, September 26). Unsupervised Domain Adaptation by Backpropagation. Proceedings of the International Conference on Machine Learning (ICML). http://jmlr.org/proceedings/papers/v37/ganin15.html
Ross, A., Hughes, M. C., & Doshi-Velez, F. (2017). Right for the Right Reasons: Training Differentiable Models by Constraining their Explanations. International Joint Conference on Artificial Intelligence. https://github.com/dtak/rrr.
Viviano, J. D., Simpson, B., Dutil, F., Bengio, Y., & Cohen, J. P. (2019). Underwhelming Generalization Improvements From Controlling Feature Attribution. Arxiv:1910.00199. http://arxiv.org/abs/1910.00199
Simpson, B., Dutil, F., Bengio, Y., & Cohen, J. P. (2019, April 16). GradMask: Reduce Overfitting by Regularizing Saliency. Medical Imaging with Deep Learning Workshop. http://arxiv.org/abs/1904.07478
Fernando, B., Habrard, A., Sebban, M., & Tuytelaars, T. (2014). Subspace Alignment For Domain Adaptation. http://arxiv.org/abs/1409.5241
Zech, J. R., Badgeley, M. A., Liu, M., Costa, A. B., Titano, J. J., & Oermann, E. K. (2018). Variable generalization performance of a deep learning model to detect pneumonia in chest radiographs: A cross-sectional study. PLoS Medicine, 15(11). https://doi.org/10.1371/journal.pmed.1002683
Seyyed-Kalantari, L., Liu, G., McDermott, M., & Ghassemi, M. (2020). CheXclusion: Fairness gaps in deep chest X-ray classifiers. http://arxiv.org/abs/2003.00827
58
Chapter 5
GANs in Medical Imaging
Medical image-to-image translation considered harmful
MR -> CT CT -> PETSynthesized H&E staining
Adversarial losses are very good at distribution matching
(e.g. CycleGAN).But artifacts could be introduced
and then used in diagnosis which can be dangerous.
Many papers have proposed methods that can "translate between modalities"
But a bias in training data can lead to incorrect translation
T1 Transformed
Image Translation/Synthesis
Undersampled raw MRI
source data
60
Use case: MRI modality transformation
Cohen, Distribution Matching Losses Can Hallucinate Features in Medical Image Translation, 2018
Everyone is so healthy!
But a bias in training data can lead to incorrect translation
T1 Transformed
Everyone is so healthy!
T1 Real
Real Image
Image Translation/Synthesis
Source Image
61
Use case: MRI modality transformation
Cohen, Distribution Matching Losses Can Hallucinate Features in Medical Image Translation, 2018
Undersampled raw MRI
source data
Tumors here are a proxy to illustrate the impact of an unaccounted pathology
Cohen, Distribution Matching Losses Can Hallucinate Features in Medical Image Translation, 201862
63[Goldsborough, CytoGAN: Generative Modeling of Cell Images, 2017]
Latent space interpretation
Vector algebra:
Real Real
Example: CytoGAN learning a self-supervised representation for cell images.● Encoder can be useful for semi-supervised learning ● Exploring representations to understand the cell biology
Adversarial losses are useful for representation learning
Semi-supervised Segmentation with GANs
Images with segmentation labels
Images without segmentation labels
Predicted segmentations from unlabelled images
Semi-supervised Segmentation with GANs
Predicted segmentations from images that were trained on
Match distributions
Luc et al. "Semantic Segmentation using Adversarial Networks" 2016Zhang et al., "Deep Adversarial Networks for Biomedical Image Segmentation Utilizing Unannotated Images," 2017
Semi-supervised Segmentation with GANs
Segmentation Loss
E should predict 1 for labelled examples
Luc et al. "Semantic Segmentation using Adversarial Networks" 2016Zhang et al., "Deep Adversarial Networks for Biomedical Image Segmentation Utilizing Unannotated Images," 2017
Update discriminator
E
Update segmenter
S
E should predict 0 for unlabelled examples
Segmentation output should not make E predict 0
Explanation by Progressive Exaggeration
Train a classifier and generative model jointly while maintaining consistency between them.
[Singla et al. Explanation by Progressive Exaggeration. ICLR 2020]
Explainer function:(cf outputs a one hot)
Explanation by Progressive Exaggeration
[Singla et al. Explanation by Progressive Exaggeration. ICLR 2020]
Generating images conditioned on an over and under prediction of the model helps explain what aspects of the image were important in prediction.
Here we can see the heart enlarge or shrink.
Prediction (normalized heart size)
Chapter 5 - References
Cohen, J. P., Luck, M., & Honari, S. (2018). Distribution Matching Losses Can Hallucinate Features in Medical Image Translation. Medical Image Computing & Computer Assisted Intervention (MICCAI).
Goldsborough, P., Pawlowski, N., Caicedo, J. C., Singh, S., & Carpenter, A. E. (2017). CytoGAN: Generative Modeling of Cell Images. Workshop On Machine Learning In Computational Biology, Neural Information Processing Systems. https://doi.org/10.1101/227645
Luc, P., Couprie, C., Chintala, S., & Verbeek, J. (n.d.). Semantic Segmentation using Adversarial Networks. Retrieved December 5, 2017, from https://arxiv.org/pdf/1611.08408.pdf
Zhang, Y., Lin, Y., Chen, J., Fredericksen, M., Hughes, D. P., Chen, D. Z., Yang, L., Chen, J., Fredericksen, M., Hughes, D. P., & Chen, D. Z. (2017, September 10). Deep Adversarial Networks for Biomedical Image Segmentation Utilizing Unannotated Images. Medical Image Computing and Computer-Assisted Intervention. https://doi.org/10.1007/978-3-319-66179-7
Singla, S., Pollack, B., Chen, J., & Batmanghelich, K. (2020, November 1). Explanation by Progressive Exaggeration. International Conference on Learning Representations. http://arxiv.org/abs/1911.00483