A case-based interpretable deep learning model for classification of mass lesions in digital mammography

Barnett, Alina Jade; Schwartz, Fides Regina; Tao, Chaofan; Chen, Chaofan; Ren, Yinhao; Lo, Joseph Y.; Rudin, Cynthia

doi:10.1038/s42256-021-00423-x

Article
Published: 15 December 2021

A case-based interpretable deep learning model for classification of mass lesions in digital mammography

Nature Machine Intelligence volume 3, pages 1061–1070 (2021)Cite this article

5110 Accesses
77 Citations
86 Altmetric
Metrics details

Subjects

Abstract

Interpretability in machine learning models is important in high-stakes decisions such as whether to order a biopsy based on a mammographic exam. Mammography poses important challenges that are not present in other computer vision tasks: datasets are small, confounding information is present and it can be difficult even for a radiologist to decide between watchful waiting and biopsy based on a mammogram alone. In this work we present a framework for interpretable machine learning-based mammography. In addition to predicting whether a lesion is malignant or benign, our work aims to follow the reasoning processes of radiologists in detecting clinically relevant semantic features of each image, such as the characteristics of the mass margins. The framework includes a novel interpretable neural network algorithm that uses case-based reasoning for mammography. Our algorithm can incorporate a combination of data with whole image labelling and data with pixel-wise annotations, leading to better accuracy and interpretability even with a small number of images. Our interpretable models are able to highlight the classification-relevant parts of the image, whereas other methods highlight healthy tissue and confounding information. Our models are decision aids—rather than decision makers—and aim for better overall human–machine collaboration. We do not observe a loss in mass margin classification accuracy over a black box neural network trained on the same data.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on SpringerLink
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Fig. 1: An overview of IAIA-BL compared to other approaches.**

**Fig. 2: Fine-annotation regularization on model attention that penalizes the model for using confounding information.**

**Fig. 3: ROC curves of IAIA-BL compared with baselines.**

**Fig. 4: Case-based explanations generated by IAIA-BL.**

Robust breast cancer detection in mammography and digital breast tomosynthesis using an annotation-efficient deep learning approach

Article 11 January 2021

VinDr-Mammo: A large-scale benchmark dataset for computer-aided diagnosis in full-field digital mammography

Article Open access 12 May 2023

Ultra-high resolution, multi-scale, context-aware approach for detection of small cancers on mammography

Article Open access 08 July 2022

Data availability

The imaging data are not publicly available because they contain confidential information that may compromise patient privacy as well as the ethical or regulatory policies of our institution. Data will be made available on reasonable request, for non-commercial research purposes, to those who contact J.L. (joseph.lo@duke.edu). Data usage agreements may be required. Source Data are provided with this paper.

Code availability

Code is available on GitHub at https://github.com/alinajadebarnett/iaiabl. Two licenses are offered: an MIT license for non-commercial use and a custom license. The doi for the initial code release is https://doi.org/10.5281/zenodo.5565592.

References

Kochanek, K. D., Xu, J. & Arias, E. Mortality In the United States, 2019 Techical Report 395 (NCHS, 2020); https://www.cdc.gov/nchs/products/databriefs/db395.htm
Badgeley, M. A. et al. Deep learning predicts hip fracture using confounding patient and healthcare variables. NPJ Digit. Med. 2, 1–10 (2019).
Article Google Scholar
Winkler, J. K. et al. Association between surgical skin markings in dermoscopic images and diagnostic performance of a deep learning convolutional neural network for melanoma recognition. JAMA Dermatol. 155, 1135–1141 (2019).
Article Google Scholar
Zech, J. R. et al. Variable generalization performance of a deep learning model to detect pneumonia in chest radiographs: a cross-sectional study. PLoS Med. https://journals.plos.org/plosmedicine/article?id=10.1371/journal.pmed.1002683 (2018).
Edwards, B. FDA Guidance on clinical decision support: peering inside the black box of algorithmic intelligence. ChilmarkResearch https://www.chilmarkresearch.com/fda-guidance-clinical-decision-support/ (2017).
Soffer, S. et al. Convolutional neural networks for radiologic images: a radiologist’s guide. Radiology 290, 590–606 (2019).
Article Google Scholar
Sickles, E et al. in. ACR BI-RADS Atlas, Breast Imaging Reporting and Data System. 5th edn, (American College of Radiology, 2013).
McKinney, S. M. et al. International evaluation of an ai system for breast cancer screening. Nature 577, 89–94 (2020).
Article Google Scholar
Chen, C. et al. This looks like that: deep learning for interpretable image recognition. In Advances in Neural Information Processing Systems 32 8930–8941 (NeurIPS, 2019).
Lehman, C. D. et al. Diagnostic accuracy of digital screening mammography with and without computer-aided detection. JAMA Internal Med. 175, 1828–1837 (2015).
Article Google Scholar
Salim, M. et al. External evaluation of 3 commercial artificial intelligence algorithms for independent assessment of screening mammograms. JAMA Oncol. 6, 1581–1588 (2020).
Article Google Scholar
Schaffter, T. et al. Evaluation of combined artificial intelligence and radiologist assessment to interpret screening mammograms. JAMA Network Open 3, e200265– (2020).
Article Google Scholar
Wu, N. et al. Deep neural networks improve radiologists’ performance in breast cancer screening. IEEE Trans. Med. Imaging 39, 1184–1194 (2019).
Article Google Scholar
Kim, H.-E. et al. Changes in cancer detection and false-positive recall in mammography using artificial intelligence: a retrospective, multireader study. The Lancet Digital Health 2, e138–e148 (2020).
Article Google Scholar
Giger, M. L., Chan, H.-P. & Boone, J. Anniversary paper: history and status of CAD and quantitative image analysis: the role of medical physics and AAPM. Med. Phys. 35, 5799–5820 (2008).
Article Google Scholar
Rudin, C. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nat. Mach. Intell. 1, 206–215 (2019).
Article Google Scholar
Adebayo, J. et al. Sanity checks for saliency maps. In Advances in Neural Information Processing Systems 9505–9515 (NeurIPS, 2018).
Arun, N. et al. Assessing the trustworthiness of saliency maps for localizing abnormalities in medical imaging. Radiology: Artificial Intelligence 3 (2021).
Wu, T. & Song, X. Towards interpretable object detection by unfolding latent structures. In Proc. IEEE International Conference on Computer Vision 6033–6043 (IEEE, 2019).
Chen, Z., Bei, Y. & Rudin, C. Concept whitening for interpretable image recognition. Nat. Mach. Intell. 2, 772–782 (2020).
Article Google Scholar
Demigha, S. & Prat, N. A case-based training system in radiology-senology. In Proc. 2004 International Conference on Information and Communication Technologies: From Theory to Applications, 2004 41–42 (IEEE, 2004).
Macura, R. T. & Macura, K. J. Macrad: Radiology image resource with a case-based retrieval system. In International Conference on Case-Based Reasoning 43–54 (Springer, 1995).
Floyd Jr, C. E., Lo, J. Y. & Tourassi, G. D. Case-based reasoning computer algorithm that uses mammographic findings for breast biopsy decisions. Am. J. Roentgenol. 175, 1347–1352 (2000).
Article Google Scholar
Kobashi, S., Kondo, K. & Hata, Y. Computer-aided diagnosis of intracranial aneurysms in MRA images with case-based reasoning. IEICE Trans. Inform. Syst. 89, 340–350 (2006).
Article Google Scholar
Wang, H., Wu, Z. & Xing, E. P. Removing confounding factors associated weights in deep neural networks improves the prediction accuracy for healthcare applications. Pac. Symp. Biocomput. 24, 54–65 (2019).
Hu, S., Ma, Y., Liu, X., Wei, Y. & Bai, S. Stratified rule-aware network for abstract visual reasoning. In AAAIConference on Artificial Intelligence (AAAI) (2021).
Dundar, A. & Garcia-Dorado, I. Context augmentation for convolutional neural networks. Preprint at https://arxiv.org/abs/1712.01653 (2017).
Xiao, K., Engstrom, L., Ilyas, A. & Madry, A. Noise or signal: The role of image backgrounds in object recognition. In International Conference on Learning Representations (2020).
Luo, J., Tang, J., Tjahjadi, T. & Xiao, X. Robust arbitrary view gait recognition based on parametric 3D human body reconstruction and virtual posture synthesis. Pattern Recognition 60, 361–377 (2016).
Article Google Scholar
Charalambous, C. & Bharath, A. A data augmentation methodology for training machine/deep learning gait recognition algorithms. In Proc. British Machine Vision Conference (BMVC) (eds Richard, C. et al.) 110.1–110.12 (BMVA, 2016).
Tang, R., Du, M., Li, Y., Liu, Z. & Hu, X. Mitigating gender bias in captioning systems. In Proc. Web Conference 2021, 633–645 (2021).
Zhao, Q., Adeli, E. & Pohl, K. M. Training confounder-free deep learning models for medical applications. Nat. Commun. 11, 1–9 (2020).
Article Google Scholar
Schramowski, P. et al. Making deep neural networks right for the right scientific reasons by interacting with their explanations. Nat. Mach. Intell. 2, 476–486 (2020).
Article Google Scholar
Zhou, B., Khosla, A., Lapedriza, A., Oliva, A. & Torralba, A. Learning deep features for discriminative localization. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2921–2929 (IEEE, 2016).
Zheng, H., Fu, J., Mei, T. & Luo, J. Learning multi-attention convolutional neural network for fine-grained image recognition. In Proc. IEEE International Conference on Computer Vision (ICCV), 5209–5217 (IEEE, 2017).
Fu, J., Zheng, H. & Mei, T. Look Closer to See Better: Recurrent Attention Convolutional Neural Network for Fine-grained Image Recognition. In Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 4438–4446 (IEEE, 2017).
DeLong, E. R., DeLong, D. M. & Clarke-Pearson, D. L. Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. Biometrics 837–845 (1988).
Sun, X. & Xu, W. Fast implementation of delong’s algorithm for comparing the areas under correlated receiver operating characteristic curves. IEEE Signal Process. Lett. 21, 1389–1393 (2014).
Article Google Scholar
Park, C. S. et al. Observer agreement using the ACR breast imaging reporting and data system (BI-RADS)-ultrasound, (2003). Korean J. Radiol. 8, 397 (2007).
Article Google Scholar
Abdullah, N., Mesurolle, B., El-Khoury, M. & Kao, E. Breast imaging reporting and data system lexicon for us: interobserver agreement for assessment of breast masses. Radiology 252, 665–672 (2009).
Article Google Scholar
Baker, J. A., Kornguth, P. J. & Floyd Jr, C. Breast imaging reporting and data system standardized mammography lexicon: Observer variability in lesion description. AJR Am. J. Roentgenol. 166, 773–778 (1996).
Article Google Scholar
Rawashdeh, M., Lewis, S., Zaitoun, M. & Brennan, P. Breast lesion shape and margin evaluation: Bi-rads based metrics understate radiologists’ actual levels of agreement. Comput. Biol. Med. 96, 294 – 298 (2018).
Article Google Scholar
Lazarus, E., Mainiero, M. B., Schepps, B., Koelliker, S. L. & Livingston, L. S. Bi-rads lexicon for us and mammography: interobserver variability and positive predictive value. Radiology 239, 385–391 (2006).
Article Google Scholar
Selvaraju, R. R. et al. Grad-CAM: visual explanations from deep networks via gradient-based localization. In The IEEE International Conference on Computer Vision (ICCV) (IEEE, 2017).
Chattopadhay, A., Sarkar, A., Howlader, P. & Balasubramanian, V. N. Grad-CAM++: generalized gradient-based visual explanations for deep convolutional networks. In 2018 IEEE Winter Conference on Applications of Computer Vision (WACV) 839–847 (IEEE, 2018).
Simonyan, K. & Zisserman, A. Very deep convolutional networks for large-scale image recognition. In Proc. 3rd International Conference on Learning Representations (ICLR) (2015).
Landis, J. R. & Koch, G. G. An application of hierarchical kappa-type statistics in the assessment of majority agreement among multiple observers. Biometrics 33, 363–374 (1977).
Kim, S. T., Lee, H., Kim, H. G. & Ro, Y. M. ICADx: interpretable computer aided diagnosis of breast masses. In Medical Imaging 2018: Computer-Aided Diagnosis Vol. 10575, 1057522 (International Society for Optics and Photonics, 2018).
Elter, M., Schulz-Wendtland, R. & Wittenberg, T. The prediction of breast cancer biopsy outcomes using two cad approaches that both emphasize an intelligible decision process. Med. Phys. 34, 4164–4172 (2007).
Article Google Scholar
Benndorf, M., Burnside, E. S., Herda, C., Langer, M. & Kotter, E. External validation of a publicly available computer assisted diagnostic tool for mammographic mass lesions with two high prevalence research datasets. Med. Phys. 42, 4987–4996 (2015).
Article Google Scholar
Burnside, E. S. et al. Probabilistic computer model developed from clinical data in national mammography database format to classify mammographic findings. Radiology 251, 663–672 (2009).
Article Google Scholar
Park, H. J. et al. A computer-aided diagnosis system using artificial intelligence for the diagnosis and characterization of breast masses on ultrasound: added value for the inexperienced breast radiologist. Medicine 98, e14146 (2019).
Shimauchi, A. et al. Evaluation of clinical breast MR imaging performed with prototype computer-aided diagnosis breast MR imaging workstation: reader study. Radiology 258, 696–704 (2011).
Article Google Scholar
Orel, S. G., Kay, N., Reynolds, C. & Sullivan, D. C. Bi-rads categorization as a predictor of malignancy. Radiology 211, 845–850 (1999).
Article Google Scholar
Kalchbrenner, N., Grefenstette, E. & Blunsom, P. A convolutional neural network for modelling sentences. In Proc. 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) 655–665 (2014).
Wu, J. et al. Deepminer: Discovering interpretable representations for mammogram classification and explanation. Harvard Data Science Review 3 (2021).

Download references

Acknowledgements

We would like to acknowledge breast radiologists M. Taylor-Cho, L. Grimm, C. Kim and S. Yoon, who annotated the dataset used in this paper. This study was supported in part by NIH/NCI U01-CA214183 and U2C-CA233254 (J.L.). This study was supported in part by MIT Lincoln Laboratory (C.R.), Duke TRIPODS CCF-1934964 (C.R.) and the Duke Incubation Fund (A.J.B.).

Author information

Authors and Affiliations

Department of Computer Science, Duke University, Durham, NC, USA
Alina Jade Barnett, Chaofan Tao & Cynthia Rudin
Department of Radiology, Duke University, Durham, NC, USA
Fides Regina Schwartz & Joseph Y. Lo
School of Computing and Information Science, University of Maine, Orono, ME, USA
Chaofan Chen
Department of Biomedical Engineering, Duke University, Durham, NC, USA
Yinhao Ren & Joseph Y. Lo
Department of Electrical and Computer Engineering, Duke University, Durham, NC, USA
Joseph Y. Lo & Cynthia Rudin
Department of Statistical Science, Duke University, Durham, NC, USA
Cynthia Rudin
Department of Biostatistics & Bioinformatics, Duke University, Durham, NC, USA
Cynthia Rudin

Authors

Alina Jade Barnett
View author publications
You can also search for this author in PubMed Google Scholar
Fides Regina Schwartz
View author publications
You can also search for this author in PubMed Google Scholar
Chaofan Tao
View author publications
You can also search for this author in PubMed Google Scholar
Chaofan Chen
View author publications
You can also search for this author in PubMed Google Scholar
Yinhao Ren
View author publications
You can also search for this author in PubMed Google Scholar
Joseph Y. Lo
View author publications
You can also search for this author in PubMed Google Scholar
Cynthia Rudin
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

A.J.B., F.S., D.T., C.C., J.L. and C.R. conceived the idea and developed the model. D.T., A.J.B. and C.C. wrote and reviewed the code. Y.R., A.J.B., F.S. and J.L. performed data collection, and Y.R., D.T. and A.J.B. preprocessed it.

Corresponding author

Correspondence to Alina Jade Barnett.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Peer review information Nature Machine Intelligence thanks Fredrik Strand and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 An automatically generated explanation of mass margin classification for a circumscribed lesion.

This circumscribed lesion is correctly identified as circumscribed. The first two most activated prototypes are drawn from the same image, but are associated with different regions of that image.

Extended Data Fig. 2 An automatically generated explanation of mass margin classification for an indistinct lesion.

This indistinct lesion is correctly identified as indistinct. The indistinct portion of the lesion margin (right side) activates the indistinct prototype and the circumscribed portion of the lesion margin (left side) activates the circumscribed prototypes.

Extended Data Fig. 3 An automatically generated explanation of mass margin classification for a spiculated lesion.

This spiculated lesion is correctly identified as spiculated.

Extended Data Fig. 4 An automatically generated explanation of mass margin classification for an incorrectly classified lesion.

This spiculated lesion is incorrectly identified as circumscribed. The explanation highlights only the circumscribed portion of the mass margin (top), but does not detect the spiculated portion (bottom).

Extended Data Fig. 5 A comparison of explanations.

We compare explanations from two common saliency methods (GradCAM [44] and GradCAM++ [45]) to a class activation visualization derived from our method. The explanations from IAIA-BL are more likely to highlight the lesion and less likely to highlight the surrounding healthy tissue. This is shown quantitatively by the activation precision metric. The single image visualization is a dramatic simplification of the full explanation that is generated by IAIA-BL. The IAIA-BL and ProtoPNet class activation visualizations shown in this figure are generated by taking the average of prototype activation maps for all prototypes of the correct class.

Extended Data Fig. 6 The architecture of the IAIA-BL prototype network.

Test image x feeds into convolutional layers f. Each patch of f(x)_l is compared to each learned prototype p_i by calculating the squared distance between the patch and the prototype. The similarity map shows the closest (most ‘activated,’ that is, smallest L² distance) patches in red and the furthest patches in blue, overlaid on the test image. Similarity score s_i is calculated from the corresponding similarity map. The similarity scores s feed into fully connected layer h₁, outputting margin logits \({\hat{{{{\bf{y}}}}}}^{{{\text{margin}}}}\). Margin logits \({\hat{{{{\bf{y}}}}}}^{{{\text{min}}}}\) feed into fully connected layer h₂, outputting malignancy logit y^mal.

Supplementary information

Supplementary Information

Supplementary Sections 1–10, Tables 1 and 2, and Figs. 1–6.

Reporting Summary

Source data

Source Data Fig. 2

Labels and model predictions used to generate the ROC curves for Fig. 2.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Barnett, A.J., Schwartz, F.R., Tao, C. et al. A case-based interpretable deep learning model for classification of mass lesions in digital mammography. Nat Mach Intell 3, 1061–1070 (2021). https://doi.org/10.1038/s42256-021-00423-x

Download citation

Received: 12 March 2021
Accepted: 24 October 2021
Published: 15 December 2021
Issue Date: December 2021
DOI: https://doi.org/10.1038/s42256-021-00423-x

This article is cited by

Visual interpretability of image-based classification models by generative latent space disentanglement applied to in vitro fertilization
- Oded Rotem
- Tamar Schwartz
- Assaf Zaritsky
Nature Communications (2024)
Pseudo-class part prototype networks for interpretable breast cancer classification
- Mohammad Amin Choukali
- Mehdi Chehel Amirani
- Majid Komeili
Scientific Reports (2024)
MyThisYourThat for interpretable identification of systematic bias in federated learning for biomedical images
- Klavdiia Naumova
- Arnout Devos
- Mary-Anne Hartley
npj Digital Medicine (2024)
A domain knowledge-based interpretable deep learning system for improving clinical breast ultrasound diagnosis
- Lin Yan
- Zhiying Liang
- Xuejun Qian
Communications Medicine (2024)
A hybrid modeling framework for generalizable and interpretable predictions of ICU mortality across multiple hospitals
- Moein E. Samadi
- Jorge Guzman-Maldonado
- Andreas Schuppert
Scientific Reports (2024)