Abstract
What does a neural network encode about a concept as we traverse through the layers? Interpretability in machine learning is undoubtedly important, but the calculations of neural networks are very challenging to understand. Attempts to see inside their hidden layers can be misleading, unusable or rely on the latent space to possess properties that it may not have. Here, rather than attempting to analyse a neural network post hoc, we introduce a mechanism, called concept whitening (CW), to alter a given layer of the network to allow us to better understand the computation leading up to that layer. When a concept whitening module is added to a convolutional neural network, the latent space is whitened (that is, decorrelated and normalized) and the axes of the latent space are aligned with known concepts of interest. By experiment, we show that CW can provide us with a much clearer understanding of how the network gradually learns concepts over layers. CW is an alternative to a batch normalization layer in that it normalizes, and also decorrelates (whitens), the latent space. CW can be used in any layer of the network without hurting predictive performance.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 digital issues and online access to articles
$119.00 per year
only $9.92 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
Data availability
All datasets that support the findings are publicly available, including Places365 at http://places2.csail.mit.edu, MS COCO at https://cocodataset.org/ and ISIC at https://www.isic-archive.com.
Code availability
The code for replicating our experiments is available on https://github.com/zhiCHEN96/ConceptWhitening (https://doi.org/10.5281/zenodo.4052692).
References
Zhou, B., Bau, D., Oliva, A. & Torralba, A. Interpreting deep visual representations via network dissection. IEEE Trans. Pattern Anal. Mach. Intell. 41, 2131–2145 (2018).
Zhou, B., Khosla, A., Lapedriza, A., Oliva, A. & Torralba, A. Object detectors emerge in deep scene CNNs. In Proc. International Conference on Learning Representations (2015).
Kim, B. et al. Interpretability beyond feature attribution: quantitative testing with concept activation vectors (TCAV). In Proc. 35th International Conference on Machine Learning 2668–2677 (ICML, 2018).
Zhou, B., Sun, Y., Bau, D. & Torralba, A. Interpretable basis decomposition for visual explanation. In Proc. European Conference on Computer Vision (ECCV) 119–134 (2018).
Ghorbani, A., Wexler, J., Zou, J. Y. & Kim, B. Towards automatic concept-based explanations. In Proc. Conference on Advances in Neural Information Processing Systems 9273–9282 (NeurIPS, 2019).
Rudin, C. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nat. Mach. Intell. 1, 206–215 (2019).
Zeiler, M. D. & Fergus, R. Visualizing and understanding convolutional networks. In Proc. European Conference on Computer Vision Vol. 8689, 818–833 (Springer, 2014).
Simonyan, K., Vedaldi, A. & Zisserman, A. Deep inside convolutional networks: visualising image classification models and saliency maps. In Proc. International Conference on Learning Representations Workshop (ICLR, 2014).
Smilkov, D., Thorat, N., Kim, B., Viégas, F. & Wattenberg, M. Smoothgrad: removing noise by adding noise. In Proc. International Conference on Machine Learning Workshop (ICML, 2017).
Selvaraju, R. R. et al. Grad-cam: visual explanations from deep networks via gradient-based localization. In Proc. 2017 IEEE International Conference on Computer Vision 618–626 (ICCV, 2017).
Adebayo, J. et al. Sanity checks for saliency maps. In Proc. 32nd Conference on Advances in Neural Information Processing Systems 9505–9515 (NeurIPS, 2018).
Yeh, C.-K. et al. On concept-based explanations in deep neural networks. Preprint at https://arxiv.org/pdf/1910.07969.pdf (2019).
Chen, C. et al. This looks like that: deep learning for interpretable image recognition. In Proc. 33rd Conference on Advances in Neural Information Processing Systems 8930–8941 (NeurIPS, 2019).
Li, O., Liu, H., Chen, C. & Rudin, C. Deep learning for case-based reasoning through prototypes: a neural network that explains its predictions. In Proc. AAAI Conference on Artificial Intelligence 3530-3537 (AAAI, 2018).
Li, X., Song, X. & Wu, T. AOGNets: compositional grammatical architectures for deep learning. In Proc. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition 6213–6223 (CVPR, 2019).
Granmo, O.-C. et al. The convolutional Tsetlin machine. Preprint at https://arxiv.org/pdf/1905.09688.pdf (2019).
Wu, T. & Song, X. Towards interpretable object detection by unfolding latent structures. In Proc. 2019 IEEE/CVF International Conference on Computer Vision 6033–6043 (ICCV, 2019).
Mnih, V. et al. Recurrent models of visual attention. In Proc. 27th Conference on Advances in Neural Information Processing Systems 2204–2212 (NeurIPS, 2014).
Ba, J., Mnih, V. & Kavukcuoglu, K. Multiple object recognition with visual attention. In Proc. International Conference on Learning Representations (ICLR, 2015).
Sermanet, P., Frome, A. & Real, E. Attention for fine-grained categorization. In Proc. 2015 International Conference on Learning Representations Workshop (ICLR, 2015).
Elsayed, G., Kornblith, S. & Le, Q. V. Saccader: improving accuracy of hard attention models for vision. In Proc. 33rd Conference on Advances in Neural Information Processing Systems 700–712 (NeurIPS, 2019).
Saralajew, S., Holdijk, L., Rees, M., Asan, E. & Villmann, T. Classification-by-components: probabilistic modeling of reasoning over a set of components. In Proc. 33rd Conference on Advances in Neural Information Processing Systems 2788–2799 (NeurIPS, 2019).
Bouchacourt, D. & Denoyer, L. EDUCE: explaining model decisions through unsupervised concepts extraction. Preprint at https://arxiv.org/pdf/1905.11852.pdf (2019).
Zhang, Q., Yang, Y., Liu, Y., Wu, Y. N. & Zhu, S.-C. Unsupervised learning of neural networks to explain neural networks. In Proc. AAAI Conference on Artificial Intelligence Workshop (AAAI, 2019).
Adel, T., Ghahramani, Z. & Weller, A. Discovering interpretable representations for both deep generative and discriminative models. In Proc. 35th International Conference on Machine Learning 50–59 (ICML, 2018).
Chen, X. et al. InfoGAN: interpretable representation learning by information maximizing generative adversarial nets. In Proc. 30th Conference on Advances in Neural Information Processing Systems 2172–2180 (NeurIPS, 2016).
Higgins, I. et al. beta-VAE: Learning basic visual concepts with a constrained variational framework. In Proc. International Conference on Learning Representations (ICLR, 2017).
Ioffe, S. & Szegedy, C. Batch normalization: accelerating deep network training by reducing internal covariate shift. In Proc. 32nd International Conference on Machine Learning, Vol. 37 of Proceedings of Machine Learning Research (eds Bach, F. & Blei, D.) 448–456 (ICML, 2015).
Desjardins, G. et al. Natural neural networks. In Proc. 28th Conference on Advances in Neural Information Processing Systems 2071–2079 (NeurIPS, 2015).
Luo, P. Learning deep architectures via generalized whitened neural networks. In Proc. 34th International Conference on Machine Learning 2238–2246 (ICML, 2017).
Cogswell, M., Ahmed, F., Girshick, R., Zitnick, L. & Batra, D. Reducing overfitting in deep networks by decorrelating representations. In Proc. International Conference on Learning Representations (ICLR, 2016).
Huang, L., Yang, D., Lang, B. & Deng, J. Decorrelated batch normalization. In Proc. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition 791–800 (CVPR, 2018).
Huang, L., Zhou, Y., Zhu, F., Liu, L. & Shao, L. Iterative normalization: beyond standardization towards efficient whitening. In Proc. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition 4874–4883 (CVPR, 2019).
Siarohin, A., Sangineto, E. & Sebe, N. Whitening and coloring batch transform for GANs. In Proc. International Conference on Learning Representations (ICLR, 2019).
Vorontsov, E., Trabelsi, C., Kadoury, S. & Pal, C. On orthogonality and learning recurrent networks with long term dependencies. In Proc. 34th International Conference on Machine Learning 3570–3578 (ICML, 2017).
Mhammedi, Z., Hellicar, A., Rahman, A. & Bailey, J. Efficient orthogonal parametrisation of recurrent neural networks using householder reflections. In Proc. 34th International Conference on Machine Learning 2401–2409 (ICML, 2017).
Wisdom, S., Powers, T., Hershey, J., Le Roux, J. & Atlas, L. Full-capacity unitary recurrent neural networks. In Advances in Neural Information Processing Systems 4880–4888 (NeurIPS, 2016).
Harandi, M. & Fernando, B. Generalized backpropagation, étude de cas: orthogonality. Preprint at https://arxiv.org/pdf/1611.05927.pdf (2016).
Huang, L. et al. Orthogonal weight normalization: solution to optimization over multiple dependent Stiefel manifolds in deep neural networks. In Proc. AAAI Conference on Artificial Intelligence 3271–3278 (AAAI, 2018).
Lezcano-Casado, M. & Martínez-Rubio, D. Cheap orthogonal constraints in neural networks: a simple parametrization of the orthogonal and unitary group. In Proc. 36th International Conference on Machine Learning 3794–3803 (ICML, 2019).
Lezama, J., Qiu, Q., Musé, P. & Sapiro, G. OLÉ: orthogonal low-rank embedding-a plug and play geometric loss for deep learning. In Proc. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition 8109–8118 (CVPR, 2018).
Wen, Z. & Yin, W. A feasible method for optimization with orthogonality constraints. Math. Program. 142, 397–434 (2012).
Simonyan, K. & Zisserman, A. Very deep convolutional networks for large-scale image recognition. In Proc. International Conference on Learning Representations (ICLR, 2015).
He, K., Zhang, X., Ren, S. & Sun, J. Deep residual learning for image recognition. In Proc. 2016 IEEE Conference on Computer Vision and Pattern Recognition 770–778 (CVPR, 2016).
Huang, G., Liu, Z., Van Der Maaten, L. & Weinberger, K. Q. Densely connected convolutional networks. In Proc. 2017 IEEE Conference on Computer Vision and Pattern Recognition 4700–4708 (CVPR, 2017).
Zhou, B., Lapedriza, A., Khosla, A., Oliva, A. & Torralba, A. Places: a 10 million image database for scene recognition. IEEE Trans. Pattern Anal. Mach. Intell. 40, 1452–1464 (2017).
Lin, T.-Y. et al. Microsoft COCO: common objects in context. In Proc. 2014 European Conference on Computer Vision 740–755 (Springer, 2014).
Breiman, L. Random forests. Mach. Learn. 45, 5–32 (2001).
Fisher, A., Rudin, C. & Dominici, F. All models are wrong, but many are useful: learning a variable’s importance by studying an entire class of prediction models simultaneously. J. Mach. Learn. Res. 20, 1–81 (2019).
Digital Imaging in Skin Lesion Diagnosis (ISIC, 2020); https://www.isic-archive.com/#!/topWithHeader/wideContentTop/main
Rose, L. Recognizing neoplastic skin lesions: a photo guide. Am. Fam. Physician 58, 873–884 (1998).
Acknowledgements
We are grateful to W. Zhang, L. Semenova, H. Parikh, C. Zhong, O. Li, C. Chen, and especially C. Tomasi and G. Sapiro for the feedback and assistance they provided during the development and preparation of this research. The authors acknowledge funding from MIT-Lincoln Laboratory and the National Science Foundation.
Author information
Authors and Affiliations
Contributions
Z.C. and C.R. conceived the study. Z.C. developed methods, designed visualizations and metrics, ran experiments and contributed to the writing. Y.B. designed metrics, ran experiments and contributed to the writing. C.R. supervised research, method development and contributed to the writing.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Peer review Information Nature Machine Intelligence thanks Professor Andreas Holzinger and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.
Extended data
Extended Data Fig. 1 Absolute correlation coefficient of every feature pair in the 16th layer.
a, when the 16th layer is a BN module; b, when 16th layer is a CW module.
Supplementary information
Supplementary Information
Supplementary sections and Figs. 1–13.
Rights and permissions
About this article
Cite this article
Chen, Z., Bei, Y. & Rudin, C. Concept whitening for interpretable image recognition. Nat Mach Intell 2, 772–782 (2020). https://doi.org/10.1038/s42256-020-00265-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s42256-020-00265-z
This article is cited by
-
Privacy-preserving explainable AI: a survey
Science China Information Sciences (2025)
-
Improved modeling of RNA-binding protein motifs in an interpretable neural model of RNA splicing
Genome Biology (2024)
-
A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts
Data Mining and Knowledge Discovery (2024)