Concept whitening for interpretable image recognition

Chen, Zhi; Bei, Yijie; Rudin, Cynthia

doi:10.1038/s42256-020-00265-z

Article
Published: 07 December 2020

Concept whitening for interpretable image recognition

Nature Machine Intelligence volume 2, pages 772–782 (2020)Cite this article

5336 Accesses
151 Citations
117 Altmetric
Metrics details

Subjects

A preprint version of the article is available at arXiv.

Abstract

What does a neural network encode about a concept as we traverse through the layers? Interpretability in machine learning is undoubtedly important, but the calculations of neural networks are very challenging to understand. Attempts to see inside their hidden layers can be misleading, unusable or rely on the latent space to possess properties that it may not have. Here, rather than attempting to analyse a neural network post hoc, we introduce a mechanism, called concept whitening (CW), to alter a given layer of the network to allow us to better understand the computation leading up to that layer. When a concept whitening module is added to a convolutional neural network, the latent space is whitened (that is, decorrelated and normalized) and the axes of the latent space are aligned with known concepts of interest. By experiment, we show that CW can provide us with a much clearer understanding of how the network gradually learns concepts over layers. CW is an alternative to a batch normalization layer in that it normalizes, and also decorrelates (whitens), the latent space. CW can be used in any layer of the network without hurting predictive performance.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on SpringerLink
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Fig. 1: Possible data distributions in the latent space.**

**Fig. 2: Top-10 image activated on axes representing different concepts.**

**Fig. 3: Joint distribution of the bed–person subspace.**

**Fig. 4: Two-dimensional representation plot of two representative images.**

**Fig. 5: Normalized intra-concept and inter-concept similarities.**

**Fig. 6: Concept purity measured by AUC score.**

**Fig. 7: Concept importance to different Places365 classes measured on the concept axes when CW is applied to the 16th layer.**

Novel applications of Convolutional Neural Networks in the age of Transformers

Article Open access 01 May 2024

Towards a universal mechanism for successful deep learning

Article Open access 11 March 2024

Limits to visual representational correspondence between convolutional neural networks and the human brain

Article Open access 06 April 2021

Data availability

All datasets that support the findings are publicly available, including Places365 at http://places2.csail.mit.edu, MS COCO at https://cocodataset.org/ and ISIC at https://www.isic-archive.com.

Code availability

The code for replicating our experiments is available on https://github.com/zhiCHEN96/ConceptWhitening (https://doi.org/10.5281/zenodo.4052692).

References

Zhou, B., Bau, D., Oliva, A. & Torralba, A. Interpreting deep visual representations via network dissection. IEEE Trans. Pattern Anal. Mach. Intell. 41, 2131–2145 (2018).
Article Google Scholar
Zhou, B., Khosla, A., Lapedriza, A., Oliva, A. & Torralba, A. Object detectors emerge in deep scene CNNs. In Proc. International Conference on Learning Representations (2015).
Kim, B. et al. Interpretability beyond feature attribution: quantitative testing with concept activation vectors (TCAV). In Proc. 35th International Conference on Machine Learning 2668–2677 (ICML, 2018).
Zhou, B., Sun, Y., Bau, D. & Torralba, A. Interpretable basis decomposition for visual explanation. In Proc. European Conference on Computer Vision (ECCV) 119–134 (2018).
Ghorbani, A., Wexler, J., Zou, J. Y. & Kim, B. Towards automatic concept-based explanations. In Proc. Conference on Advances in Neural Information Processing Systems 9273–9282 (NeurIPS, 2019).
Rudin, C. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nat. Mach. Intell. 1, 206–215 (2019).
Article Google Scholar
Zeiler, M. D. & Fergus, R. Visualizing and understanding convolutional networks. In Proc. European Conference on Computer Vision Vol. 8689, 818–833 (Springer, 2014).
Simonyan, K., Vedaldi, A. & Zisserman, A. Deep inside convolutional networks: visualising image classification models and saliency maps. In Proc. International Conference on Learning Representations Workshop (ICLR, 2014).
Smilkov, D., Thorat, N., Kim, B., Viégas, F. & Wattenberg, M. Smoothgrad: removing noise by adding noise. In Proc. International Conference on Machine Learning Workshop (ICML, 2017).
Selvaraju, R. R. et al. Grad-cam: visual explanations from deep networks via gradient-based localization. In Proc. 2017 IEEE International Conference on Computer Vision 618–626 (ICCV, 2017).
Adebayo, J. et al. Sanity checks for saliency maps. In Proc. 32nd Conference on Advances in Neural Information Processing Systems 9505–9515 (NeurIPS, 2018).
Yeh, C.-K. et al. On concept-based explanations in deep neural networks. Preprint at https://arxiv.org/pdf/1910.07969.pdf (2019).
Chen, C. et al. This looks like that: deep learning for interpretable image recognition. In Proc. 33rd Conference on Advances in Neural Information Processing Systems 8930–8941 (NeurIPS, 2019).
Li, O., Liu, H., Chen, C. & Rudin, C. Deep learning for case-based reasoning through prototypes: a neural network that explains its predictions. In Proc. AAAI Conference on Artificial Intelligence 3530-3537 (AAAI, 2018).
Li, X., Song, X. & Wu, T. AOGNets: compositional grammatical architectures for deep learning. In Proc. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition 6213–6223 (CVPR, 2019).
Granmo, O.-C. et al. The convolutional Tsetlin machine. Preprint at https://arxiv.org/pdf/1905.09688.pdf (2019).
Wu, T. & Song, X. Towards interpretable object detection by unfolding latent structures. In Proc. 2019 IEEE/CVF International Conference on Computer Vision 6033–6043 (ICCV, 2019).
Mnih, V. et al. Recurrent models of visual attention. In Proc. 27th Conference on Advances in Neural Information Processing Systems 2204–2212 (NeurIPS, 2014).
Ba, J., Mnih, V. & Kavukcuoglu, K. Multiple object recognition with visual attention. In Proc. International Conference on Learning Representations (ICLR, 2015).
Sermanet, P., Frome, A. & Real, E. Attention for fine-grained categorization. In Proc. 2015 International Conference on Learning Representations Workshop (ICLR, 2015).
Elsayed, G., Kornblith, S. & Le, Q. V. Saccader: improving accuracy of hard attention models for vision. In Proc. 33rd Conference on Advances in Neural Information Processing Systems 700–712 (NeurIPS, 2019).
Saralajew, S., Holdijk, L., Rees, M., Asan, E. & Villmann, T. Classification-by-components: probabilistic modeling of reasoning over a set of components. In Proc. 33rd Conference on Advances in Neural Information Processing Systems 2788–2799 (NeurIPS, 2019).
Bouchacourt, D. & Denoyer, L. EDUCE: explaining model decisions through unsupervised concepts extraction. Preprint at https://arxiv.org/pdf/1905.11852.pdf (2019).
Zhang, Q., Yang, Y., Liu, Y., Wu, Y. N. & Zhu, S.-C. Unsupervised learning of neural networks to explain neural networks. In Proc. AAAI Conference on Artificial Intelligence Workshop (AAAI, 2019).
Adel, T., Ghahramani, Z. & Weller, A. Discovering interpretable representations for both deep generative and discriminative models. In Proc. 35th International Conference on Machine Learning 50–59 (ICML, 2018).
Chen, X. et al. InfoGAN: interpretable representation learning by information maximizing generative adversarial nets. In Proc. 30th Conference on Advances in Neural Information Processing Systems 2172–2180 (NeurIPS, 2016).
Higgins, I. et al. beta-VAE: Learning basic visual concepts with a constrained variational framework. In Proc. International Conference on Learning Representations (ICLR, 2017).
Ioffe, S. & Szegedy, C. Batch normalization: accelerating deep network training by reducing internal covariate shift. In Proc. 32nd International Conference on Machine Learning, Vol. 37 of Proceedings of Machine Learning Research (eds Bach, F. & Blei, D.) 448–456 (ICML, 2015).
Desjardins, G. et al. Natural neural networks. In Proc. 28th Conference on Advances in Neural Information Processing Systems 2071–2079 (NeurIPS, 2015).
Luo, P. Learning deep architectures via generalized whitened neural networks. In Proc. 34th International Conference on Machine Learning 2238–2246 (ICML, 2017).
Cogswell, M., Ahmed, F., Girshick, R., Zitnick, L. & Batra, D. Reducing overfitting in deep networks by decorrelating representations. In Proc. International Conference on Learning Representations (ICLR, 2016).
Huang, L., Yang, D., Lang, B. & Deng, J. Decorrelated batch normalization. In Proc. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition 791–800 (CVPR, 2018).
Huang, L., Zhou, Y., Zhu, F., Liu, L. & Shao, L. Iterative normalization: beyond standardization towards efficient whitening. In Proc. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition 4874–4883 (CVPR, 2019).
Siarohin, A., Sangineto, E. & Sebe, N. Whitening and coloring batch transform for GANs. In Proc. International Conference on Learning Representations (ICLR, 2019).
Vorontsov, E., Trabelsi, C., Kadoury, S. & Pal, C. On orthogonality and learning recurrent networks with long term dependencies. In Proc. 34th International Conference on Machine Learning 3570–3578 (ICML, 2017).
Mhammedi, Z., Hellicar, A., Rahman, A. & Bailey, J. Efficient orthogonal parametrisation of recurrent neural networks using householder reflections. In Proc. 34th International Conference on Machine Learning 2401–2409 (ICML, 2017).
Wisdom, S., Powers, T., Hershey, J., Le Roux, J. & Atlas, L. Full-capacity unitary recurrent neural networks. In Advances in Neural Information Processing Systems 4880–4888 (NeurIPS, 2016).
Harandi, M. & Fernando, B. Generalized backpropagation, étude de cas: orthogonality. Preprint at https://arxiv.org/pdf/1611.05927.pdf (2016).
Huang, L. et al. Orthogonal weight normalization: solution to optimization over multiple dependent Stiefel manifolds in deep neural networks. In Proc. AAAI Conference on Artificial Intelligence 3271–3278 (AAAI, 2018).
Lezcano-Casado, M. & Martínez-Rubio, D. Cheap orthogonal constraints in neural networks: a simple parametrization of the orthogonal and unitary group. In Proc. 36th International Conference on Machine Learning 3794–3803 (ICML, 2019).
Lezama, J., Qiu, Q., Musé, P. & Sapiro, G. OLÉ: orthogonal low-rank embedding-a plug and play geometric loss for deep learning. In Proc. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition 8109–8118 (CVPR, 2018).
Wen, Z. & Yin, W. A feasible method for optimization with orthogonality constraints. Math. Program. 142, 397–434 (2012).
Article MathSciNet Google Scholar
Simonyan, K. & Zisserman, A. Very deep convolutional networks for large-scale image recognition. In Proc. International Conference on Learning Representations (ICLR, 2015).
He, K., Zhang, X., Ren, S. & Sun, J. Deep residual learning for image recognition. In Proc. 2016 IEEE Conference on Computer Vision and Pattern Recognition 770–778 (CVPR, 2016).
Huang, G., Liu, Z., Van Der Maaten, L. & Weinberger, K. Q. Densely connected convolutional networks. In Proc. 2017 IEEE Conference on Computer Vision and Pattern Recognition 4700–4708 (CVPR, 2017).
Zhou, B., Lapedriza, A., Khosla, A., Oliva, A. & Torralba, A. Places: a 10 million image database for scene recognition. IEEE Trans. Pattern Anal. Mach. Intell. 40, 1452–1464 (2017).
Article Google Scholar
Lin, T.-Y. et al. Microsoft COCO: common objects in context. In Proc. 2014 European Conference on Computer Vision 740–755 (Springer, 2014).
Breiman, L. Random forests. Mach. Learn. 45, 5–32 (2001).
Article Google Scholar
Fisher, A., Rudin, C. & Dominici, F. All models are wrong, but many are useful: learning a variable’s importance by studying an entire class of prediction models simultaneously. J. Mach. Learn. Res. 20, 1–81 (2019).
MathSciNet MATH Google Scholar
Digital Imaging in Skin Lesion Diagnosis (ISIC, 2020); https://www.isic-archive.com/#!/topWithHeader/wideContentTop/main
Rose, L. Recognizing neoplastic skin lesions: a photo guide. Am. Fam. Physician 58, 873–884 (1998).
Google Scholar

Download references

Acknowledgements

We are grateful to W. Zhang, L. Semenova, H. Parikh, C. Zhong, O. Li, C. Chen, and especially C. Tomasi and G. Sapiro for the feedback and assistance they provided during the development and preparation of this research. The authors acknowledge funding from MIT-Lincoln Laboratory and the National Science Foundation.

Author information

Authors and Affiliations

Department of Computer Science, Duke University, Durham, NC, USA
Zhi Chen & Cynthia Rudin
Department of Electrical and Computer Engineering, Duke University, Durham, NC, USA
Yijie Bei & Cynthia Rudin

Authors

Zhi Chen
View author publications
You can also search for this author in PubMed Google Scholar
Yijie Bei
View author publications
You can also search for this author in PubMed Google Scholar
Cynthia Rudin
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Z.C. and C.R. conceived the study. Z.C. developed methods, designed visualizations and metrics, ran experiments and contributed to the writing. Y.B. designed metrics, ran experiments and contributed to the writing. C.R. supervised research, method development and contributed to the writing.

Corresponding author

Correspondence to Zhi Chen.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Peer review Information Nature Machine Intelligence thanks Professor Andreas Holzinger and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Extended data

Extended Data Fig. 1 Absolute correlation coefficient of every feature pair in the 16^th layer.

a, when the 16^th layer is a BN module; b, when 16^th layer is a CW module.

Supplementary information

Supplementary Information

Supplementary sections and Figs. 1–13.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chen, Z., Bei, Y. & Rudin, C. Concept whitening for interpretable image recognition. Nat Mach Intell 2, 772–782 (2020). https://doi.org/10.1038/s42256-020-00265-z

Download citation

Received: 04 February 2020
Accepted: 19 October 2020
Published: 07 December 2020
Issue Date: December 2020
DOI: https://doi.org/10.1038/s42256-020-00265-z

This article is cited by

Privacy-preserving explainable AI: a survey
- Thanh Tam Nguyen
- Thanh Trung Huynh
- Quoc Viet Hung Nguyen
Science China Information Sciences (2025)
Improved modeling of RNA-binding protein motifs in an interpretable neural model of RNA splicing
- Kavi Gupta
- Chenxi Yang
- Armando Solar-Lezama
Genome Biology (2024)
A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts
- Gesina Schwalbe
- Bettina Finzel
Data Mining and Knowledge Discovery (2024)