iBet uBet web content aggregator. Adding the entire web to your favor.
iBet uBet web content aggregator. Adding the entire web to your favor.



Link to original content: https://doi.org/10.1007/978-3-031-63787-2_22
Explainability, Quantified: Benchmarking XAI Techniques | SpringerLink
Skip to main content

Explainability, Quantified: Benchmarking XAI Techniques

  • Conference paper
  • First Online:
Explainable Artificial Intelligence (xAI 2024)

Abstract

Modern Machine Learning (ML) has significantly advanced various fields; yet, the challenge of understanding complex models, often referred to as the “black box problem”, remains a barrier to their widespread adoption, particularly in critical domains such as medical diagnosis and financial services. Explainable AI (XAI) addresses this challenge by augmenting ML models’ outputs with interpretable information to facilitate human understanding of their internal decision processes. Despite the proliferation of explainers in recent years, covering a wide range of ML tasks and explanation types, there is no consensus on what constitutes a good explanation, leaving ML practitioners without clear guidance for selecting appropriate explainers. We argue that explanation quality quantification is the enabling factor for informed explainer choices, but many proposed explanation evaluation criteria are either narrow in scope or closer to desired properties than quantifiable metrics. This paper addresses this gap by proposing a standardized set of metrics for quantitatively evaluating explanations across diverse explanation types and ML tasks. We describe in detail the metrics of Effective Compactness, Rank Quality Index and Stability, designed to assess quantitatively explanation quality for various types of explanations (attributions, counterfactuals and rules) across different ML tasks (classification, regression and anomaly detection). We then present an exhaustive benchmarking framework for tabular-based ML, comprising open datasets, trained models, and state-of-the-art explainers. For each (data, model, explainer) tuple, we measure the time of the explanation production, apply our metrics and collect the results, highlighting correlations and trade-offs between desired properties. The resulting framework allows us to quantitatively rank explainers suitable for specific ML scenarios and select the most appropriate one based on the user’s requirements.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 119.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 89.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://community.fico.com/s/explainable-machine-learning-challenge.

  2. 2.

    https://archive.ics.uci.edu/.

  3. 3.

    https://shap.readthedocs.io/en/latest/example_notebooks/tabular_examples/model_agnostic/Diabetes%20regression.html.

  4. 4.

    https://captum.ai/tutorials/House_Prices_Regression_Interpret.

  5. 5.

    https://digital-strategy.ec.europa.eu/en/policies/regulatory-framework-ai.

References

  1. Agarwal, C., et al.: OpenXAI: towards a transparent evaluation of model explanations. In: Advances in Neural Information Processing Systems, vol. 35, pp. 15784–15799 (2022)

    Google Scholar 

  2. Allaj, E.: Two simple measures of variability for categorical data. J. Appl. Stat. 45(8), 1497–1516 (2018)

    Article  MathSciNet  Google Scholar 

  3. Amparore, E., Perotti, A., Bajardi, P.: To trust or not to trust an explanation: using leaf to evaluate local linear XAI methods. PeerJ Comput. Sci. 7 (2021)

    Google Scholar 

  4. Barredo Arrieta, A., et al.: Explainable artificial intelligence (XAI): concepts, taxonomies, opportunities and challenges toward responsible AI. Inf. Fusion 58, 82–115 (2020)

    Article  Google Scholar 

  5. Becker, B., Kohavi, R.: Adult. UCI Machine Learning Repository (1996)

    Google Scholar 

  6. Bodria, F., Giannotti, F., Guidotti, R., Naretto, F., Pedreschi, D., Rinzivillo, S.: Benchmarking and survey of explanation methods for black box models (2021)

    Google Scholar 

  7. Carletti, M., Terzi, M., Susto, G.A.: Interpretable anomaly detection with DIFFI: depth-based feature importance of isolation forest. Eng. Appl. Artif. Intell. 119, 105730 (2023)

    Article  Google Scholar 

  8. Chen, H., Lundberg, S., Lee, S.I.: Explaining models by propagating Shapley values of local components (2019)

    Google Scholar 

  9. Efron, B., Hastie, T., Johnstone, I., Tibshirani, R.: Least angle regression. Ann. Stat. 32(2), 407–499 (2004)

    Article  MathSciNet  Google Scholar 

  10. Fanaee-T, H., Gama, J.: Event labeling combining ensemble detectors and background knowledge. Progress Artif. Intell. 2, 113–127 (2014)

    Google Scholar 

  11. German, B.: Glass Identification. UCI Machine Learning Repository (1987)

    Google Scholar 

  12. Grinsztajn, L., Oyallon, E., Varoquaux, G.: Why do tree-based models still outperform deep learning on typical tabular data? In: Thirty-Sixth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (2022)

    Google Scholar 

  13. Hedström, A., et al.: Quantus: an explainable AI toolkit for responsible evaluation of neural network explanations and beyond. J. Mach. Learn. Res. 24(34), 1–11 (2023)

    Google Scholar 

  14. Hofmann, H.: German Credit Data. UCI Machine Learning Repository (1994)

    Google Scholar 

  15. Hooker, S., Erhan, D., Kindermans, P.J., Kim, B.: A benchmark for interpretability methods in deep neural networks. In: Advances in Neural Information Processing Systems, vol. 32 (2019)

    Google Scholar 

  16. Kelley Pace, R., Barry, R.: Sparse spatial autoregressions. Stat. Probab. Lett. 33(3), 291–297 (1997)

    Article  Google Scholar 

  17. Le, P.Q., Nauta, M., Nguyen, V.B., Pathak, S., Schlötterer, J., Seifert, C.: Benchmarking explainable AI - a survey on available toolkits and open challenges. In: Elkind, E. (ed.) Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence, IJCAI-23, pp. 6665–6673. International Joint Conferences on Artificial Intelligence Organization (2023). Survey Track

    Google Scholar 

  18. Liu, N., Shin, D., Hu, X.: Contextual outlier interpretation. In: Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, IJCAI-18, pp. 2461–2467 (2018)

    Google Scholar 

  19. Liu, Y., Khandagale, S., White, C., Neiswanger, W.: Synthetic benchmarks for scientific research in explainable machine learning. In: Advances in Neural Information Processing Systems Datasets Track (2021)

    Google Scholar 

  20. Longo, L., et al.: Explainable artificial intelligence (XAI) 2.0: a manifesto of open challenges and interdisciplinary research directions. Inf. Fusion 106, 102301 (2024)

    Google Scholar 

  21. Lopes, P., Silva, E., Braga, C., Oliveira, T., Rosado, L.: XAI systems evaluation: a review of human and computer-centred methods. Appl. Sci. 12(19) (2022)

    Google Scholar 

  22. Lundberg, S.M., Lee, S.I.: A unified approach to interpreting model predictions. In: Guyon, I., et al. (eds.) Advances in Neural Information Processing Systems, vol. 30. Curran Associates, Inc. (2017)

    Google Scholar 

  23. Lundberg, S.M., Lee, S.I.: Consistent feature attribution for tree ensembles (2018)

    Google Scholar 

  24. Nash, W., Sellers, T., Talbot, S., Cawthorn, A., Ford, W.: The population biology of abalone in Tasmania. Sea Fisheries Division, Technical Report No 48 (1994)

    Google Scholar 

  25. Nauta, M., et al.: From anecdotal evidence to quantitative evaluation methods: a systematic review on evaluating explainable AI. ACM Comput. Surv. 55(13s) (2023)

    Google Scholar 

  26. Pang, G., Cao, L., Chen, L.: Homophily outlier detection in non-IID categorical data. Data Min. Knowl. Disc. 35(4), 1163–1224 (2021)

    Article  MathSciNet  Google Scholar 

  27. Pang, G., Shen, C., Cao, L., Hengel, A.V.D.: Deep learning for anomaly detection: a review. ACM Comput. Surv. (CSUR) 54(2), 1–38 (2021)

    Article  Google Scholar 

  28. Panigutti, C., et al.: Co-design of human-centered, explainable AI for clinical decision support. ACM Trans. Interact. Intell. Syst. 13(4) (2023)

    Google Scholar 

  29. Petsiuk, V., Das, A., Saenko, K.: Rise: randomized input sampling for explanation of black-box models. In: British Machine Vision Conference (BMVC) (2018)

    Google Scholar 

  30. Ribeiro, M.T., Singh, S., Guestrin, C.: “why should I trust you?”: explaining the predictions of any classifier. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2016, pp. 1135–1144. Association for Computing Machinery, New York, NY, USA (2016)

    Google Scholar 

  31. Ribeiro, M.T., Singh, S., Guestrin, C.: High-precision model-agnostic explanations. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32, no. (1) (2018)

    Google Scholar 

  32. Saeed, W., Omlin, C.: Explainable AI (XAI): a systematic meta-survey of current challenges and future opportunities. Knowl.-Based Syst. 263, 110273 (2023)

    Article  Google Scholar 

  33. Salojarvi, J., Puolamaki, K., Simola, J., Kovanen, L., Kojo, I., Kaski, S.: Inferring relevance from eye movements: feature extraction. In: Publications in Computer and Information Science (2005)

    Google Scholar 

  34. Shrikumar, A., Greenside, P., Shcherbina, A., Kundaje, A.: Not just a black box: learning important features through propagating activation differences (2017)

    Google Scholar 

  35. Simonyan, K., Vedaldi, A., Zisserman, A.: Deep inside convolutional networks: visualising image classification models and saliency maps, pp. 1–8. ICLR (2014)

    Google Scholar 

  36. Sundararajan, M., Taly, A., Yan, Q.: Axiomatic attribution for deep networks. In: Precup, D., Teh, Y.W. (eds.) Proceedings of the 34th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 70, pp. 3319–3328. PMLR, 06–11 August 2017

    Google Scholar 

  37. Yang, W., Li, J., Xiong, C., Hoi, S.C.H.: MACE: an efficient model-agnostic framework for counterfactual explanation (2022)

    Google Scholar 

  38. Yeh, I.C., Hui Lien, C.: The comparisons of data mining techniques for the predictive accuracy of probability of default of credit card clients. Expert Syst. Appl. 36(2, Part 1), 2473–2480 (2009)

    Google Scholar 

  39. Zeiler, M.D., Fergus, R.: Visualizing and understanding convolutional networks. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8689, pp. 818–833. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10590-1_53

    Chapter  Google Scholar 

  40. Zhang, Q., et al.: Towards an integrated evaluation framework for XAI: an experimental study. Procedia Comput. Sci. 207, 3884–3893 (2022). Knowledge-Based and Intelligent Information & Engineering Systems: Proceedings of the 26th International Conference KES2022

    Google Scholar 

  41. Zhang, X., Marwah, M., Lee, I.T., Arlitt, M., Goldwasser, D.: ACE - an anomaly contribution explainer for cyber-security applications. In: 2019 IEEE International Conference on Big Data (Big Data), pp. 1991–2000 (2019)

    Google Scholar 

  42. Zhou, J., Gandomi, A.H., Chen, F., Holzinger, A.: Evaluating the quality of machine learning explanations: a survey on methods and metrics. Electronics 10(5), 593 (2021)

    Article  Google Scholar 

Download references

Acknowledgments

The authors would like to thank Laura Li Puma, Paolo Racca, Silvia Ronchiadin, Mauro Giuseppe Ronzano, Mauro Paolo Valorio, for their useful comments. The authors would like to thank Valerio Cencig, Andrea Cosentini, Mario D’Almo and Luigi Ruggerone for supporting the research team.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Alan Perotti .

Editor information

Editors and Affiliations

Ethics declarations

Disclosure of Interests

The research was conducted within the AFC Digital Hub (Anti Financial Crime Digital Hub), a Turin-based consortium to fight digital financial crime through the use of new technologies and artificial intelligence. AFC Digital Hub’s members are Intesa Sanpaolo, Intesa Sanpaolo Innovation Center, the Polytechnic University of Turin, the University of Turin and CENTAI.

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Perotti, A., Borile, C., Miola, A., Nerini, F.P., Baracco, P., Panisson, A. (2024). Explainability, Quantified: Benchmarking XAI Techniques. In: Longo, L., Lapuschkin, S., Seifert, C. (eds) Explainable Artificial Intelligence. xAI 2024. Communications in Computer and Information Science, vol 2153. Springer, Cham. https://doi.org/10.1007/978-3-031-63787-2_22

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-63787-2_22

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-63786-5

  • Online ISBN: 978-3-031-63787-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics