Predictive software maintenance utilizing cross-project data

Khatri, Yogita; Singh, Sandeep Kumar

doi:10.1007/s13198-023-01957-6

Predictive software maintenance utilizing cross-project data

ORIGINAL ARTICLE
Published: 23 June 2023

Volume 15, pages 1503–1518, (2024)
Cite this article

International Journal of System Assurance Engineering and Management Aims and scope Submit manuscript

153 Accesses
1 Citation
Explore all metrics

Abstract

To improve the software quality and reduce the maintenance cost, cross-project fault prediction (CPFP) identifies faulty software components in a particular project (aka target project) using the historical fault data of other projects (aka source/reference projects). Although several diverse approaches/models have been proposed in the past, there exists room for improvement in the prediction performance. Further, they did not consider effort-based evaluation metrics (EBEMs), which are important to ensure the model’s application in the industry, undertaking a realistic constraint of having a limited inspection effort. Besides, they validated their respective approaches using a limited number of datasets. Addressing these issues, we propose an improved CPFP model with its validation on a large corpus of data containing 62 datasets in terms of EBEMs (PIM@20%, Cost-effectiveness@20%, and IFA) and other machine learning-based evaluation metrics (MLBEMs) like PF, G-measure, and MCC. The reference data and the target data are first normalized to reduce the distribution divergence between them and then the relevant training data is selected from the reference data using the KNN algorithm. Seeing the experimental and statistical test results, we claim the efficacy of our proposed model over state-of-the-art CPFP models namely the Turhan-Filter and Cruz model comprehensively. Thus, the proposed CPFP model provides an effective solution for predicting faulty software components, enabling practitioners in developing quality software with lesser maintenance cost.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Cross project defect prediction: a comprehensive survey with its SWOT analysis

Article 03 January 2021

RELMP-MM: an approach to cross project fault prediction using improved regularized extreme learning machine and identical matched metrics

Article 31 March 2022

Combined classifier for cross-project defect prediction: an extended empirical study

Article 15 February 2018

References

Arar ÖF, Ayan K (2015) Software defect prediction using cost-sensitive neural network. Appl Soft Comput 33:263–277. https://doi.org/10.1016/J.ASOC.2015.04.045
Article Google Scholar
Basili VR, Briand LC, Melo WL (1996) A validation of object-oriented design metrics as quality indicators. IEEE Trans Software Eng 22(10):751–761. https://doi.org/10.1109/32.544352
Article Google Scholar
Bisi M, Goyal NK (2016) An ANN-PSO-based model to predict fault-prone modules in software. Int J Reliab Saf 10(3):243–264. https://doi.org/10.1504/IJRS.2016.081611
Article Google Scholar
Bowes D, Hall T, Petrić J (2018) Software defect prediction: do different classifiers find the same defects? Softw Qual J 26(2):525–552. https://doi.org/10.1007/s11219-016-9353-3
Article Google Scholar
Briand LC, Melo WL, Wüst J (2002) Assessing the applicability of fault-proneness models across object-oriented software projects. IEEE Trans Softw Eng 28(7):706–720. https://doi.org/10.1109/TSE.2002.1019484
Article Google Scholar
Canfora G, Lucia AD, Penta MD, Oliveto R, Panichella A, Panichella S (2015) Defect prediction as a multiobjective optimization problem. Softw Test Verif Reliab 25(4):426–459. https://doi.org/10.1002/STVR.1570
Article Google Scholar
Chen L, Fang B, Shang Z, Tang Y (2015) Negative samples reduction in cross-company software defects prediction. Inf Softw Technol 62(1):67–77. https://doi.org/10.1016/j.infsof.2015.01.014
Article Google Scholar
Cover TM, Hart PE (1967) Nearest neighbor pattern classification. IEEE Trans Inf Theory 13(1):21–27. https://doi.org/10.1109/TIT.1967.1053964
Article Google Scholar
Cruz AEC, Ochimizu K (2009) Towards logistic regression models for predicting fault-prone code across software projects. In: 2009 3rd International symposium on empirical software engineering and measurement, pp 460–463. https://doi.org/10.1109/ESEM.2009.5316002
D’Ambros M, Lanza M, Robbes R (2012) Evaluating defect prediction approaches: a benchmark and an extensive comparison. Empir Softw Eng 17(4–5):531–577. https://doi.org/10.1007/s10664-011-9173-9
Article Google Scholar
Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30
MathSciNet Google Scholar
He P, Li B, Liu X, Chen J, Ma Y (2015) An empirical study on software defect prediction with a simplified metric set. Inf Softw Technol 59:170–190. https://doi.org/10.1016/j.infsof.2014.11.006
Article Google Scholar
Herbold S, Trautsch A, Grabowski J (2018) A comparative study to benchmark cross-project defect prediction approaches. IEEE Trans Softw Eng 44(9):811–833. https://doi.org/10.1109/TSE.2017.2724538
Article Google Scholar
Herbold S (2013) Training data selection for cross-project defect prediction. In: ACM international conference proceeding series, Part F1288, pp 1–10. https://doi.org/10.1145/2499393.2499397
Hosseini S, Turhan B, Gunarathna D (2019) A systematic literature review and meta-analysis on cross project defect prediction. IEEE Trans Softw Eng 45(2):111–147. https://doi.org/10.1109/TSE.2017.2770124
Article Google Scholar
Huang Q, Xia X, Lo D (2018) Revisiting supervised and unsupervised models for effort-aware just-in-time defect prediction. Empir Softw Eng 24(5):2823–2862. https://doi.org/10.1007/s10664-018-9661-2
Article Google Scholar
Jureczko M, Madeyski L (2010) Towards identifying software project clusters with regard to defect prediction. In: ACM international conference proceeding series, pp 1–10. https://doi.org/10.1145/1868328.1868342
Jureczko M, Spinellis D (2010) Using object-oriented design metrics to predict software defects. In: Models and methods of system dependability. Oficyna Wydawnicza Politechniki Wrocławskiej, pp 69–81. http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.226.2285
Kassab M, Defranco JF, Laplante PA (2017) Software testing: the state of the practice. IEEE Softw 34(5):46–52. https://doi.org/10.1109/MS.2017.3571582
Article Google Scholar
Kawata K, Amasaki S, Yokogawa T (2015) Improving relevancy filter methods for cross-project defect prediction. In: Proceedings—3rd international conference on applied computing and information technology and 2nd international conference on computational science and intelligence, ACIT-CSI 2015, pp 2–7. https://doi.org/10.1109/ACIT-CSI.2015.104
Khatri Y, Singh SK (2021) Cross project defect prediction: a comprehensive survey with its SWOT analysis. Innov Syst Softw Eng. https://doi.org/10.1007/s11334-020-00380-5
Article Google Scholar
Khatri Y, Singh SK (2022) Towards building a pragmatic cross-project defect prediction model combining non-effort based and effort-based performance measures for a balanced evaluation. Inf Softw Technol 150:106980. https://doi.org/10.1016/J.INFSOF.2022.106980
Article Google Scholar
Kochhar PS, Xia X, Lo D, Li S (2016) Practitioners’ expectations on automated fault localization. In: Proceedings of the 25th international symposium on software testing and analysis, pp 165–176. https://doi.org/10.1145/2931037.2931051
Lessmann S, Baesens B, Mues C, Pietsch S (2008) Benchmarking classification models for software defect prediction: a proposed framework and novel findings. IEEE Trans Softw Eng 34(4):485–496. https://doi.org/10.1109/TSE.2008.35
Article Google Scholar
Liu Y, Khoshgoftaar TM, Seliya N (2010) Evolutionary optimization of software quality modeling with multiple repositories. IEEE Trans Softw Eng 36(6):852–864. https://doi.org/10.1109/TSE.2010.51
Article Google Scholar
Lu H, Cukic B, Culp M (2012) Software defect prediction using semi-supervised learning with dimension reduction. In: 2012 27th IEEE/ACM international conference on automated software engineering, ASE 2012 —Proceedings, pp 314–317. https://doi.org/10.1145/2351676.2351734
Ma Y, Luo G, Zeng X, Chen A (2012) Transfer learning for cross-company software defect prediction. Inf Softw Technol 54(3):248–256. https://doi.org/10.1016/j.infsof.2011.09.007
Article Google Scholar
Menzies T, Greenwald J, Frank A (2007) Data mining static code attributes to learn defect predictors. IEEE Trans Softw Eng 33(1):2–13. https://doi.org/10.1109/TSE.2007.256941
Article Google Scholar
Meyer AN, Fritz T, Murphy GC, Zimmermann T (2014) Software developers’ perceptions of productivity. In: Proceedings of the ACM SIGSOFT symposium on the foundations of software engineering, 16–21 November, pp 19–29. https://doi.org/10.1145/2635868.2635892
Nam J, Pan SJ, Kim S (2013) Transfer defect learning. In: Proceedings—international conference on software engineering, pp 382–391. https://doi.org/10.1109/ICSE.2013.6606584
Ostrand TJ, Weyuker EJ, Bell RM (2005) Predicting the location and number of faults in large software systems. IEEE Trans Softw Eng 31(4):340–355. https://doi.org/10.1109/TSE.2005.49
Article Google Scholar
Pan SJ, Tsang IW, Kwok JT, Yang Q (2011) Domain adaptation via transfer component analysis. IEEE Trans Neural Netw 22(2):199–210. https://doi.org/10.1109/TNN.2010.2091281
Article Google Scholar
Pelayo L, Dick S (2007) Applying novel resampling strategies to software defect prediction. In: Annual conference of the north American fuzzy information processing society—NAFIPS, pp 69–72. https://doi.org/10.1109/NAFIPS.2007.383813
Peng L, Yang B, Chen Y, Abraham A (2009) Data gravitation based classification. Inf Sci 179(6):809–819. https://doi.org/10.1016/j.ins.2008.11.007
Article Google Scholar
Ryu D, Jang JI, Baik J (2017) A transfer cost-sensitive boosting approach for cross-project defect prediction. Softw Qual J 25(1):235–272. https://doi.org/10.1007/s11219-015-9287-1
Article Google Scholar
Subramanyam R, Krishnan MS (2003) Empirical analysis of CK metrics for object-oriented design complexity: implications for software defects. IEEE Trans Softw Eng 29(4):297–310. https://doi.org/10.1109/TSE.2003.1191795
Article Google Scholar
Turhan B, Menzies T, Bener AB, Di Stefano J (2009) On the relative value of cross-company and within-company data for defect prediction. Empir Softw Eng 14:540–578. https://doi.org/10.1007/s10664-008-9103-7
Article Google Scholar
Wang T, Zhang Z, Jing X, Zhang L (2015) Multiple kernel ensemble learning for software defect prediction. Autom Softw Eng 23(4):569–590. https://doi.org/10.1007/S10515-015-0179-1
Article Google Scholar
Watanabe S, Kaiya H, Kaijiri K (2008) Adapting a fault prediction model to allow inter language reuse. In: Proceedings—international conference on software engineering, pp 19–24. https://doi.org/10.1145/1370788.1370794
Zhou Y, Yang Y, Lu H, Chen L, Li Y, Zhao Y, Qian J, Xu B (2018) How far we have progressed in the journey? An examination of cross-project defect prediction. ACM Trans Softw Eng Methodol 27(1):1–51. https://doi.org/10.1145/3183339
Article Google Scholar
Zimmermann T, Nagappan N, Gall H, Giger E, Murphy B (2009) Cross-project defect prediction: a large scale experiment on data vs. domain vs. process. In: ESEC-FSE’09—Proceedings of the joint 12th European software engineering conference and 17th ACM SIGSOFT symposium on the foundations of software engineering, pp 91–100. https://doi.org/10.1145/1595696.1595713

Download references

Funding

No funds, grants, or other support was received.

Author information

Authors and Affiliations

Department of Computer Science Engineering and Information Technology, Jaypee Institute of Information Technology, Noida, 201309, India
Yogita Khatri & Sandeep Kumar Singh

Authors

Yogita Khatri
View author publications
You can also search for this author in PubMed Google Scholar
Sandeep Kumar Singh
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yogita Khatri.

Ethics declarations

Conflict of interest

The authors have no relevant financial or non-financial interests to disclose.

Human and/or animals participants

This study doesn’t involve any human/animal participants.

Informed consent

This study does not involve any human/animal participants, as a result no consent is needed.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Khatri, Y., Singh, S.K. Predictive software maintenance utilizing cross-project data. Int J Syst Assur Eng Manag 15, 1503–1518 (2024). https://doi.org/10.1007/s13198-023-01957-6

Download citation

Received: 22 September 2022
Revised: 16 May 2023
Accepted: 24 May 2023
Published: 23 June 2023
Issue Date: April 2024
DOI: https://doi.org/10.1007/s13198-023-01957-6

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Predictive software maintenance utilizing cross-project data

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Cross project defect prediction: a comprehensive survey with its SWOT analysis

RELMP-MM: an approach to cross project fault prediction using improved regularized extreme learning machine and identical matched metrics

Combined classifier for cross-project defect prediction: an extended empirical study

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Human and/or animals participants

Informed consent

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Predictive software maintenance utilizing cross-project data

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Cross project defect prediction: a comprehensive survey with its SWOT analysis

RELMP-MM: an approach to cross project fault prediction using improved regularized extreme learning machine and identical matched metrics

Combined classifier for cross-project defect prediction: an extended empirical study

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Human and/or animals participants

Informed consent

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation