Abstract
Context:
Test-driven development (TDD) is an agile software development approach that has been widely claimed to improve software quality. However, the extent to which TDD improves quality appears to be largely dependent upon the characteristics of the study in which it is evaluated (e.g., the research method, participant type, programming environment, etc.). The particularities of each study make the aggregation of results untenable.
Objectives:
The goal of this paper is to: increase the accuracy and generalizability of the results achieved in isolated experiments on TDD, provide joint conclusions on the performance of TDD across different industrial and academic settings, and assess the extent to which the characteristics of the experiments affect the quality-related performance of TDD.
Method:
We conduct a family of 12 experiments on TDD in academia and industry. We aggregate their results by means of meta-analysis. We perform exploratory analyses to identify variables impacting the quality-related performance of TDD.
Results:
TDD novices achieve a slightly higher code quality with iterative test-last development (i.e., ITL, the reverse approach of TDD) than with TDD. The task being developed largely determines quality. The programming environment, the order in which TDD and ITL are applied, or the learning effects from one development approach to another do not appear to affect quality. The quality-related performance of professionals using TDD drops more than for students. We hypothesize that this may be due to their being more resistant to change and potentially less motivated than students.
Conclusion:
Previous studies seem to provide conflicting results on TDD performance (i.e., positive vs. negative, respectively). We hypothesize that these conflicting results may be due to different study durations, experiment participants being unfamiliar with the TDD process, or case studies comparing the performance achieved by TDD vs. the control approach (e.g., the waterfall model), each applied to develop a different system. Further experiments with TDD experts are needed to validate these hypotheses.
Similar content being viewed by others
Notes
For simplicity’s sake, we refer to quality and external quality interchangeably throughout the rest of the article. We acknowledge the limitations of this under the threats to validity.
Note that in our experiments the programming language is confounded with other variables: IDE, testing tools, and other programming environment related variables (the use of Java implies the use of Java-related technologies, while the use of C++/C# implies the use of C++/C#-related technologies). We have grouped all confounded variables under the programming environment name.
Due to space restrictions, we moved the references of the primary studies to the A.
It was not feasible to compute any response ratio synthesizing the quality achieved with TDD with respect to a control approach.
The outlier observed in [P35] may have been due to the small number of participants, and the larger variability of results expected in small sample sizes (Cumming 2013).
Note that the implementation style influences the size of the programs. We do not claim that these are gold implementations with the optimal design and coding practices. In fact, there are several implementations of these katas in public GitHub repositories.
Both measured with eclEmma: https://www.eclemma.org/
Measured with muJava: https://cs.gmu.edu/~offutt/mujava/
Note that the fact that participants do not have any ITL and TDD experience does not mean that they have no software testing experience. ITL and TDD have to do with knowledge of slicing and not with knowledge of testing. Therefore, participants with testing experience might conceivably have no experience with either ITL and/or TDD.
We analyzed the data with t-tests. Therefore, the mean difference (i.e., the slope of the line) provides useful information for evaluating experiment results.
In our experiments, the IDEs and testing tools used with C++ and C# are different. In this study, however, we make the simplification of considering them as being a part of the same group of technologies merely for the purposes of comparison.
A type of reactivity in which individuals modify an aspect of their behavior in response to their awareness of being observed.
References
Astels D (2003) Test driven development: A practical guide. Prentice Hall Professional Technical Reference
Baltes S, Diehl S (2018) Towards a theory of software development expertise. arXiv:1807.06087
Basili V R (1992) Software modeling and measurement: the goal/question/metric paradigm
Basili V R, Shull F, Lanubile F (1999) Building knowledge through families of experiments. IEEE Trans Softw Eng 25(4):456–473
Beck K (2003) Test-driven development: by example. Addison-Wesley Professional
Bergersen GR, Sjoberg DIK, Dyba T (2014) Construction and validation of an instrument for measuring programming skill. IEEE Trans Softw Eng 40 (12):1163–1184
Bertolino A (2007) Software testing research: Achievements, challenges, dreams. In: 2007 Future of Software Engineering. IEEE Computer Society, pp 85–103
Bissi W, Neto A G S S, Emer M C F P (2016) The effects of test driven development on internal quality, external quality and productivity: A systematic review. Inf Softw Technol 74:45–54
Borenstein M, Hedges L V, Higgins JPT, Rothstein H R (2011) Introduction to meta-analysis. Wiley
Brown H, Prescott R (2014) Applied mixed models in medicine. Wiley
Causevic A, Sundmark D, Punnekkat S (2011) Factors limiting industrial adoption of test driven development: A systematic review. In: 2011 IEEE Fourth International Conference on Software Testing, Verification and Validation (ICST). IEEE, pp 337–346
Cooper H, Patall E A (2009) The relative benefits of meta-analysis conducted with individual participant data versus aggregated data. Psychol Methods 14(2):165
Cumming G (2013) Understanding the new statistics: Effect sizes, confidence intervals, and meta-analysis. Routledge
de Winter JCF (2013) Using the student’s t-test with extremely small sample sizes. Pract Assess Res Eval 18(10). [Online; accessed 28-August-2018]
Dieste O, Aranda A M, Uyaguari F, Turhan B, Tosun A, Fucci D, Oivo M, Juristo N (2017) Empirical evaluation of the effects of experience on code quality and programmer productivity: an exploratory study. Empir Softw Eng 22(5):2457–2542
Falessi D, Juristo N, Wohlin C, Turhan B, Münch J, Jedlitschka A, Oivo M (2018) Empirical software engineering experts on the use of students and professionals in experiments. Empir Softw Eng 23(1):452–489
Feigenspan J, Kästner C, Liebig J, Apel S, Hanenberg S (2012) Measuring programming experience. In: 2012 IEEE 20th International Conference on Program Comprehension (ICPC). IEEE, pp 73–82
Field A (2013) Discovering statistics using ibm spss statistics. Sage
Fisher DJ, Copas AJ, Tierney JF, Parmar MKB (2011) A critical review of methods for the assessment of patient-level interactions in individual participant data meta-analysis of randomized trials, and guidance for practitioners. J Clin Epidemiol 64(9):949–967
Fucci D, Erdogmus H, Turhan B, Oivo M, Juristo N (2017) A dissection of the test-driven development process: does it really matter to test-first or to test-last?. IEEE Trans Softw Eng 43(7):597–614
Gómez O S, Juristo N, Vegas S (2014) Understanding replication of experiments in software engineering: A classification. Inf Softw Technol 56 (8):1033–1048
Gurevitch J, Koricheva J, Nakagawa S, Stewart G (2018) Meta-analysis and the science of research synthesis. Nature 555(7695):175
Higgins JPT, Green S, et al. (2008) Cochrane handbook for systematic reviews of interventions, vol 5. Wiley Online Library
ISO/IEC 25010:2011 (2011) https://www.iso.org/obp/ui/#iso:std:iso-iec:25010:ed-1:v1:en
Jia Y, Harman M (2011) An analysis and survey of the development of mutation testing. IEEE Trans Softw Eng 37(5):649–678
Jung J, Hoefig K, Domis D, Jedlitschka A, Hiller M (2013) Experimental comparison of two safety analysis methods and its replication. In: 2013 ACM/IEEE International Symposium on Empirical Software Engineering and Measurement. IEEE, pp 223–232
Juristo N, Moreno A M (2001) Basics of software engineering experimentation. Springer Science & Business Media
Juristo N, Vegas S (2009) Using differences among replications of software engineering experiments to gain knowledge. In: Proceedings of the 2009 3rd International Symposium on Empirical Software Engineering and Measurement. IEEE Computer Society, pp 356–366
Karac I, Turhan B (2018) What do we (really) know about test-driven development?. IEEE Softw 35(4):81–85
Karac E I, Turhan B, Juristo N (2019) A controlled experiment with novice developers on the impact of task description granularity on software quality in test-driven development. IEEE Transactions on Software Engineering
Kitchenham B (2008) The role of replications in empirical software engineering, a word of warning. Empir Softw Eng 13(2):219–221
Kollanus S (2010) Test-driven development-still a promising approach? In: Quality of Information and Communications Technology (QUATIC), 2010 Seventh International Conference on the. , pp 403–408
Kruger J, Dunning D (1999) Unskilled and unaware of it: how difficulties in recognizing one’s own incompetence lead to inflated self-assessments. J Person Soc Psychol 77(6):1121
Lau J, Ioannidis John PA, Schmid C H (1998) Summing up evidence: one answer is not always enough. Lancet 351(9096):123–127
Lumley T, Diehr P, Emerson S, Chen L (2002) The importance of the normality assumption in large public health data sets. Ann Rev Public Health 23(1):151–169
Mäkinen S, Münch J (2014) Effects of test-driven development: A comparative analysis of empirical studies. In: International Conference on Software Quality. Springer, pp 155–169
Martin CR (2001) Advanced principles, patterns and process of software development. Prentice Hall
Munir H, Moayyed M, Petersen K (2014) Considering rigor and relevance when evaluating test driven development: A systematic review. Inf Softw Technol 56(4):375–394
Myers G J, Sandler C, Badgett T (2011) The art of software testing. Wiley
Norman G (2010) Likert scales, levels of measurement and the laws of statistics. Adv Health Sci Educ 15(5):625–632
Offutt J (2018) Why don’t we publish more TDD research papers?. Softw Test Verif Reliab 28(4):e1670
Quinn G P, Keough M J (2002) Experimental design and data analysis for biologists. Cambridge University Press
Rafique Y, Mišić V B (2013) The effects of test-driven development on external quality and productivity: A meta-analysis. IEEE Trans Softw Eng 39(6):835–856
Riley R D, Lambert P C, Abo-Zaid G (2010) Meta-analysis of individual participant data: rationale, conduct, and reporting. Bmj 340:c221
Rosenthal R (1991) Meta-analytic procedures for social research, vol 6. Sage
Santos A, Gomez O S, Juristo N (2018a) Analyzing families of experiments in SE: a systematic mapping study. IEEE Trans Softw Eng:1. https://doi.org/10.1109/TSE.2018.2864633
Santos A, Jarvinen J, Partanen J, Oivo M, Juristo N (2018b) Does the performance of tdd hold across software companies and premises? a group of industrial experiments on tdd. In: International Conference on Product-Focused Software Process Improvement. Springer, pp 227–242
Santos A, Vegas S, Oivo M, Juristo N (2018c) Guidelines for analyzing families of experiments in SE. Submitted to IEEE Transactions on Software Engineering
Schmider E, Ziegler M, Danay E, Beyer L, Bühner M (2010) Is it really robust? Methodology
Shull F, Melnik G, Turhan B, Layman L, Diep M, Erdogmus H (2010) What do we know about test-driven development?. IEEE Softw 27(6):16–19
Sjøberg D IK, Bergersen G R (2018) The price of using students comments on empirical software engineering experts on the use of students and professionals in experiments. CoRR, arXiv:1810.10791
Thorlund K, Imberger G, Johnston B C, Walsh M, Awad T, Thabane L, Gluud C, Devereaux PJ, Wetterslev J (2012) Evolution of heterogeneity (i2) estimates and their 95% confidence intervals in large meta-analyses. PloS One 7(7):e39471
Tosun A, Dieste O, Fucci D, Vegas S, Turhan B, Erdogmus H, Santos A, Oivo M, Toro K, Jarvinen J et al (2017) An industry experiment on the effects of test-driven development on external quality and productivity. Empir Softw Eng 22(6):2763–2805
Tosun A, Dieste O, Vegas S, Pfahl D, Rungi K, Juristo N (In press) Investigating the impact of development task on external quality in test-driven development: An industry experiment. IEEE Transactions on Software Engineering
Vegas S, Dieste O, Juristo N (2015) Difficulties in running experiments in the software industry: experiences from the trenches. In: Proceedings of the Third International Workshop on Conducting Empirical Studies in Industry at ICSE. IEEE Press, pp 3–9
Vickers A J (2005) Parametric versus non-parametric statistics in the analysis of randomized trials with non-normally distributed data. BMC Med Res Methodol 5(1):35
Williams L, Kessler R (2002) Pair programming illuminated. Addison-Wesley Longman Publishing Co., Inc.
Wohlin C, Runeson P, Höst M, Ohlsson M C, Regnell B, Wesslén A (2012) Experimentation in software engineering. Springer Science & Business Media
Acknowledgements
This research was developed with the support of project PGC2018-097265-B-I00, funded by: FEDER/Spanish Ministry of Science and Innovation—Research State Agency. We would like to thank the participants in the ESEIL experiments: this research would not have been possible without your help. We would also like to thank the anonymous reviewers for their valuable comments during the review of the manuscript.
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by: Jeff Offutt
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix: : Primary Studies
Appendix: : Primary Studies
-
[P1]
Aniche, M.F., Gerosa, M.A.: Most common mistakes in test-driven development practice: Results from an online survey with developers. In: Software Testing, Verification, and Validation Workshops (ICSTW), 2010 Third International Conference on, pp. 469–478. IEEE (2010)
-
[P2]
Bannerman, S., Martin, A.: A multiple comparative study of test-with development product changes and their effects on team speed and product quality. Empirical Software Engineering 16(2), 177–210 (2011)
-
[P3]
Bhat, T., Nagappan, N.: Evaluating the efficacy of test-driven development: industrial case studies. In: Proceedings of the 2006 ACM/IEEE international symposium on Empirical software engineering, pp. 356–363. ACM (2006)
-
[P4]
Damm, L.O., Lundberg, L.: Quality impact of introducing component-level test automation and test-driven development. In: European Conference on Software Process Improvement, pp. 187–199. Springer (2007)
-
[P5]
Desai, C., Janzen, D.S., Clements, J.: Implications of integrating test-driven development into cs1/cs2 curricula. In: ACM SIGCSE Bulletin, vol. 41, pp. 148–152. ACM (2009)
-
[P6]
Dogša, T., Batič, D.: The effectiveness of test-driven development: an industrial case study. Software Quality Journal 19(4), 643–661 (2011)
-
[P7]
Domino, M.A., Collins, R.W., Hevner, A.R.: Controlled experimentation on adaptations of pair programming. Information Technology and Management 8(4), 297–312 (2007)
-
[P8]
Edwards, S.H.: Using test-driven development in the classroom: Providing students with automatic, concrete feedback on performance. In: Proceedings of the international conference on education and information systems: technologies and applications EISTA, vol. 3. Citeseer (2003)
-
[P9]
Erdogmus, H., Morisio, M., Torchiano, M.: On the effectiveness of the test-first approach to programming. IEEE Transactions on software Engineering 31(3), 226–237 (2005)
-
[P10]
George, B., Williams, L.: A structured experiment of test-driven development. Information and software Technology 46(5), 337–342 (2004)
-
[P11]
George, B., et al.: Analysis and quantification of test driven development approach (2002)
-
[P12]
Geras, A., Smith, M., Miller, J.: A prototype empirical evaluation of test driven development. In: Software Metrics, 2004. Proceedings. 10th International Symposium on, pp. 405–416. IEEE (2004)
-
[P13]
Gupta, A., Jalote, P.: An experimental evaluation of the effectiveness and efficiency of the test driven development. In: First International Symposium on Empirical Software Engineering and Measurement (ESEM 2007), pp. 285–294. IEEE (2007)
-
[P14]
Huang, L., Holcombe, M.: Empirical investigation towards the effectiveness of test first programming. Information and Software Technology 51(1), 182–194 (2009)
-
[P15]
Kobayashi, O., Kawabata, M., Sakai, M., Parkinson, E.: Analysis of the interaction between practices for introducing xp effectively. In: Proceedings of the 28th international conference on Software engineering, pp. 544–550. ACM (2006)
-
[P16]
LeJeune, N.F.: Teaching software engineering practices with extreme programming. Journal of Computing Sciences in Colleges 21(3), 107–117 (2006)
-
[P17]
Lui, K.M., Chan, K.C.: Test driven development and software process improvement in china. In: International Conference on Extreme Programming and Agile Processes in Software Engineering, pp. 219–222. Springer (2004)
-
[P18]
Madeyski, L., Sza la, L.: The impact of test-driven development on software development productivity—an empirical study. In: European Conference on Software Process Improvement, pp. 200–211. Springer (2007)
-
[P19]
Marchenko, A., Abrahamsson, P., Ihme, T.: Long-term effects of test-driven development a case study. In: International Conference on Agile Processes and Extreme Programming in Software Engineering, pp. 13–22. Springer (2009)
-
[P20]
Maximilien, E.M., Williams, L.: Assessing test-driven development at ibm. In: Software Engineering, 2003. Proceedings. 25th International Conference on, pp. 564–569. IEEE (2003)
-
[P21]
McDaid, K., Rust, A., Bishop, B.: Test-driven development: can it work for spreadsheets? In: Proceedings of the 4th international workshop on End-user software engineering, pp. 25–29. ACM (2008)
-
[P22]
Mueller, M.M., Hagner, O.: Experiment about test-first programming. IEE Proceedings-Software 149(5), 131–136 (2002)
-
[P23]
Nagappan, N., Maximilien, E.M., Bhat, T., Williams, L.: Realizing quality improvement through test driven development: results and experiences of four industrial teams. Empirical Software Engineering 13(3), 289–302 (2008)
-
[P24]
Pančur, M., Ciglarič, M.: Impact of test-driven development on productivity, code and tests: A controlled experiment. Information and Software Technology 53(6), 557–573 (2011)
-
[P25]
Pancur, M., Ciglaric, M., Trampus, M., Vidmar, T.: Towards empirical evaluation of test-driven development in a university environment. In: EUROCON 2003. Computer as a Tool. The IEEE Region 8, vol. 2, pp. 83–86. IEEE (2003)
-
[P26]
Paula Filho, W.P.: Quality gates in use-case driven development. In: Proceedings of the 2006 international workshop on Software quality, pp. 33–38. ACM (2006)
-
[P27]
Rahman, S.M.: Applying the tbc method in introductory programming courses. In: Frontiers In Education Conference-Global Engineering: Knowledge Without Borders, Opportunities Without Passports, 2007. FIE’07. 37th Annual, pp. T1E–20. IEEE (2007)
-
[P28]
Sanchez, J.C., Williams, L., Maximilien, E.M.: On the sustained use of a test-driven development practice at ibm. In: Agile Conference (AGILE), 2007, pp. 5–14. IEEE (2007)
-
[P29]
Siniaalto, M., Abrahamsson, P.: Does test-driven development improve the program code? alarming results from a comparative case study. In: Balancing Agility and Formalism in Software Engineering, pp. 143–156. Springer (2008)
-
[P30]
Slyngstad, O.P.N., Li, J., Conradi, R., Rønneberg, H., Landre, E., Wesenberg, H.: The impact of test driven development on the evolution of a reusable framework of components–an industrial case study. In: Software Engineering Advances, 2008. ICSEA’08. The Third International Conference on, pp. 214–223. IEEE (2008)
-
[P31]
Vu, J.H., Frojd, N., Shenkel-Therolf, C., Janzen, D.S.: Evaluating test-driven development in an industry-sponsored capstone project. In: Proceedings of the Sixth International Conference on Information Technology: New Generations, p. 229 (2009)
-
[P32]
Wilkerson, J.W., Nunamaker Jr, J.F., Mercer, R.: Comparing the defect reduction benefits of code inspection and test-driven development. IEEE Transactions on Software Engineering 38(3), 547 (2012)
-
[P33]
Xu, S., Li, T.: Evaluation of test-driven development: An academic case study. In: Software Engineering Research, Management and Applications 2009, pp. 229–238. Springer (2009)
-
[P34]
Yenduri, S., Perkins, A.L.: Impact of using test-driven development: A case study. Software Engineering Research and Practice 1(2006), 126–129 (2006)
-
[P35]
Ynchausti, R.A.: Integrating unit testing into a software development team’s process. XP 1, 84–87 (2001)
-
[P36]
Zielinski, K., Szmuc, T.: Preliminary analysis of the effects of pair programming and test-driven development on the external code quality. Frontiers in Artificial Intelligence and Applications p. 113 (2005)
Rights and permissions
About this article
Cite this article
Santos, A., Vegas, S., Dieste, O. et al. A family of experiments on test-driven development. Empir Software Eng 26, 42 (2021). https://doi.org/10.1007/s10664-020-09895-8
Accepted:
Published:
DOI: https://doi.org/10.1007/s10664-020-09895-8