Abstract
An essential part of software maintenance and evolution, refactoring is performed by developers, regardless of technology or domain, to improve the internal quality of the system, and reduce its technical debt. However, choosing the appropriate refactoring strategy is not always straightforward, resulting in developers seeking assistance. Although research in refactoring is well-established, with several studies altering between the detection of refactoring opportunities and the recommendation of appropriate code changes, little is known about their adoption in practice. Analyzing the perception of developers is critical to understand better what developers consider to be problematic in their code and how they handle it. Additionally, there is a need for bridging the gap between refactoring, as research, and its adoption in practice, by extracting common refactoring intents that are more suitable for what developers face in reality. In this study, we analyze refactoring discussions on Stack Overflow through a series of quantitative and qualitative experiments. Our results show that Stack Overflow is utilized by a diverse set of developers for refactoring assistance for a variety of technologies. Our observations show five areas that developers typically require help with refactoring– Code Optimization, Tools and IDEs, Architecture and Design Patterns, Unit Testing, and Database. We envision our findings better bridge the support between traditional (or academic) aspects of refactoring and their real-world applicability, including better tool support.
Similar content being viewed by others
Notes
When reading/comparing these two histograms, it should be noted that the ‘Frequency’ scale for the two charts differs.
In Stack Overflow, tags can only be associated with a question post.
References
Stack Overflow (2020) Adding and removing additional documents with roslyn. https://stackoverflow.com/questions/43933933. (Accessed on 06/05/2020)
Stack Overflow (2020) Automated refactor of myenum.myvalue.tostring() to nameof(myenum.myvalue). https://stackoverflow.com/questions/51482063. (Accessed on 06/05/2020)
Stack Overflow (2020) Can this mvc code be refactored using a design pattern? https://stackoverflow.com/questions/9597529. (Accessed on 06/05/2020)
Stack Overflow (2020) Complex refactor and version control with database projects. https://stackoverflow.com/questions/29132487. (Accessed on 06/05/2020)
Stack Overflow (2020) How can i refactor this python code to make it more readable and compact? https://stackoverflow.com/questions/56925926. (Accessed on 06/05/2020)
Stack Overflow (2020) How to refactor css grid for ie11 compatibility. https://stackoverflow.com/questions/53667530. (Accessed on 06/05/2020)
Stack Overflow (2020) How to remove non-project files from refactorings in idea? https://stackoverflow.com/questions/49636340. (Accessed on 06/05/2020)
Stack Overflow (2020) Making a thread-unsafe dll call in biztalk orchestration (or only running one orchestration at a time). https://stackoverflow.com/questions/7106884. (Accessed on 06/05/2020)
Stack Overflow (2020) Newest questions - stack overflow. https://stackoverflow.com/questions. (Accessed on 06/14/2020)
Stack Overflow (2020) Performance issue - refactor select subqueries which are doing the same joins. https://stackoverflow.com/questions/59950019. (Accessed on 06/05/2020)
Stack Overflow (2020) Php refactoring, too many methods in class? https://stackoverflow.com/questions/31029037. (Accessed on 06/05/2020)
Stack Overflow (2020) Popularity of programming language index. http://pypl.github.io/PYPL.html. (Accessed on 11/03/2020)
Project website (2021) https://www.scanl.org/
Stack Overflow (2020) Refactoring switch statement for data to different types of data - stack overflow. https://stackoverflow.com/questions/4299770/refactoring-switch-statement-for-data-to-different-types-of-data. (Accessed on 06/05/2020)
Stack Overflow (2020) Rename/refactor database elements - only scripts exists but not database. https://stackoverflow.com/questions/46043341. (Accessed on 06/05/2020)
Stack Overflow (2020) ruby on rails - how can i know which columns in my table are considered unique? - stack overflow. https://stackoverflow.com/questions/20929619. (Accessed on 06/05/2020)
Stack Overflow (2020) Scala refactoring. https://stackoverflow.com/questions/33958310. (Accessed on 06/05/2020)
Stack Overflow (2020) State of the stack 2019: A year in review - stack overflow blog. https://stackoverflow.blog/2019/01/18/state-of-the-stack-2019-a-year-in-review/. (Accessed on 06/14/2020)
Stack Overflow (2020) Tdd and refactoring the ”system under test”. https://stackoverflow.com/questions/38334608. (Accessed on 06/05/2020)
Stack Overflow (2020) Top ide index. https://pypl.github.io/IDE.html. (Accessed on 11/03/2020)
Stack Overflow (2020) What refactoring tools do you use for python? https://stackoverflow.com/questions/28796. (Accessed on 06/05/2020)
Abdellatif A, Costa D, Badran K, Abdalkareem R, Shihab E (2020) Challenges in chatbot development: A study of stack overflow posts. In: Proceedings of the 17th international conference on mining software repositories, MSR ’20. https://doi.org/10.1145/3379597.3387472. Association for Computing Machinery, New York, pp 174–185
Ahmed S, Bagherzadeh M (2018) What do concurrency developers ask about? a large-scale study using stack overflow. In: Proceedings of the 12th ACM/IEEE international symposium on empirical software engineering and measurement, ESEM ’18. Association for Computing Machinery, New York. https://doi.org/10.1145/3239235.3239524
Aljedaani W, Peruma A, Aljohani A, Alotaibi M, Mkaouer M, Ouni A, Newman C, Ghallab A, Ludi S (2021) Test smell detection tools: A systematic mapping study. In: Evaluation and assessment in software engineering, EASE 2021. https://doi.org/10.1145/3463274.3463335. Association for Computing Machinery, New York, pp 170–180
Allamanis M, Barr ET, Bird C, Sutton C (2014) Learning natural coding conventions. In: Proceedings of the 22nd ACM SIGSOFT international symposium on foundations of software engineering, FSE 2014. https://doi.org/10.1145/2635868.2635883. Association for Computing Machinery, New York, pp 281–293
Allamanis M, Sutton C (2013) Why, when, and what: Analyzing stack overflow questions by topic, type, and code. In: Proceedings of the 10th working conference on mining software repositories, MSR ’13. IEEE Press, pp 53–56
AlOmar E, Mkaouer MW, Ouni A (2019) Can refactoring be self-affirmed? an exploratory study on how developers document their refactoring activities in commit messages. In: 2019 IEEE/ACM 3rd international workshop on refactoring (IWoR). pp 51–58. https://doi.org/10.1109/IWoR.2019.00017
AlOmar E, Mkaouer M, Ouni A (2020) Toward the automatic classification of self-affirmed refactoring. J Syst Softw :110821
AlOmar E, Mkaouer M, Ouni A, Kessentini M (2019) On the impact of refactoring on the relationship between quality attributes and design metrics. In: 2019 ACM/IEEE international symposium on empirical software engineering and measurement (ESEM). IEEE, pp 1–11
AlOmar E, Peruma A, Mkaouer M, Newman C, Ouni A, Kessentini M (2020) How we refactor and how we document it? on the use of supervised machine learning algorithms to classify refactoring documentation. Expert Syst Appl :114176
AlOmar E, Rodriguez PT, Bowman J, Wang T, Adepoju B, Lopez K, Newman C, Ouni A, Mkaouer M (2020) How do developers refactor code to improve code reusability?. In: 19th International conference on software and software reuse. Springer, pp 261–276
Alrubaye H, Alshoaibi D, Alomar E, Mkaouer M, Ouni A (2020) How does library migration impact software quality and comprehension? an empirical study. In: International conference on software and software reuse. Springer, pp 245–260
Alshangiti M, Sapkota H, Murukannaiah PK, Liu X, Yu Q (2019) Why is developing machine learning applications challenging? a study on stack overflow posts. In: 2019 ACM/IEEE international symposium on empirical software engineering and measurement (ESEM). pp 1–11
Ambler S, Sadalage P (2006) Refactoring databases: evolutionary database design. Addison-Wesley Signature Series (Fowler). Pearson Education
Arnaoudova V, Di Penta M, Antoniol G (2016) Linguistic antipatterns: what they are and how developers perceive them. Empir Softw Eng 21 (1):104–158. https://doi.org/10.1007/s10664-014-9350-8
Arnaoudova V, Eshkevari LM, Penta MD, Oliveto R, Antoniol G, Guéhéneuc Y (2014) Repent: Analyzing the nature of identifier renamings. IEEE Trans Softw Eng 40(5):502–532
Bagherzadeh M, Khatchadourian R (2019) Going big: A large-scale study on what big data developers ask. In: Proceedings of the 2019 27th ACM joint meeting on european software engineering conference and symposium on the foundations of software engineering, ESEC/FSE 2019. https://doi.org/10.1145/3338906.3338939. Association for Computing Machinery, New York, pp 432–442
Baltes S, Dumani L, Treude C, Diehl S (2018) Sotorrent: reconstructing and analyzing the evolution of stack overflow posts. In: Zaidman A, Kamei Y, Hill E (eds) Proceedings of the 15th international conference on mining software repositories, MSR 2018, Gothenburg, Sweden, May 28-29, 2018. ACM, pp 319–330. https://doi.org/10.1145/3196398.3196430
Bandeira A, Medeiros CA, Paixao M, Maia PH (2019) We need to talk about microservices: An analysis from the discussions on stackoverflow. In: Proceedings of the 16th international conference on mining software repositories, MSR ’19. IEEE Press, pp 255–259. https://doi.org/10.1109/MSR.2019.00051
Bangash AA, Sahar H, Chowdhury S, Wong AW, Hindle A, Ali K (2019) What do developers know about machine learning: A study of ml discussions on stackoverflow. In: Proceedings of the 16th International conference on mining software repositories, MSR ’19. IEEE Press, pp 260–264. https://doi.org/10.1109/MSR.2019.00052
Barua A, Thomas SW, Hassan AE (2014) What are developers talking about? an analysis of topics and trends in stack overflow. Empir Softw Eng 19 (3):619–654. https://doi.org/10.1007/s10664-012-9231-y
Bavota G, De Lucia A, Di Penta M, Oliveto R, Palomba F (2015) An experimental investigation on the innate relationship between quality and refactoring. J Syst Softw 107:1–14
Bavota G, De Lucia A, Marcus A, Oliveto R (2014) Recommending refactoring operations in large software systems. In: Recommendation systems in software engineering. Springer, pp 387–419
Bird S (2002) Nltk: The natural language toolkit. ArXiv cs.CL/0205028
Blei DM, Ng AY, Jordan MI (2003) Latent dirichlet allocation. J Mach Learn Res 3(null):993–1022
Buse RP, Weimer WR (2009) Learning a metric for code readability. IEEE Trans Softw Eng 36(4):546–558
Cedrim D, Sousa L, Garcia A, Gheyi R (2016) Does refactoring improve software structural quality? a longitudinal study of 25 projects. In: Proceedings of the 30th Brazilian symposium on software engineering. ACM, pp 73–82
Chávez A., Ferreira I, Fernandes E, Cedrim D, Garcia A (2017) How does refactoring affect internal quality attributes? a multi-project study. In: Proceedings of the 31st Brazilian symposium on software engineering, SBES’17. https://doi.org/10.1145/3131151.3131171. Association for Computing Machinery, New York, pp 74–83
Choi E, Yoshida N, Kula RG, Inoue K (2015) What do practitioners ask about code clone? a preliminary investigation of stack overflow. In: 2015 IEEE 9th international workshop on software clones (IWSC), pp. 49–50
Dorn J (2012) A general software readability model. MCS Thesis available from (http://www.cs.virginia.edu/weimer/students/dorn-mcs-paper.pdf) 5, 11–14
Du Bois B, Demeyer S, Verelst J (2004) Refactoring - improving coupling and cohesion of existing code. In: 11th working conference on reverse engineering, pp 144–151. https://doi.org/10.1109/WCRE.2004.33
Eilertsen AM, Murphy GC (2021) The usability (or not) of refactoring tools. In: 2021 IEEE international conference on software analysis, evolution and reengineering (SANER), pp 237–248. https://doi.org/10.1109/SANER50967.2021.00030
Fakhoury S, Roy D, Hassan SA, Arnaoudova V (2019) Improving source code readability: theory and practice. In: Proceedings of the 27th international conference on program comprehension.IEEE Press, pp 2–12
Fontana FA, Braione P, Zanoni M (2012) Automatic detection of bad smells in code: An experimental assessment. J Object Technol 11(2):5–1
Fontana FA, Mangiacavalli M, Pochiero D, Zanoni M (2015) On experimenting refactoring tools to remove code smells. In: Scientific workshop proceedings of the XP2015, XP ’15 workshops. Association for Computing Machinery, New York. https://doi.org/10.1145/2764979.2764986
Fowler M (2018) Refactoring: improving the design of existing code, Addison-Wesley Professional, Reading
Johnston B, Jones A, Kruger C (2019) Applied Unsupervised Learning with Python: Discover hidden patterns and relationships in unstructured data with Python. Packt Publishing
Jones A (2018) Probability, statistics and other frightening stuff. Working Guides to Estimating & Forecasting Taylor & Francis
Jurafsky D, Martin J (2009) Speech and language processing: An introduction to natural language processing, computational linguistics, and speech recognition. Prentice Hall series in artificial intelligence. Pearson Prentice Hall, Prentince
Kamiya T, Kusumoto S, Inoue K (2002) Ccfinder: a multilinguistic token-based code clone detection system for large scale source code. IEEE Trans Softw Eng 28(7):654–670
Kim M, Gee M, Loh A, Rachatasumrit N (2010) Ref-finder: a refactoring reconstruction tool based on logic query templates. In: Proceedings of the eighteenth ACM SIGSOFT international symposium on Foundations of software engineering. ACM, pp 371–372
Kim M, Zimmermann T, Nagappan N (2014) An empirical study of refactoring challenges and benefits at microsoft. IEEE Trans Softw Eng 40(7):633–649. https://doi.org/10.1109/TSE.2014.2318734
Lambiase S, Cupito A, Pecorelli F, De Lucia A, Palomba F (2020) Just-in-time test smell detection and refactoring: The darts project. In: Proceedings of the 28th international conference on program comprehension, ICPC ’20. https://doi.org/10.1145/3387904.3389296. Association for Computing Machinery, New York, pp 441–445
Lane H, Hapke H, Howard C (2019) Natural language processing in action, understanding, analyzing, and generating text with python. Manning Publications Company
Liu H, Liu Q, Liu Y, Wang Z (2015) Identifying renaming opportunities by expanding conducted rename refactorings. IEEE Trans Softw Eng 41 (9):887–900
Mazinanian D, Tsantalis N, Stein R, Valenta Z (2016) Jdeodorant: clone refactoring. In: Proceedings of the 38th international conference on software engineering companion. pp 613–616
Mens T, Tourwe T (2004) A survey of software refactoring. IEEE Trans Softw Eng 30(2):126–139. https://doi.org/10.1109/TSE.2004.1265817
Mkaouer W, Kessentini M, Shaout A, Koligheu P, Bechikh S, Deb K, Ouni A (2015) Many-objective software remodularization using nsga-iii. ACM Trans Softw Eng Methodol (TOSEM) 24(3):1–45
Moghadam IH, Cinnéide MÓ, Zarepour F, Jahanmir MA (2021) Refdetect: A multi-language refactoring detection tool based on string alignment. IEEE Access
Moser R, Sillitti A, Abrahamsson P, Succi G (2006) Does refactoring improve reusability?. In: International conference on software reuse. Springer, pp 287–297
Murphy-Hill E, Parnin C, Black AP (2012) How we refactor, and how we know it. IEEE Trans Softw Eng 38(1):5–18. https://doi.org/10.1109/TSE.2011.41
Openja M, Adams B, Khomh F (2020) Analysis of modern release engineering topics : – a large-scale study using stackoverflow –. In: 2020 IEEE international conference on software maintenance and evolution (ICSME). pp 104–114. https://doi.org/10.1109/ICSME46990.2020.00020
Ouni A, Kessentini M, Sahraoui H, Inoue K, Deb K (2016) Multi-criteria code refactoring using search-based software engineering: An industrial case study. ACM Trans Softw Eng Methodol (TOSEM) 25(3):23
Palomba F, Bavota G, Di Penta M, Oliveto R, De Lucia A, Poshyvanyk D (2013) Detecting bad smells in source code using change history information. In: 2013 28th IEEE/ACM international conference on automated software engineering (ASE). IEEE, pp 268–278
Pantiuchina J, Lanza M, Bavota G (2018) Improving code: The (mis) perception of quality metrics. In: 2018 IEEE international conference on software maintenance and evolution (ICSME). IEEE, pp 80–91
Pantiuchina J, Zampetti F, Scalabrino S, Piantadosi V, Oliveto R, Bavota G, Penta MD (2020) Why developers refactor source code: A mining-based study. ACM Trans Softw Eng Methodol 29(4). https://doi.org/10.1145/3408302
Peruma A (2019) A preliminary study of android refactorings. In: 2019 IEEE/ACM 6th international conference on mobile software engineering and systems (MOBILESoft). pp 148–149
Peruma A, Arnaoudova V, Newman C (2021) Ideal: An open-source identifier name appraisal tool. In: 2021 IEEE international conference on software maintenance and evolution (ICSME), ICSME ’21
Peruma A, Hu E, Chen J, AlOmar E, Mkaouer M, Newman C (2021) Using grammar patterns to interpret test method name evolution. In: 2021 2021 IEEE/ACM 29th international conference on program comprehension (ICPC) (ICPC). https://doi.org/10.1109/ICPC52881.2021.00039. IEEE Computer Society, Los Alamitos, pp 335–346
Peruma A, Mkaouer M, Decker MJ, Newman C (2018) An empirical investigation of how and why developers rename identifiers. In: Proceedings of the 2nd international workshop on refactoring, iWoR 2018. https://doi.org/10.1145/3242163.3242169. Association for Computing Machinery, New York, pp 26–33
Peruma A, Mkaouer M, Decker MJ, Newman C (2020) Contextualizing rename decisions using refactorings, commit messages, and data types, vol 169. https://doi.org/10.1016/j.jss.2020.110704. http://www.sciencedirect.com/science/article/pii/S0164121220301503
Pinto GH, Kamei F (2013) What programmers say about refactoring tools? an empirical investigation of stack overflow. In: Proceedings of the 2013 ACM workshop on workshop on refactoring tools, WRT ’13. https://doi.org/10.1145/2541348.2541357. Association for Computing Machinery, New York, pp 33–36
Posnett D, Hindle A, Devanbu P (2011) A simpler model of software readability. In: Proceedings of the 8th working conference on mining software repositories. pp 73–82
Roberts D, Brant J, Johnson R (1997) A refactoring tool for smalltalk. Theory Pract Object Syst 3(4):253–263
Röder M., Both A, Hinneburg A (2015) Exploring the space of topic coherence measures. In: Proceedings of the eighth ACM international conference on web search and data mining, WSDM ’15. https://doi.org/10.1145/2684822.2685324. ACM, New York, pp 399–408
Rosen C, Shihab E (2016) What are mobile developers asking about? a large scale study using stack overflow. Empir Softw Eng 21(3):1192–1223. https://doi.org/10.1007/s10664-015-9379-3
Roy CK, Cordy JR, Koschke R (2009) Comparison and evaluation of code clone detection techniques and tools: A qualitative approach. Sci Comput Program 74(7):470–495
Samarthyam G, Suryanarayana G, Sharma T (2016) Refactoring for software architecture smells. In: Proceedings of the 1st international workshop on software refactoring, IWoR 2016. https://doi.org/10.1145/2975945.2975946. Association for Computing Machinery, New York, pp 1–4
Scalabrino S, Linares-Vásquez M., Oliveto R, Poshyvanyk D (2018) A comprehensive model for code readability. J Softw Evol Process 30(6):e1958
Sievert C, Shirley K (2014) Ldavis: A method for visualizing and interpreting topics. In: Proceedings of the workshop on interactive language learning, visualization, and interfaces. pp 63–70
Silva D, Silva J, Santos GJDS, Terra R, Valente MTO (2020) Refdiff 2.0: A multi-language refactoring detection tool. IEEE Trans on Softw Eng
Silva D, Tsantalis N, Valente MT (2016) Why we refactor? confessions of github contributors. In: Proceedings of the 2016 24th ACM SIGSOFT international symposium on foundations of software engineering, FSE 2016. https://doi.org/10.1145/2950290.2950305. Association for Computing Machinery, New York, pp 858–870
Spectrum I (2020) The top programming languages 2016. https://spectrum.ieee.org/static/interactive-the-top-programming-languages-2016. (Accessed on 06/05/2020)
Spectrum I (2020) The top programming languages 2017. https://spectrum.ieee.org/computing/software/the-2017-top-programming-languages. (Accessed on 06/05/2020)
Spectrum I (2020) The top programming languages 2018. https://spectrum.ieee.org/static/interactive-the-top-programming-languages-2018. (Accessed on 06/05/2020)
Spectrum I (2020) The top programming languages 2019. https://spectrum.ieee.org/static/interactive-the-top-programming-languages-2019. (Accessed on 06/05/2020)
Taeger D, Kuhnt S (2014) Statistical Hypothesis Testing with SAS and R Wiley
Tahir A, Dietrich J, Counsell S, Licorish S, Yamashita A (2020) A large scale study on how developers discuss code smells and anti-pattern in stack exchange sites. Inform Softw Technol :106333
Tahir A, Yamashita A, Licorish S, Dietrich J, Counsell S (2018) Can you tell me if it smells? a study on how developers discuss code smells and anti-patterns in stack overflow. In: Proceedings of the 22nd international conference on evaluation and assessment in software engineering 2018, EASE’18. https://doi.org/10.1145/3210459.3210466. Association for Computing Machinery, New York, pp 68–78
Tang Y, Khatchadourian R, Bagherzadeh M, Ahmed S (2018) Towards safe refactoring for intelligent parallelization of java 8 streams. In: Proceedings of the 40th international conference on software engineering: companion proceeedings, ICSE ’18. https://doi.org/10.1145/3183440.3195098. Association for Computing Machinery, New York, pp 206–207
Tashakkori A, Teddlie C, Teddlie C (1998) Mixed methodology: combining qualitative and quantitative approaches. Applied Social Research Methods. SAGE Publications, Philadelphia
Tian F, Liang P, Babar MA (2019) How developers discuss architecture smells? an exploratory study on stack overflow. In: 2019 IEEE international conference on software architecture (ICSA). pp 91–100
Tsantalis N, Mansouri M, Eshkevari LM, Mazinanian D, Dig D (2018) Accurate and efficient refactoring detection in commit history. In: Proceedings of the 40th international conference on software engineering. ACM
Villanes IK, Ascate SM, Gomes J, Dias-Neto AC (2017) What are software engineers asking about android testing on stack overflow?. In: Proceedings of the 31st Brazilian symposium on software engineering, SBES’17. https://doi.org/10.1145/3131151.3131157. Association for Computing Machinery, New York, pp 104–113
Wang S, Lo D, Jiang L (2013) An empirical study on developer interactions in stackoverflow. In: Proceedings of the 28th annual ACM symposium on applied computing, SAC ’13. https://doi.org/10.1145/2480362.2480557. Association for Computing Machinery, New York, pp 1019–1024
Wilking D, Kahn UF, Kowalewski S (2007) An empirical evaluation of refactoring. e-Informatica 1(1):27–42
Wohlin C, Runeson P, Höst M, Ohlsson MC, Regnell B, Wesslén A (2012) Experimentation in software engineering. Springer Science & Business Media, Berlin
Yang XL, Lo D, Xia X, Wan ZY, Sun JL (2016) What security questions do developers ask? a large-scale study of stack overflow posts. J Comput Sci Technol 31(5):910–924. https://doi.org/10.1007/s11390-016-1672-0
Acknowledgments
We would like to thank the reviewers at ESE for their detailed and invaluable feedback.
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by: Shaowei Wang, Tse-Hsun (Peter) Chen, Sebastian Baltes, Ivano Malavolta, Christoph Treude, and Alexander Serebrenik
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This article belongs to the Topical Collection: Collective Knowledge in Software Engineering
Rights and permissions
About this article
Cite this article
Peruma, A., Simmons, S., AlOmar, E.A. et al. How do i refactor this? An empirical study on refactoring trends and topics in Stack Overflow. Empir Software Eng 27, 11 (2022). https://doi.org/10.1007/s10664-021-10045-x
Accepted:
Published:
DOI: https://doi.org/10.1007/s10664-021-10045-x