Abstract
The use of computers and complex software is pervasive in archaeology, yet their role in the analytical pipeline is rarely exposed for other researchers to inspect or reuse. This limits the progress of archaeology because researchers cannot easily reproduce each other’s work to verify or extend it. Four general principles of reproducible research that have emerged in other fields are presented. An archaeological case study is described that shows how each principle can be implemented using freely available software. The costs and benefits of implementing reproducible research are assessed. The primary benefit, of sharing data in particular, is increased impact via an increased number of citations. The primary cost is the additional time required to enhance reproducibility, although the exact amount is difficult to quantify.
Similar content being viewed by others
References
Abari, K. (2012). Reproducible research in speech sciences. International Journal of Computer Science Issues, 9(6), 43–52. Retrieved from http://www.ijcsi.org/papers/IJCSI-9-6-2-43-52.pdf
Arbuckle, B. S., Kansa, S. W., Kansa, E., Orton, D., Çakırlar, C., Gourichon, L., & Würtenberger, D. (2014). Data sharing reveals complexity in the westward spread of domestic animals across neolithic Turkey. PloS One, 9(6), e99845. doi:10.1371/journal.pone.0099845.
Baggerly, K. A., & Coombes, K. R. (2009). Deriving chemosensitivity from cell lines: forensic bioinformatics and reproducible research in high-throughput biology. The Annals of Applied Statistics, 1309–1334.
Barnes, N. (2010). Publish your computer code: it is good enough. Nature News, 467(7317), 753–753. doi:10.1038/467753a.
Bassi, S. (2007). A primer on Python for life science researchers. PLoS Computational Biology, 3(11). doi:10.1371/journal.pcbi.0030199
Baumer, B., & Udwin, D. (2015). R markdown. Wiley Interdisciplinary Reviews: Computational Statistics, 7(3), 167–177. doi:10.1002/wics.1348.
Baumer, B., Cetinkaya-Rundel, M., Bray, A., Loi, L., & Horton, N. J. (2014). R markdown: Integrating a reproducible analysis tool into introductory statistics. Technology Innovations in Statistics Education, 8(1). Retrieved from http://www.escholarship.org/uc/item/90b2f5xh
Beale, N. (2012). How community archaeology can make use of open data to achieve further its objectives. World Archaeology, 44(4), 612–633.
Begley, C. G., & Ioannidis, J. P. A. (2015). Reproducibility in science improving the standard for basic and preclinical research. Circulation Research, 116(1), 116–126. doi:10.1161/CIRCRESAHA.114.303819.
Bivand, R. S., Pebesma, E. J., Gomez-Rubio, V., & Pebesma, E. J. (2008). Applied spatial data analysis with R (Vol. 747248717). Springer.
Bocinsky, R. K. (2014). Extrinsic site defensibility and landscape-based archaeological inference: an example from the northwest coast. Journal of Anthropological Archaeology, 35, 164–176.
Bocinsky, R. K., & Kohler, T. A. (2014). A 2,000-year reconstruction of the rain-fed maize agricultural niche in the US southwest. Nature Communications, 5.
Boettiger, C. (2015). An introduction to docker for reproducible research. SIGOPS Operating System Review, 49(1), 71–79. doi:10.1145/2723872.2723882.
Boettiger, C., Hart, T., Chamberlain, S., & Ram, K. (2015). Building software, building community: lessons from the rOpenSci project. Journal of Open Research Software, 1(1), e8.
Bonhomme, V., Picq, S., Gaucherel, C., & Claude, J. (2014). Momocs: outline analysis using r. Journal of Statistical Software, 56(13), 1–24. doi:10.18637/jss.v056.i13.
Borck, L., Mills, B. J., Peeples, M. A., & Clark, J. J. (2015). Are social networks survival networks? An example from the late pre-Hispanic US southwest. Journal of Archaeological Method and Theory, 22(1), 33–57.
Borgman, C. L. (2012). The conundrum of sharing research data. Journal of the American Society for Information Science and Technology, 63(6), 1059–1078.
Buckheit, J. B., & Donoho, D. L. (1995). WaveLab and reproducible research. In A. Antoniadis & G. Oppenheim (Eds.), Wavelets and statistics (pp. 55–81). Springer New York. Retrieved from http://link.springer.com/chapter/10.1007/978-1-4612-2544-7_5
Buffalo, V. (2015). Bioinformatics data skills reproducible and robust research with open source tools. CA: O’Reilly Media.
Chambers, J. M. (2009). Software for data analysis: Programming with R (1st ed. 2008. Corr. 2nd printing 2009 edition.). New York: Springer.
Clarkson, C., Smith, M., Marwick, B., Fullagar, R., Wallis, L. A., Faulkner, P., & Florin, S. A. (2015). The archaeology, chronology and stratigraphy of Madjedbebe (Malakunanja II): a site in northern australia with early occupation. Journal of Human Evolution, 83, 46–64. doi:10.1016/j.jhevol.2015.03.014.
Contreras, D. A., & Meadows, J. (2014). Summed radiocarbon calibrations as a population proxy: a critical evaluation using a realistic simulation approach. Journal of Archaeological Science, 52, 591–608.
Crema, E., Edinborough, K., Kerig, T., & Shennan, S. (2014). An approximate bayesian computation approach for inferring patterns of cultural evolutionary change. Journal of Archaeological Science, 50, 160–170.
Dafoe, A. (2014). Science deserves better: the imperative to share complete replication files. PS: Political Science & Politics, 47(01), 60–66. doi:10.1017/S104909651300173X.
Delescluse, M., Franconville, R., Joucla, S., Lieury, T., & Pouzat, C. (2012). Making neurophysiological data analysis reproducible: why and how? Journal of Physiology-Paris, 106(3–4), 159–170. doi:10.1016/j.jphysparis.2011.09.011.
Donoho, D. L., Maleki, A., Rahman, I. U., Shahram, M., & Stodden, V. (2009). Reproducible research in computational harmonic analysis. Computing in Science & Engineering, 11(1), 8–18. doi:10.1109/MCSE.2009.15.
Dorch, S. (2012). On the citation advantage of linking to data: Astrophysics. Retrieved from https://halshs.archives-ouvertes.fr/hprints-00714715/
Drake, W. H. A. H., Lee, B., & Wills, W. (2014). Strontium isotopes and the reconstruction of the Chaco regional system: evaluating uncertainty with bayesian mixing models. PloS One, 9(5), e95580. doi:10.1371/journal.pone.0095580.
Dudley, J. T., & Butte, A. J. (2010). Reproducible in silico research in the era of cloud computing. Nature Biotechnology, 28(11), 1181–1185. doi:10.1038/nbt1110-1181.
Dye, T. S. (2011). A model-based age estimate for polynesian colonization of Hawai’i. Archaeology in Oceania, 46(3), 130–138.
Eglen, S. J. (2009). A quick guide to teaching R programming to computational biology students. PLoS Computational Biology, 5(8), e1000482. doi:10.1371/journal.pcbi.1000482.
Faris, J., Kolker, E., Szalay, A., Bradlow, L., Deelman, E., Feng, W., & Kolker, E. (2011). Communication and data-intensive science in the beginning of the 21st century. Omics: A Journal of Integrative Biology, 15(4), 213–215.
Gandrud, C. (2013a). Github: a tool for social data development and verification in the cloud. The Political Methodologist, 20(2), 7–16.
Gandrud, C. (2013b). Reproducible research with R and RStudio. CRC Press.
Gentleman, R., & Temple Lang, D. (2007). Statistical analyses and reproducible research. Journal of Computational and Graphical Statistics, 16(1), 1–23. doi:10.1198/106186007X178663.
Glatard, T., Lewis, L. B., Ferreira da Silva, R., Adalat, R., Beck, N., Lepage, C., & Evans, A. C. (2015). Reproducibility of neuroimaging analyses across operating systems. Frontiers in Neuroinformatics, 9, 12. doi:10.3389/fninf.2015.00012.
Gleditsch, N. P., & Strand, H. (2003). Posting your data: will you be scooped or will you be famous? International Studies Perspectives, 4(1), 72–107. doi:10.1111/1528-3577.04105.
Guedes, J. d., Jin, G., & Bocinsky, R. K. (2015). The impact of climate on the spread of rice to north-eastern China: a new look at the data from shandong province. PloS One, 10(6), e0130430.
Haddock, S. H. D., & Dunn, C. W. (2011). Practical computing for biologists. MA: Sinauer Associates Sunderland.
Hatton, L., & Roberts, A. (1994). How accurate is scientific software? IEEE Transactions on Software Engineering, 20(10), 785–797. doi:10.1109/32.328993.
Healy, K. (2011). Choosing your workflow applications. The Political Methodologist, 18(2), 9–18.
Henley, M., & Kemp, R. (2008). Open source software: an introduction. Computer Law & Security Review, 24(1), 77–85.
Henneken, E. A., & Accomazzi, A. (2011). Linking to data - effect on citation rates in astronomy. CoRR, abs/1111.3618. Retrieved from http://arxiv.org/abs/1111.3618
Herndon, T., Ash, M., & Pollin, R. (2014). Does high public debt consistently stifle economic growth? A critique of Reinhart and Rogoff. Cambridge Journal of Economics, 38(2), 257–279. doi:10.1093/cje/bet075.
Hoffa, C., Mehta, G., Freeman, T., Deelman, E., Keahey, K., Berriman, B., & Good, J. (2008). On the use of cloud computing for scientific workflows. In IEEE fourth international conference on eScience, 2008. eScience’08 (pp. 640–645). doi:10.1109/eScience.2008.167
Howe, B. (2012). Virtual appliances, cloud computing, and reproducible research. Computing in Science & Engineering, 14(4), 36–41. doi:10.1109/MCSE.2012.62.
Ince, D. C., Hatton, L., & Graham-Cumming, J. (2012). The case for open computer programs. Nature, 482(7386), 485–488. doi:10.1038/nature10836.
Janssen, M. A., Alessa, L. N., Barton, M., Bergin, S., & Lee, A. (2008). Towards a community framework for agent-based modelling. Journal of Artificial Societies and Social Simulation, 11(2), 6.
Jones, Z. M. (2013). Git/GitHub, transparency, and legitimacy in quantitative research. The Political Methodologist, 21(1), 6–7. Retrieved from http://zmjones.com/static/papers/git.pdf
Joppa, L. N., McInerny, G., Harper, R., Salido, L., Takeda, K., O’Hara, K., & Emmott, S. (2013). Troubling trends in scientific software use. Science, 340(6134), 814–815.
Kahle, D., & Wickham, H. (2013). Ggmap: spatial visualization with ggplot2. The R Journal, 5(1), 144–161.
Kansa, E. (2012). Openness and archaeology’s information ecosystem. World Archaeology, 44(4), 498–520.
Kansa, E. C., Kansa, S. W., & Watrall, E. (2011). Archaeology 2.0: new approaches to communication and collaboration. Cotsen Digital Archaeology Series.
Keeling, K. B., & Pavur, R. J. (2007). A comparative study of the reliability of nine statistical software packages. Computational Statistics & Data Analysis, 51(8), 3811–3831. doi:10.1016/j.csda.2006.02.013.
King, G. (1995). Replication, replication. PS: Political Science & Politics, 28(03), 444–452.
Kintigh, K. (2006). The promise and challenge of archaeological data integration. American Antiquity, 71(3), 567–578. doi:10.2307/40035365.
Kintigh, K. W., Altschul, J. H., Beaudry, M. C., Drennan, R. D., Kinzig, A. P., & Kohler, T. A. (2014). Grand challenges for archaeology. Proceedings of the National Academy of Sciences, 111(3), 879–880.
Knuth, D. E. (1984). Literate programming. The Computer Journal, 27(2), 97–111.
Laine, C., Goodman, S. N., Griswold, M. E., & Sox, H. C. (2007). Reproducible research: moving toward research the public can really trust. Annals of Internal Medicine, 146(6), 450–453. doi:10.7326/0003-4819-146-6-200703200-00154.
Lang, S. (1993). Questions of scientific responsibility: the Baltimore case. Ethics & Behavior, 3(1), 3–72. doi:10.1207/s15327019eb0301_1.
Leisch, F., Eugster, M., & Hothorn, T. (2011). Executable papers for the R community: the R2 platform for reproducible research. Procedia Computer Science, 4, 618–626.
Loeliger, J., & McCullough, M. (2012). Version control with Git: Powerful tools and techniques for collaborative software development. “O’Reilly Media, Inc.”
Lowe, K. M., Wallis, L. A., Pardoe, C., Marwick, B., Clarkson, C., Manne, T., & Fullagar, R. (2014). Ground-penetrating radar and burial practices in western Arnhem Land, Australia. Archaeology in Oceania, 49(3), 148–157.
Mackay, A., Sumner, A., Jacobs, Z., Marwick, B., Bluff, K., & Shaw, M. (2014). Putslaagte 1 (PL1), the Doring river, and the later middle stone age in southern Africa’s winter rainfall zone. Quaternary International, 350, 43–58.
Mair, P., Hofmann, E., Gruber, K., Hatzinger, R., Zeileis, A., & Hornik, K. (2015). Motivation, values, and work design as drivers of participation in the r open source project for statistical computing. Proceedings of the National Academy of Sciences, 112(48), 14788–14792. doi:10.1073/pnas.1506047112.
Markowetz, F. (2015). Five selfish reasons to work reproducibly. Genome Biology, 16.
Marwick, B. (2013). Multiple optima in Hoabinhian flaked stone artefact palaeoeconomics and palaeoecology at two archaeological sites in northwest Thailand. Journal of Anthropological Archaeology, 32(4), 553–564.
Marwick, B. (2015). Code and data repository for a report on the 1989 excavations at Madjebebe, northern territory, Australia. Retrieved May 30, 2015, from http://dx.doi.org/10.6084/m9.figshare.1297059
McCullough, B. (2007). Got replicability? The _journal of money, credit and banking_ archive. Econ Journal Watch, 4(3), 326–337.
McCullough, B., & Vinod, H. D. (2003). Verifying the solution from a nonlinear solver: a case study. American Economic Review, 93(3), 873–892. doi:10.1257/000282803322157133.
McCullough, B., McGeary, K. A., & Harrison, T. D. (2006). Lessons from the JMCB archive. Journal of Money, Credit, and Banking, 38(4), 1093–1107. doi:10.1353/mcb.2006.0061.
McCullough, B., McGeary, K. A., & Harrison, T. D. (2008). Do economics journal archives promote replicable research? The Canadian Journal of Economics / Revue Canadienne d’Economique, 41(4), 1406–1420.
Miguel, E., Camerer, C., Casey, K., Cohen, J., Esterling, K. M., Gerber, A., & Van der Laan, M. (2014). Promoting transparency in social science research. Science (New York, N.Y.), 343(6166), 30–31. doi:10.1126/science.1245317.
Miller, G. (2006). A scientist’s nightmare: software problem leads to five retractions. Science, 314(5807), 1856–1857. doi:10.1126/science.314.5807.1856.
Morandat, F., Hill, B., Osvald, L., & Vitek, J. (2012). Evaluating the design of the R language. In ECOOP 2012–Object-oriented programming (pp. 104–131). Springer.
Morin, A., Urban, J., & Sliz, P. (2012). A quick guide to software licensing for the scientist-programmer. PLoS Computational Biology, 8(7), e1002598.
Narasimhan, B., et al. (2005). Lisp-stat to Java to R. Journal of Statistical Software, 13(4), 1–10.
Noble, W. S. (2009). A quick guide to organizing computational biology projects. PLoS Computational Biology, 5(7), e1000424. doi:10.1371/journal.pcbi.1000424.
Nosek, B., Alter, G., Banks, G., Borsboom, D., Bowman, S., & Breckler, S. (2015). Promoting an open research culture: author guidelines for journals could help to promote transparency, openness, and reproducibility. Science (New York, NY), 348(6242), 1422.
Nowakowski, P., Ciepiela, E., Harżlak, D., Kocot, J., Kasztelnik, M., Bartyński, T., & Malawski, M. (2011). The collage authoring environment. Procedia Computer Science, 4, 608–617.
Peeples, M. A., & Schachner, G. (2012). Refining correspondence analysis-based ceramic seriation of regional data sets. Journal of Archaeological Science, 39(8), 2818–2827.
Peng, R. D. (2009). Reproducible research and biostatistics. Biostatistics, 10(3), 405–408. doi:10.1093/biostatistics/kxp014.
Peng, R. D. (2011). Reproducible research in computational science. Science (New York, Ny), 334(6060), 1226.
Perkel, J. M. (2015). Programming: pick up Python. Nature, 518(7537), 125–126. doi:10.1038/518125a.
Pienta, A. M., Alter, G. C., & Lyle, J. A. (2010). The enduring value of social science research: the use and reuse of primary research data.
Piwowar, H. A., & Vision, T. J. (2013). Data reuse and the open data citation advantage. Peer Journal, 1, e175.
Piwowar, H. A., Day, R. S., & Fridsma, D. B. (2007). Sharing detailed research data is associated with increased citation rate. PloS One, 2(3), e308. doi:10.1371/journal.pone.0000308.
Plummer, M., & others. (2003). JAGS: A program for analysis of Bayesian graphical models using gibbs sampling. In Proceedings of the 3rd international workshop on distributed statistical computing (Vol. 124, p. 125). Technische Universit at Wien.
Ram, K. (2013). Git can facilitate greater reproducibility and increased transparency in science. Source Code for Biology and Medicine, 8(1), 7.
Reich, V. (2008). CLOCKSS—it takes a community. The Serials Librarian, 54(1-2), 135–139.
Reshef, D. N., Reshef, Y. A., Finucane, H. K., Grossman, S. R., McVean, G., Turnbaugh, P. J., & Sabeti, P. C. (2011). Detecting novel associations in large data sets. Science, 334(6062), 1518–1524. doi:10.1126/science.1205438.
Richards, J. D. (1997). Preservation and re-use of digital data: the role of the archaeology data service. Antiquity, 71(274), 1057–1057.
Rieth, C. (2013). Report from the SAA board of directors. SAA Archaeological Record, May, 42–44.
Sandve, G. K., Nekrutenko, A., Taylor, J., & Hovig, E. (2013). Ten simple rules for reproducible computational research. PLoS Computational Biology, 9(10), e1003285. doi:10.1371/journal.pcbi.1003285.
Sarkar, D. (2008). Lattice: Multivariate data visualization with R. Springer Science & Business Media.
Schulte, E., Davison, D., Dye, T., & Dominik, C. (2012). A multi-language computing environment for literate programming and reproducible research. Journal of Statistical Software, 46(3), 1–24. Retrieved from http://yuyang0.github.io/static/doc/babel.pdf
Schwab, M., Karrenbach, M., & Claerbout, J. (2000). Making scientific computations reproducible. Computing in Science & Engineering, 2(6), 61–67.
Scopatz, A., & Huff, K. D. (2015). Effective computation in physics: Field guide to research with python. CA: O’Reilly Media.
Sears, J. (2011). Data sharing effect on article citation rate in paleoceanography. In AGU fall meeting abstracts (Vol. 1, p. 1628).
Sharpe, D. (2013). Why the resistance to statistical innovations? Bridging the communication gap. Psychological Methods, 18(4), 572. Retrieved from http://psycnet.apa.org/journals/met/18/4/572/
Shennan, S. J., Crema, E. R., & Kerig, T. (2015). Isolation-by-distance, homophily, and’core’ vs’.package’ cultural evolution models in neolithic europe. Evolution and Human Behavior, 36(2), 103–109.
Stanisic, L., Legrand, A., & Danjean, V. (2015). An effective git and org-mode based workflow for reproducible research. ACM SIGOPS Operating Systems Review, 49(1), 61–70. Retrieved from http://dl.acm.org/citation.cfm?id = 2723881
Stodden, V. (2009). The legal framework for reproducible scientific research: licensing and copyright. Computing in Science & Engineering, 11(1), 35–40. doi:10.1109/MCSE.2009.19.
Stodden, V., & Miguez, S. (2014). Best practices for computational science: software infrastructure and environments for reproducible and extensible research. Journal of Open Research Software, 2(1), e21. doi:10.5334/jors.ay.
Stodden, V., Guo, P., & Ma, Z. (2013). Toward reproducible computational research: an empirical analysis of data and code policy adoption by journals. PloS One, 8(6), e67111. doi:10.1371/journal.pone.0067111.
Teal, T. K., Cranston, K. A., Lapp, H., White, E., Wilson, G., Ram, K., & Pawlik, A. (2015). Data carpentry: workshops to increase data literacy for researchers. International Journal of Digital Curation, 10(1), 135–143.
Tenopir, C., Allard, S., Douglass, K., Aydinoglu, A. U., Wu, L., Read, E., & Frame, M. (2011). Data sharing by scientists: practices and perceptions. PloS One, 6(6), e21101. doi:10.1371/journal.pone.0021101.
Thompson, P. A., & Burnett, A. (2012). Reproducible research. CORE Issues in Professional and Research Ethics, 1(6).
Tippmann, S. (2014). Programming tools: adventures with R. Nature, 517(7532), 109–110. doi:10.1038/517109a.
Vandewalle, P. (2012). Code sharing is associated with research impact in image processing. Computing in Science and Engineering, 14(4), 42–47.
VanPool, T. L., & Leonard, R. D. (2010). Quantitative analysis in archaeology (1 edition.). Chichester, West Sussex, U.K. Malden, MA: Wiley-Blackwell.
Vihinen, M. (2015). No more hidden solutions in bioinformatics. Nature, 521(7552), 261–261. doi:10.1038/521261a.
Wicherts, J. M., Bakker, M., & Molenaar, D. (2011). Willingness to share research data is related to the strength of the evidence and the quality of reporting of statistical results. PloS One, 6(11), e26828. doi:10.1371/journal.pone.0026828.
Wickham, H. (2009). Ggplot2: Elegant graphics for data analysis. Springer Science & Business Media.
Wickham, H. (2014). Advanced R (1 edition.). Boca Raton, FL: Chapman; Hall/CRC.
Wickham, H. (2015). R packages (1 edition.). Sebastopol, Calif.: O’Reilly Media.
Widemann, B. T. y, Bolz, C. F., & Grelck, C. (2013). The functional programming language r and the paradigm of dynamic scientific programming. In H.-W. Loidl & R. Peña (Eds.), Trends in functional programming (Vol. 7829, pp. 182–197). Springer Berlin Heidelberg. doi:10.1007/978-3-642-40447-4_12
Wilson, G. (2014). Software carpentry: Lessons learned. F1000Research, Retrieved from http://f1000research.com/articles/3-62/v1.
Wilson, G., Aruliah, D. A., Brown, C. T., Chue Hong, N. P., Davis, M., Guy, R. T., & Wilson, P. (2014). Best practices for scientific computing. Plos Biology, 12(1), e1001745. doi:10.1371/journal.pbio.1001745.
Xie, Y. (2013). Dynamic documents with R and knitr. CRC Press.
Acknowledgments
Thanks to Chris Clarkson, Mike Smith, Richard Fullagar, Lynley A. Wallis, Patrick Faulkner, Tiina Manne, Elspeth Hayes, Richard G. Roberts, Zenobia Jacobs, Xavier Carah, Kelsey M. Lowe, and Jacqueline Matthews for their cooperation with the JHE paper. Thanks to the Mirarr Senior Traditional Owners, and to our research partners, the Gundjeimhi Aboriginal Corporation, for granting permission to carry out the research that was published in the JHE paper, and led to this paper. Thanks to Kyle Bocinsky and Oliver Nakoinz for their helpful peer reviews and many constructive suggestions. This research was carried out as part of ARC Discovery Project DP110102864. This work was supported in part by the University of Washington eScience Institute, and especially benefited from the expertise of the Reproducibility and Open Science working group. An earlier version was presented at an International Neuroinformatics Coordinating Facility (INCF) meeting in December 2014 organised by Stephen Eglen, and benefited from discussion during that meeting. I am a contributor to the Software and Data Carpentry projects and the rOpenSci collective; beyond this, I declare that I have no conflict of interest.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Marwick, B. Computational Reproducibility in Archaeological Research: Basic Principles and a Case Study of Their Implementation. J Archaeol Method Theory 24, 424–450 (2017). https://doi.org/10.1007/s10816-015-9272-9
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10816-015-9272-9