If these data could talk
- PMID: 28872630
- PMCID: PMC5584398
- DOI: 10.1038/sdata.2017.114
If these data could talk
Abstract
In the last few decades, data-driven methods have come to dominate many fields of scientific inquiry. Open data and open-source software have enabled the rapid implementation of novel methods to manage and analyze the growing flood of data. However, it has become apparent that many scientific fields exhibit distressingly low rates of reproducibility. Although there are many dimensions to this issue, we believe that there is a lack of formalism used when describing end-to-end published results, from the data source to the analysis to the final published results. Even when authors do their best to make their research and data accessible, this lack of formalism reduces the clarity and efficiency of reporting, which contributes to issues of reproducibility. Data provenance aids both reproducibility through systematic and formal records of the relationships among data sources, processes, datasets, publications and researchers.
Conflict of interest statement
The authors declare no competing financial interests.
Figures
Similar articles
-
New ways of insulin delivery.Int J Clin Pract Suppl. 2011 Feb;(170):31-46. doi: 10.1111/j.1742-1241.2010.02577.x. Int J Clin Pract Suppl. 2011. PMID: 21323811 Review.
-
Rules to be adopted for publishing a scientific paper.Ann Ital Chir. 2016;87:1-3. Ann Ital Chir. 2016. PMID: 28474609
-
"Just Another Statistic".Oncologist. 1998;3(3):III-IV. Oncologist. 1998. PMID: 10388105
-
TCGA Expedition: A Data Acquisition and Management System for TCGA Data.PLoS One. 2016 Oct 27;11(10):e0165395. doi: 10.1371/journal.pone.0165395. eCollection 2016. PLoS One. 2016. PMID: 27788220 Free PMC article.
-
Open source EMR software: profiling, insights and hands-on analysis.Comput Methods Programs Biomed. 2014 Nov;117(2):360-82. doi: 10.1016/j.cmpb.2014.07.002. Epub 2014 Jul 17. Comput Methods Programs Biomed. 2014. PMID: 25070757 Review.
Cited by
-
A large-scale study on research code quality and execution.Sci Data. 2022 Feb 21;9(1):60. doi: 10.1038/s41597-022-01143-6. Sci Data. 2022. PMID: 35190569 Free PMC article.
-
Anti-clustering in the national SARS-CoV-2 daily infection counts.PeerJ. 2021 Aug 27;9:e11856. doi: 10.7717/peerj.11856. eCollection 2021. PeerJ. 2021. PMID: 34532156 Free PMC article.
-
The End-to-End Provenance Project.Patterns (N Y). 2020 May 8;1(2):100016. doi: 10.1016/j.patter.2020.100016. eCollection 2020 May 8. Patterns (N Y). 2020. PMID: 33205093 Free PMC article.
-
Low availability of code in ecology: A call for urgent action.PLoS Biol. 2020 Jul 28;18(7):e3000763. doi: 10.1371/journal.pbio.3000763. eCollection 2020 Jul. PLoS Biol. 2020. PMID: 32722681 Free PMC article.
-
From Data Silos to Standardized, Linked, and FAIR Data for Pharmacovigilance: Current Advances and Challenges with Observational Healthcare Data.Drug Saf. 2019 May;42(5):583-586. doi: 10.1007/s40264-018-00793-z. Drug Saf. 2019. PMID: 30666591 No abstract available.
References
-
- Baker M. & Dolgin E. Cancer reproducibility project releases first results. Nature 541, 269–270 (2017). - PubMed
-
- Leek J. T. & Jager L. R. Is most published research really false? Annu Rev Stat Appl 4, 109–122 (2017).
-
- Sarewitz D. The pressure to publish pushes down quality. Nature 533, 147–147 (2016). - PubMed
-
- Ellison A. M. et al. An analytic web to support the analysis and synthesis of ecological data. Ecology 87, 1345–1358 (2006). - PubMed
LinkOut - more resources
Full Text Sources
Other Literature Sources