Abstract
In recent years, the increase in the amount of data generated in basic social practices and specifically in all fields of research has boosted the rise of new database models, many of which have been employed in the field of Molecular Biology. NoSQL graph databases have been used in many types of research with biological data, especially in cases where data integration is a determining factor. For the most part, they are used to represent relationships between data along two main lines: (i) to infer knowledge from existing relationships; (ii) to represent relationships from a previous data knowledge. In this work, a short history in a timeline of events introduces the mutual evolution of databases and Molecular Biology. We present how graph databases have been used in Molecular Biology research using High Throughput Sequencing data, and discuss their role and the open field of research in this area.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Angles, R., et al.: Benchmarking database systems for social network applications. In: First International Workshop on Graph Data Management Experiences and Systems, p. 15. ACM (2013)
Bachman, C.W.: Integrated data store. DPMA Q. 1(2), 10–30 (1965)
Bachman, C.W.: The origin of the integrated data store (IDS): the first direct-access dbms. IEEE Ann. History Comput. 31, 42–54 (2009)
Balaur, I., et al.: EpigeNet: a graph database of interdependencies between genetic and epigenetic events in colorectal cancer. J. Comput. Biol. 24, 969–980 (2017)
Berners-Lee, T., et al.: World-wide web: the information universe. Internet Res. 20(4), 461–471 (2010)
Bonnici, V., et al.: Comprehensive reconstruction and visualization of non-coding regulatory networks in human. Front. Bioeng. Biotechnol. 2, 69 (2014)
Bonnici, V., et al.: Arena-Idb: a platform to build human non-coding RNA interaction networks, pp. 1–13 (2018)
Codd, E.F.: A relational model of data for large shared data banks. Commun. ACM 13(6), 377–387 (1970)
Corbacho, J., et al.: Transcriptomic events involved in melon mature-fruit abscission comprise the sequential induction of cell-wall degrading genes coupled to a stimulation of endo and exocytosis. PloS ONE 8(3), e58363 (2013)
Corbellini, A., et al.: Persisting big-data: the NoSQL landscape. Inf. Syst. 63, 1–23 (2017)
Costa, R.L., et al.: GeNNet: an integrated platform for unifying scientific workflows and graph databases for transcriptome data analysis. PeerJ 5, e3509 (2017)
Crick, F.H., et al.: General nature of the genetic code for proteins. Nature 192(4809), 1227–1232 (1961)
Deen, S.M.: Fundamentals of Data Base Systems. Springer, Heidelberg (1977). https://doi.org/10.1007/978-1-349-15843-0
Fabregat, A., et al.: Reactome graph database: efficient access to complex pathway data. PLoS Comput. Biol. 14(1), 1–13 (2018)
Fry, J.P., Sibley, E.H.: Evolution of data-base management systems. ACM Comput. Surv. (CSUR) 8(1), 7–42 (1976)
Have, C.T., Jensen, L.J.: Are graph databases ready for bioinformatics? Bioinformatics 29(24), 3107 (2013)
Henkel, R., Wolkenhauer, O., Waltemath, D.: Combining computational models, semantic annotations and simulation experiments in a graph database. Database 2015 (2015)
Hutchison III, C.A.: Dna sequencing: bench to bedside and beyond. Nucl. Acids Res. 35(18), 6227–6237 (2007)
Lander, E.S.: Initial sequencing and analysis of the human genome. Nature 409(6822), 860–921 (2001)
Lysenko, A., et al.: Representing and querying disease networks using graph databases. BioData Min. 9, 23 (2016)
Martin, R.G., et al.: Ribonucleotide composition of the genetic code. Biochem. Biophys. Res. Commun. 6(6), 410–414 (1962)
McCallum, D., Smith, M.: Computer processing of dna sequence data. J. Mol. Biol. 116, 29–30 (1977)
Messaoudi, C., Mhand, M.A., Fissoune, R.: A performance study of NoSQL stores for biomedical data NoSQL databases: an overview, November 2017 (2018)
Messina, A., Pribadi, H., Stichbury, J., Bucci, M., Klarman, S., Urso, A.: BioGrakn: a knowledge graph-based semantic database for biomedical sciences. In: Barolli, L., Terzo, O. (eds.) CISIS 2017. AISC, vol. 611, pp. 299–309. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-61566-0_28
O’Neill, J.T.: MUMPS language standard, vol. 118. US Department of Commerce, National Bureau of Standards (1976)
Pareja-Tobes, P., et al.: Bio4j: a high-performance cloud-enabled graph-based data platform. bioRxiv (2015)
Robinson, I., Webber, J., Eifrem, E.: Graph Databases. O’Reilly Media Inc, Sebastopol (2013)
Sanger, F., Coulson, A.R.: A rapid method for determining sequences in DNA by primed synthesis with DNA polymerase. J. Mol. Biol. 94(3), 441IN19447–441IN20448 (1975)
Shreeve, J.: The Genome War: How Craig Venter Tried to Capture the Code of Life and Save the World. Random House Digital Inc., Manhattan (2005)
Silva, W.M.C.D., et al.: A terpenoid metabolic network modelled as graph database. Int. J. Data Min. Bioinform. 18(1), 74–90 (2017)
Srinivasa, S.: Data, storage and index models for graph databases. In: Sakr, S., Pardede, E. (eds.) Graph Data Management, pp. 47–70. IGI Global, Hershey (2011)
Stephens, Z.D., et al.: Big data: astronomical or genomical? PLoS Biol. 13(7), e1002195 (2015)
Summer, G., et al.: cyNeo4j: connecting neo4j and cytoscape. Bioinformatics 31(23), 3868–3869 (2015)
Summer, G., et al.: The network library: a framework to rapidly integrate network biology resources. Bioinformatics 32(17), i473–i478 (2016)
Swainston, N., et al.: biochem4j: Integrated and extensible biochemical knowledge through graph databases. PloS ONE 12(7), e0179130 (2017)
Szklarczyk, D., et al.: The string database in 2017: quality-controlled protein-protein association networks, made broadly accessible. Nucl. Acids Res. 45(D1), D362–D368 (2017)
Van Erven, G., Silva, W., Carvalho, R., Holanda, M.: GRAPHED: a graph description diagram for graph databases. In: Rocha, Á., Adeli, H., Reis, L.P., Costanzo, S. (eds.) WorldCIST’18 2018. AISC, vol. 745, pp. 1141–1151. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-77703-0_111
Venter, J.C., et al.: The sequence of the human genome. Science 291(5507), 1304–1351 (2001)
Watson, J.D., Crick, F.H.: A structure for deoxyribose nucleic acid. Nature 171(4356), 737–738 (1953)
Wilkinson, M.D., et al.: The FAIR guiding principles for scientific data management and stewardship. Sci. Data 3 (2016). https://doi.org/10.1038/sdata.2016.18
Wu, R., Taylor, E.: Nucleotide sequence analysis of DNA: II. Complete nucleotide sequence of the cohesive ends of bacteriophage \(\lambda \) DNA. J. Mol. Biol. 57(3), 491–511 (1971)
Acknowledgements
W. M. C. S. kindly thanks CAPES and IFG. M. E. M. T. W. thanks CNPq (Project 308524/2015-2).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
da Silva, W.M.C., Wercelens, P., Walter, M.E.M.T., Holanda, M., Brígido, M. (2018). Graph Databases in Molecular Biology. In: Alves, R. (eds) Advances in Bioinformatics and Computational Biology. BSB 2018. Lecture Notes in Computer Science(), vol 11228. Springer, Cham. https://doi.org/10.1007/978-3-030-01722-4_5
Download citation
DOI: https://doi.org/10.1007/978-3-030-01722-4_5
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-01721-7
Online ISBN: 978-3-030-01722-4
eBook Packages: Computer ScienceComputer Science (R0)