Abstract
Transforming disparate and heterogeneous data sources that provide large volumes of data in high velocity into a common form allows integrated and enriched views on data and thus provides further opportunities to advance the effectiveness and accuracy of data analysis and prediction tasks. This paper presents the RDF-Gen approach for transforming data provided by archival and streaming data sources, provided in various formats, into RDF triples, according to a set of ontological specifications. RDF-Gen introduces a generic mechanism which supports the transformation of data efficiently (i.e., with high throughput and low latency), even in cases where the velocity of data presents high peaks, offering facilities for discovering associations between data from different sources, and supporting transformation of modular data sets. This paper presents a parallel implementation of RDF-Gen, also presenting data transformation workflows that allow variations incorporating RDF-Gen instances, adjusting to the needs of data sources, application areas and performance requirements. RDF-Gen is experimentally evaluated against state of the art, in both archival and streaming settings: Experimental results show RDF-Gen efficiency and highlight key contributions.
Similar content being viewed by others
Notes
GeoJSON Specification is available online at https://tools.ietf.org/html/rfc7946.
RDF/XML Specification is available online at https://www.w3.org/TR/rdf-syntax-grammar/.
The predefined terms for the configuration file are in the namespace http://www.datacron-project.eu/RDFGen_conf#.
References
Brecher C, Özdemir D, Feng J, Herfs W, Fayzullin K, Hamadou M, Müller A (2010) Integration of software tools with heterogeneous data structures in production plant lifecycles. IFAC Proc Vol 43(4):48–53
Chortaras A, Stamou G (2018) D2RML: integrating heterogeneous data and web services into custom RDF graphs. In: Workshop on linked data, LDOW@WWW 2018
Dell’Aglio D, Valle ED, van Harmelen F, Bernstein A (2017) Stream reasoning: a survey and outlook: a summary of ten years of research and a vision for the next decade. Data Sci. J. 1:59–83
Dimou A, Sande MV, Colpaert P, Verborgh R, Mannens E, de Walle RV (2014) RML: a generic language for integrated RDF mappings of heterogeneous data. In: Proceedings of the 7th workshop on linked data on the web
Dong XL, Srivastava D (2015) Big data integration. Synthesis lectures on data management. Morgan & Claypool Publishers. https://doi.org/10.2200/S00578ED1V01Y201404DTM040
Efthymiou K, Sipsas K, Mourtzis D, Chryssolouris G (2013) On an integrated knowledge based framework for manufacturing systems early design phase. Procedia CIRP 9:121–126
ESRI (1998) Esri shapefile technical description. Technical report. Tech. rep., Environmental Systems Research Institute, Inc., 380 New York Street, Redlands, CA 92373–8100 USA, http://www.esri.com/library/whitepapers/pdfs/shapefile.pdf
Haesendonck G, Maroy W, Heyvaert P, Verborgh R, Dimou A (2019) Parallel RDF generation from heterogeneous big data. In: Proceedings of the international workshop on semantic big data, SBD ’19. pp 1:1–1:6
Hirzel M, Baudart G, Bonifati A, Valle ED, Sakr S, Vlachou A (2018) Stream processing languages in the big data era. SIGMOD Rec 47(2):29–40
Junior AC, Debruyne C, Brennan R, O’Sullivan D (2016a) FunUL: a method to incorporate functions into uplift mapping languages. In: Proceedings of the 18th international conference on information integration and web-based applications and services. pp 267–275
Junior AC, Debruyne C, O’Sullivan D (2016b) Incorporating functions in mappings to facilitate the uplift of CSV files into RDF. In: The semantic web - ESWC 2016 satellite events. pp 55–59
Knoblock CA, Szekely PA, Ambite JL, Goel A, Gupta S, Lerman K, Muslea M, Taheriyan M, Mallick P (2012) Semi-automatically mapping structured sources into the semantic web. In: The semantic web: research and applications. pp 375–390
Kyzirakos K, Vlachopoulos I, Savva D, Manegold S, Koubarakis M (2018) GeoTriples: transforming geospatial data into RDF graphs using R2RML and RML mappings. J Web Semant 52:16–53
Lefrançois M, Zimmermann A, Bakerally N (2017) A SPARQL extension for generating RDF from heterogeneous formats. In: The semantic web. pp 35–50
Meester BD, Maroy W, Dimou A, Verborgh R, Mannens E (2017) Declarative data transformations for linked data generation: the case of DBpedia. In: Proceedings of the 14th ESWC. pp 33–48
Nentwig M, Hartung M, Ngomo AN, Rahm E (2017) A survey of current link discovery frameworks. Semant Web 8(3):419–436. https://doi.org/10.3233/SW-150210
Ocker F, Vogel-Heuser B, Seitz M, Paredis CJ (2020) A knowledge based system for managing heterogeneous sources of engineering information. IFAC-PapersOnLine 53(2):10511–10517
Perry M, Herring J (2012) Open geospatial consortium. GeoSPARQL - A geographic query language for RDF data, OpenGIS implementation standard. Accessed 10 Aug 2019
Phuoc DL, Quoc HNM, Ngo QH, Nhat TT, Hauswirth M (2016) The graph of things: a step towards the live knowledge graph of connected things. J Web Semant 37–38:25–35
Santipantakis GM, Vouros GA, Glenis A, Doulkeridis C, Vlachou A (2017) The datAcron ontology for semantic trajectories. In The semantic web: ESWC 2017 satellite events. pp 26–30
Santipantakis GM, Glenis A, Kalaitzian N, Vlachou A, Doulkeridis C, Vouros GA (2018a) FAIMUSS: flexible data transformation to rdf from multiple streaming sources. EDBT 2018
Santipantakis GM, Kotis KI, Vouros GA, Doulkeridis C (2018b) RDF-Gen: generating RDF from streaming and archival data. In: WIMS, ACM. pp 28:1–28:10
Scharffe F, Atemezing G, Troncy R, Gandon F, Villata S, Bucher B, Hamdi F, Bihanic L, Képéklian G, Cotton F, Euzenat J, Fan Z, Vandenbussche PY, Vatant B (2012) Enabling linked data publication with the Datalift platform. In: Semantic cities @AAAI 2012, AAAI workshops, vol WS-12-13
Simsek U, Kärle E, Fensel D (2019) RocketRML - a NodeJS implementation of a use-case specific RML mapper. CoRR arXiv:1903.04969
Slepicka J, Yin C, Szekely P, Knoblock C (2015) KR2RML: an alternative interpretation of R2RML for heterogeneous sources. In: Proceedings of the 6th international workshop on consuming linked data (COLD 2015)
Venetis T, Vassalos V (2015) Data integration in the human brain project. In: Ambite JL, Ashish N (eds) Data integration in the life sciences. Springer, New York, pp 28–36
Vouros G, Santipantakis G, Doulkeridis C, Vlachou A, Andrienko G, Andrienko N, Fuchs G, Martinez MG, Cordero JMG (2019) The datAcron ontology for the specification of semantic trajectories: specification of semantic trajectories for data transformations supporting visual analytics. J Data Semant 8:235–262
Vouros GA, Vlachou A, Santipantakis GM, Doulkeridis C, Pelekis N, Georgiou HV, Theodoridis Y, Patroumpas K, Alevizos E, Artikis A, Claramunt C, Ray C, Scarlatti D, Fuchs G, Andrienko GL, Andrienko NV, Mock M, Camossi E, Jousselme A, Garcia JMC (2018) Big data analytics for time critical mobility forecasting: recent progress and research challenges. In: Proceedings of the 21th international conference on extending database technology, EDBT 2018, Vienna, Austria, March 26–29, 2018. pp 612–623
Acknowledgements
This work was supported by EU projects datAcron (Grant Agreement No 687591), VesselAI (Grant Agreement No 957237), and by the Hellenic Foundation for Research and Innovation (H.F.R.I.) under the “First Call for H.F.R.I. Research Projects to support Faculty members and Researchers and the procurement of high-cost research equipment grant” (Project Number: HFRI-FM17-81).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Santipantakis, G.M., Kotis, K.I., Glenis, A. et al. RDF-Gen: generating RDF triples from big data sources. Knowl Inf Syst 64, 2985–3015 (2022). https://doi.org/10.1007/s10115-022-01729-x
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-022-01729-x