Abstract
To achieve efficient large-scale RDF data queries, we designed a parallel two-phase query strategy-PAQS for large-scale RDF data based on MapReduce, which is divided into two stages: the SPARQL pretreatment stage and the distributed query execution stage. In the SPARQL pretreatment stage, a SPARQL query classification algorithm is implemented, which determines the join order of connection variables by calculating the correlation between the variables in a SPARQL query statement; then, the join between SPARQL clauses is divided into the minimum number of MapReduce jobs according to the connection variables. The distributed query execution phase accomplishes large-scale RDF data query concurrently based on MapReduce jobs from the SPARQL pretreatment stage. The experimental results on the LUMB benchmark set indicate that PAQS can query large-scale RDF data with good efficiency, stability, and scalability.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Big data white paper in 2014. Ministry of Industry and Information Technology Telecommunications Research Institute (2014)
Manola, F., Miller, E.: RDF Primer [EB/OL]. W3C Recommendation (2004). http://www.w3.org/TR/rdf-syntax/
Hoffart, J., Suchanek, F.M., Berberich, K., et al.: YAGO2: a spatially and temporally enhanced knowledge base from Wikipedia. Artif. Intell. 194, 28–61 (2013)
Belleau, F., Nolin, M.A., Tourigny, N., et al.: Bio2RDF: towards a mashup to build bioinformatics knowledge systems. J. Biomed. Inf. 41(5), 706–716 (2008)
Kobilarov, G., et al.: Media meets semantic web – how the BBC uses DBpedia and linked data to make connections. In: Aroyo, L., et al. (eds.) ESWC 2009. LNCS, vol. 5554, pp. 723–737. Springer, Heidelberg (2009). doi:10.1007/978-3-642-02121-3_53
Mika P.: Social networks and the semantic web. In: Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence, Beijing, 20–24 September 2004, pp. 285–291. IEEE, New Jersey (2004)
The Linked Open Data Project (LOD), 06 August 2015. http://www.w3.org/wiki/SweoIG/TaskForces/CommunityProjects/LinkingOpenData
Xiao-feng, M.E.N.G., Xiang, C.I.: Big data management: concepts, techniques and challenges. J. Comput. Res. Dev. 50(1), 146–169 (2013)
Wang, S., Wang, H.-J., Tan, X.-P., et al.: Architecting big data: challenges, studies and forecasts. Chin. J. Comput. 34, 1741–1752 (2011)
Li, R.: Research on key technologies of large-scaled Semantic Web ontologies querying and reasoning based on Hadoop. Chongqing University (2013)
Xiao-yong, D.U., Yan, W.A.N.G., Bin, L.U.: Research and development on Semantic Web data management. J. Softw. 20(11), 2950–2964 (2009)
Bechhofer, S., Harmelen, F.V., Hendler, J., et al.: OWL web ontology language reference. W3C Recommendation 40(8), 25–39 (2004). http://www.w3.org/2004/OWL
Shi, H.-J.: Research of massive semantic information parallel inference method based on cloud computing.Shanghai Jiaotong University (2012)
Myung, J., Yeon, J., Lee, S.G.: SPARQL basic graph pattern processing with iterative MapReduce. In: Proceedings of the 2010 Workshop on Massive Data Analytics on the Cloud, 26 April 2010, pp. 1–6. ACM, New York (2010)
Husain, M., Mcglothlin, J., Masud, M.M., et al.: Heuristics-based query processing for large RDF graphs using cloud computing. IEEE Trans. Knowl. Data Eng. 23(9), 1312–1327 (2011)
Cure O, Naacke H, Randriamalala T, et al.: LiteMat: a scalable, cost-efficient inference encoding scheme for large RDF graphs. In: IEEE International Conference on Big Data, pp. 1823–1830. IEEE (2015)
Liu, B., Huang, K., Li, J., et al.: An incremental and distributed inference method for large-scale ontologies based on MapReduce paradigm. IEEE Trans. Cybern. 45(1), 53–64 (2015)
Acknowledgement
The National Natural Science Foundation of China under Grant No. 61301136, No. 61272148 and No. 61602525.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing AG
About this paper
Cite this paper
Yang, L., Yang, L., Niu, J., Hu, Z., Long, J., Zheng, M. (2016). A Semantic Data Parallel Query Method Based on Hadoop. In: Cellary, W., Mokbel, M., Wang, J., Wang, H., Zhou, R., Zhang, Y. (eds) Web Information Systems Engineering – WISE 2016. WISE 2016. Lecture Notes in Computer Science(), vol 10041. Springer, Cham. https://doi.org/10.1007/978-3-319-48740-3_29
Download citation
DOI: https://doi.org/10.1007/978-3-319-48740-3_29
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-48739-7
Online ISBN: 978-3-319-48740-3
eBook Packages: Computer ScienceComputer Science (R0)