A Semantic Data Parallel Query Method Based on Hadoop

Yang, Liu; Yang, Liu; Niu, Jiangbo; Hu, Zhigang; Long, Jun; Zheng, Meiguang

doi:10.1007/978-3-319-48740-3_29

Liu Yang¹⁹,
Liu Yang¹⁹,
Jiangbo Niu¹⁹,
Zhigang Hu¹⁹,
Jun Long²⁰ &
…
Meiguang Zheng¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10041))

Included in the following conference series:

International Conference on Web Information Systems Engineering

1253 Accesses
1 Citations

Abstract

To achieve efficient large-scale RDF data queries, we designed a parallel two-phase query strategy-PAQS for large-scale RDF data based on MapReduce, which is divided into two stages: the SPARQL pretreatment stage and the distributed query execution stage. In the SPARQL pretreatment stage, a SPARQL query classification algorithm is implemented, which determines the join order of connection variables by calculating the correlation between the variables in a SPARQL query statement; then, the join between SPARQL clauses is divided into the minimum number of MapReduce jobs according to the connection variables. The distributed query execution phase accomplishes large-scale RDF data query concurrently based on MapReduce jobs from the SPARQL pretreatment stage. The experimental results on the LUMB benchmark set indicate that PAQS can query large-scale RDF data with good efficiency, stability, and scalability.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

RDF Data Partitioning for Efficient SPARQL Query Processing with Spark SQL

RDF partitioning for scalable SPARQL query processing

Article 13 August 2015

Leon: A Distributed RDF Engine for Multi-query Processing

References

Big data white paper in 2014. Ministry of Industry and Information Technology Telecommunications Research Institute (2014)
Google Scholar
Manola, F., Miller, E.: RDF Primer [EB/OL]. W3C Recommendation (2004). http://www.w3.org/TR/rdf-syntax/
Hoffart, J., Suchanek, F.M., Berberich, K., et al.: YAGO2: a spatially and temporally enhanced knowledge base from Wikipedia. Artif. Intell. 194, 28–61 (2013)
Article MathSciNet MATH Google Scholar
Belleau, F., Nolin, M.A., Tourigny, N., et al.: Bio2RDF: towards a mashup to build bioinformatics knowledge systems. J. Biomed. Inf. 41(5), 706–716 (2008)
Article Google Scholar
Kobilarov, G., et al.: Media meets semantic web – how the BBC uses DBpedia and linked data to make connections. In: Aroyo, L., et al. (eds.) ESWC 2009. LNCS, vol. 5554, pp. 723–737. Springer, Heidelberg (2009). doi:10.1007/978-3-642-02121-3_53
Chapter Google Scholar
Mika P.: Social networks and the semantic web. In: Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence, Beijing, 20–24 September 2004, pp. 285–291. IEEE, New Jersey (2004)
Google Scholar
The Linked Open Data Project (LOD), 06 August 2015. http://www.w3.org/wiki/SweoIG/TaskForces/CommunityProjects/LinkingOpenData
Xiao-feng, M.E.N.G., Xiang, C.I.: Big data management: concepts, techniques and challenges. J. Comput. Res. Dev. 50(1), 146–169 (2013)
Google Scholar
Wang, S., Wang, H.-J., Tan, X.-P., et al.: Architecting big data: challenges, studies and forecasts. Chin. J. Comput. 34, 1741–1752 (2011)
Article Google Scholar
Li, R.: Research on key technologies of large-scaled Semantic Web ontologies querying and reasoning based on Hadoop. Chongqing University (2013)
Google Scholar
Xiao-yong, D.U., Yan, W.A.N.G., Bin, L.U.: Research and development on Semantic Web data management. J. Softw. 20(11), 2950–2964 (2009)
Article Google Scholar
Bechhofer, S., Harmelen, F.V., Hendler, J., et al.: OWL web ontology language reference. W3C Recommendation 40(8), 25–39 (2004). http://www.w3.org/2004/OWL
Shi, H.-J.: Research of massive semantic information parallel inference method based on cloud computing.Shanghai Jiaotong University (2012)
Google Scholar
Myung, J., Yeon, J., Lee, S.G.: SPARQL basic graph pattern processing with iterative MapReduce. In: Proceedings of the 2010 Workshop on Massive Data Analytics on the Cloud, 26 April 2010, pp. 1–6. ACM, New York (2010)
Google Scholar
Husain, M., Mcglothlin, J., Masud, M.M., et al.: Heuristics-based query processing for large RDF graphs using cloud computing. IEEE Trans. Knowl. Data Eng. 23(9), 1312–1327 (2011)
Article Google Scholar
Cure O, Naacke H, Randriamalala T, et al.: LiteMat: a scalable, cost-efficient inference encoding scheme for large RDF graphs. In: IEEE International Conference on Big Data, pp. 1823–1830. IEEE (2015)
Google Scholar
Liu, B., Huang, K., Li, J., et al.: An incremental and distributed inference method for large-scale ontologies based on MapReduce paradigm. IEEE Trans. Cybern. 45(1), 53–64 (2015)
Article Google Scholar

Download references

Acknowledgement

The National Natural Science Foundation of China under Grant No. 61301136, No. 61272148 and No. 61602525.

Author information

Authors and Affiliations

School of Software, Center South University, Changsha, 410073, China
Liu Yang, Liu Yang, Jiangbo Niu, Zhigang Hu & Meiguang Zheng
School of Information Science and Engineering, Center South University, Changsha, 410073, China
Jun Long

Authors

Liu Yang
View author publications
You can also search for this author in PubMed Google Scholar
Liu Yang
View author publications
You can also search for this author in PubMed Google Scholar
Jiangbo Niu
View author publications
You can also search for this author in PubMed Google Scholar
Zhigang Hu
View author publications
You can also search for this author in PubMed Google Scholar
Jun Long
View author publications
You can also search for this author in PubMed Google Scholar
Meiguang Zheng
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Meiguang Zheng .

Editor information

Editors and Affiliations

Poznań University of Economics, Poznan, Poland
Wojciech Cellary
University of Minnesota, Minneapolis, Minnesota, USA
Mohamed F. Mokbel
Tsinghua University, Beijing, China
Jianmin Wang
Victoria University, Melbourne, Victoria, Australia
Hua Wang
Victoria University, Melbourne, Victoria, Australia
Rui Zhou
Victoria University, Melbourne, Victoria, Australia
Yanchun Zhang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Yang, L., Yang, L., Niu, J., Hu, Z., Long, J., Zheng, M. (2016). A Semantic Data Parallel Query Method Based on Hadoop. In: Cellary, W., Mokbel, M., Wang, J., Wang, H., Zhou, R., Zhang, Y. (eds) Web Information Systems Engineering – WISE 2016. WISE 2016. Lecture Notes in Computer Science(), vol 10041. Springer, Cham. https://doi.org/10.1007/978-3-319-48740-3_29

Download citation

DOI: https://doi.org/10.1007/978-3-319-48740-3_29
Published: 02 November 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-48739-7
Online ISBN: 978-3-319-48740-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

A Semantic Data Parallel Query Method Based on Hadoop

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

RDF Data Partitioning for Efficient SPARQL Query Processing with Spark SQL

RDF partitioning for scalable SPARQL query processing

Leon: A Distributed RDF Engine for Multi-query Processing

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

A Semantic Data Parallel Query Method Based on Hadoop

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

RDF Data Partitioning for Efficient SPARQL Query Processing with Spark SQL

RDF partitioning for scalable SPARQL query processing

Leon: A Distributed RDF Engine for Multi-query Processing

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation