Abstract
Over the past decade’s several new concepts emerged to organize and query data over large Data Warehouse (DW) system with the same primary objective, that is, optimize processing speed. More recently, with the rise of BigData concept, storage cost lowered significantly, and performance (random accesses) increased, particularly with modern SSD disks. This paper introduces and tested a storage alternative which goes against current data normalization premises, where storage space is no longer a concern. By de-normalizing the entire data schema (transparent to the user) it is proposed a new concept system where query execution time must be entirely predictable, independently of its complexity, called, SINGLE. The proposed data model also allows easy partitioning and distributed processing to enable execution parallelism, boosting performance, as happens in MapReduce. TPC-H benchmark is used to evaluate storage space and query performance. Results show predictable performance when comparing with approaches based on a normalized relational schema, and MapReduce oriented.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Chaudhuri, S., Das, G., Narasayya, V.: Optimized stratified sampling for approximate query processing. ACM Trans. Database Syst. (TODS) 32(2), 9 (2007)
Cheng, D., Zhou, X., Lama, P., Wu, J., Jiang, C.: Cross-platform resource scheduling for Spark and MapReduce on YARN. IEEE Trans. Comput. 66, 1341–1353 (2017)
Council, Transaction Processing Performance: TPC-H benchmark specification, vol. 21, pp. 592–603 (2008). http://www.tcp.org
DeWitt, D.J., Katz, R.H., Olken, F., Shapiro, L.D., Stonebraker, M.R., Wood, D.A.: Implementation techniques for main memory database systems, vol. 14. ACM (1984)
Harris, E.P., Ramamohanarao, K.: Join algorithm costs revisited. VLDB J.—Int. J. Very Large Data Bases 5(1), 064–084 (1996)
Kimball, R.: The Data Warehouse Lifecycle Toolkit. Wiley, Hoboken (2008)
Lamb, A., et al.: The vertica analytic database: C-store 7 years later. Proc. VLDB Endow. 5(12), 1790–1801 (2012)
Lemire, D., Kaser, O., Aouiche, K.: Sorting improves word-aligned bitmap indexes. Data Knowl. Eng. 69(1), 3–28 (2010)
Mutharaju, R., Maier, F., Hitzler, P.: A MapReduce algorithm for SC. In: 23rd International Workshop on Description Logics DL2010, p. 456 (2010)
O’Neil, P., O’Neil, E., Chen, X., Revilak, S.: The star schema benchmark and augmented fact table indexing. In: Nambiar, R., Poess, M. (eds.) TPCTC 2009. LNCS, vol. 5895, pp. 237–252. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-10424-4_17
Patel, J.M., Carey, M.J., Vernon, M.K.: Accurate modeling of the hybrid hash join algorithm. In: ACM SIGMETRICS Performance Evaluation Review, vol. 22, pp. 56–66. ACM (1994)
Pavlo, A., et al.: A comparison of approaches to large-scale data analysis. In: Proceedings of the 2009 ACM SIGMOD International Conference on Management of data, pp. 165–178. ACM (2009)
Pinto, Y.: A framework for systematic database denormalization. Glob. J. Comput. Sci. Technol. 9(4), 44–52 (2009)
Roy, S., Shit, B., Sen, S.: Association based multi-attribute analysis to construct materialized view. In: Chaki, R., Saeed, K., Cortesi, A., Chaki, N. (eds.) Advanced Computing and Systems for Security. AISC, vol. 567, pp. 115–131. Springer, Singapore (2017). https://doi.org/10.1007/978-981-10-3409-1_8
Sanders, G.L., Shin, S.: Denormalization effects on performance of RDBMS. In: Proceedings of the 34th Annual Hawaii International Conference on System Sciences 2001, p. 9. IEEE (2001)
Zaker, M., Phon-Amnuaisuk, S., Haw, S.C.: Optimizing the data warehouse design by hierarchical denormalizing. In: Proceedings of the 8th Conference on Applied Computer Scince, pp. 131–138. World Scientific and Engineering Academy and Society (WSEAS) (2008)
Zhang, Y., Hu, W., Wang, S.: MOSS-DB: a hardware-aware OLAP database. In: Chen, L., Tang, C., Yang, J., Gao, Y. (eds.) WAIM 2010. LNCS, vol. 6184, pp. 582–594. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-14246-8_57
Acknowledgements
This work is financed by national funds through FCT - Fundação para a Ciência e Tecnologia, I.P., under the project UID/Multi/04016/2016. Furthermore, we would like to thank the Instituto Politécnico de Viseu and CI&DETS for their support.
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Abbasi, M., Martins, P., Cecílio, J., Costa, J., Furtado, P. (2018). SINGLE vs. MapReduce vs. Relational: Predicting Query Execution Time. In: Kozielski, S., Mrozek, D., Kasprowski, P., Małysiak-Mrozek, B., Kostrzewa, D. (eds) Beyond Databases, Architectures and Structures. Facing the Challenges of Data Proliferation and Growing Variety. BDAS 2018. Communications in Computer and Information Science, vol 928. Springer, Cham. https://doi.org/10.1007/978-3-319-99987-6_5
Download citation
DOI: https://doi.org/10.1007/978-3-319-99987-6_5
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-99986-9
Online ISBN: 978-3-319-99987-6
eBook Packages: Computer ScienceComputer Science (R0)