Abstract
With the rapid development of information technology, large amounts of multi-source data are constantly being generated in medical field. The automatic visualization system based on them has gained a lot of attention, since the intuitive data presentation can help even non-professional users effectively get the information hidden behind the separate data obtained from different scenarios and make better decisions. In this paper, based on the Data Lake architecture, we improve the performance of an existing novel data visualization recommendation system and resolve three challenges about the processing of multi-source and heterogeneous data. First, we build the framework based on Data Lake to store multi-source and heterogeneous data. Second, we optimize the data manipulation module in the visualization system based on the distributed processing power of Data Lake to get potentially interesting visualization candidates in a short time. Third, we efficiently run exploratory queries on large datasets based on the calculation capability of Data Lake to meet the actual needs of users. According to the experiment results, our system demonstrates a remarkable acceleration effect on the task of automatic visualization of big multi-source medical data.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Feng, W., Li, G., Zhao, H.: Research on Visualization and Application of Medical Big Data, pp. 383–386 (2018)
Yang, Y., Chen, T.: Analysis and visualization implementation of medical big data resource sharing mechanism based on deep learning. IEEE Access 7, 156077–156088 (2019)
Liu, H., Taniguchi, T., Tanaka, Y., Takenaka, K., Bando, T.: Visualization of driving behavior based on hidden feature extraction by using deep learning. IEEE Trans. Intell. Transp. Syst. 18, 2477–2489 (2017)
Satagopam, V., et al.: Integration and visualization of translational medicine data for better understanding of human diseases. Big Data 4, 97–108 (2016)
Ledesma, A., Al-Musawi, M., Nieminen, H.: Health figures: an open source JavaScript library for health data visualization. BMC Med. Inform. Decis. Mak. 16, 38 (2016)
Qin, X., Luo, Y., Tang, N., Li, G.: Deepeye: An automatic big data visualization framework. Big Data Min. Analyt. 1, 75–82 (2018)
Ravat, F., Zhao, Y.: Data lakes: trends and perspectives. In: Hartmann, S., Küng, J., Chakravarthy, S., Anderst-Kotsis, G., Tjoa, A.M., Khalil, I. (eds.) DEXA 2019. LNCS, vol. 11706, pp. 304–313. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-27615-7_23
Satyanarayan, A., Moritz, D., Wongsuphasawat, K., Heer, J.: Vega-lite: a grammar of interactive graphics. IEEE Trans. Vis. Comput. Graph. 23, 341–350 (2016)
Kemper, A., Neumann, T.: HyPer: a hybrid OLTP&OLAP main memory database system based on virtual memory snapshots. In: 2011 IEEE 27th International Conference on Data Engineering, pp. 195–206. IEEE (2011)
Li, D., et al.: ECharts: a declarative framework for rapid construction of web-based visualization. Vis. Inf. 2, 136–146 (2018)
Bostock, M., Ogievetsky, V., Heer, J.: D3 data-driven documents. IEEE Trans. Vis. Comput. Graph. 17, 2301–2309 (2011)
Moritz, D., Fisher, D., Ding, B., Wang, C.: Trust, but verify: optimistic visualizations of approximate queries for exploring big data. In: Proceedings of the 2017 CHI conference on human factors in computing systems, pp. 2904–2915 (2017)
Qin, X., Luo, Y., Tang, N., Li, G.: Making data visualization more efficient and effective: a survey. VLDB J. 29(1), 93–117 (2019). https://doi.org/10.1007/s00778-019-00588-3
Luo, Y., Qin, X., Tang, N., Li, G.: DeepEye: towards automatic data visualization, pp. 101–112 (2018)
Deng, D., Li, G., Feng, J., Duan, Y., Gong, Z.: A unified framework for approximate dictionary-based entity extraction. VLDB J. 24, 143–167 (2015)
Armbrust, M., et al.: Delta lake. Proc. VLDB Endow. 13, 3411–3424 (2020)
Introduction to Delta Lake — Delta Lake Documentation. https://docs.delta.io/0.4.0/delta-intro.html. Accessed 21 May 2021
Guller, M.: Spark SQL. In: Big Data Analytics with Spark, pp. 103–152. Apress, Berkeley, CA (2015)
Table Deles, Updates and Merges — Delta Lake Documentation. https://docs.delta.io/0.4.0/delta-update.html. Accessed 21 May 2021
Zaharia, M., et al.: Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing. In: 9th {USENIX} Symposium on Networked Systems Design and Implementation ({NSDI} 12), pp. 15–28 (2012)
Zhao, X., Lei, Z., Zhang, G., Zhang, Y., Xing, C.: Blockchain and distributed system. In: Wang, G., Lin, X., Hendler, J., Song, W., Xu, Z., Liu, G. (eds.) WISA 2020. LNCS, vol. 12432, pp. 629–641. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-60029-7_56
Luo, Y., Qin, X., Tang, N., Li, G., Wang, X.: DeepEye: Creating Good Data Visualizations by Keyword Search, pp. 1733–1736 (2018)
Qin, X., Luo, Y., Tang, N., Li, G.: DeepEye: Visualizing Your Data by Keyword Search. In: EDBT, pp. 441–444. (2018)
Gonzalez, J.E., Xin, R.S., Dave, A., Crankshaw, D., Franklin, M.J., Stoica, I.: Graphx: graph processing in a distributed dataflow framework. In: 11th {USENIX} Symposium on Operating Systems Design and Implementation ({OSDI} 14), pp. 599–613 (2014)
Acknowledgements
This work was supported by the National Key R&D Program of China (2019YFC0119600).
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Ren, P. et al. (2021). Intelligent Visualization System for Big Multi-source Medical Data Based on Data Lake. In: Xing, C., Fu, X., Zhang, Y., Zhang, G., Borjigin, C. (eds) Web Information Systems and Applications. WISA 2021. Lecture Notes in Computer Science(), vol 12999. Springer, Cham. https://doi.org/10.1007/978-3-030-87571-8_61
Download citation
DOI: https://doi.org/10.1007/978-3-030-87571-8_61
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-87570-1
Online ISBN: 978-3-030-87571-8
eBook Packages: Computer ScienceComputer Science (R0)