Abstract
This work investigates the accuracy and efficiency tradeoffs between centralized and collective (distributed) algorithms for (i) sampling, and (ii) n-way data analysis techniques in multidimensional stream data, such as Internet chatroom communications. Its contributions are threefold. First, we use the Kolmogorov-Smirnov goodness-of-fit test to show that statistical differences between real data obtained by collective sampling in time dimension from multiple servers and that of obtained from a single server are insignificant. Second, we show using the real data that collective data analysis of 3-way data arrays (users x keywords x time) known as high order tensors is more efficient than centralized algorithms with respect to both space and computational cost. Furthermore, we show that this gain is obtained without loss of accuracy. Third, we examine the sensitivity of collective constructions and analysis of high order data tensors to the choice of server selection and sampling window size. We construct 4-way tensors (users x keywords x time x servers) and analyze them to show the impact of server and window size selections on the results.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Acar, E., Çamtepe, S.A., Krishnamoorthy, M.S., Yener, B.: Modeling and multiway analysis of chatroom tensors. In: Kantor, P., Muresan, G., Roberts, F., Zeng, D.D., Wang, F.-Y., Chen, H., Merkle, R.C. (eds.) ISI 2005. LNCS, vol. 3495, pp. 256–268. Springer, Heidelberg (2005)
Golub, G.H., Loan, C.F.V.: Matrix Computations, 3rd edn. The Johns Hopkins University Press, Baltimore (1996)
Kargupta, H., Huang, W., Sivakumar, K., Johnson, E.: Distributed Clustering Using Collective Principal Component Analysis. Knowledge and Information Systems Journal 3(4), 422–448 (2001)
Timmerman, M., Kiers, H.A.L.: Three-mode principal component analysis: Choosing the numbers of components and sensitivity to local optima. British Journal of Mathematical and Statistical Psychology 53, 1–16 (2000)
Kiers, H.A.L., der Kinderen, A.: A fast method for choosing the numbers of components in Tucker3 analysis. British Journal of Mathematical and Statistical Psychology 56, 119–125 (2003)
Lathauwer, L.D., Moor, B.D., Vanderwalle, J.: A Multilinear Singular Value Decomposition. SIAM Journal on Matrix Analysis and Applications 21(4), 1253–1278 (2000)
MacQueen, J.B.: Some Methods for classification and Analysis of Multivariate Observations. In: Proceedings of 5-th Berkeley Symposium on Mathematical Statistics and Probability, vol. 1, pp. 281–297. University of California Press, Berkeley (1967)
Smilde Age, K., Westerhuis, J.A., Boque, R.: Multiway Multiblock Component and Covariates Regression Models. J. Chemometrics 14, 301–331 (2000)
Tucker, L.: Some mathematical notes on three mode factor analysis. Psychometrika 31, 279–311 (1966)
Wansbeek, T., Verhees, J.: Models for multidimensional matrices in econometrics and psychometrics. In: Coppi, R., Bolasco, S. (eds.) Multiway Data Analysis. North Holland, Amsterdam
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Acar, E., Çamtepe, S.A., Yener, B. (2006). Collective Sampling and Analysis of High Order Tensors for Chatroom Communications. In: Mehrotra, S., Zeng, D.D., Chen, H., Thuraisingham, B., Wang, FY. (eds) Intelligence and Security Informatics. ISI 2006. Lecture Notes in Computer Science, vol 3975. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11760146_19
Download citation
DOI: https://doi.org/10.1007/11760146_19
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-34478-0
Online ISBN: 978-3-540-34479-7
eBook Packages: Computer ScienceComputer Science (R0)