iBet uBet web content aggregator. Adding the entire web to your favor.
iBet uBet web content aggregator. Adding the entire web to your favor.



Link to original content: https://unpaywall.org/10.1007/S00357-017-9222-1
Versatile Hyper-Elliptic Clustering Approach for Streaming Data Based on One-Pass-Thrown-Away Learning | Journal of Classification Skip to main content
Log in

Versatile Hyper-Elliptic Clustering Approach for Streaming Data Based on One-Pass-Thrown-Away Learning

  • Published:
Journal of Classification Aims and scope Submit manuscript

Abstract

Finding patterns or clusters in streaming data is very important in the present information mining. The most critical issue is the huge amount of data versus the limited size of storage space. In the previous works, the essential information of huge data was represented by subsets of data, grid summarization, or spherical function. Those forms of data representation are not compact enough to capture the topology of the arriving data points and may lead to the lack of information for generating the accurate cluster result. In this work, we proposed a new versatile hyper-elliptic clustering algorithm, called VHEC, to cluster the streaming data in one-pass-thrown-away fashion in order to preserve the original topology of data space. To cope with the problem of one-pass-thrown-away clustering, a new set of elliptic micro-cluster parameters, i.e. boundary, density, direction, intra-distance and inter-distance, was introduced. Furthermore, a feasible technique for merging two micro-clusters was developed. The proposed parameters and one-pass-throw-away clustering algorithm were tested against several benchmark data sets and structural clustering data sets. Our performance was compared with existing algorithms. Regardless of different sizes, shapes, and densities, VHEC outperformed the other previous data stream clustering algorithms on both synthetic and real data sets. Moreover, VHEC is more significantly robust to streaming speed and incoming data sequence than the other compared algorithms in terms of purity, Rand index, and adjusted Rand index measures.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • ACHTERT, E., BOHM, C., KRIEGEl, H.-P., and KROGER, P. (2005), “Online Hierarchical Clustering in a Data Warehouse Environment”, in Proceedings of the Fifth IEEE International Conference on Data Mining, pp. 10–17.

  • AGGARWAL, C.C. (2009), “On High Dimensional Projected Clustering of Uncertain Data Streams”, in Proceedings of the 2009 IEEE International Conference on Data Engineering, pp. 1152–1154.

  • AGGARWAL, C.C., HAN, J., WANG, J., and YU, P.S. (2003), “A Framework for Clustering Evolving Data Streams”, in Proceedings of the 29th International Conference on Very Large Data Bases, pp. 81–92.

  • AMINI, A., WAH, T., and SABOOHI, H. (2014), “On Density-Based Data Streams Clustering Algorithms: A Survey”, Journal of Computer Science and Technology 29(1), 116–141.

    Article  Google Scholar 

  • BERINGER, J., and HĂśLLERMEIER, E.H. (2006), “Online Clustering of Parallel Data Streams”, Data and Knowledge Engineering 58, 180–204.

    Article  Google Scholar 

  • BHATNAGAR, V., and KAUR, S. (2007), “Exclusive and Complete Clustering of Streams”, in Database and Expert Systems Applications, pp. 629–638.

  • BHATNAGAR, V., KAUR, S., and CHAKRAVARTHY, S. (2014), “Clustering Data Streams Using Grid-Based Synopsis”, Knowledge and Information Systems 41(1), 127–152.

    Article  Google Scholar 

  • BHATNAGAR, V., KAUR, S., and MIGNET, L. (2009), “A Parameterized Framework for Clustering Streams”, International Journal of Data Warehousing and Mining 5, 36–56.

    Article  Google Scholar 

  • CAO, F., ESTER, M., QIAN, W., and ZHOU, A. (2006), “Density-Based Clustering over an Evolving Data Stream with Noise”, in 2006 SIAM Conference on Data Mining, pp. 328–339.

  • CHEN, H.-L., CHEN, M.-S, and LIN, S.-C. (2009), “Catching the Trend: A Framework for Clustering Concept-Drifting Categorical Data”, Knowledge and Data Engineering, IEEE Transactions on 21(5), 652–665.

  • CORDER, G.W., and FOREMAN,D.I. (2009), Nonparametric Statistics for Non-Statisticians: A Step-by-Step Approach, New Jersey: Wiley.

  • DA SILVA, A., CHIKY, R., and HBRAIL, G. (2012), “A Clustering Approach for Sampling Data Streams in Sensor Networks”, Knowledge and Information Systems 32(1), 1–23.

    Article  Google Scholar 

  • DANIEL, B. (2002), “Requirements for Clustering Data Streams”, ACM SIGKDD Explorations Newsletter 3(2), 23–27.

    Article  Google Scholar 

  • DING, S.,WU, F., QIAN, J., JIA, H., and JIN, F. (2013), “Research on Data Stream Clustering Algorithms”, Artificial Intelligence Review, 43(4), 593–600.

  • DRAGUT, A. (2012), “Stock Data Clustering and Multiscale Trend Detection”, Methodology and Computing in Applied Probability 14(1), 87–105.

    Article  MathSciNet  MATH  Google Scholar 

  • ESTER, M., KRIEGEL, H.-P., SANDER, J., and XU, X. (1996), “ A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise”, in Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, pp. 226–231.

  • GAO, J., LI, J., ZHANG, Z., and TAN, P.-N. (2005), “An Incremental Data Stream Clustering Algorithm Based on Dense Units Detection”, in The 9th Pacific-Asia Conference on Knowledge Discovery and Data Mining, pp. 420–425.

  • GONG, L., ZENG, J., and ZHANG, S. (2011), “Text Stream Clustering Algorithm Based on Adaptive Feature Selection” Expert Systems with Applications 38(3), 1393–1399.

    Article  Google Scholar 

  • GUHA, S., MEYERSON, A., MISHRA, N., MOTWANI, R., and O’CALLAGHAN, L. (2003), “Clustering Data Streams: Theory and Practice”, IEEE Transactions on Knowledge and Data Engineering 15(3), 515–528.

    Article  Google Scholar 

  • HORE, P., HALL, L., GOLDGOF, D., and CHENG, W. (2008), “Online Fuzzy C Means”, in Annual Meeting of the North American Fuzzy Information Processing Society, pp.

  • HUBERT, L., and ARABIE, P. (1985), “Comparing Partitions”, Journal of Classification 2(1), 193–218.

    Article  MATH  Google Scholar 

  • KRANEN, P., ASSENT, I., BALDAUF, C., and SEIDL, T. (2011), “The Clustree: Indexing Micro-Clusters for Anytime Stream Mining”, Knowledge and Information Systems 29(2), 249–272.

    Article  Google Scholar 

  • LEE, C.-H. (2012), “Mining Spatio-Temporal Information on Microblogging Streams Using a Density-Based Online ClusteringMethod”, Expert Systems with Applications 39(10), 9623–9641.

    Article  Google Scholar 

  • LI, Y., and GOPALAN, R. (2006), “Clustering Transactional Data Streams”, in Advances in Artificial Intelligence, pp. 1069–1073.

  • LI, Y., LI, D., WANG, S., and ZHAI, Y. (2014), “Incremental Entropy-Based Clustering on Categorical Data Streams with Concept Drift”, Knowledge-Based Systems 59, 33–47.

  • LI-XIONG, L., HAI, H., YUN-FEI, G., and FU-CAI, C. (2009), “rdenstream, A Clustering Algorithm over an Evolving Data Stream”, in International Conference on Information Engineering and Computer Science, pp. 1–4.

  • LU, Y., SUN, Y., XU, G., and LIU, G. (2005), “A Grid-Based Clustering Algorithm for High-Dimensional Data Streams”, in International Conference on Advanced Data Mining and Applications, pp. 824–831.

  • LĂśHR, S., and LAZARESCU,M. (2009), “Incremental Clustering of Dynamic Data Streams Using Connectivity Based Representative Points”, Data and Knowledge Engineering 68(1), 1–27.

    Article  Google Scholar 

  • LUO, Q., YAN, X., LI, J., and PENG, Y. (2014), “Ddeudsc: A Dynamic Distance Estimation Using Uncertain Data Stream Clustering in Mobile Wireless Sensor Networks”, Measurement 55, 423–433.

  • MAGDY, A., and BASSIOUNY, M. (2010), “Sic-Means: A Semi-Fuzzy Approach for Clustering Data Streams Using C-Means”, in Artificial Neural Networks in Pattern Recognition, pp. 96–107.

  • MILLER, Z., DICKINSON, B., DEITRICK,W., HU,W., and WANG, A.H. (2014), “Twitter Spammer Detection Using Data Stream Clustering”, Information Sciences 260, 64–73.

  • PARK, N.H., and LEE, W.S. (2004), “Statistical Grid-Based Clustering over Data Streams”, SIGMOD Record 33(1), 32–37.

    Article  Google Scholar 

  • PARK, N.H., and LEE, W.S. (2007a), “Cell Trees: An Adaptive Synopsis Structure for Clustering Multi-Dimensional On-Line Data Streams”, Data and Knowledge Engineering 63(2), 528–549.

    Article  Google Scholar 

  • PARK, N.H., and LEE,W.S. (2007b), “Grid-Based Subspace Clustering over Data Streams”, in Proceedings of the 16th ACM Conference on Information and Knowledge Management, pp. 801–810.

  • PARK, N.H., OH, S.H., and LEE, W.S. (2010), “Anomaly Intrusion Detection by Clustering Transactional Audit Streams in a Host Computer”, Information Sciences 180(12), 2375–2389.

  • PEREIRA, C., and DE MELLO, R. (2014), “TS-Stream: Clustering Time Series on Data Streams”, Journal of Intelligent Information Systems 42(3), 531–566.

    Google Scholar 

  • PHRIDVIRAJ, M.S.B., SRINIVAS, C., and RAO, C.V.G. (2014), “Clustering Text Data Streams - A Tree Based Approach with Ternary Function and Ternary Feature Vector”, in Proceedings of the Second International Conference on Information Technology and Quantitative Management, pp. 976–984.

  • RAND,W.M. (1971), “Objective Criteria for the Evaluation of ClusteringMethods”, Journal of the American Statistical Association 66(336), 846–850.

    Article  Google Scholar 

  • REHMAN, M.Z., LI, Y., YANG, Y., and WANG, H. (2014), “Hyper-Ellipsoidal Clustering Technique for Evolving Data Stream”, Knowledge-Based Systems 70(C), 3–14.

  • REN, J., CAI, B., and HU, C. (2011), “Clustering over Data Streams Based on Grid Density and Index Tree”, Journal of Convergence Information Technology 6(1), 83–93.

  • REN, J., and MA, R. (2009), “Density-Based Data Streams Clustering over Sliding Windows”, in 6th International Conference on Fuzzy Systems and Knowledge Discovery, pp. 248–252.

  • RODRIGUES, P., GAMA, J., and PEDROSO, J. (2008), “Hierarchical Clustering of Time-Series Data Streams”, IEEE Transactions on Knowledge and Data Engineering, 20(5), 615–627.

    Article  Google Scholar 

  • RUIZ, C., MENASALVAS, E., and SPILIOPOULOU, M. (2009), “C-Denstream: Using Domain Knowledge on a Data Stream”, in Proceedings of the 12th International Conference on Discovery Science, pp. 287–301.

  • SONG, M., and WANG, H. (2005), “Highly Efficient Incremental Estimation of Gaussian Mixture Models for Online Data Stream Clustering”, in Society of Photo-Optical Instrumentation Engineers (SPIE) Conference Series, pp. 174–183.

  • SONG, M.J., and ZHANG, L. (2008), “Comparison of Cluster Representations from Partial Second- to Full Fourth-Order Cross Moments for Data Stream Clustering”, in Proceedings of the 8th IEEE International Conference on Data Mining, pp. 560–569.

  • STEINLEY, D. (2004), “Properties of the Hubert-Arable Adjusted Rand Index”, Psychological Methods 9(3), 386–396.

    Article  Google Scholar 

  • STEINLEY, D., and BRUSCO, M.J. (2007), “Initializing K-Means Batch Clustering: A Critical Evaluation of Several Techniques”, Journal of Classification 24(1), 99–121.

    Article  MathSciNet  MATH  Google Scholar 

  • SUN, Y., and LU, Y. (2006), “A Grid-Based Subspace Clustering Algorithm for High-Dimensional Data Streams”, in Web Information Systems Workshops, pp. 37–48.

  • TASOULIS, D.K., ADAMS, N.M., and HAND, D.J. (2006), “ Unsupervised Clustering in Streaming Data”, in Workshops Proceedings of the 6th IEEE International Conference on Data Mining, pp. 638–642.

  • WANG, W., GUYET, T., QUINIOU, R., CORDIER, M.-O., MASSEGLIA, F., and ZHANG, X. (2014), “Autonomic Intrusion Detection: Adaptively Detecting Anomalies over Unlabeled Audit Data Streams in Computer Networks”, Knowledge-Based Systems 70, 103–117.

    Article  Google Scholar 

  • WEI, L.-Y., and PENG, W.-C. (2013), “An Incremental Algorithm for Clustering Spatial Data Streams: Exploring Temporal Locality”, Knowledge and Information Systems 37(2), 453–483.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Niwan Wattanakitrungroj.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wattanakitrungroj, N., Maneeroj, S. & Lursinsap, C. Versatile Hyper-Elliptic Clustering Approach for Streaming Data Based on One-Pass-Thrown-Away Learning. J Classif 34, 108–147 (2017). https://doi.org/10.1007/s00357-017-9222-1

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00357-017-9222-1

Keywords

Navigation