Versatile Hyper-Elliptic Clustering Approach for Streaming Data Based on One-Pass-Thrown-Away Learning

Wattanakitrungroj, Niwan; Maneeroj, Saranya; Lursinsap, Chidchanok

doi:10.1007/s00357-017-9222-1

Versatile Hyper-Elliptic Clustering Approach for Streaming Data Based on One-Pass-Thrown-Away Learning

Published: 20 March 2017

Volume 34, pages 108–147, (2017)
Cite this article

Journal of Classification Aims and scope Submit manuscript

Niwan Wattanakitrungroj¹,
Saranya Maneeroj¹ &
Chidchanok Lursinsap¹

353 Accesses
4 Citations
Explore all metrics

Abstract

Finding patterns or clusters in streaming data is very important in the present information mining. The most critical issue is the huge amount of data versus the limited size of storage space. In the previous works, the essential information of huge data was represented by subsets of data, grid summarization, or spherical function. Those forms of data representation are not compact enough to capture the topology of the arriving data points and may lead to the lack of information for generating the accurate cluster result. In this work, we proposed a new versatile hyper-elliptic clustering algorithm, called VHEC, to cluster the streaming data in one-pass-thrown-away fashion in order to preserve the original topology of data space. To cope with the problem of one-pass-thrown-away clustering, a new set of elliptic micro-cluster parameters, i.e. boundary, density, direction, intra-distance and inter-distance, was introduced. Furthermore, a feasible technique for merging two micro-clusters was developed. The proposed parameters and one-pass-throw-away clustering algorithm were tested against several benchmark data sets and structural clustering data sets. Our performance was compared with existing algorithms. Regardless of different sizes, shapes, and densities, VHEC outperformed the other previous data stream clustering algorithms on both synthetic and real data sets. Moreover, VHEC is more significantly robust to streaming speed and incoming data sequence than the other compared algorithms in terms of purity, Rand index, and adjusted Rand index measures.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

MCMSTStream: applying minimum spanning tree to KD-tree-based micro-clusters to define arbitrary-shaped clusters in streaming data

Article Open access 27 February 2024

StrDip: A Fast Data Stream Clustering Algorithm Using the Dip Test of Unimodality

Adaptive Multiple-Resolution Stream Clustering

References

ACHTERT, E., BOHM, C., KRIEGEl, H.-P., and KROGER, P. (2005), “Online Hierarchical Clustering in a Data Warehouse Environment”, in Proceedings of the Fifth IEEE International Conference on Data Mining, pp. 10–17.
AGGARWAL, C.C. (2009), “On High Dimensional Projected Clustering of Uncertain Data Streams”, in Proceedings of the 2009 IEEE International Conference on Data Engineering, pp. 1152–1154.
AGGARWAL, C.C., HAN, J., WANG, J., and YU, P.S. (2003), “A Framework for Clustering Evolving Data Streams”, in Proceedings of the 29th International Conference on Very Large Data Bases, pp. 81–92.
AMINI, A., WAH, T., and SABOOHI, H. (2014), “On Density-Based Data Streams Clustering Algorithms: A Survey”, Journal of Computer Science and Technology 29(1), 116–141.
Article Google Scholar
BERINGER, J., and HÜLLERMEIER, E.H. (2006), “Online Clustering of Parallel Data Streams”, Data and Knowledge Engineering 58, 180–204.
Article Google Scholar
BHATNAGAR, V., and KAUR, S. (2007), “Exclusive and Complete Clustering of Streams”, in Database and Expert Systems Applications, pp. 629–638.
BHATNAGAR, V., KAUR, S., and CHAKRAVARTHY, S. (2014), “Clustering Data Streams Using Grid-Based Synopsis”, Knowledge and Information Systems 41(1), 127–152.
Article Google Scholar
BHATNAGAR, V., KAUR, S., and MIGNET, L. (2009), “A Parameterized Framework for Clustering Streams”, International Journal of Data Warehousing and Mining 5, 36–56.
Article Google Scholar
CAO, F., ESTER, M., QIAN, W., and ZHOU, A. (2006), “Density-Based Clustering over an Evolving Data Stream with Noise”, in 2006 SIAM Conference on Data Mining, pp. 328–339.
CHEN, H.-L., CHEN, M.-S, and LIN, S.-C. (2009), “Catching the Trend: A Framework for Clustering Concept-Drifting Categorical Data”, Knowledge and Data Engineering, IEEE Transactions on 21(5), 652–665.
CORDER, G.W., and FOREMAN,D.I. (2009), Nonparametric Statistics for Non-Statisticians: A Step-by-Step Approach, New Jersey: Wiley.
DA SILVA, A., CHIKY, R., and HBRAIL, G. (2012), “A Clustering Approach for Sampling Data Streams in Sensor Networks”, Knowledge and Information Systems 32(1), 1–23.
Article Google Scholar
DANIEL, B. (2002), “Requirements for Clustering Data Streams”, ACM SIGKDD Explorations Newsletter 3(2), 23–27.
Article Google Scholar
DING, S.,WU, F., QIAN, J., JIA, H., and JIN, F. (2013), “Research on Data Stream Clustering Algorithms”, Artificial Intelligence Review, 43(4), 593–600.
DRAGUT, A. (2012), “Stock Data Clustering and Multiscale Trend Detection”, Methodology and Computing in Applied Probability 14(1), 87–105.
Article MathSciNet MATH Google Scholar
ESTER, M., KRIEGEL, H.-P., SANDER, J., and XU, X. (1996), “ A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise”, in Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, pp. 226–231.
GAO, J., LI, J., ZHANG, Z., and TAN, P.-N. (2005), “An Incremental Data Stream Clustering Algorithm Based on Dense Units Detection”, in The 9th Pacific-Asia Conference on Knowledge Discovery and Data Mining, pp. 420–425.
GONG, L., ZENG, J., and ZHANG, S. (2011), “Text Stream Clustering Algorithm Based on Adaptive Feature Selection” Expert Systems with Applications 38(3), 1393–1399.
Article Google Scholar
GUHA, S., MEYERSON, A., MISHRA, N., MOTWANI, R., and O’CALLAGHAN, L. (2003), “Clustering Data Streams: Theory and Practice”, IEEE Transactions on Knowledge and Data Engineering 15(3), 515–528.
Article Google Scholar
HORE, P., HALL, L., GOLDGOF, D., and CHENG, W. (2008), “Online Fuzzy C Means”, in Annual Meeting of the North American Fuzzy Information Processing Society, pp.
HUBERT, L., and ARABIE, P. (1985), “Comparing Partitions”, Journal of Classification 2(1), 193–218.
Article MATH Google Scholar
KRANEN, P., ASSENT, I., BALDAUF, C., and SEIDL, T. (2011), “The Clustree: Indexing Micro-Clusters for Anytime Stream Mining”, Knowledge and Information Systems 29(2), 249–272.
Article Google Scholar
LEE, C.-H. (2012), “Mining Spatio-Temporal Information on Microblogging Streams Using a Density-Based Online ClusteringMethod”, Expert Systems with Applications 39(10), 9623–9641.
Article Google Scholar
LI, Y., and GOPALAN, R. (2006), “Clustering Transactional Data Streams”, in Advances in Artificial Intelligence, pp. 1069–1073.
LI, Y., LI, D., WANG, S., and ZHAI, Y. (2014), “Incremental Entropy-Based Clustering on Categorical Data Streams with Concept Drift”, Knowledge-Based Systems 59, 33–47.
LI-XIONG, L., HAI, H., YUN-FEI, G., and FU-CAI, C. (2009), “rdenstream, A Clustering Algorithm over an Evolving Data Stream”, in International Conference on Information Engineering and Computer Science, pp. 1–4.
LU, Y., SUN, Y., XU, G., and LIU, G. (2005), “A Grid-Based Clustering Algorithm for High-Dimensional Data Streams”, in International Conference on Advanced Data Mining and Applications, pp. 824–831.
LÜHR, S., and LAZARESCU,M. (2009), “Incremental Clustering of Dynamic Data Streams Using Connectivity Based Representative Points”, Data and Knowledge Engineering 68(1), 1–27.
Article Google Scholar
LUO, Q., YAN, X., LI, J., and PENG, Y. (2014), “Ddeudsc: A Dynamic Distance Estimation Using Uncertain Data Stream Clustering in Mobile Wireless Sensor Networks”, Measurement 55, 423–433.
MAGDY, A., and BASSIOUNY, M. (2010), “Sic-Means: A Semi-Fuzzy Approach for Clustering Data Streams Using C-Means”, in Artificial Neural Networks in Pattern Recognition, pp. 96–107.
MILLER, Z., DICKINSON, B., DEITRICK,W., HU,W., and WANG, A.H. (2014), “Twitter Spammer Detection Using Data Stream Clustering”, Information Sciences 260, 64–73.
PARK, N.H., and LEE, W.S. (2004), “Statistical Grid-Based Clustering over Data Streams”, SIGMOD Record 33(1), 32–37.
Article Google Scholar
PARK, N.H., and LEE, W.S. (2007a), “Cell Trees: An Adaptive Synopsis Structure for Clustering Multi-Dimensional On-Line Data Streams”, Data and Knowledge Engineering 63(2), 528–549.
Article Google Scholar
PARK, N.H., and LEE,W.S. (2007b), “Grid-Based Subspace Clustering over Data Streams”, in Proceedings of the 16th ACM Conference on Information and Knowledge Management, pp. 801–810.
PARK, N.H., OH, S.H., and LEE, W.S. (2010), “Anomaly Intrusion Detection by Clustering Transactional Audit Streams in a Host Computer”, Information Sciences 180(12), 2375–2389.
PEREIRA, C., and DE MELLO, R. (2014), “TS-Stream: Clustering Time Series on Data Streams”, Journal of Intelligent Information Systems 42(3), 531–566.
Google Scholar
PHRIDVIRAJ, M.S.B., SRINIVAS, C., and RAO, C.V.G. (2014), “Clustering Text Data Streams - A Tree Based Approach with Ternary Function and Ternary Feature Vector”, in Proceedings of the Second International Conference on Information Technology and Quantitative Management, pp. 976–984.
RAND,W.M. (1971), “Objective Criteria for the Evaluation of ClusteringMethods”, Journal of the American Statistical Association 66(336), 846–850.
Article Google Scholar
REHMAN, M.Z., LI, Y., YANG, Y., and WANG, H. (2014), “Hyper-Ellipsoidal Clustering Technique for Evolving Data Stream”, Knowledge-Based Systems 70(C), 3–14.
REN, J., CAI, B., and HU, C. (2011), “Clustering over Data Streams Based on Grid Density and Index Tree”, Journal of Convergence Information Technology 6(1), 83–93.
REN, J., and MA, R. (2009), “Density-Based Data Streams Clustering over Sliding Windows”, in 6th International Conference on Fuzzy Systems and Knowledge Discovery, pp. 248–252.
RODRIGUES, P., GAMA, J., and PEDROSO, J. (2008), “Hierarchical Clustering of Time-Series Data Streams”, IEEE Transactions on Knowledge and Data Engineering, 20(5), 615–627.
Article Google Scholar
RUIZ, C., MENASALVAS, E., and SPILIOPOULOU, M. (2009), “C-Denstream: Using Domain Knowledge on a Data Stream”, in Proceedings of the 12th International Conference on Discovery Science, pp. 287–301.
SONG, M., and WANG, H. (2005), “Highly Efficient Incremental Estimation of Gaussian Mixture Models for Online Data Stream Clustering”, in Society of Photo-Optical Instrumentation Engineers (SPIE) Conference Series, pp. 174–183.
SONG, M.J., and ZHANG, L. (2008), “Comparison of Cluster Representations from Partial Second- to Full Fourth-Order Cross Moments for Data Stream Clustering”, in Proceedings of the 8th IEEE International Conference on Data Mining, pp. 560–569.
STEINLEY, D. (2004), “Properties of the Hubert-Arable Adjusted Rand Index”, Psychological Methods 9(3), 386–396.
Article Google Scholar
STEINLEY, D., and BRUSCO, M.J. (2007), “Initializing K-Means Batch Clustering: A Critical Evaluation of Several Techniques”, Journal of Classification 24(1), 99–121.
Article MathSciNet MATH Google Scholar
SUN, Y., and LU, Y. (2006), “A Grid-Based Subspace Clustering Algorithm for High-Dimensional Data Streams”, in Web Information Systems Workshops, pp. 37–48.
TASOULIS, D.K., ADAMS, N.M., and HAND, D.J. (2006), “ Unsupervised Clustering in Streaming Data”, in Workshops Proceedings of the 6th IEEE International Conference on Data Mining, pp. 638–642.
WANG, W., GUYET, T., QUINIOU, R., CORDIER, M.-O., MASSEGLIA, F., and ZHANG, X. (2014), “Autonomic Intrusion Detection: Adaptively Detecting Anomalies over Unlabeled Audit Data Streams in Computer Networks”, Knowledge-Based Systems 70, 103–117.
Article Google Scholar
WEI, L.-Y., and PENG, W.-C. (2013), “An Incremental Algorithm for Clustering Spatial Data Streams: Exploring Temporal Locality”, Knowledge and Information Systems 37(2), 453–483.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Advanced Virtual and Intelligent Computing (AVIC) Center, Department of Mathematics and Computer Science, Faculty of Science, Chulalongkorn University, Bangkok, Thailand
Niwan Wattanakitrungroj, Saranya Maneeroj & Chidchanok Lursinsap

Authors

Niwan Wattanakitrungroj
View author publications
You can also search for this author in PubMed Google Scholar
Saranya Maneeroj
View author publications
You can also search for this author in PubMed Google Scholar
Chidchanok Lursinsap
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Niwan Wattanakitrungroj.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wattanakitrungroj, N., Maneeroj, S. & Lursinsap, C. Versatile Hyper-Elliptic Clustering Approach for Streaming Data Based on One-Pass-Thrown-Away Learning. J Classif 34, 108–147 (2017). https://doi.org/10.1007/s00357-017-9222-1

Download citation

Published: 20 March 2017
Issue Date: April 2017
DOI: https://doi.org/10.1007/s00357-017-9222-1

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Versatile Hyper-Elliptic Clustering Approach for Streaming Data Based on One-Pass-Thrown-Away Learning

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

MCMSTStream: applying minimum spanning tree to KD-tree-based micro-clusters to define arbitrary-shaped clusters in streaming data

StrDip: A Fast Data Stream Clustering Algorithm Using the Dip Test of Unimodality

Adaptive Multiple-Resolution Stream Clustering

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Versatile Hyper-Elliptic Clustering Approach for Streaming Data Based on One-Pass-Thrown-Away Learning

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

MCMSTStream: applying minimum spanning tree to KD-tree-based micro-clusters to define arbitrary-shaped clusters in streaming data

StrDip: A Fast Data Stream Clustering Algorithm Using the Dip Test of Unimodality

Adaptive Multiple-Resolution Stream Clustering

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation