Abstract
Automated clustering automatically builds appropriate clustering models. The existing automated clustering methods are widely based on meta-learning. However, it still faces specific challenges: lacking comprehensive meta-features for meta-learning and general clustering validation index (CVI) as objective function. Therefore, we propose a novel automated clustering method named AutoCluster to address these problems, which is mainly composed of Clustering-oriented Meta-feature Extraction (CME) and Multi-CVIs Clustering Ensemble Construction (MC\(^2\)EC). CME captures the meta-features from spatial randomness and different learning properties of clustering algorithms to enhance meta-learning. MC\(^2\)EC develops a collaborative mechanism based on clustering ensemble to balance the measuring criterion of different CVIs and construct more appropriate clustering model for given datasets. Extensive experiments are conducted on 150 datasets from OpenML to create meta-data and 33 test datasets from three clustering benchmarks to validate the superiority of AutoCluster. The results show the superiority of AutoCluster for building an appropriate clustering model compared with classical clustering algorithms and CASH method.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
The supplementary material of this paper is available at https://github.com/wj-tian/AutoCluster.
- 2.
- 3.
- 4.
References
Adam, A., Blockeel, H.: Dealing with overlapping clustering: a constraint-based approach to algorithm selection. In: Meta-Learning and Algorithm Selection workshop-ECMLPKDD2015, vol. 1, pp. 43–54 (2015)
Arbelaitz, O., Gurrutxaga, I., Muguerza, J., PéRez, J.M., Perona, I.: An extensive comparative study of cluster validity indices. Pattern Recogn. 46(1), 243–256 (2013)
De Souto, M.C., et al.: Ranking and selecting clustering algorithms using a meta-learning approach. In: 2008 IEEE International Joint Conference on Neural Networks, pp. 3729–3735 (2008)
Ferrari, D.G., De Castro, L.N.: Clustering algorithm selection by meta-learning systems: a new distance-based problem characterization and ranking combination methods. Inf. Sci. 301, 181–194 (2015)
Fränti, P., Sieranoja, S.: K-means properties on six clustering benchmark datasets. Appl. Intell. 48(12), 4743–4759 (2018)
Garg, V., Kalai, A.T.: Supervising unsupervised learning. Adv. Neural Inf. Process. Syst. 31, 4991–5001 (2018)
Jain, A.K.: Data clustering: 50 years beyond k-means. Pattern Recogn. Lett. 31(8), 651–666 (2010)
Jamali, N., Sammut, C.: Majority voting: material classification by tactile sensing using surface texture. IEEE Trans. Robot. 27(3), 508–521 (2011)
José-GarcÃa, A., Gómez-Flores, W.: Automatic clustering using nature-inspired metaheuristics: a survey. Appl. Soft Comput. 41, 192–213 (2016)
Li, Y.F., Wang, H., Wei, T., Tu, W.W.: Towards automated semi-supervised learning. In: AAAI, vol. 33, pp. 4237–4244 (2019)
Liu, Y., Li, Z., Xiong, H., Gao, X., Wu, J.: Understanding of internal clustering validation measures. In: ICDM, pp. 911–916 (2010)
Pedregosa, F., et al.: Scikit-learn: machine learning in python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
Pimentel, B.A., de Carvalho, A.C.: A new data characterization for selecting clustering algorithms using meta-learning. Inf. Sci. 477, 203–219 (2019)
Ronan, T., Anastasio, S., Qi, Z., Sloutsky, R., Naegle, K.M., Tavares, P.H.S.V.: Openensembles: a python resource for ensemble clustering. J. Mach. Learn. Res. 19(1), 956–961 (2018)
Topchy, A., Jain, A.K., Punch, W.: Combining multiple weak clusterings. In: Proceedings of the Third IEEE International Conference on Data Mining, pp. 331–338 (2003)
Ultsch, A.: Clustering with som: U\(^{*}\) c. In: Proceedings of the Workshop on Self-Organizing Maps, 2005 (2005)
Vanschoren, J.: Meta-learning: a survey. CoRR abs/1810.03548 (2018)
Vanschoren, J., Van Rijn, J.N., Bischl, B., Torgo, L.: OpenML: networked science in machine learning. ACM SIGKDD Explor. Newsl. 15(2), 49–60 (2014)
Vukicevic, M., Radovanovic, S., Delibašić, B., Suknovic, M.: Extending meta-learning framework for clustering gene expression data with component based algorithm design and internal evaluation measures. Int. J. Data Min. Bioinform. 14, 101–119 (2016)
Zöller, M., Huber, M.F.: Benchmark and survey of automated machine learning frameworks. J. Artif. Intell. Res. 70, 409–472 (2021)
Acknowledgment
This work is supported by the National Natural Science Foundation of China (No. 52073169) and the State Key Program of National Nature Science Foundation of China (Grant No. 61936001).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Liu, Y., Li, S., Tian, W. (2021). AutoCluster: Meta-learning Based Ensemble Method for Automated Unsupervised Clustering. In: Karlapalem, K., et al. Advances in Knowledge Discovery and Data Mining. PAKDD 2021. Lecture Notes in Computer Science(), vol 12714. Springer, Cham. https://doi.org/10.1007/978-3-030-75768-7_20
Download citation
DOI: https://doi.org/10.1007/978-3-030-75768-7_20
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-75767-0
Online ISBN: 978-3-030-75768-7
eBook Packages: Computer ScienceComputer Science (R0)