A Dynamic Convergence Criterion for Fast K-means Computations

Yu, Hui; Du, Yujie; Zhang, Xiaoqi; Wang, Zhigang; Wang, Ning; Yi, Juncheng; Wang, Xiaodong; Nie, Jie; Wei, Zhiqiang

doi:10.1007/978-981-97-7707-5_17

Hui Yu¹²,
Yujie Du¹³,
Xiaoqi Zhang¹²,
Zhigang Wang¹²,
Ning Wang¹²,
Juncheng Yi¹⁴,
Xiaodong Wang¹²,
Jie Nie¹² &
…
Zhiqiang Wei¹²

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14883))

Included in the following conference series:

International Conference on Web Information Systems and Applications

246 Accesses

Abstract

The K-Means algorithm has effectively promoted the development of intelligent systems and data-driven decision-making through data clustering and analysis. A reasonable convergence judgment directly determines when the model training can be terminated, which heavily affects the model quality. There are many researches for training acceleration and quality improvement, but few focus on the judgment. Currently, the convergence criteria still adopt a centralized judgment strategy for a single loss value. The criterion is simply copied between different optimized K-Means variants, typically like the fast Mini-Batch version and the traditional Full-Batch version. Our analysis reveals that such a design cannot guarantee that different variants converge to the same point, that is, it can result in abnormal situations such as false-positive and over-training. To perform a fair comparison and guarantee the model accuracy, we proposed a new dynamic convergence criterion VF (Vote for Freezing) and optimized version VF+. VF adopts a distributed judgment strategy where each sample can decide whether to participate in training based on the criterion (i.e., freezing itself) or not. Meanwhile, combined with the priority of samples, VF adaptively adjusts the sample freezing threshold which achieves asymptotic withdrawal of samples and accelerates model convergence. VF+ further introduced parameter freezing thresholds and freezing periods to eliminate redundant distance calculations, hence it improves the training efficiency. Experiments on multiple datasets validate the effectiveness of our convergence criterion in terms of training quality and efficiency.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 119.99; Price excludes VAT (USA)

Softcover Book: USD 89.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Ahmed, M., Seraj, R., Islam, S.M.S.: The k-means algorithm: a comprehensive survey and performance evaluation. Electronics 9(8), 1295 (2020)
Article Google Scholar
Blackard, J.: Covertype. UCI Machine Learning Repository (1998). https://doi.org/10.24432/C50K5N
Chen, C., et al.: Communication-efficient federated learning with adaptive parameter freezing. In: 2021 IEEE 41st International Conference on Distributed Computing Systems (ICDCS), pp. 1–11. IEEE (2021)
Google Scholar
Ghazal, T.M.: Performances of k-means clustering algorithm with different distance metrics. Intell. Autom. Soft Comput. 30(2), 735–742 (2021)
Article Google Scholar
Han, R., et al.: SlimML: removing non-critical input data in large-scale iterative machine learning. IEEE Trans. Knowl. Data Eng. 33(5), 2223–2236 (2019)
Google Scholar
Malewicz, G., et al.: Pregel: a system for large-scale graph processing. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data, pp. 135–146 (2010)
Google Scholar
Mao, Y., Gan, D., Mwakapesa, D.S., Nanehkaran, Y.A., Tao, T., Huang, X.: A mapreduce-based k-means clustering algorithm. J. Supercomput. 1–22 (2022)
Google Scholar
McCallum, A., Nigam, K., Ungar, L.H.: Efficient clustering of high-dimensional data sets with application to reference matching. In: Proceedings of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 169–178 (2000)
Google Scholar
Olukanmi, P., Nelwamondo, F., Marwala, T., Twala, B.: Automatic detection of outliers and the number of clusters in k-means clustering via Chebyshev-type inequalities. Neural Comput. Appl. 34(8), 5939–5958 (2022)
Article Google Scholar
Pérez, J., Martínez, A., Almanza, N., Mexicano, A., Pazos, R.: Improvement to the k-means algorithm by using its geometric and cluster neighborhood properties. In: Proceedings of ICITSEM, pp. 21–26 (2014)
Google Scholar
Song, Z., et al.: ADGNN: towards scalable GNN training with aggregation-difference aware sampling. Proc. ACM Manag. Data 1(4), 1–26 (2023)
Google Scholar
Song, Z., Gu, Y., Qi, J., Wang, Z., Yu, G.: EC-graph: a distributed graph neural network system with error-compensated compression. In: 2022 IEEE 38th International Conference on Data Engineering (ICDE), pp. 648–660. IEEE (2022)
Google Scholar
Wang, Z., Gu, Y., Bao, Y., Yu, G., Yu, J.X., Wei, Z.: HGraph: I/O-efficient distributed and iterative graph computing by hybrid pushing/pulling. IEEE Trans. Knowl. Data Eng. 33(5), 1973–1987 (2019)
Google Scholar
Wang, Z., et al.: FSP: towards flexible synchronous parallel frameworks for distributed machine learning. IEEE Trans. Parallel Distrib. Syst. 34(2), 687–703 (2022)
Article Google Scholar
Wang, Z., et al.: Lightweight streaming graph partitioning by fully utilizing knowledge from local view. In: 2023 IEEE 43rd International Conference on Distributed Computing Systems (ICDCS), pp. 614–625. IEEE (2023)
Google Scholar
Whiteson, D.: HIGGS. UCI Machine Learning Repository (2014). https://doi.org/10.24432/C5V312
Whiteson, D.: HEPMASS. UCI Machine Learning Repository (2016). https://doi.org/10.24432/C5PP5W
Yu, C., Fei, L., Chen, F., Chen, L., Wang, J.: Heterogeneous graphs embedding learning with metapath instance contexts. In: Yuan, L., Yang, S., Li, R., Kanoulas, E., Zhao, X. (eds.) WISA 2023. LNCS, vol. 14094, pp. 149–161. Springer, Singapore (2023). https://doi.org/10.1007/978-981-99-6222-8_13
Chapter Google Scholar

Download references

Acknowledgement

This work was supported by the Key R&D Program of Shandong Province, China (No. 2023CXPT020), and the National Natural Science Foundation of China (No. U23A20320 and No. U22A2068).

Author information

Authors and Affiliations

Ocean University of China, Qingdao, China
Hui Yu, Xiaoqi Zhang, Zhigang Wang, Ning Wang, Xiaodong Wang, Jie Nie & Zhiqiang Wei
Yantai Engineering and Technology College, Yantai, China
Yujie Du
Big Data Center, Qingdao, China
Juncheng Yi

Authors

Hui Yu
View author publications
You can also search for this author in PubMed Google Scholar
Yujie Du
View author publications
You can also search for this author in PubMed Google Scholar
Xiaoqi Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Zhigang Wang
View author publications
You can also search for this author in PubMed Google Scholar
Ning Wang
View author publications
You can also search for this author in PubMed Google Scholar
Juncheng Yi
View author publications
You can also search for this author in PubMed Google Scholar
Xiaodong Wang
View author publications
You can also search for this author in PubMed Google Scholar
Jie Nie
View author publications
You can also search for this author in PubMed Google Scholar
Zhiqiang Wei
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zhigang Wang .

Editor information

Editors and Affiliations

East China Normal University, Shanghai, China
Cheqing Jin
Guangzhou University, Guangzhou, China
Shiyu Yang
Northwestern Polytechnical University, Xi'an Shaanxi, China
Xuequn Shang
Tongji University, Shanghai, China
Haofen Wang
Tsinghua University, Beijing, China
Yong Zhang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Yu, H. et al. (2024). A Dynamic Convergence Criterion for Fast K-means Computations. In: Jin, C., Yang, S., Shang, X., Wang, H., Zhang, Y. (eds) Web Information Systems and Applications. WISA 2024. Lecture Notes in Computer Science, vol 14883. Springer, Singapore. https://doi.org/10.1007/978-981-97-7707-5_17

Download citation

DOI: https://doi.org/10.1007/978-981-97-7707-5_17
Published: 11 September 2024
Publisher Name: Springer, Singapore
Print ISBN: 978-981-97-7706-8
Online ISBN: 978-981-97-7707-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

A Dynamic Convergence Criterion for Fast K-means Computations