Abstract
Clustering validity indices are designed for evaluating the performance of clustering algorithms in terms of quality of clusters. They are also used for detecting the correct number of clusters. Numerous clustering validity indices have been proposed in the literature for different types of data. However, to the best of our knowledge, there are a few well-known clustering validity indices for graph data. In this paper, we propose a new clustering validity index for graph data called the node-based clustering validity index. The main characteristic of our proposed index is that it captures the exclusive contribution of each node to the separation and compactness of the clusters in a graph. This characteristic gives an advantage to the method which is the detection of the correct number of clusters for cases that two sets of nodes are bonded to each other with a single node. We provide an illustrative example to show that many existing validity indices fail to detect the correct number of clusters in such cases. We evaluate the performance of our proposed index with several experiments on different real-world and synthetic graphs. Experiments show that our proposed node-based index outperforms the existing validity indices for different graph data.
Similar content being viewed by others
References
Abadi, A., Rajabioun, T., Ioannou, P. A., et al. (2015). Traffic flow prediction for road transportation networks with limited traffic data. IEEE Transactions on Intelligent Transportation Systems, 16(2), 653–662.
Ah-Pine, J., Csurka, G., & Clinchant, S. (2015). Unsupervised visual and textual information fusion in CBMIR using graph-based methods. ACM Transactions on Information Systems, 33(2), 1–31.
Almeida, H., Guedes, D., Meira, W., & Zaki, M. J. (2011). Is there a best quality metric for graph clusters? In Joint European conference on machine learning and knowledge discovery in databases (pp. 44–59). Springer.
Boutin, F., & Hascoët, M. (2004). Cluster validity indices for graph partitioning. In Proceedings. Eighth international conference on information visualisation, 2004. IV 2004 (pp. 376–381). IEEE.
Brandes, U., Gaertler, M., & Wagner, D. (2003). Experiments on graph clustering algorithms. In European symposium on algorithms (pp. 568–579). Springer.
Condon, A., & Karp, R. M. (2001). Algorithms for graph partitioning on the planted partition model. Random Structures & Algorithms, 18(2), 116–140.
Danon, L., Diaz-Guilera, A., Duch, J., & Arenas, A. (2005). Comparing community structure identification. Journal of Statistical Mechanics: Theory and Experiment, 2005(09), P09008.
Davies, D. L., & Bouldin, D. W. (1979). A cluster separation measure. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2, 224–227.
Demir, E., Aykanat, C., & Cambazoglu, B. B. (2008). Clustering spatial networks for aggregate query processing: A hypergraph approach. Information Systems, 33(1), 1–17.
Dunn, J. C. (1973). A fuzzy relative of the isodata process and its use in detecting compact well-separated clusters.
Emmons, S., Kobourov, S., Gallant, M., & Börner, K. (2016). Analysis of network clustering algorithms and cluster quality metrics at scale. PLoS ONE, 11(7), e0159161.
Girvan, M., & Newman, M. E. (2002). Community structure in social and biological networks. Proceedings of the National Academy of Sciences, 99(12), 7821–7826.
Gómez, D., Zarrazola, E., Yáñez, J., & Montero, J. (2015). A divide-and-link algorithm for hierarchical clustering in networks. Information Sciences, 316, 308–328.
Hastie, T., Tibshirani, R., & Friedman, J. (2009). The elements of statistical learning: Data mining, inference, and prediction. Springer.
Kehagias, A., & Pitsoulis, L. (2013). Bad communities with high modularity. The European Physical Journal B, 86(7), 1–11.
Kobourov, S. G., Pupyrev, S., & Simonetto, P. (2014). Visualizing graphs as maps with contiguous regions. in EuroVis14. Accepted to appear 4.
Lancichinetti, A., Fortunato, S., & Radicchi, F. (2008). Benchmark graphs for testing community detection algorithms. Physical Review E, 78(4), 046110.
Li, L. T., Xiong, Z. Y., Dai, Q. Z., Zha, Y. F., Zhang, Y. F., & Dan, J. P. (2020). A novel graph-based clustering method using noise cutting. Information Systems, 91, 101504.
Liang, S., Ren, Z., Zhao, Y., Ma, J., Yilmaz, E., & Rijke, M. D. (2017). Inferring dynamic user interests in streams of short texts for user clustering. ACM Transactions on Information Systems, 36(1), 1–37.
Newman, M. (2018). Networks. Oxford University Press.
Newman, M. E. (2003). Mixing patterns in networks. Physical Review E, 67(2), 026126.
Newman, M. E. (2004). Fast algorithm for detecting community structure in networks. Physical Review E, 69(6), 066133.
Newman, M. E., & Girvan, M. (2004). Finding and evaluating community structure in networks. Physical Review E, 69(2), 026113.
Rojas-Thomas, J., Santos, M., & Mora, M. (2017). New internal index for clustering validation based on graphs. Expert Systems with Applications, 86, 334–349.
Rousseeuw, P. J. (1987). Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. Journal of Computational and Applied Mathematics, 20, 53–65.
Schaeffer, S. E. (2007). Graph clustering. Computer Science Review, 1(1), 27–64.
Shi, J., & Malik, J. (2000). Normalized cuts and image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(8), 888–905.
Tavakkol, B., Jeong, M. K., & Albin, S. L. (2018). Validity indices for clusters of uncertain data objects. Annals of Operations Research, 303, 1–37.
Tosyali, A., Kim, J., Choi, J., & Jeong, M. K. (2019). Regularized asymmetric nonnegative matrix factorization for clustering in directed networks. Pattern Recognition Letters, 125, 750–757.
Tosyali, A., Kim, J., Choi, J., Kang, Y., & Jeong, M. K. (2020). New node anomaly detection algorithm based on nonnegative matrix factorization for directed citation networks. Annals of Operations Research, 288, 457–474.
van der Pol, J., & Rameshkoumar, J. P. (2018). The co-evolution of knowledge and collaboration networks: The role of the technology life-cycle. Scientometrics, 114(1), 307–323.
Wang, F., Li, T., Wang, X., Zhu, S., & Ding, C. (2011). Community discovery using nonnegative matrix factorization. Data Mining and Knowledge Discovery, 22(3), 493–521.
Wang, S., & Siskind, J. M. (2003). Image segmentation with ratio cut. IEEE Transactions on Pattern Analysis and Machine Intelligence, 25(6), 675–690.
Xie, X. L., & Beni, G. (1991). A validity measure for fuzzy clustering. IEEE Transactions on Pattern Analysis & Machine Intelligence, 8, 841–847.
Zachary, W. W. (1977). An information flow model for conflict and fission in small groups. Journal of Anthropological Research, 33(4), 452–473.
Zhu, J., Wu, X., Lin, X., Huang, C., Fung, G. P. C., & Tang, Y. (2018). A novel multiple layers name disambiguation framework for digital libraries using dynamic clustering. Scientometrics, 114(3), 781–794.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix: Details of calculations for illustrative example
Appendix: Details of calculations for illustrative example
In the following, we show the details of calculating the value of our proposed node-based clustering validity index and the existing clustering validity indices for graph data.
When \(K=2\):
Performance:
Rcut:
Ncut:
Modularity:
Conductance:
Coverage:
Node-based validity index:
When \(K=3\):
Performance:
Rcut:
Ncut:
Modularity:
Conductance:
Coverage:
Node-based validity index:
When \(K=4\):
Performance:
Rcut:
Ncut:
Modularity:
Conductance:
Coverage:
Node-based validity index:
Rights and permissions
About this article
Cite this article
Tosyali, A., Tavakkol, B. A node-based index for clustering validation of graph data. Ann Oper Res 341, 197–221 (2024). https://doi.org/10.1007/s10479-021-04376-7
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10479-021-04376-7