Abstract
Neighborhood rough set theory is an important computational model in granular computing and has been successfully applied in many areas. One of its most prominent applications is in attribute reduction. However, most current attribute reduction methods for neighborhood rough sets are supervised or semi-supervised, which makes them unable to handle datasets without decision information. To address this, we propose an unsupervised attribute reduction strategy based on neighborhood dependency. First, a neighborhood rough set model based on conditional attribute sets is constructed. Then, based on all individual attribute subsets in the datasets, the importance of the attributes is defined to indicate the significance of the candidate attributes. Furthermore, a neighborhood dependency-based unsupervised attribute reduction (NDUAR) algorithm is designed. Finally, NDUAR is compared with existing algorithms on publicly available datasets. The experimental results show that NDUAR can select fewer attributes to maintain or improve the performance of the clustering algorithm. The effectiveness of the algorithm proposed in this paper is thereby confirmed.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Wang HJ, Zhang YH, Zhang J, Li TR, Peng LX (2019) A factor graph model for unsupervised feature selection. Inf Sci 480:144–159
Zhao JD, Lu K, He XF (2008) Locality sensitive semi-supervised feature selection. Neurocomputing 71(10–12):1842–1849
Zhu PF, Xu Q, Hu QH, Zhang CQ (2018) Co-regularized unsupervised feature selection. Neurocomputing 275:2855–2863
Yuan Z, Chen HM, Li TR, Yu Z, Sang BB, Luo C (2021) Unsupervised attribute reduction for mixed data based on fuzzy rough sets. Inf Sci 572:67–87
Pal SK, Mitra P (2004) Pattern recognition algorithms for data mining. Chapman and Hall/CRC
Kotsiantis SB (2011) Feature selection for machine learning classification problems: a recent overview. Artif Intell Rev 42:157–176
Tang J, Alelyani S, Liu H (2014) Feature selection for classification: a review. Data Classif: Algorithms Appl 37
Sheikhpour R, Sarram MA, Gharaghani S, Chahooki MAZ (2017) A survey on semi-supervised feature selection methods. Pattern Recognit 64:141–158
Alelyani S, Tang J, Liu H (2018) Feature selection for clustering: a review. Data Cluster 29–60
Solorio-Fernández S, Carrasco-Ochoa JA, Martínez-Trinidad JF (2020) A review of unsupervised feature selection methods. Artif Intell Rev 53(2):907–948
Dai JH, Hu H, Wu WZ, Qian YH, Huang DB (2017) Maximal-discernibility-pair-based approach to attribute reduction in fuzzy rough sets. IEEE Trans Fuzzy Syst 26(4):2174–2187
Wang X, Tsang EC, Zhao S, Chen D, Yeung DS (2007) Learning fuzzy rules from fuzzy samples based on rough set technique. Inf Sci 177(20):4493–4514
Pawlak Z (1982) Rough sets. Int J Comput Inf Sci 11:341–356
Teng SH, Lu M, Yang AF, Zhang J, Nian YJ, He M (2016) Efficient attribute reduction from the viewpoint of discernibility. Inf Sci 326:297–314
Zhang X, Mei CL, Chen DG, Li JH (2016) Feature selection in mixed data: a method using a novel fuzzy rough set-based information entropy. Pattern Recognit 56:1–15
Lin T (1988) Neighborhood systems and relational database. abstract. In: Proceedings of CSC, vol 88, p 725
Hu QH, Yu D, Liu JF, Wu CX (2008) Neighborhood rough set based heterogeneous feature subset selection. Inf Sci 178(18):3577–3594
Hu QH, Liu JF, Yu DR (2008) Mixed feature selection based on granulation and approximation. Knowl-Based Syst 21(4):294–304
Yong L, Huang WL, YunLiang J, Yong ZZ (2014) Quick attribute reduct algorithm for neighborhood rough set model. Inf Sci 271:65–81
Yao P, Lu YH (2011) Neighborhood rough set and svm based hybrid credit scoring classifier. Expert Syst Appl 38(9):11300–11304
Meng J, Zhang J, Luan YS (2014) Gene selection integrated with biological knowledge for plant stress response using neighborhood system and rough set theory. IEEE/ACM Trans Comput Biol Bioinform 12(2):433–444
Zhao J, Liang JM, Dong ZN, Tang DY, Liu Z (2020) Nec: a nested equivalence class-based dependency calculation approach for fast feature selection using rough set theory. Inf Sci 536:431–453
Wang CZ, He Q, Shao MW, Hu QH (2018) Feature selection based on maximal neighborhood discernibility. Int J Mach Learn Cybern 9:1929–1940
Hu CX, Zhang L, Wang BJ, Zhang Z, Li FZ (2019) Incremental updating knowledge in neighborhood multigranulation rough sets under dynamic granular structures. Knowl-Based Syst 163:811–829
Sun L, Zhang XY, Qian YH, Xu JC, Zhang SG (2019) Feature selection using neighborhood entropy-based uncertainty measures for gene expression data classification. Inf Sci 502:18–41
Wang CZ, Huang Y, Shao MW, Hu QH, Chen DG (2019) Feature selection based on neighborhood self-information. IEEE Trans Cybern 50(9):4031–4042
Xu JC, Qu KL, Yuan M, Yang J (2021) Feature selection combining information theory view and algebraic view in the neighborhood decision system. Entropy 23(6):704
Mitra P, Murthy C, Pal SK (2002) Unsupervised feature selection using feature similarity. IEEE Trans Pattern Anal Mach Intell 24(3):301–312
He X, Cai D, Niyogi P (2018) Laplacian score for feature selection. Adv Neural Inf Process Syst 18
Tabakhi S, Moradi P, Akhlaghian F (2014) An unsupervised feature selection algorithm based on ant colony optimization. Eng Appl Artif Intell 32:112–123
Dy JG, Brodley CE (2004) Feature selection for unsupervised learning. J Mach Learn Res 5(Aug):845–889
Dutta D, Dutta P, Sil J (2014) Simultaneous feature selection and clustering with mixed features by multi objective genetic algorithm. Int J Hybrid Intell Syst 11(1):41–54
Law MH, Figueiredo MA, Jain AK (2004) Simultaneous feature selection and clustering using mixture models. IEEE Trans Pattern Anal Mach Intell 26(9):1154–1166
Dash M, Liu H (2000) Feature selection for clustering. In: Pacific-Asia conference on knowledge discovery and data mining. Springer, pp 110–121
Solorio-Fernández S, Carrasco-Ochoa JA, Martínez-Trinidad JF (2016) A new hybrid filter-wrapper feature selection method for clustering based on ranking. Neurocomputing 214:866–880
Hruschka ER, Covoes TF, Ebecken NF (2005) Feature selection for clustering problems: a hybrid algorithm that iterates between k-means and a bayesian filter. In: Fifth international conference on hybrid intelligent systems (HIS’05), IEEE, p 6
Dong LJ, Gang CD, Ling WN, Hui LZ (2020) Key energy-consumption feature selection of thermal power systems based on robust attribute reduction with rough sets. Inf Sci 532:61–71
Zhu PF, Hu QH, Han YH, Zhang CQ, Du Y (2016) Combining neighborhood separable subspaces for classification via sparsity regularized optimization. Inf Sci 370:270–287
Liu J, Lin Y, Li Y et al (2018) Online multi-label streaming feature selection based on neighborhood rough set. Pattern Recognit 84:273–287
Li LJ, Li MZ, Mi JS, Xie B (2020) Dynamic granularity selection based on local weighted accuracy and local likelihood ratio. Appl Soft Comput 89:106087
Sun L, Wang LY, Ding WP, Qian YH, Xu JC (2020) Feature selection using fuzzy neighborhood entropy-based uncertainty measures for fuzzy neighborhood multigranulation rough sets. IEEE Trans Fuzzy Syst 29(1):19–33
Xu JC, Yuan M, Ma YY (2022) Feature selection using self-information and entropy-based uncertainty measure for fuzzy neighborhood rough set. Complex Intell Syst 8(1):287–305
Wan JH, Chen HM, Yuan Z, Li TR, Yang XL, Sang BB (2021) A novel hybrid feature selection method considering feature interaction in neighborhood rough set. Knowl-Based Syst 227:107167
Yang XL, Chen HM, Li TR, Wan JH, Sang BB (2021) Neighborhood rough sets with distance metric learning for feature selection. Knowl-Based Syst 224:107076
Hu QH, Yu DR, Xie ZX (2008) Neighborhood classifiers. Expert Syst Appl 34(2):866–876
Hu QH, Liu JF, Yu DR (2008) Mixed feature selection based on granulation and approximation. Knowl-Based Syst 21(4):294–304
Yuan Z, Zhang XY, Feng S (2018) Hybrid data-driven outlier detection based on neighborhood information entropy and its developmental measures. Expert Syst Appl 112:243–257
Yuan Z, Chen HM, Yang XL, Li TR, Liu KY (2021) Fuzzy complementary entropy using hybrid-kernel function and its unsupervised attribute reduction. Knowl-Based Syst 231:107398
Solorio-Fernández S, Martínez-Trinidad JF, Carrasco-Ochoa JA (2017) A new unsupervised spectral feature selection method for mixed data: a filter approach. Pattern Recognit 72:314–326
Parthaláin NM, Jensen R (2013) Unsupervised fuzzy-rough set-based dimensionality reduction. Inf Sci 229:106–121
Zhao Z, Liu H (2007) Spectral feature selection for supervised and unsupervised learning. In: Proceedings of the 24th international conference on machine learning, pp 1151–1157
Zhu PF, Zuo WM, Zhang L, Hu QH, Shiu SC (2015) Unsupervised feature selection by regularized self-representation. Pattern Recognit 48(2):438–446
Zhang PF, Li TR, Yuan Z, Deng ZX, Wang GQ, Wang DX, Zhang F (2023) A possibilistic information fusion-based unsupervised feature selection method using information quality measures. IEEE Trans Fuzzy Syst
Wang ZH, Chen HM, Yuan Z, Yang XL, Zhang PF, Li TR (2022) Exploiting fuzzy rough mutual information for feature selection. Appl Soft Comput 131:109769
Zhu PF, Zhu WC, Hu QH, Zhang CQ, Zuo WM (2017) Subspace clustering guided unsupervised feature selection. Pattern Recognit 66:364–374
Friedman M (1940) A comparison of alternative tests of significance for the problem of m rankings. Ann Math Stat 11(1):86–92
Demšar J (2006) Statistical comparisons of classifiers over multiple data sets, The. J Mach Learn Res 7:1–30
Yuan Z, Chen BY, Liu J, Chen HM, Peng DZ, Li PL (2023) Anomaly detection based on weighted fuzzy-rough density. Appl Soft Comput 134:109995
Yuan Z, Chen HM, Luo C, Peng DZ (2023) Mfgad: multi-fuzzy granules anomaly detection. Inf Fus 95:17–25
Jain AK, Murty MN, Flynn PJ (1999) Data clustering: a review. ACM Comput Surv (CSUR) 31(3):264–323
Daniels MJ, Normand S-LT (2006) Longitudinal profiling of health care units based on continuous and discrete patient outcomes. Biostatistics 7(1):1–15
Liu HT, Wei RX, Jiang GP (2013) A hybrid feature selection scheme for mixed attributes data. Comput Appl Math 32:145–161
Yuan Z, Chen HM, Xie P, Zhang PF, Liu J, Li TR (2021) Attribute reduction methods in fuzzy rough set theory: an overview, comparative experiments, and new directions. Appl Soft Comput 107:107353
Acknowledgements
The authors thank both the editors and reviewers for their valuable suggestions, which substantially improved this paper. This work was supported by the Key Research and Development projects in Sichuan province (2023YFG0303), the Project of Sichuan Provincial Department of Science and Technology (2023ZHCG0009), the Science and technology project in Ganzi Prefecture Sichuan province (23KJJH00016), the Research Team of Sichuan Minzu College (2022TD07), the Natural Science Foundation of Sichuan Province (NO.2022NSFSC1830), and the Southwest Minzu University Research Startup Funds (NO.RQD2022035).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Li, Y., Zhang, B., Yuan, Z. et al. Unsupervised attribute reduction based on neighborhood dependency. Appl Intell 54, 10653–10670 (2024). https://doi.org/10.1007/s10489-024-05604-w
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-024-05604-w