A new hybrid feature selection based on multi-filter weights and multi-feature weights

Wang, Youwei; Feng, Lizhou

doi:10.1007/s10489-019-01470-z

A new hybrid feature selection based on multi-filter weights and multi-feature weights

Published: 21 May 2019

Volume 49, pages 4033–4057, (2019)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Youwei Wang¹ &
Lizhou Feng²

765 Accesses
16 Citations
Explore all metrics

Abstract

A traditional feature selection of filters evaluates the importance of a feature by using a particular metric, deducing unstable performances when the dataset changes. In this paper, a new hybrid feature selection (called MFHFS) based on multi-filter weights and multi-feature weights is proposed. Concretely speaking, MFHFS includes the following three stages: Firstly, all samples are normalized and discretized, and the noises and the outliers are removed based on 10-folder cross validation. Secondly, the vector of multi-filter weights and the matrix of multi-feature weights are calculated and used to combine different feature subsets obtained by the optimal filters. Finally, a Q-range based feature relevance calculation method is proposed to measure the relationship of different features and the greedy searching policy is used to filter the redundant features of the temp feature subset to obtain the final feature subset. Experiments are carried out using two typical classifiers of support vector machine and random forest on six datasets (APS, Madelon, CNAE9, Gisette, DrivFace and Amazon). When the measurements of F₁^macro and F₁^micro are used, the experimental results show that the proposed method has great improvement on classification accuracy compared to the traditional filters, and it achieves significant improvements on running speed while guaranteeing the classification accuracy compared to typical hybrid feature selections.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Feature subset selection combining maximal information entropy and maximal information coefficient

Article 29 July 2019

A New Proposed Feature Subset Selection Algorithm Based on Maximization of Gain Ratio

Improved Filter-Based Feature Selection Using Correlation and Clustering Techniques

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

References

Hancer E, Xue B, Zhang M (2018) Differential evolution for filter feature selection based on information theory and feature ranking. Knowledge-Based Systems
Rawles S, Flach P (2004) Redundant feature elimination for multi-class problems. International Conference on Machine Learning ACM
Bharti KK, Singh PK (2015) Hybrid dimension reduction by integrating feature selection with feature extraction method for text clustering. Expert Syst Appl 42(6):3105–3114
Google Scholar
Zabalza J, Ren J, Zheng J et al (2016) Novel segmented stacked autoencoder for effective dimensionality reduction and feature extraction in hyperspectral imaging. Neurocomputing 185(C):1–10
Google Scholar
Quispe O, Ocsa A, Coronado R (2017) Latent semantic indexing and convolutional neural network for multi-label and multi-class text classification. IEEE Latin American Conference on Computational Intelligence. IEEE, 1–6
Marquetti I, Link JV, Lemes ALG et al (2016) Partial least square with discriminant analysis and near infrared spectroscopy for evaluation of geographic and genotypic origin of arabica coffee. Comput Electron Agric 121(C):313–319
Google Scholar
Okada K, Lee MD (2016) A Bayesian approach to modeling group and individual differences in multidimensional scaling. J Math Psychol 70:35–44
MathSciNet MATH Google Scholar
Fan Z, Xu Y, Zuo W et al (2017) Modified principal component analysis: an integration of multiple similarity subspace models. IEEE Transactions on Neural Networks & Learning Systems 25(8):1538–1552
Google Scholar
Prihatini PM, Putra IKGD, Giriantari IAD et al (2017) Fuzzy-Gibbs latent Dirichlet allocation model for feature extraction on Indonesian documents. Contemporary Engineering Sciences 10:403–421
Google Scholar
Zhang Y, Zhang Z (2012) Feature subset selection with cumulate conditional mutual information minimization. Expert Syst Appl 39(5):6078–6088
Google Scholar
Yang Y, Pedersen J (1997) A comparative study on feature set selection in text categorization. In: Fisher DH (ed) Proceedings of the 14th International Conference on Machine Learning. Morgan Kaufmann, San Francisco, pp 412–420
Google Scholar
Shang W, Huang H, Zhu H et al (2007) A novel feature selection algorithm for text classification. Expert Syst Appl 33(1):1–5
Google Scholar
Uysal AK, Gunal S A novel probabilistic feature selection for text classification. Knowl-Based Syst 36:226–235
Mengle SSR, Goharian N (2009) Ambiguity measure feature-selection algorithm. J Am Soc Inf Sci Technol 60:1037–1050
Google Scholar
Sebastiani F (2002) Machine learning in automated text categorization. ACM Computing Surveys (CSUR) 34(1):1–47
MathSciNet Google Scholar
Shi JT, Liu HL, Xu Y et al (2014) Chinese sentiment classifier machine learning based on optimized information gain feature selection. Adv Mater Res 988:511–516
Google Scholar
Peng H, Long F, Ding C (2005) Feature selection based on mutual information criteria of max-dependency, max-relevance, and min redundancy. IEEE Trans Pattern Anal Mach Intell 27(8):1226–1238
Google Scholar
Moradi P, Rostami M (2015) Integration of graph clustering with ant colony optimization for feature selection. Knowl-Based Syst 84(C):144–161
Google Scholar
Yan J, Liu N, Zhang B (2009) OCFS: optimal orthogonal centroid feature selection for text categorization. International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM: 122–129
Yang J, Qu Z, Liu Z (2014) Improved feature-selection method considering the imbalance problem in text categorization. Sci World J:1–17
Google Scholar
Tutkan M, Ganiz MC, Akyokuş S (2016) Helmholtz principle based supervised and unsupervised feature selection methods for text mining. Inf Process Manag 52(5):885–910
Google Scholar
Forman G (2003) An extensive empirical study of feature selection metrics for text classification. J Mach Learn Res 3:1289–1305
MATH Google Scholar
Rehman A, Javed K, Babri HA (2017) Feature selection based on a normalized difference measure for text classification. Inf Process Manag 53(2):473–489
Google Scholar
Zhou X, Hu Y, Guo L (2014) Text categorization based on clustering feature selection. Procedia Computer Science 31(31):398–405
Google Scholar
Hoque N, Bhattacharyya DK, Kalita JK (2014) MIFS-ND: A mutual information-based feature selection. Expert Syst Appl 41(14):6371–6385
Google Scholar
Vinh LT, Lee S, Park YT et al (2012) A novel feature selection based on normalized mutual information. Appl Intell 37(1):100–120
Google Scholar
Lin Y, Hu Q, Liu J et al (2015) Multi-label feature selection based on max-dependency and min-redundancy. Neurocomputing 168:92–103
Google Scholar
Das S (2001) Wrappers and a boosting-based hybrid for feature selection. International Conference on Machine Learning 74–81
Es TF, Hruschka ER, Castro LN et al (2009) A cluster-based feature selection approach. Hybrid Artificial Intelligence Systems, International Conference, Salamanca, Spain, Proceedings DBLP: 169–176
Jaskowiak PA, Campello RJGB (2015) A cluster based hybrid feature selection approach. Intelligent Systems. IEEE, 43–48
Uysal AK (2016) An improved global feature selection scheme for text classification. Expert Syst Appl 43:82–92
Google Scholar
Agnihotri D (2017) Variable global feature selection scheme for automatic classification of text documents. Expert Syst Appl 81(C):268–281
Google Scholar
Wang Y, Liu Y, Feng L et al (2015) Novel feature selection based on harmony search for email classification. Knowl-Based Syst 73(1):311–323
Google Scholar
Zorarpacı E, Özel SA (2016) A hybrid approach of differential evolution and artificial bee colony for feature selection. Expert Syst Appl 62:91–103
Google Scholar
Xue B, Zhang M, Browne WN (2014) Particle swarm optimization for feature selection in classification: novel initialization and updating mechanisms. Appl Soft Comput 18:261–276
Google Scholar
Ghareb AS, Bakar AA, Hamdan AR (2016) Hybrid feature selection based on enhanced genetic algorithm for text categorization. Expert Syst Appl 49:31–47
Google Scholar
Wang Y, Feng L (2018) Hybrid feature selection using component co-occurrence based feature relevance measurement. Expert Syst Appl 102:83–99
Google Scholar
Bhattacharya S, Selvakumar S (2016) Multi-measure multi-weight ranking approach for the identification of the network features for the detection of DoS and Probe attacks. Comput J 59(6):bxv078
Google Scholar
Osanaiye O, Cai H, Choo KKR et al (2016) Ensemble-based multi-filter feature selection for DDoS detection in cloud computing. EURASIP J Wirel Commun Netw 2016(1):130
Google Scholar
Wang Y, Feng L, Li Y (2017) Two-step based feature selection for filtering redundant information. J Intell Fuzzy Syst 33(4):2059–2073
Google Scholar
Breiman L, Friedman JH, Olshen RA (1984) Classification and regression trees. Wadsworth International Group, Montery
MATH Google Scholar
Wang Y, Feng L, Zhu J (2017) Novel artificial bee colony based feature selection for filtering redundant information. Appl Intell 3:1–18
Google Scholar
Duda J (1995) Supervised and unsupervised discretization of continuous Features. Machine Learning Proceedings (2):194–202
Paulus J, Klapuri A (2009) Music structure analysis using a probabilistic fitness measure and a greedy search algorithm. IEEE Trans Audio Speech Lang Process 17(6):1159–1170
Google Scholar
Dadaneh BZ, Markid HY, Zakerolhosseini A (2016) Unsupervised probabilistic feature selection using ant colony optimization. Expert Syst Appl 53:27–42
Google Scholar
Asuncion A, Newman DJ (2007) UCI machine learning repository. University of California, Department of Information and Computer Science, Irvine
Shan S (2016) Support vector machine. Machine Learning Models and Algorithms for Big Data Classification. Springer US, 24–52
Breiman L (2001) Random forests. Mach Learn 45(1):5–32
MATH Google Scholar
Masetic Z, Subasi A (2016) Congestive heart failure detection using random forest classifier. Comput Methods Prog Biomed 130(C):54–64
Google Scholar
Chang CC, Lin CJLIBSVM (2001) A library for support vector machines. ACM Trans Intell Syst Technol 2(27):1–27
Google Scholar
Chen J, Huang H, Tian S, Qu Y (2009) Feature selection for text classification with Naïve Bayes. Expert Syst Appl 36(3):5432–5435
Google Scholar
Chang F, Guo J, Xu W et al (2015) A feature selection to handle imbalanced data in text classification. J Digit Inf Manag 13(3):169–175
Google Scholar
Yang J, Qu Z, Liu Z (2014) Improved feature-selection method considering the imbalance problem in text categorization. Sci World J 3:625342
Google Scholar
Liu WS, Chen X, Gu Q (2018) A noise tolerable feature selection framework for software defect prediction. Chinese Journal of Computers 41(3):506–520
Google Scholar
Wang YW, Feng LZ (2018) A new feature selection for handling redundant information in text classification. Frontiers of Information Technology & Electronic Engineering 19(2):221–234
MathSciNet Google Scholar

Download references

Acknowledgements

This research is supported by the Beijing Natural Science Foundation, China (No. 4174105), the Key Projects of National Bureau of Statistics of China (No. 2017LZ05), the National Key R&D Program of China (2017YFB1400700), the Joint Funds of the National Natural Science Foundation of China (No. U1509214).

Author information

Authors and Affiliations

School of Information, Central University of Finance and Economics, Beijing, 100081, China
Youwei Wang
School of Science and Engineering, Tianjin University of Finance and Economics, Tianjin, 300222, China
Lizhou Feng

Authors

Youwei Wang
View author publications
You can also search for this author in PubMed Google Scholar
Lizhou Feng
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Youwei Wang.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

(M 1 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wang, Y., Feng, L. A new hybrid feature selection based on multi-filter weights and multi-feature weights. Appl Intell 49, 4033–4057 (2019). https://doi.org/10.1007/s10489-019-01470-z

Download citation

Published: 21 May 2019
Issue Date: December 2019
DOI: https://doi.org/10.1007/s10489-019-01470-z

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A new hybrid feature selection based on multi-filter weights and multi-feature weights

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Explore related subjects

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Electronic supplementary material

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation