A novel minorization–maximization framework for simultaneous feature selection and clustering of high-dimensional count data

Zamzami, Nuha; Bouguila, Nizar

doi:10.1007/s10044-022-01094-z

A novel minorization–maximization framework for simultaneous feature selection and clustering of high-dimensional count data

Theoretical Advances
Published: 27 July 2022

Volume 26, pages 91–106, (2023)
Cite this article

Pattern Analysis and Applications Aims and scope Submit manuscript

381 Accesses
1 Citation
Explore all metrics

Abstract

Count data are commonly exploited in machine learning and computer vision applications; however, they often suffer from the well-known curse of dimensionality, which declines the performance of clustering algorithms dramatically. Feature selection is a major technique for handling a large number of features, which most are often redundant and noisy. In this paper, we propose a probabilistic approach for count data based on the concept of feature saliency in the context of mixture-based clustering using the generalized Dirichlet multinomial distribution. The saliency of irrelevant features is reduced toward zero by minimizing the message length, which equates to doing feature and model selection simultaneously. It is proved that the developed approach is effective in identifying both the optimal number of clusters and the most important features, and so enhancing clustering performance significantly, using a range of challenging applications including text and image clustering.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Mixture-Based Unsupervised Learning for Positively Correlated Count Data

Fast Simultaneous Clustering and Feature Selection for Binary Data

Toward an Efficient Computation of Log-Likelihood Functions in Statistical Inference: Overdispersed Count Data Clustering

Notes

In our experiments, the values for $M_{min}$ and $M_{max}$ have been set to 2 and 50, respectively.

References

Dy JG, Brodley CE (2004) Feature selection for unsupervised learning. J Mach Learn Res 5(Aug):845–889
MathSciNet MATH Google Scholar
Guyon I, Elisseeff A (2003) An introduction to variable and feature selection. J Mach Learn Res 3(Mar):1157–1182
MATH Google Scholar
Liu H, Wu X, Zhang S (2011) Feature selection using hierarchical feature clustering. In: Proceedings of the 20th ACM international conference on information and knowledge management, ACM, pp 979–984
Cai D, Zhang C, He X (2010) Unsupervised feature selection for multi-cluster data. In: Proceedings of the 16th ACM SIGKDD international conference on knowledge discovery and data mining, ACM, pp 333–342
Chandrashekar G, Sahin F (2014) A survey on feature selection methods. Comput Electr Eng 40(1):16–28
Article Google Scholar
Kohavi R, Sommerfield D (1995) Feature subset selection using the wrapper method: Overfitting and dynamic search space topology. In: KDD, pp 192–197
Wolf L, Shashua A (2005) Feature selection for unsupervised and supervised inference: the emergence of sparsity in a weight-based approach. J Mach Learn Res 6(Nov):1855–1887
MathSciNet MATH Google Scholar
Liu H, Yu L (2005) Toward integrating feature selection algorithms for classification and clustering. IEEE Trans Knowl Data Eng 4:491–502
Google Scholar
Chuang L-Y, Chang H-W, Tu C-J, Yang C-H (2008) Improved binary PSO for feature selection using gene expression data. Comput Biol Chem 32(1):29–38
Article MATH Google Scholar
Lazar C, Taminau J, Meganck S, Steenhoff D, Coletta A, Molter C, de Schaetzen V, Duque R, Bersini H, Nowe A (2012) A survey on filter techniques for feature selection in gene expression microarray analysis. IEEE/ACM Trans Comput Biol Bioinform (TCBB) 9(4):1106–1119
Article Google Scholar
Tang J, Liu H (2012) Feature selection with linked data in social media. In: Proceedings of the 2012 SIAM international conference on data mining, SIAM, pp 118–128
Tang J, Liu H (2014) An unsupervised feature selection framework for social media data. IEEE Trans Knowl Data Eng 26(12):2914–2927
Article Google Scholar
Liu L, Shao L, Rockett P (2013) Boosted key-frame selection and correlated pyramidal motion-feature representation for human action recognition. Pattern Recogn 46(7):1810–1818
Article Google Scholar
Lin C-H, Chen H-Y, Wu Y-S (2014) Study of image retrieval and classification based on adaptive features using genetic algorithm feature selection. Expert Syst Appl 41(15):6611–6621
Article Google Scholar
Battiti R (1994) Using mutual information for selecting features in supervised neural net learning. IEEE Trans Neural Netw 5(4):537–550
Article Google Scholar
Zeng Z, Wang X, Zhang J, Wu Q (2016) Semi-supervised feature selection based on local discriminative information. Neurocomputing 173:102–109
Article Google Scholar
Chen X, Yuan G, Nie F, Huang JZ (2017) Semi-supervised feature selection via rescaled linear regression. In: IJCAI, vol 2017, pp 1525–1531
Li Z, Tang J (2021) Semi-supervised local feature selection for data classification. Science China Inf Sci 64(9):1–12
Article Google Scholar
Law MH, Figueiredo MA, Jain AK (2004) Simultaneous feature selection and clustering using mixture models. IEEE Trans Pattern Anal Mach Intell 26(9):1154–1166
Article Google Scholar
Bouguila N (2009) A model-based approach for discrete data clustering and feature weighting using MAP and stochastic complexity. IEEE Trans Knowl Data Eng 21(12):1649–1664
Article Google Scholar
Luo M, Nie F, Chang X, Yang Y, Hauptmann AG, Zheng Q (2017) Adaptive unsupervised feature selection with structure regularization. IEEE Trans Neural Netw Learn Syst 29(4):944–956
Article Google Scholar
Li Z, Liu J, Zhu X, Liu T, Lu H (2010) Image annotation using multi-correlation probabilistic matrix factorization. In: Proceedings of the 18th ACM international conference on multimedia, ACM, pp 1187–1190
Li Z, Liu J, Yang Y, Zhou X, Lu H (2014) Clustering-guided sparse structural learning for unsupervised feature selection. IEEE Trans Knowl Data Eng 26(9):2138–2150
Article Google Scholar
Hong X, Li H, Miller P, Zhou J, Li L, Crookes D, Lu Y, Li X, Zhou H (2019) Component-based feature saliency for clustering. IEEE transactions on knowledge and data engineering
Ortega JM, Rheinboldt WC (1970) Iterative solution of nonlinear equations in several variables. vol 30. Siam
Wu TT, Lange K (2010) The MM alternative to EM. Stat Sci 25(4):492–505
Article MathSciNet MATH Google Scholar
Dempster AP (1977) Maximum likelihood estimation from incomplete data via the EM algorithm. J R Stat Soc Ser B (Statistical Methodology) 39:1–38
MATH Google Scholar
Wallace CS (2005) Statistical and inductive inference by minimum message length. Springer, New York
MATH Google Scholar
Bouguila N (2008) Clustering of count data using generalized Dirichlet multinomial distributions. IEEE Trans Knowl Data Eng 20(4):462–474
Article Google Scholar
Connor RJ, Mosimann JE (1969) Concepts of independence for proportions with a generalization of the Dirichlet distribution. J Am Stat Assoc 64(325):194–206
Article MathSciNet MATH Google Scholar
Madsen RE, Kauchak D, Elkan C (2005) Modeling word burstiness using the Dirichlet distribution. In: Proceedings of the 22nd international conference on machine learning, ACM, pp 545–552
Wong T-T (2009) Alternative prior assumptions for improving the performance of naïve Bayesian classifiers. Data Min Knowl Disc 18(2):183–213
Article Google Scholar
Zamzami N, Bouguila N (2018) Consumption behavior prediction using hierarchical Bayesian frameworks. In: 2018 first international conference on artificial intelligence for industries (AI4I), IEEE, pp 31–34
Graham MW, Miller DJ (2006) Unsupervised learning of parsimonious mixtures on large spaces with integrated feature and component selection. IEEE Trans Signal Process 54(4):1289–1303
Article MATH Google Scholar
Zhou H, Lange K (2010) MM algorithms for some discrete multivariate distributions. J Comput Graph Stat 19(3):645–665
Article MathSciNet Google Scholar
Wu X, Jiang B, Yu K, Miao C, Chen H (2019) Accurate Markov boundary discovery for causal feature selection. IEEE Trans Cybern 50:4983–4996
Article Google Scholar
Liu C, Zheng C-T, Wu S, Yu Z, Wong H-S (2018) Multitask feature selection by graph-clustered feature sharing. IEEE Trans Cybern 50:74–86
Article Google Scholar
Wu H, Liu T, Xie J (2017) Fine-grained product feature extraction in chinese reviews. In: 2017 international conference on computing intelligence and information system (CIIS), IEEE, pp. 327–331
Marquetti I, Link JV, Lemes ALG, dos Santos Scholz MB, Valderrama P, Bona E (2016) Partial least square with discriminant analysis and near infrared spectroscopy for evaluation of geographic and genotypic origin of arabica coffee. Comput Electr Agric 121:313–319
Article Google Scholar
Fan Z, Xu Y, Zuo W, Yang J, Tang J, Lai Z, Zhang D (2014) Modified principal component analysis: An integration of multiple similarity subspace models. IEEE Trans Neural Netw Learn Syst 25(8):1538–1552
Article Google Scholar
Zhao H, Wang Z, Nie F (2018) A new formulation of linear discriminant analysis for robust dimensionality reduction. IEEE Trans Knowl Data Eng 31(4):629–640
Article Google Scholar
Bouveyron C, Brunet-Saumard C (2014) Model-based clustering of high-dimensional data: a review. Comput Stat Data Anal 71:52–78
Article MathSciNet MATH Google Scholar
Dash M, Liu H (2000) Feature selection for clustering. In: Pacific-Asia conference on knowledge discovery and data mining, Springer, pp 110–121
Wang Y, Feng L (2019) A new hybrid feature selection based on multi-filter weights and multi-feature weights. Appl Intell 49:1–25
Article Google Scholar
Friedman J, Hastie T, Tibshirani R (2001) The elements of statistical learning. Springer, New York
MATH Google Scholar
Liu H, Motoda H (2007) Computational methods of feature selection. CRC Press, Boca Raton
Book MATH Google Scholar
Dash M, Choi K, Scheuermann P, Liu H (2002) Feature selection for clustering-a filter solution. In: Proceedings of 2002 IEEE international conference on data mining, IEEE, pp 115–122
Ambusaidi MA, He X, Nanda P, Tan Z (2016) Building an intrusion detection system using a filter-based feature selection algorithm. IEEE Trans Comput 65(10):2986–2998
Article MathSciNet MATH Google Scholar
Kohavi R, John GH (1997) Wrappers for feature subset selection. Artif Intell 97(1–2):273–324
Article MATH Google Scholar
Kabir MM, Islam MM, Murase K (2010) A new wrapper feature selection approach using neural network. Neurocomputing 73(16–18):3273–3283
Article Google Scholar
Apolloni J, Leguizamón G, Alba E (2016) Two hybrid wrapper-filter feature selection algorithms applied to high-dimensional microarray experiments. Appl Soft Comput 38:922–932
Article Google Scholar
Moradkhani M, Amiri A, Javaherian M, Safari H (2015) A hybrid algorithm for feature subset selection in high-dimensional datasets using FICA and IWSSr algorithm. Appl Soft Comput 35:123–135
Article Google Scholar
Tang B, Kay S, He H (2016) Toward optimal feature selection in naive Bayes for text categorization. IEEE Trans Knowl Data Eng 28(9):2508–2521
Article Google Scholar
Bouillot F, Hai PN, Béchet N, Bringay S, Ienco D, Matwin S, Poncelet P, Roche M, Teisseire M (2012) How to extract relevant knowledge from tweets? In: International workshop on information search, integration, and personalization, Springer, pp 111–120
Mladenic D, Grobelnik M (1999) Feature selection for unbalanced class distribution and naive bayes. In: ICML, vol 99, pp 258–267
Caropreso MF, Matwin S, Sebastiani F (2001) A learner-independent evaluation of the usefulness of statistical phrases for automated text categorization. Text Databases Doc Manage Theory Pract 5478:78–102
Google Scholar
Li Y, Luo C, Chung SM (2008) Text clustering with feature selection by using statistical data. IEEE Trans Knowl Data Eng 20(5):641–652
Article Google Scholar
Galavotti L, Sebastiani F, Simi M (2000) Experiments on the use of feature selection and negative evidence in automated text categorization. In: International conference on theory and practice of digital libraries, Springer, pp 59–68
Talavera L (1999) Feature selection as a preprocessing step for hierarchical clustering. In: ICML, vol 99, pp 389–397 (Citeseer)
He X, Cai D, Niyogi P (2006) Laplacian score for feature selection. In: Advances in neural information processing systems; 18; pp 507–514
Dasgupta A, Drineas P, Harb B, Josifovski V, Mahoney MW (2007) Feature selection methods for text classification. In: Proceedings of the 13th ACM SIGKDD international conference on knowledge discovery and data mining, ACM, pp 230–239
Sharma KK, Seal A (2020) Clustering analysis using an adaptive fused distance. Eng Appl Artif Intell 96:103928
Article Google Scholar
Sharma KK, Seal A (2021) Spectral embedded generalized mean based k-nearest neighbors clustering with s-distance. Expert Syst Appl 169:114326
Article Google Scholar
Sharma KK, Seal A, Herrera-Viedma E, Krejcar O (2021) An enhanced spectral clustering algorithm with s-distance. Symmetry 13(4):596
Article Google Scholar
Adams S, Beling PA (2017) A survey of feature selection methods for Gaussian mixture models and hidden Markov models. Artif Intell Rev 52:1–41
Article Google Scholar
Boutemedjet S, Bouguila N, Ziou D (2008) A hybrid feature extraction selection approach for high-dimensional non-gaussian data clustering. IEEE Trans Pattern Anal Mach Intell 31(8):1429–1443
Article Google Scholar
Fan W, Bouguila N, Ziou D (2012) Unsupervised hybrid feature extraction selection for high-dimensional non-gaussian data clustering with variational inference. IEEE Trans Knowl Data Eng 25(7):1670–1685
Article Google Scholar
Vaithyanathan S, Dom B (2000) Generalized model selection for unsupervised learning in high dimensions. Adv Neural Inf Process Syst 12:970–976
Google Scholar
Wang X, Kabán A (2006) Model-based estimation of word saliency in text. In: International conference on discovery science, Springer, pp 279–290
Li Z, Yang Y, Liu J, Zhou X, Lu H (2012) Unsupervised feature selection using nonnegative spectral analysis. In: Proceedings of the AAAI conference on artificial intelligence, vol 26
Li Z, Tang J (2015) Unsupervised feature selection via nonnegative spectral analysis and redundancy control. IEEE Trans Image Process 24(12):5343–5355
Article MathSciNet MATH Google Scholar
Cheung Y-m, Zeng H (2007) A maximum weighted likelihood approach to simultaneous model selection and feature weighting in gaussian mixture. In: International conference on artificial neural networks, Springer, pp 78–87
Tsai C-Y, Chiu C-C (2008) Developing a feature weight self-adjustment mechanism for a K-means clustering algorithm. Comput Stat Data Anal 52(10):4658–4672
Article MathSciNet MATH Google Scholar
Figueiredo MAT, Jain AK (2002) Unsupervised learning of finite mixture models. IEEE Trans Pattern Anal Mach Intell 3:381–396
Article Google Scholar
Wallace CS, Dowe DL (2000) MML clustering of multi-state, poisson, von mises circular and Gaussian distributions. Stat Comput 10(1):73–83
Article Google Scholar
Mosimann JE (1962) On the compound multinomial distribution, the multivariate $\beta$-distribution, and correlations among proportions. Biometrika 49(1/2):65–82
Article MathSciNet MATH Google Scholar
Wong T-T (2014) Generalized dirichlet priors for naïve bayesian classifiers with multinomial models in document classification. Data Min Knowl Disc 28(1):123–144
Article MATH Google Scholar
Caballero KL, Barajas J, Akella R (2012) The generalized dirichlet distribution in enhanced topic detection. In: Proceedings of the 21st ACM international conference on information and knowledge management, ACM, pp 773–782
Katz SM (1996) Distribution of content words and phrases in text and language modelling. Nat Lang Eng 2(1):15–59
Article Google Scholar
Puig P, Valero J (2006) Count data distributions: some characterizations with applications. J Am Stat Assoc 101(473):332–340
Article MathSciNet MATH Google Scholar
Haldane JB (1941) The fitting of binomial distributions. Ann Eugen 11(1):179–181
Article MathSciNet MATH Google Scholar
Bailey NT (1957) The mathematical theory of epidemics. Technical report
Griffiths D (1973) Maximum likelihood estimation for the beta-binomial distribution and an application to the household distribution of the total number of cases of a disease. Biometrics, pp 637–648
Pudil P, Novovičová J, Choakjarernwanit N, Kittler J (1995) Feature selection based on the approximation of class densities by finite mixtures of special type. Pattern Recogn 28(9):1389–1398
Article Google Scholar
Nguyen HD (2017) An introduction to Majorization-Minimization algorithms for machine learning and statistical estimation. Wiley Interdiscip Rev Data Min Knowl Discov 7(2):1198
Article Google Scholar
Tian G-L, Liu Y, Tang M-L, Li T (2019) A novel MM algorithm and the mode-sharing method in bayesian computation for the analysis of general incomplete categorical data. Comput Stat Data Anal 140:122–143
Article MathSciNet MATH Google Scholar
Elkan C (2006) Clustering documents with an exponential-family approximation of the dirichlet compound multinomial distribution. In: Proceedings of the 23rd international conference on machine learning, ACM, pp 289–296
Baxter RA, Oliver JJ (2000) Finding overlapping components with mml. Stat Comput 10(1):5–16
Article Google Scholar
Bernardo JM, Smith AF (2001) Bayesian Theory. IOP Publishing, Bristol
Google Scholar
Celeux G, Chrétien S, Forbes F, Mkhadri A (2001) A component-wise em algorithm for mixtures. J Comput Graph Stat 10(4):697–712
Article MathSciNet Google Scholar
Novovičová J, Malik A (2003) Application of multinomial mixture model to text classification. In: Iberian conference on pattern recognition and image analysis, Springer, pp 646–653
Davidson T, Warmsley D, Macy M, Weber I (2017) Automated hate speech detection and the problem of offensive language. In: Eleventh international AAAI conference on web and social media
Ortiz EG, Becker BC (2014) Face recognition for web-scale datasets. Comput Vis Image Underst 118:153–170
Article Google Scholar
Kumar N, Berg A, Belhumeur PN, Nayar S (2011) Describable visual attributes for face verification and image search. IEEE Trans Pattern Anal Mach Intell 33(10):1962–1977
Article Google Scholar
Huang GB, Mattar M, Berg T, Learned-Miller E (2008) Labeled faces in the wild: A database forstudying face recognition in unconstrained environments. Workshop on Faces in ‘Real-Life’ Images: Detection, Alignment, and Recognition, Erik Learned-Miller and Andras Ferencz and Frédéric Jurie, Oct 2008, Marseille, France. ffinria-00321923
Zhang Z, Song Y, Qi H (2017) Age progression/regression by conditional adversarial autoencoder. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5810–5818
Ricanek K, Tesafaye T (2006) Morph: A longitudinal image database of normal adult age-progression. In: 7th international conference on automatic face and gesture recognition (FGR06), IEEE, pp 341–345
Guo G, Zhang C (2014) A study on cross-population age estimation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4257–4263
He Z, Li X, Zhang Z, Wu F, Geng X, Zhang Y, Yang M-H, Zhuang Y (2017) Data-dependent label distribution learning for age estimation. IEEE Trans Image Process 26(8):3846–3858
Article MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

College of Computer Science and Engineering, Department of Computer Science and Artificial Intelligence, University of Jeddah, Jeddah, Saudi Arabia
Nuha Zamzami
Concordia Institute for Information Systems Engineering (CIISE), Concordia University, Montreal, QC, Canada
Nizar Bouguila

Authors

Nuha Zamzami
View author publications
You can also search for this author in PubMed Google Scholar
Nizar Bouguila
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Nuha Zamzami.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix A: Surrogate function construction

The complete data log-likelihood of proposed model (Eq. 7) is given by:

$$\begin{aligned} {\mathcal {L}}=\sum _{i=1}^{N} \sum _{j=1}^{M} p_j&\Bigg [ \sum _{l=1}^{D} \log (\rho _{jl}) + \log (\pi _{jl}\dots [\pi _{jl}+(X_{il}-1)\theta _{jl}]) \nonumber \\&+\log \Big ((1-\pi _{jl})\dots [1-\pi _{jl}+(Y_{il+1}-1)\theta _{jl}]\Big ) \nonumber \\&-\log (1\dots [1+(Y_{il}-1)\theta _{jl}]) + \log (1-\rho _{jl}) \nonumber \\&+ \log (\mu _{l}\dots [\mu _{l}+(X_{il}-1)\lambda _{l}]) \nonumber \\&+\log \Big ((1-\mu _{l})\dots [1-\mu _{l}+(Y_{il+1}-1)\lambda _{l}]\Big ) \nonumber \\&-\log (1\dots [1+(Y_{il}-1)\lambda _{l}]) \Bigg ] \end{aligned}$$

(A1)

To construct an MM algorithm, we need to minorize terms such as $\log (\pi _{jl}+ k)$, $\log (\mu _{jl}+k)$. Noticing that the term $\log (\pi _{jl}+ k)$ occurs in the log-likelihood if and only if $X_{il} \ge k+1$ , and the term $\log (\mu _{jl}+k)$ occurs in the log-likelihood if and only if $Y_{il} \ge k+1$, we define the following associated counts for $l=1,\dots , D$ :

$$\begin{aligned} r_{lk}=\sum _{i=1}^t 1_{\{X_{il} \ge k+1\}}, \quad s_{lk}=\sum _{i=1}^t 1_{\{Y_{il} \ge k+1\}} \end{aligned}$$

where the index k ranges from 0 to $\max _i m_i-1$. Recalling that $\upsilon _{ijl}=P(Z_i=j,\phi _{jl}=1 \mid X_i)$ and $\nu _{ijl}=P(Z_i=j,\phi _{jl}=0 \mid X_i)$, thus, Eq.(A1) can be re-written as:

$$\begin{aligned} {\mathcal {L}}(\Theta )&=\sum _i \upsilon _{ijl} \Bigg [- \sum _l \sum _k s_{lk} \log (1+k\theta _{jl})\nonumber \\&+ \sum _l \sum _k r_{lk} \log (\pi _{jl}+k \theta _{jl})\nonumber \\&+\sum _l \sum _k s_{lk} \log \Big ((1-\pi _{jl})+k\theta _{jl}\Big ) \Bigg ] \nonumber \\&+\sum _i (\sum _j \nu _{ijl}) \Bigg [-\sum _l \sum _k s_{lk} \log (1+k\lambda _{l}) \nonumber \\&+ \sum _l \sum _k r_{lk} \log (\mu _{l}+k \lambda _{l})\nonumber \\&+\sum _l \sum _k s_{lk} \log \Big ((1-\mu _{l})+k\lambda _{l}\Big ) \Bigg ] \end{aligned}$$

(A2)

Then, we apply the basic minorization functions found by Zhou and Lange (see equations 2.3 and 2.4 in [35]) to the previous equation which yields the surrogate function as:

$$\begin{aligned} {\mathcal {G}}(\Theta )&= \sum _i \upsilon _{ijl} \Bigg [- \sum _l \sum _k s_{lk} \frac{k}{1+k\theta _{jl}^n} \theta _{jl} \nonumber \\&+ \sum _l \sum _k r_{lk} \Bigg \{\frac{\pi _{jl}^n}{\pi _{jl}^n+k\theta _{jl}^n} \log \pi _{jl} + \frac{k \theta _{jl}^n}{\pi _{jl}^n+k\theta _{jl}^n} \log \theta _{jl}\Bigg \} \nonumber \\&+\sum _l \sum _k s_{lk} \Bigg \{\frac{1-\pi _{jl}^n}{(1-\pi _{jl}^n)+k\theta _{jl}^n} \log (1-\pi _{jl})\nonumber \\& + \frac{k \theta _{jl}^n}{(1-\pi _{jl}^n)+k \theta _{jl}^n } \log k\theta _{jl}^n\Bigg \} \Bigg ] \nonumber \\&+\sum _i (\sum _j \nu _{ijl}) \Bigg [-\sum _l \sum _k s_{lk} \frac{k}{1+k\lambda _{l}^n} \lambda _{l} \nonumber \\&+ \sum _l \sum _k r_{lk} \Bigg \{\frac{\mu _{l}^n}{\mu _{l}^n+k\lambda _{l}^n} \log \mu _{l} + \frac{k \lambda _{l}^n}{\mu _{l}^n+k\lambda _{l}^n} \log \lambda _{l}\Bigg \} \nonumber \\&+\sum _l \sum _k s_{lk} \Bigg \{\frac{1-\mu _{l}^n}{(1-\mu _{l}^n)+k\lambda _{l}^n} \log (1-\mu _{l})\nonumber \\& + \frac{k \lambda _{l}^n}{(1-\mu _{l}^n)+k\lambda _{l}^n }\log k\lambda _{l}^n \Bigg \} \Bigg ] \end{aligned}$$

(A3)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zamzami, N., Bouguila, N. A novel minorization–maximization framework for simultaneous feature selection and clustering of high-dimensional count data. Pattern Anal Applic 26, 91–106 (2023). https://doi.org/10.1007/s10044-022-01094-z

Download citation

Received: 04 October 2021
Accepted: 17 June 2022
Published: 27 July 2022
Issue Date: February 2023
DOI: https://doi.org/10.1007/s10044-022-01094-z

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A novel minorization–maximization framework for simultaneous feature selection and clustering of high-dimensional count data

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Mixture-Based Unsupervised Learning for Positively Correlated Count Data

Fast Simultaneous Clustering and Feature Selection for Binary Data

Toward an Efficient Computation of Log-Likelihood Functions in Statistical Inference: Overdispersed Count Data Clustering

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix A: Surrogate function construction

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

A novel minorization–maximization framework for simultaneous feature selection and clustering of high-dimensional count data

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Mixture-Based Unsupervised Learning for Positively Correlated Count Data

Fast Simultaneous Clustering and Feature Selection for Binary Data

Toward an Efficient Computation of Log-Likelihood Functions in Statistical Inference: Overdispersed Count Data Clustering

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix A: Surrogate function construction

Appendix A: Surrogate function construction

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation