iBet uBet web content aggregator. Adding the entire web to your favor.
iBet uBet web content aggregator. Adding the entire web to your favor.



Link to original content: https://unpaywall.org/10.1007/S10044-022-01094-Z
A novel minorization–maximization framework for simultaneous feature selection and clustering of high-dimensional count data | Pattern Analysis and Applications Skip to main content
Log in

A novel minorization–maximization framework for simultaneous feature selection and clustering of high-dimensional count data

  • Theoretical Advances
  • Published:
Pattern Analysis and Applications Aims and scope Submit manuscript

Abstract

Count data are commonly exploited in machine learning and computer vision applications; however, they often suffer from the well-known curse of dimensionality, which declines the performance of clustering algorithms dramatically. Feature selection is a major technique for handling a large number of features, which most are often redundant and noisy. In this paper, we propose a probabilistic approach for count data based on the concept of feature saliency in the context of mixture-based clustering using the generalized Dirichlet multinomial distribution. The saliency of irrelevant features is reduced toward zero by minimizing the message length, which equates to doing feature and model selection simultaneously. It is proved that the developed approach is effective in identifying both the optimal number of clusters and the most important features, and so enhancing clustering performance significantly, using a range of challenging applications including text and image clustering.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Notes

  1. In our experiments, the values for \(M_{min}\) and \(M_{max}\) have been set to 2 and 50, respectively.

References

  1. Dy JG, Brodley CE (2004) Feature selection for unsupervised learning. J Mach Learn Res 5(Aug):845–889

    MathSciNet  MATH  Google Scholar 

  2. Guyon I, Elisseeff A (2003) An introduction to variable and feature selection. J Mach Learn Res 3(Mar):1157–1182

    MATH  Google Scholar 

  3. Liu H, Wu X, Zhang S (2011) Feature selection using hierarchical feature clustering. In: Proceedings of the 20th ACM international conference on information and knowledge management, ACM, pp 979–984

  4. Cai D, Zhang C, He X (2010) Unsupervised feature selection for multi-cluster data. In: Proceedings of the 16th ACM SIGKDD international conference on knowledge discovery and data mining, ACM, pp 333–342

  5. Chandrashekar G, Sahin F (2014) A survey on feature selection methods. Comput Electr Eng 40(1):16–28

    Article  Google Scholar 

  6. Kohavi R, Sommerfield D (1995) Feature subset selection using the wrapper method: Overfitting and dynamic search space topology. In: KDD, pp 192–197

  7. Wolf L, Shashua A (2005) Feature selection for unsupervised and supervised inference: the emergence of sparsity in a weight-based approach. J Mach Learn Res 6(Nov):1855–1887

    MathSciNet  MATH  Google Scholar 

  8. Liu H, Yu L (2005) Toward integrating feature selection algorithms for classification and clustering. IEEE Trans Knowl Data Eng 4:491–502

    Google Scholar 

  9. Chuang L-Y, Chang H-W, Tu C-J, Yang C-H (2008) Improved binary PSO for feature selection using gene expression data. Comput Biol Chem 32(1):29–38

    Article  MATH  Google Scholar 

  10. Lazar C, Taminau J, Meganck S, Steenhoff D, Coletta A, Molter C, de Schaetzen V, Duque R, Bersini H, Nowe A (2012) A survey on filter techniques for feature selection in gene expression microarray analysis. IEEE/ACM Trans Comput Biol Bioinform (TCBB) 9(4):1106–1119

    Article  Google Scholar 

  11. Tang J, Liu H (2012) Feature selection with linked data in social media. In: Proceedings of the 2012 SIAM international conference on data mining, SIAM, pp 118–128

  12. Tang J, Liu H (2014) An unsupervised feature selection framework for social media data. IEEE Trans Knowl Data Eng 26(12):2914–2927

    Article  Google Scholar 

  13. Liu L, Shao L, Rockett P (2013) Boosted key-frame selection and correlated pyramidal motion-feature representation for human action recognition. Pattern Recogn 46(7):1810–1818

    Article  Google Scholar 

  14. Lin C-H, Chen H-Y, Wu Y-S (2014) Study of image retrieval and classification based on adaptive features using genetic algorithm feature selection. Expert Syst Appl 41(15):6611–6621

    Article  Google Scholar 

  15. Battiti R (1994) Using mutual information for selecting features in supervised neural net learning. IEEE Trans Neural Netw 5(4):537–550

    Article  Google Scholar 

  16. Zeng Z, Wang X, Zhang J, Wu Q (2016) Semi-supervised feature selection based on local discriminative information. Neurocomputing 173:102–109

    Article  Google Scholar 

  17. Chen X, Yuan G, Nie F, Huang JZ (2017) Semi-supervised feature selection via rescaled linear regression. In: IJCAI, vol 2017, pp 1525–1531

  18. Li Z, Tang J (2021) Semi-supervised local feature selection for data classification. Science China Inf Sci 64(9):1–12

    Article  Google Scholar 

  19. Law MH, Figueiredo MA, Jain AK (2004) Simultaneous feature selection and clustering using mixture models. IEEE Trans Pattern Anal Mach Intell 26(9):1154–1166

    Article  Google Scholar 

  20. Bouguila N (2009) A model-based approach for discrete data clustering and feature weighting using MAP and stochastic complexity. IEEE Trans Knowl Data Eng 21(12):1649–1664

    Article  Google Scholar 

  21. Luo M, Nie F, Chang X, Yang Y, Hauptmann AG, Zheng Q (2017) Adaptive unsupervised feature selection with structure regularization. IEEE Trans Neural Netw Learn Syst 29(4):944–956

    Article  Google Scholar 

  22. Li Z, Liu J, Zhu X, Liu T, Lu H (2010) Image annotation using multi-correlation probabilistic matrix factorization. In: Proceedings of the 18th ACM international conference on multimedia, ACM, pp 1187–1190

  23. Li Z, Liu J, Yang Y, Zhou X, Lu H (2014) Clustering-guided sparse structural learning for unsupervised feature selection. IEEE Trans Knowl Data Eng 26(9):2138–2150

    Article  Google Scholar 

  24. Hong X, Li H, Miller P, Zhou J, Li L, Crookes D, Lu Y, Li X, Zhou H (2019) Component-based feature saliency for clustering. IEEE transactions on knowledge and data engineering

  25. Ortega JM, Rheinboldt WC (1970) Iterative solution of nonlinear equations in several variables. vol 30. Siam

  26. Wu TT, Lange K (2010) The MM alternative to EM. Stat Sci 25(4):492–505

    Article  MathSciNet  MATH  Google Scholar 

  27. Dempster AP (1977) Maximum likelihood estimation from incomplete data via the EM algorithm. J R Stat Soc Ser B (Statistical Methodology) 39:1–38

    MATH  Google Scholar 

  28. Wallace CS (2005) Statistical and inductive inference by minimum message length. Springer, New York

    MATH  Google Scholar 

  29. Bouguila N (2008) Clustering of count data using generalized Dirichlet multinomial distributions. IEEE Trans Knowl Data Eng 20(4):462–474

    Article  Google Scholar 

  30. Connor RJ, Mosimann JE (1969) Concepts of independence for proportions with a generalization of the Dirichlet distribution. J Am Stat Assoc 64(325):194–206

    Article  MathSciNet  MATH  Google Scholar 

  31. Madsen RE, Kauchak D, Elkan C (2005) Modeling word burstiness using the Dirichlet distribution. In: Proceedings of the 22nd international conference on machine learning, ACM, pp 545–552

  32. Wong T-T (2009) Alternative prior assumptions for improving the performance of naïve Bayesian classifiers. Data Min Knowl Disc 18(2):183–213

    Article  Google Scholar 

  33. Zamzami N, Bouguila N (2018) Consumption behavior prediction using hierarchical Bayesian frameworks. In: 2018 first international conference on artificial intelligence for industries (AI4I), IEEE, pp 31–34

  34. Graham MW, Miller DJ (2006) Unsupervised learning of parsimonious mixtures on large spaces with integrated feature and component selection. IEEE Trans Signal Process 54(4):1289–1303

    Article  MATH  Google Scholar 

  35. Zhou H, Lange K (2010) MM algorithms for some discrete multivariate distributions. J Comput Graph Stat 19(3):645–665

    Article  MathSciNet  Google Scholar 

  36. Wu X, Jiang B, Yu K, Miao C, Chen H (2019) Accurate Markov boundary discovery for causal feature selection. IEEE Trans Cybern 50:4983–4996

    Article  Google Scholar 

  37. Liu C, Zheng C-T, Wu S, Yu Z, Wong H-S (2018) Multitask feature selection by graph-clustered feature sharing. IEEE Trans Cybern 50:74–86

    Article  Google Scholar 

  38. Wu H, Liu T, Xie J (2017) Fine-grained product feature extraction in chinese reviews. In: 2017 international conference on computing intelligence and information system (CIIS), IEEE, pp. 327–331

  39. Marquetti I, Link JV, Lemes ALG, dos Santos Scholz MB, Valderrama P, Bona E (2016) Partial least square with discriminant analysis and near infrared spectroscopy for evaluation of geographic and genotypic origin of arabica coffee. Comput Electr Agric 121:313–319

    Article  Google Scholar 

  40. Fan Z, Xu Y, Zuo W, Yang J, Tang J, Lai Z, Zhang D (2014) Modified principal component analysis: An integration of multiple similarity subspace models. IEEE Trans Neural Netw Learn Syst 25(8):1538–1552

    Article  Google Scholar 

  41. Zhao H, Wang Z, Nie F (2018) A new formulation of linear discriminant analysis for robust dimensionality reduction. IEEE Trans Knowl Data Eng 31(4):629–640

    Article  Google Scholar 

  42. Bouveyron C, Brunet-Saumard C (2014) Model-based clustering of high-dimensional data: a review. Comput Stat Data Anal 71:52–78

    Article  MathSciNet  MATH  Google Scholar 

  43. Dash M, Liu H (2000) Feature selection for clustering. In: Pacific-Asia conference on knowledge discovery and data mining, Springer, pp 110–121

  44. Wang Y, Feng L (2019) A new hybrid feature selection based on multi-filter weights and multi-feature weights. Appl Intell 49:1–25

    Article  Google Scholar 

  45. Friedman J, Hastie T, Tibshirani R (2001) The elements of statistical learning. Springer, New York

    MATH  Google Scholar 

  46. Liu H, Motoda H (2007) Computational methods of feature selection. CRC Press, Boca Raton

    Book  MATH  Google Scholar 

  47. Dash M, Choi K, Scheuermann P, Liu H (2002) Feature selection for clustering-a filter solution. In: Proceedings of 2002 IEEE international conference on data mining, IEEE, pp 115–122

  48. Ambusaidi MA, He X, Nanda P, Tan Z (2016) Building an intrusion detection system using a filter-based feature selection algorithm. IEEE Trans Comput 65(10):2986–2998

    Article  MathSciNet  MATH  Google Scholar 

  49. Kohavi R, John GH (1997) Wrappers for feature subset selection. Artif Intell 97(1–2):273–324

    Article  MATH  Google Scholar 

  50. Kabir MM, Islam MM, Murase K (2010) A new wrapper feature selection approach using neural network. Neurocomputing 73(16–18):3273–3283

    Article  Google Scholar 

  51. Apolloni J, Leguizamón G, Alba E (2016) Two hybrid wrapper-filter feature selection algorithms applied to high-dimensional microarray experiments. Appl Soft Comput 38:922–932

    Article  Google Scholar 

  52. Moradkhani M, Amiri A, Javaherian M, Safari H (2015) A hybrid algorithm for feature subset selection in high-dimensional datasets using FICA and IWSSr algorithm. Appl Soft Comput 35:123–135

    Article  Google Scholar 

  53. Tang B, Kay S, He H (2016) Toward optimal feature selection in naive Bayes for text categorization. IEEE Trans Knowl Data Eng 28(9):2508–2521

    Article  Google Scholar 

  54. Bouillot F, Hai PN, Béchet N, Bringay S, Ienco D, Matwin S, Poncelet P, Roche M, Teisseire M (2012) How to extract relevant knowledge from tweets? In: International workshop on information search, integration, and personalization, Springer, pp 111–120

  55. Mladenic D, Grobelnik M (1999) Feature selection for unbalanced class distribution and naive bayes. In: ICML, vol 99, pp 258–267

  56. Caropreso MF, Matwin S, Sebastiani F (2001) A learner-independent evaluation of the usefulness of statistical phrases for automated text categorization. Text Databases Doc Manage Theory Pract 5478:78–102

    Google Scholar 

  57. Li Y, Luo C, Chung SM (2008) Text clustering with feature selection by using statistical data. IEEE Trans Knowl Data Eng 20(5):641–652

    Article  Google Scholar 

  58. Galavotti L, Sebastiani F, Simi M (2000) Experiments on the use of feature selection and negative evidence in automated text categorization. In: International conference on theory and practice of digital libraries, Springer, pp 59–68

  59. Talavera L (1999) Feature selection as a preprocessing step for hierarchical clustering. In: ICML, vol 99, pp 389–397 (Citeseer)

  60. He X, Cai D, Niyogi P (2006) Laplacian score for feature selection. In: Advances in neural information processing systems; 18; pp 507–514

  61. Dasgupta A, Drineas P, Harb B, Josifovski V, Mahoney MW (2007) Feature selection methods for text classification. In: Proceedings of the 13th ACM SIGKDD international conference on knowledge discovery and data mining, ACM, pp 230–239

  62. Sharma KK, Seal A (2020) Clustering analysis using an adaptive fused distance. Eng Appl Artif Intell 96:103928

    Article  Google Scholar 

  63. Sharma KK, Seal A (2021) Spectral embedded generalized mean based k-nearest neighbors clustering with s-distance. Expert Syst Appl 169:114326

    Article  Google Scholar 

  64. Sharma KK, Seal A, Herrera-Viedma E, Krejcar O (2021) An enhanced spectral clustering algorithm with s-distance. Symmetry 13(4):596

    Article  Google Scholar 

  65. Adams S, Beling PA (2017) A survey of feature selection methods for Gaussian mixture models and hidden Markov models. Artif Intell Rev 52:1–41

    Article  Google Scholar 

  66. Boutemedjet S, Bouguila N, Ziou D (2008) A hybrid feature extraction selection approach for high-dimensional non-gaussian data clustering. IEEE Trans Pattern Anal Mach Intell 31(8):1429–1443

    Article  Google Scholar 

  67. Fan W, Bouguila N, Ziou D (2012) Unsupervised hybrid feature extraction selection for high-dimensional non-gaussian data clustering with variational inference. IEEE Trans Knowl Data Eng 25(7):1670–1685

    Article  Google Scholar 

  68. Vaithyanathan S, Dom B (2000) Generalized model selection for unsupervised learning in high dimensions. Adv Neural Inf Process Syst 12:970–976

    Google Scholar 

  69. Wang X, Kabán A (2006) Model-based estimation of word saliency in text. In: International conference on discovery science, Springer, pp 279–290

  70. Li Z, Yang Y, Liu J, Zhou X, Lu H (2012) Unsupervised feature selection using nonnegative spectral analysis. In: Proceedings of the AAAI conference on artificial intelligence, vol 26

  71. Li Z, Tang J (2015) Unsupervised feature selection via nonnegative spectral analysis and redundancy control. IEEE Trans Image Process 24(12):5343–5355

    Article  MathSciNet  MATH  Google Scholar 

  72. Cheung Y-m, Zeng H (2007) A maximum weighted likelihood approach to simultaneous model selection and feature weighting in gaussian mixture. In: International conference on artificial neural networks, Springer, pp 78–87

  73. Tsai C-Y, Chiu C-C (2008) Developing a feature weight self-adjustment mechanism for a K-means clustering algorithm. Comput Stat Data Anal 52(10):4658–4672

    Article  MathSciNet  MATH  Google Scholar 

  74. Figueiredo MAT, Jain AK (2002) Unsupervised learning of finite mixture models. IEEE Trans Pattern Anal Mach Intell 3:381–396

    Article  Google Scholar 

  75. Wallace CS, Dowe DL (2000) MML clustering of multi-state, poisson, von mises circular and Gaussian distributions. Stat Comput 10(1):73–83

    Article  Google Scholar 

  76. Mosimann JE (1962) On the compound multinomial distribution, the multivariate \(\beta\)-distribution, and correlations among proportions. Biometrika 49(1/2):65–82

    Article  MathSciNet  MATH  Google Scholar 

  77. Wong T-T (2014) Generalized dirichlet priors for naïve bayesian classifiers with multinomial models in document classification. Data Min Knowl Disc 28(1):123–144

    Article  MATH  Google Scholar 

  78. Caballero KL, Barajas J, Akella R (2012) The generalized dirichlet distribution in enhanced topic detection. In: Proceedings of the 21st ACM international conference on information and knowledge management, ACM, pp 773–782

  79. Katz SM (1996) Distribution of content words and phrases in text and language modelling. Nat Lang Eng 2(1):15–59

    Article  Google Scholar 

  80. Puig P, Valero J (2006) Count data distributions: some characterizations with applications. J Am Stat Assoc 101(473):332–340

    Article  MathSciNet  MATH  Google Scholar 

  81. Haldane JB (1941) The fitting of binomial distributions. Ann Eugen 11(1):179–181

    Article  MathSciNet  MATH  Google Scholar 

  82. Bailey NT (1957) The mathematical theory of epidemics. Technical report

  83. Griffiths D (1973) Maximum likelihood estimation for the beta-binomial distribution and an application to the household distribution of the total number of cases of a disease. Biometrics, pp 637–648

  84. Pudil P, Novovičová J, Choakjarernwanit N, Kittler J (1995) Feature selection based on the approximation of class densities by finite mixtures of special type. Pattern Recogn 28(9):1389–1398

    Article  Google Scholar 

  85. Nguyen HD (2017) An introduction to Majorization-Minimization algorithms for machine learning and statistical estimation. Wiley Interdiscip Rev Data Min Knowl Discov 7(2):1198

    Article  Google Scholar 

  86. Tian G-L, Liu Y, Tang M-L, Li T (2019) A novel MM algorithm and the mode-sharing method in bayesian computation for the analysis of general incomplete categorical data. Comput Stat Data Anal 140:122–143

    Article  MathSciNet  MATH  Google Scholar 

  87. Elkan C (2006) Clustering documents with an exponential-family approximation of the dirichlet compound multinomial distribution. In: Proceedings of the 23rd international conference on machine learning, ACM, pp 289–296

  88. Baxter RA, Oliver JJ (2000) Finding overlapping components with mml. Stat Comput 10(1):5–16

    Article  Google Scholar 

  89. Bernardo JM, Smith AF (2001) Bayesian Theory. IOP Publishing, Bristol

    Google Scholar 

  90. Celeux G, Chrétien S, Forbes F, Mkhadri A (2001) A component-wise em algorithm for mixtures. J Comput Graph Stat 10(4):697–712

    Article  MathSciNet  Google Scholar 

  91. Novovičová J, Malik A (2003) Application of multinomial mixture model to text classification. In: Iberian conference on pattern recognition and image analysis, Springer, pp 646–653

  92. Davidson T, Warmsley D, Macy M, Weber I (2017) Automated hate speech detection and the problem of offensive language. In: Eleventh international AAAI conference on web and social media

  93. Ortiz EG, Becker BC (2014) Face recognition for web-scale datasets. Comput Vis Image Underst 118:153–170

    Article  Google Scholar 

  94. Kumar N, Berg A, Belhumeur PN, Nayar S (2011) Describable visual attributes for face verification and image search. IEEE Trans Pattern Anal Mach Intell 33(10):1962–1977

    Article  Google Scholar 

  95. Huang GB, Mattar M, Berg T, Learned-Miller E (2008) Labeled faces in the wild: A database forstudying face recognition in unconstrained environments. Workshop on Faces in ‘Real-Life’ Images: Detection, Alignment, and Recognition, Erik Learned-Miller and Andras Ferencz and Frédéric Jurie, Oct 2008, Marseille, France. ffinria-00321923

  96. Zhang Z, Song Y, Qi H (2017) Age progression/regression by conditional adversarial autoencoder. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5810–5818

  97. Ricanek K, Tesafaye T (2006) Morph: A longitudinal image database of normal adult age-progression. In: 7th international conference on automatic face and gesture recognition (FGR06), IEEE, pp 341–345

  98. Guo G, Zhang C (2014) A study on cross-population age estimation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4257–4263

  99. He Z, Li X, Zhang Z, Wu F, Geng X, Zhang Y, Yang M-H, Zhuang Y (2017) Data-dependent label distribution learning for age estimation. IEEE Trans Image Process 26(8):3846–3858

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nuha Zamzami.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix A: Surrogate function construction

Appendix A: Surrogate function construction

The complete data log-likelihood of proposed model (Eq. 7) is given by:

$$\begin{aligned} {\mathcal {L}}=\sum _{i=1}^{N} \sum _{j=1}^{M} p_j&\Bigg [ \sum _{l=1}^{D} \log (\rho _{jl}) + \log (\pi _{jl}\dots [\pi _{jl}+(X_{il}-1)\theta _{jl}]) \nonumber \\&+\log \Big ((1-\pi _{jl})\dots [1-\pi _{jl}+(Y_{il+1}-1)\theta _{jl}]\Big ) \nonumber \\&-\log (1\dots [1+(Y_{il}-1)\theta _{jl}]) + \log (1-\rho _{jl}) \nonumber \\&+ \log (\mu _{l}\dots [\mu _{l}+(X_{il}-1)\lambda _{l}]) \nonumber \\&+\log \Big ((1-\mu _{l})\dots [1-\mu _{l}+(Y_{il+1}-1)\lambda _{l}]\Big ) \nonumber \\&-\log (1\dots [1+(Y_{il}-1)\lambda _{l}]) \Bigg ] \end{aligned}$$
(A1)

To construct an MM algorithm, we need to minorize terms such as \(\log (\pi _{jl}+ k)\), \(\log (\mu _{jl}+k)\). Noticing that the term \(\log (\pi _{jl}+ k)\) occurs in the log-likelihood if and only if \(X_{il} \ge k+1\) , and the term \(\log (\mu _{jl}+k)\) occurs in the log-likelihood if and only if \(Y_{il} \ge k+1\), we define the following associated counts for \(l=1,\dots , D\) :

$$\begin{aligned} r_{lk}=\sum _{i=1}^t 1_{\{X_{il} \ge k+1\}}, \quad s_{lk}=\sum _{i=1}^t 1_{\{Y_{il} \ge k+1\}} \end{aligned}$$

where the index k ranges from 0 to \(\max _i m_i-1\). Recalling that \(\upsilon _{ijl}=P(Z_i=j,\phi _{jl}=1 \mid X_i)\) and \(\nu _{ijl}=P(Z_i=j,\phi _{jl}=0 \mid X_i)\), thus, Eq.(A1) can be re-written as:

$$\begin{aligned} {\mathcal {L}}(\Theta )&=\sum _i \upsilon _{ijl} \Bigg [- \sum _l \sum _k s_{lk} \log (1+k\theta _{jl})\nonumber \\&+ \sum _l \sum _k r_{lk} \log (\pi _{jl}+k \theta _{jl})\nonumber \\&+\sum _l \sum _k s_{lk} \log \Big ((1-\pi _{jl})+k\theta _{jl}\Big ) \Bigg ] \nonumber \\&+\sum _i (\sum _j \nu _{ijl}) \Bigg [-\sum _l \sum _k s_{lk} \log (1+k\lambda _{l}) \nonumber \\&+ \sum _l \sum _k r_{lk} \log (\mu _{l}+k \lambda _{l})\nonumber \\&+\sum _l \sum _k s_{lk} \log \Big ((1-\mu _{l})+k\lambda _{l}\Big ) \Bigg ] \end{aligned}$$
(A2)

Then, we apply the basic minorization functions found by Zhou and Lange (see equations 2.3 and 2.4 in [35]) to the previous equation which yields the surrogate function as:

$$\begin{aligned} {\mathcal {G}}(\Theta )&= \sum _i \upsilon _{ijl} \Bigg [- \sum _l \sum _k s_{lk} \frac{k}{1+k\theta _{jl}^n} \theta _{jl} \nonumber \\&+ \sum _l \sum _k r_{lk} \Bigg \{\frac{\pi _{jl}^n}{\pi _{jl}^n+k\theta _{jl}^n} \log \pi _{jl} + \frac{k \theta _{jl}^n}{\pi _{jl}^n+k\theta _{jl}^n} \log \theta _{jl}\Bigg \} \nonumber \\&+\sum _l \sum _k s_{lk} \Bigg \{\frac{1-\pi _{jl}^n}{(1-\pi _{jl}^n)+k\theta _{jl}^n} \log (1-\pi _{jl})\nonumber \\& + \frac{k \theta _{jl}^n}{(1-\pi _{jl}^n)+k \theta _{jl}^n } \log k\theta _{jl}^n\Bigg \} \Bigg ] \nonumber \\&+\sum _i (\sum _j \nu _{ijl}) \Bigg [-\sum _l \sum _k s_{lk} \frac{k}{1+k\lambda _{l}^n} \lambda _{l} \nonumber \\&+ \sum _l \sum _k r_{lk} \Bigg \{\frac{\mu _{l}^n}{\mu _{l}^n+k\lambda _{l}^n} \log \mu _{l} + \frac{k \lambda _{l}^n}{\mu _{l}^n+k\lambda _{l}^n} \log \lambda _{l}\Bigg \} \nonumber \\&+\sum _l \sum _k s_{lk} \Bigg \{\frac{1-\mu _{l}^n}{(1-\mu _{l}^n)+k\lambda _{l}^n} \log (1-\mu _{l})\nonumber \\& + \frac{k \lambda _{l}^n}{(1-\mu _{l}^n)+k\lambda _{l}^n }\log k\lambda _{l}^n \Bigg \} \Bigg ] \end{aligned}$$
(A3)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zamzami, N., Bouguila, N. A novel minorization–maximization framework for simultaneous feature selection and clustering of high-dimensional count data. Pattern Anal Applic 26, 91–106 (2023). https://doi.org/10.1007/s10044-022-01094-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10044-022-01094-z

Keywords

Navigation