Gaussian mixture models (GMM) are the most-widely employed approach to perform model-based clustering of continuous features. Grievously, with the increasing availability of high-dimensional datasets, their direct applicability is put at stake: GMMs suffer from the curse of dimensionality issue, as the number of parameters grows quadratically with the number of variables. To this extent, a methodological link between Gaussian mixtures and Gaussian graphical models has recently been established in order to provide a framework for performing penalized model-based clustering in presence of large precision matrices. Notwithstanding, current methodologies do not account for the fact that groups may be under or over-connected, thus implicitly assuming similar levels of sparsity across clusters. We overcome this limitation by defining data-driven and component specific penalty factors, automatically accounting for different degrees of connections within groups. A real data experiment on handwritten digits recognition showcases the validity of our proposal.

Penalized Model-Based Clustering with Group-Dependent Shrinkage Estimation

Cappozzo, A;
2023-01-01

Abstract

Gaussian mixture models (GMM) are the most-widely employed approach to perform model-based clustering of continuous features. Grievously, with the increasing availability of high-dimensional datasets, their direct applicability is put at stake: GMMs suffer from the curse of dimensionality issue, as the number of parameters grows quadratically with the number of variables. To this extent, a methodological link between Gaussian mixtures and Gaussian graphical models has recently been established in order to provide a framework for performing penalized model-based clustering in presence of large precision matrices. Notwithstanding, current methodologies do not account for the fact that groups may be under or over-connected, thus implicitly assuming similar levels of sparsity across clusters. We overcome this limitation by defining data-driven and component specific penalty factors, automatically accounting for different degrees of connections within groups. A real data experiment on handwritten digits recognition showcases the validity of our proposal.
2023
Building Bridges between Soft and Statistical Methodologies for Data Science
978-3-031-15508-6
978-3-031-15509-3
File in questo prodotto:
File Dimensione Formato  
cappozzo_casa_fop_SMPS2022.pdf

accesso aperto

: Post-Print (DRAFT o Author’s Accepted Manuscript-AAM)
Dimensione 225.55 kB
Formato Adobe PDF
225.55 kB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11311/1233940
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? 0
social impact