An enhanced technique for hierarchical agglomerative clustering is presented. Classical clusterings suffer from non-uniqueness, resulting from the adopted scaling of data and from the arbitrary choice of the function to measure the proximity between elements. Moreover, most classical methods cannot account for the effect of measurement uncertainty on initial data, when present. To overcome these limitations, the definition of a weighted, asymmetric function is introduced to quantify the proximity between any two elements. The data weighting depends dynamically on the degree of advancement of the clustering procedure. The novel proximity measure is derived from a geometric approach to the clustering, and it allows to both disengage the result from the data scaling, and to indicate the robustness of a clustering against the measurement uncertainty of initial data. The method applies to both flat and hierarchical clustering, maintaining the computational cost of the classical methods.

A novel scale-invariant, dynamic method for hierarchical clustering of data affected by measurement uncertainty

Vignati, Federica;Fustinoni, Damiano;Niro, Alfonso
2018-01-01

Abstract

An enhanced technique for hierarchical agglomerative clustering is presented. Classical clusterings suffer from non-uniqueness, resulting from the adopted scaling of data and from the arbitrary choice of the function to measure the proximity between elements. Moreover, most classical methods cannot account for the effect of measurement uncertainty on initial data, when present. To overcome these limitations, the definition of a weighted, asymmetric function is introduced to quantify the proximity between any two elements. The data weighting depends dynamically on the degree of advancement of the clustering procedure. The novel proximity measure is derived from a geometric approach to the clustering, and it allows to both disengage the result from the data scaling, and to indicate the robustness of a clustering against the measurement uncertainty of initial data. The method applies to both flat and hierarchical clustering, maintaining the computational cost of the classical methods.
2018
Computational cost; Hierarchical clustering; Non-uniqueness; Proximity measure; Uncertainty; Computational Mathematics; Applied Mathematics
File in questo prodotto:
File Dimensione Formato  
paper_stat v.4 2017.11.20.pdf

Accesso riservato

: Publisher’s version
Dimensione 397.58 kB
Formato Adobe PDF
397.58 kB Adobe PDF   Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11311/1085451
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 10
  • ???jsp.display-item.citation.isi??? 5
social impact