We consider the k-Hyperplane Clustering problem where, given a set of m points in R^n, we have to partition the set into k subsets (clusters) and determine a hyperplane for each of them, so as to minimize the sum of the square of the Euclidean distance between each point and the hyperplane of the corresponding cluster. We give a nonconvex mixed-integer quadratically constrained quadratic programming formulation for the problem. Since even very small-size instances are challenging for state-of-the-art spatial branch-and-bound solvers like Couenne, we propose a heuristic in which many critical points are reassigned at each iteration. Such points, which are likely to be ill-assigned in the current solution, are identified using a distance-based criterion and their number is progressively decreased to zero. Our algorithm outperforms the state-of-the-art one proposed by Bradley and Mangasarian on a set of real-world and structured randomly generated instances. For the largest group of instances, we obtain an average improvement in the solution quality of 54%.

A distance-based point-reassignment heuristic for the k-hyperplane clustering problem

AMALDI, EDOARDO;CONIGLIO, STEFANO
2013-01-01

Abstract

We consider the k-Hyperplane Clustering problem where, given a set of m points in R^n, we have to partition the set into k subsets (clusters) and determine a hyperplane for each of them, so as to minimize the sum of the square of the Euclidean distance between each point and the hyperplane of the corresponding cluster. We give a nonconvex mixed-integer quadratically constrained quadratic programming formulation for the problem. Since even very small-size instances are challenging for state-of-the-art spatial branch-and-bound solvers like Couenne, we propose a heuristic in which many critical points are reassigned at each iteration. Such points, which are likely to be ill-assigned in the current solution, are identified using a distance-based criterion and their number is progressively decreased to zero. Our algorithm outperforms the state-of-the-art one proposed by Bradley and Mangasarian on a set of real-world and structured randomly generated instances. For the largest group of instances, we obtain an average improvement in the solution quality of 54%.
2013
Data mining; Heuristics; Nonlinear programming
File in questo prodotto:
File Dimensione Formato  
k-Hyperplane-Clustering-EJOR-03.pdf

Accesso riservato

: Post-Print (DRAFT o Author’s Accepted Manuscript-AAM)
Dimensione 535.14 kB
Formato Adobe PDF
535.14 kB Adobe PDF   Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11311/679171
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 5
  • ???jsp.display-item.citation.isi??? 5
social impact