Clustering is a popular and useful unsupervised learning method with various algorithms for applying to many engineering problems. However, some practical and technical issues such as severe variability in unlabeled data, necessity of determining the number of clusters, and utili- zation of an additional technique for threshold estimation in the problem of anomaly detection may limit applications of clustering algorithms to structural health monitoring (SHM). To address these challenges, this article proposes a novel probabilistic data self-clustering method for damage detection of large-scale civil structures by getting an idea from semi-parametric extreme value theory. The crux of this method lies in considering each unlabeled data as a local cluster. Accordingly, it is possible to deal with the major technical challenge of determining the number of clusters regarding most of the state-of-the-art clustering techniques. Using an unsupervised nearest neighbor search, each local cluster contains extreme values as negated minimum dis- tances concerning representative data and ignores irrelevant data sensitive to outliers and any source of variability. With this strategy, the proposed method can address the major challenge of environmental and/or operational variability in SHM. A new anomaly score is then proposed to not only discriminate an abnormal (damaged) condition from a normal (undamaged) one, but also use it for estimating a decision threshold. This score is a high quantile of the selected dis- tances within each local cluster estimated by the generalized Hill estimator under a semi- parametric peak-over-threshold technique. The key novel contribution of the proposed anom- aly score against most of the existing scores is its great ability to simultaneously determine a damage index and a decision threshold within an integrated framework. Long- and short-term dynamic features (modal frequencies) of a concrete box-girder and a cable-stayed bridge are applied to verify the proposed method along with several comparisons. Results demonstrate the effectiveness and superiority of this method over some state-of-the-art techniques.

Probabilistic data self-clustering based on semi-parametric extreme value theory for structural health monitoring

Entezami, Alireza;De Michele, Carlo
2023-01-01

Abstract

Clustering is a popular and useful unsupervised learning method with various algorithms for applying to many engineering problems. However, some practical and technical issues such as severe variability in unlabeled data, necessity of determining the number of clusters, and utili- zation of an additional technique for threshold estimation in the problem of anomaly detection may limit applications of clustering algorithms to structural health monitoring (SHM). To address these challenges, this article proposes a novel probabilistic data self-clustering method for damage detection of large-scale civil structures by getting an idea from semi-parametric extreme value theory. The crux of this method lies in considering each unlabeled data as a local cluster. Accordingly, it is possible to deal with the major technical challenge of determining the number of clusters regarding most of the state-of-the-art clustering techniques. Using an unsupervised nearest neighbor search, each local cluster contains extreme values as negated minimum dis- tances concerning representative data and ignores irrelevant data sensitive to outliers and any source of variability. With this strategy, the proposed method can address the major challenge of environmental and/or operational variability in SHM. A new anomaly score is then proposed to not only discriminate an abnormal (damaged) condition from a normal (undamaged) one, but also use it for estimating a decision threshold. This score is a high quantile of the selected dis- tances within each local cluster estimated by the generalized Hill estimator under a semi- parametric peak-over-threshold technique. The key novel contribution of the proposed anom- aly score against most of the existing scores is its great ability to simultaneously determine a damage index and a decision threshold within an integrated framework. Long- and short-term dynamic features (modal frequencies) of a concrete box-girder and a cable-stayed bridge are applied to verify the proposed method along with several comparisons. Results demonstrate the effectiveness and superiority of this method over some state-of-the-art techniques.
2023
Structural health monitoring; Unsupervised learning Clustering; Anomaly detection; Semi-parametric extreme value theory; Environmental variability; Threshold
File in questo prodotto:
File Dimensione Formato  
MSSP_2023.pdf

Accesso riservato

Descrizione: Probabilistic data self-clustering based on semi-parametric extreme value theory for structural health monitoring
: Publisher’s version
Dimensione 5.26 MB
Formato Adobe PDF
5.26 MB Adobe PDF   Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11311/1232958
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 38
  • ???jsp.display-item.citation.isi??? 16
social impact