Machinery label data is necessary for training intelligent fault diagnosis models. However, unlabeled and abnormal data are commonly seen in these data, resulting in the reduction of data quality. As a result, these low-quality data may lead to inaccurate diagnosis models. To address this issue, a kernel cluster local outlier factor (CLOF) method is proposed for automated labeling and abnormal data detection. The suggested approach can establish the relationship among different samples of label data based on a parameter-free method, that is, the natural neighbor spectrum. Through this relationship, different clusters are searched. Then, CLOF is calculated to evaluate the abnormal degree of different clusters, and clusters whose CLOF is larger than the predetermined threshold value are detected as abnormal data. The natural neighbor spectrum is reconstructed after cleaning abnormalities. Finally, fault types of the data can be labeled automatically based on the relationship between unlabeled and labeled data through the reconstructed spectrum. The proposed method is validated through different experimental data collected from a gear test bench, a real wind turbine, and a centrifugal pump, respectively. The results indicate that the proposed approach is effective in detecting abnormal data with different condition types and labeling data accurately and automatically.
Automated labeling and abnormal detection based on kernel cluster local outlier factor for machinery health monitoring
Karimi, Hamid Reza
2026-01-01
Abstract
Machinery label data is necessary for training intelligent fault diagnosis models. However, unlabeled and abnormal data are commonly seen in these data, resulting in the reduction of data quality. As a result, these low-quality data may lead to inaccurate diagnosis models. To address this issue, a kernel cluster local outlier factor (CLOF) method is proposed for automated labeling and abnormal data detection. The suggested approach can establish the relationship among different samples of label data based on a parameter-free method, that is, the natural neighbor spectrum. Through this relationship, different clusters are searched. Then, CLOF is calculated to evaluate the abnormal degree of different clusters, and clusters whose CLOF is larger than the predetermined threshold value are detected as abnormal data. The natural neighbor spectrum is reconstructed after cleaning abnormalities. Finally, fault types of the data can be labeled automatically based on the relationship between unlabeled and labeled data through the reconstructed spectrum. The proposed method is validated through different experimental data collected from a gear test bench, a real wind turbine, and a centrifugal pump, respectively. The results indicate that the proposed approach is effective in detecting abnormal data with different condition types and labeling data accurately and automatically.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


