The high-dimensional sequential data available across various industrial scenarios may be contaminated with both attribute and label noise, hindering the establishment of accurate deep learning-based prediction models. The existing noise detection methods can only detect one type of noise. Conversely, in this article, a novel noisy samples detection method is proposed to detect both types of noise simultaneously through generative learning. An enhanced variational recurrent prediction model (EVRPM) is proposed to model the log-likelihood of samples, which incorporates a label predictor and an auxiliary task into the variational recurrent neural network. Moreover, an iterative detection process is adopted to refine EVRPM training and enhance noisy sample detection, which is particularly beneficial for low-quality datasets. A prediction model with higher prediction accuracy can be obtained using the refined dataset. The effectiveness and superiority of the proposed method are verified using both public and real experimental datasets.

A novel method of detection of noisy samples in high-dimensional sequential data considering both attribute and label noise

Zio, Enrico;
2025-01-01

Abstract

The high-dimensional sequential data available across various industrial scenarios may be contaminated with both attribute and label noise, hindering the establishment of accurate deep learning-based prediction models. The existing noise detection methods can only detect one type of noise. Conversely, in this article, a novel noisy samples detection method is proposed to detect both types of noise simultaneously through generative learning. An enhanced variational recurrent prediction model (EVRPM) is proposed to model the log-likelihood of samples, which incorporates a label predictor and an auxiliary task into the variational recurrent neural network. Moreover, an iterative detection process is adopted to refine EVRPM training and enhance noisy sample detection, which is particularly beneficial for low-quality datasets. A prediction model with higher prediction accuracy can be obtained using the refined dataset. The effectiveness and superiority of the proposed method are verified using both public and real experimental datasets.
2025
Data likelihood
High-dimensional sequential data
Noise detection
Variational recurrent neural network
File in questo prodotto:
File Dimensione Formato  
1-s2.0-S0360835225002104-main.pdf

accesso aperto

Dimensione 1.72 MB
Formato Adobe PDF
1.72 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11311/1305165
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? 0
social impact