Active learning (AL) is a machine learning technique that selects the most informative samples from a large pool of unlabeled data for annotation, thus lowering the labeling cost and advancing the learning performance. Nevertheless, conventional AL approaches often neglect the complex issue of class imbalance, where specific classes are either overrepresented or underrepresented in the dataset distribution. This inequality can introduce bias in sampling and compromise the overall generalization ability of the classifier. This paper aims to present a novel threshold-based strategy for AL designed to navigate the challenges of class imbalance. This strategy dynamically adjusts to the degree of class imbalance, guaranteeing the selection of samples that are both informative and well-representative of minority classes. The approach adopted in this study is rigorously tested on a variety of imbalanced datasets and benchmarked against state-of-the-art AL methods. According to the findings, AL-Rank and AL Hybrid overperform traditional AL techniques. AL-Rank achieved a CAVC of 31.53 with Random Forest and 31.71 with Logistic Regression, outperforming traditional methods. Additionally, AL Hybrid surpassed AL-Rank, with a CAVC of 32.02 using Logistic Regression. Empirical results show that the proposed method significantly improves classifier performance, mainly in scenarios characterized by imbalanced class labels.

Using Active Learning to Improve Learning Performance

Clarissa Amico;Roberto Cigolini
2024-01-01

Abstract

Active learning (AL) is a machine learning technique that selects the most informative samples from a large pool of unlabeled data for annotation, thus lowering the labeling cost and advancing the learning performance. Nevertheless, conventional AL approaches often neglect the complex issue of class imbalance, where specific classes are either overrepresented or underrepresented in the dataset distribution. This inequality can introduce bias in sampling and compromise the overall generalization ability of the classifier. This paper aims to present a novel threshold-based strategy for AL designed to navigate the challenges of class imbalance. This strategy dynamically adjusts to the degree of class imbalance, guaranteeing the selection of samples that are both informative and well-representative of minority classes. The approach adopted in this study is rigorously tested on a variety of imbalanced datasets and benchmarked against state-of-the-art AL methods. According to the findings, AL-Rank and AL Hybrid overperform traditional AL techniques. AL-Rank achieved a CAVC of 31.53 with Random Forest and 31.71 with Logistic Regression, outperforming traditional methods. Additionally, AL Hybrid surpassed AL-Rank, with a CAVC of 32.02 using Logistic Regression. Empirical results show that the proposed method significantly improves classifier performance, mainly in scenarios characterized by imbalanced class labels.
2024
8th IEEE International Forum on Research and Technologies for Society and Industry Innovation, RTSI 2024
979-8-3503-6213-8
File in questo prodotto:
File Dimensione Formato  
Using_Active_Learning_to_Improve_Learning_Performance.pdf

Accesso riservato

Descrizione: articolo
Dimensione 359.43 kB
Formato Adobe PDF
359.43 kB Adobe PDF   Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11311/1282366
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact