Active learning (AL) is a machine learning technique that selects the most informative samples from a large pool of unlabeled data for annotation, thus lowering the labeling cost and advancing the learning performance. Nevertheless, conventional AL approaches often neglect the complex issue of class imbalance, where specific classes are either overrepresented or underrepresented in the dataset distribution. This inequality can introduce bias in sampling and compromise the overall generalization ability of the classifier. This paper aims to present a novel threshold-based strategy for AL designed to navigate the challenges of class imbalance. This strategy dynamically adjusts to the degree of class imbalance, guaranteeing the selection of samples that are both informative and well-representative of minority classes. The approach adopted in this study is rigorously tested on a variety of imbalanced datasets and benchmarked against state-of-the-art AL methods. According to the findings, AL-Rank and AL Hybrid overperform traditional AL techniques. AL-Rank achieved a CAVC of 31.53 with Random Forest and 31.71 with Logistic Regression, outperforming traditional methods. Additionally, AL Hybrid surpassed AL-Rank, with a CAVC of 32.02 using Logistic Regression. Empirical results show that the proposed method significantly improves classifier performance, mainly in scenarios characterized by imbalanced class labels.
Using Active Learning to Improve Learning Performance
Clarissa Amico;Roberto Cigolini
2024-01-01
Abstract
Active learning (AL) is a machine learning technique that selects the most informative samples from a large pool of unlabeled data for annotation, thus lowering the labeling cost and advancing the learning performance. Nevertheless, conventional AL approaches often neglect the complex issue of class imbalance, where specific classes are either overrepresented or underrepresented in the dataset distribution. This inequality can introduce bias in sampling and compromise the overall generalization ability of the classifier. This paper aims to present a novel threshold-based strategy for AL designed to navigate the challenges of class imbalance. This strategy dynamically adjusts to the degree of class imbalance, guaranteeing the selection of samples that are both informative and well-representative of minority classes. The approach adopted in this study is rigorously tested on a variety of imbalanced datasets and benchmarked against state-of-the-art AL methods. According to the findings, AL-Rank and AL Hybrid overperform traditional AL techniques. AL-Rank achieved a CAVC of 31.53 with Random Forest and 31.71 with Logistic Regression, outperforming traditional methods. Additionally, AL Hybrid surpassed AL-Rank, with a CAVC of 32.02 using Logistic Regression. Empirical results show that the proposed method significantly improves classifier performance, mainly in scenarios characterized by imbalanced class labels.File | Dimensione | Formato | |
---|---|---|---|
Using_Active_Learning_to_Improve_Learning_Performance.pdf
Accesso riservato
Descrizione: articolo
Dimensione
359.43 kB
Formato
Adobe PDF
|
359.43 kB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.