RE.PUBLIC@POLIMI pubblicazioni di ricerca del Politecnico di Milano

Active learning (AL) is a machine learning technique that selects the most informative samples from a large pool of unlabeled data for annotation, thus lowering the labeling cost and advancing the learning performance. Nevertheless, conventional AL approaches often neglect the complex issue of class imbalance, where specific classes are either overrepresented or underrepresented in the dataset distribution. This inequality can introduce bias in sampling and compromise the overall generalization ability of the classifier. This paper aims to present a novel threshold-based strategy for AL designed to navigate the challenges of class imbalance. This strategy dynamically adjusts to the degree of class imbalance, guaranteeing the selection of samples that are both informative and well-representative of minority classes. The approach adopted in this study is rigorously tested on a variety of imbalanced datasets and benchmarked against state-of-the-art AL methods. According to the findings, AL-Rank and AL Hybrid overperform traditional AL techniques. AL-Rank achieved a CAVC of 31.53 with Random Forest and 31.71 with Logistic Regression, outperforming traditional methods. Additionally, AL Hybrid surpassed AL-Rank, with a CAVC of 32.02 using Logistic Regression. Empirical results show that the proposed method significantly improves classifier performance, mainly in scenarios characterized by imbalanced class labels.

Using Active Learning to Improve Learning Performance

Clarissa Amico;Roberto Cigolini

2024-01-01

Abstract

Active learning (AL) is a machine learning technique that selects the most informative samples from a large pool of unlabeled data for annotation, thus lowering the labeling cost and advancing the learning performance. Nevertheless, conventional AL approaches often neglect the complex issue of class imbalance, where specific classes are either overrepresented or underrepresented in the dataset distribution. This inequality can introduce bias in sampling and compromise the overall generalization ability of the classifier. This paper aims to present a novel threshold-based strategy for AL designed to navigate the challenges of class imbalance. This strategy dynamically adjusts to the degree of class imbalance, guaranteeing the selection of samples that are both informative and well-representative of minority classes. The approach adopted in this study is rigorously tested on a variety of imbalanced datasets and benchmarked against state-of-the-art AL methods. According to the findings, AL-Rank and AL Hybrid overperform traditional AL techniques. AL-Rank achieved a CAVC of 31.53 with Random Forest and 31.71 with Logistic Regression, outperforming traditional methods. Additionally, AL Hybrid surpassed AL-Rank, with a CAVC of 32.02 using Logistic Regression. Empirical results show that the proposed method significantly improves classifier performance, mainly in scenarios characterized by imbalanced class labels.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di pubblicazione
	
				2024
			
	Titolo del libro
	
				8th IEEE International Forum on Research and Technologies for Society and Industry Innovation, RTSI 2024
			
	ISBN (International Standard Book Number)
	
				979-8-3503-6213-8
			
	Appare nelle tipologie:
	
				04.1 Contributo in Atti di convegno

File in questo prodotto:

File	Dimensione	Formato
Using_Active_Learning_to_Improve_Learning_Performance.pdf Accesso riservato Descrizione: articolo Dimensione 359.43 kB Formato Adobe PDF Visualizza/Apri	359.43 kB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11311/1282366

Citazioni

ND

1

0

social impact