RE.PUBLIC@POLIMI pubblicazioni di ricerca del Politecnico di Milano

The growing number of reported software vulnerabilities underscores the need for efficient detection methods, especially for resource-limited organizations. While traditional techniques like fuzzing and symbolic execution are effective, they require significant manual effort. Recent advances in Large Language Models (LLMs) show promise for zero-shot learning, leveraging pre-training on diverse datasets to detect vulnerabilities without fine-tuning. This study evaluates quantized models (e.g., Mistral v0.3), code-specialized models (e.g., CodeQwen 1.5), and fine-tuned approaches like PDBERT. Zero-shot models perform poorly, with a precision below 0.46, and even PDBERT’s high metrics (precision 0.91, specificity 0.99) are undermined by overfitting. These findings emphasize the limitations of current AI solutions and the necessity for approaches tailored to the specific problem.

Guessing as a service: large language models are not yet ready for vulnerability detection

F. Panebianco;A. Isgrò;S. Longari;S. Zanero;M. Carminati

2025-01-01

Abstract

The growing number of reported software vulnerabilities underscores the need for efficient detection methods, especially for resource-limited organizations. While traditional techniques like fuzzing and symbolic execution are effective, they require significant manual effort. Recent advances in Large Language Models (LLMs) show promise for zero-shot learning, leveraging pre-training on diverse datasets to detect vulnerabilities without fine-tuning. This study evaluates quantized models (e.g., Mistral v0.3), code-specialized models (e.g., CodeQwen 1.5), and fine-tuned approaches like PDBERT. Zero-shot models perform poorly, with a precision below 0.46, and even PDBERT’s high metrics (precision 0.91, specificity 0.99) are undermined by overfitting. These findings emphasize the limitations of current AI solutions and the necessity for approaches tailored to the specific problem.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di pubblicazione
	
				2025
			
	Titolo del libro
	
				Guessing As A Service: Large Language Models Are Not Yet Ready For Vulnerability Detection
			
	Titolo della collana
	
				CEUR WORKSHOP PROCEEDINGS
			
	Parole chiave
	
				large language models, software security, vulnerability detection, artificial intelligence
			
	Appare nelle tipologie:
	
				04.1 Contributo in Atti di convegno

File in questo prodotto:

File	Dimensione	Formato
_ITASEC__Survey_LLMs_for_Vulnerability_Detection.pdf accesso aperto Descrizione: The paper highlights the challenges of AI-driven vulnerability detection, showing that zero-shot models perform poorly while fine-tuned models like PDBERT suffer from overfitting, emphasizing the need for specialized approaches. : Pre-Print (o Pre-Refereeing) Dimensione 496.12 kB Formato Adobe PDF Visualizza/Apri	496.12 kB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11311/1284205

Citazioni

ND

0

ND

social impact