The growing number of reported software vulnerabilities underscores the need for efficient detection methods, especially for resource-limited organizations. While traditional techniques like fuzzing and symbolic execution are effective, they require significant manual effort. Recent advances in Large Language Models (LLMs) show promise for zero-shot learning, leveraging pre-training on diverse datasets to detect vulnerabilities without fine-tuning. This study evaluates quantized models (e.g., Mistral v0.3), code-specialized models (e.g., CodeQwen 1.5), and fine-tuned approaches like PDBERT. Zero-shot models perform poorly, with a precision below 0.46, and even PDBERT’s high metrics (precision 0.91, specificity 0.99) are undermined by overfitting. These findings emphasize the limitations of current AI solutions and the necessity for approaches tailored to the specific problem.

Guessing as a service: large language models are not yet ready for vulnerability detection

F. Panebianco;S. Longari;S. Zanero;M. Carminati
2025-01-01

Abstract

The growing number of reported software vulnerabilities underscores the need for efficient detection methods, especially for resource-limited organizations. While traditional techniques like fuzzing and symbolic execution are effective, they require significant manual effort. Recent advances in Large Language Models (LLMs) show promise for zero-shot learning, leveraging pre-training on diverse datasets to detect vulnerabilities without fine-tuning. This study evaluates quantized models (e.g., Mistral v0.3), code-specialized models (e.g., CodeQwen 1.5), and fine-tuned approaches like PDBERT. Zero-shot models perform poorly, with a precision below 0.46, and even PDBERT’s high metrics (precision 0.91, specificity 0.99) are undermined by overfitting. These findings emphasize the limitations of current AI solutions and the necessity for approaches tailored to the specific problem.
2025
Guessing As A Service: Large Language Models Are Not Yet Ready For Vulnerability Detection
large language models, software security, vulnerability detection, artificial intelligence
File in questo prodotto:
File Dimensione Formato  
_ITASEC__Survey_LLMs_for_Vulnerability_Detection.pdf

accesso aperto

Descrizione: The paper highlights the challenges of AI-driven vulnerability detection, showing that zero-shot models perform poorly while fine-tuned models like PDBERT suffer from overfitting, emphasizing the need for specialized approaches.
: Pre-Print (o Pre-Refereeing)
Dimensione 496.12 kB
Formato Adobe PDF
496.12 kB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11311/1284205
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? ND
social impact