RE.PUBLIC@POLIMI pubblicazioni di ricerca del Politecnico di Milano

In this paper we investigate the effects of self-training Large Language Models (LLMs) to enhance their reasoning capabilities. We introduce SAL, an iterative training framework where the LLMs engage in textual games with domain-independent prompts. Through self-appraisal, the models autonomously select game instances identified as beneficial to their learning trajectory. The chosen games are then used to fine-tune the LLMs, iteratively improving the model's performance. In our work, we explore different textual games and training approaches. Moreover, we take care to evaluate the language understanding capabilities of the LLMs both before and after the training, to keep track of the effect of the self-training process on the reasoning capabilities also outside of the games environment. Our results demonstrate how the LLM's ability in self-appraisal can allow to improve, even significantly in some games, it's own performance with little impact on its linguistic capabilities. Our findings underscore the potential of leveraging self-training techniques with LLMs to improve their capabilities to solve problems, in particular we show how LLMs self-appraisal capabilities can be exploited to identify relevant experiences useful for fine-tuning.

Exploring Language Model Self-Improvement through Domain-Independent Gameplay in Text Environments

Forasassi, Matteo;Maberino, Francesco;Brunello, Nicolò;Scotti, Vincenzo;Carman, Mark James

2025-01-01

Abstract

In this paper we investigate the effects of self-training Large Language Models (LLMs) to enhance their reasoning capabilities. We introduce SAL, an iterative training framework where the LLMs engage in textual games with domain-independent prompts. Through self-appraisal, the models autonomously select game instances identified as beneficial to their learning trajectory. The chosen games are then used to fine-tune the LLMs, iteratively improving the model's performance. In our work, we explore different textual games and training approaches. Moreover, we take care to evaluate the language understanding capabilities of the LLMs both before and after the training, to keep track of the effect of the self-training process on the reasoning capabilities also outside of the games environment. Our results demonstrate how the LLM's ability in self-appraisal can allow to improve, even significantly in some games, it's own performance with little impact on its linguistic capabilities. Our findings underscore the potential of leveraging self-training techniques with LLMs to improve their capabilities to solve problems, in particular we show how LLMs self-appraisal capabilities can be exploited to identify relevant experiences useful for fine-tuning.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di pubblicazione
	
				2025
			
	Titolo del libro
	
				Procedia Computer Science
			
	Parole chiave
	
				LLM
NLP
Self-appraisal
Self-training
TextWorld
			
	Appare nelle tipologie:
	
				04.1 Contributo in Atti di convegno

File in questo prodotto:

File	Dimensione	Formato
1-s2.0-S1877050925021799-main.pdf accesso aperto Descrizione: Article : Publisher’s version Dimensione 976.62 kB Formato Adobe PDF Visualizza/Apri	976.62 kB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11311/1295920

Citazioni

ND

0

ND

social impact