In this paper we investigate the effects of self-training Large Language Models (LLMs) to enhance their reasoning capabilities. We introduce SAL, an iterative training framework where the LLMs engage in textual games with domain-independent prompts. Through self-appraisal, the models autonomously select game instances identified as beneficial to their learning trajectory. The chosen games are then used to fine-tune the LLMs, iteratively improving the model's performance. In our work, we explore different textual games and training approaches. Moreover, we take care to evaluate the language understanding capabilities of the LLMs both before and after the training, to keep track of the effect of the self-training process on the reasoning capabilities also outside of the games environment. Our results demonstrate how the LLM's ability in self-appraisal can allow to improve, even significantly in some games, it's own performance with little impact on its linguistic capabilities. Our findings underscore the potential of leveraging self-training techniques with LLMs to improve their capabilities to solve problems, in particular we show how LLMs self-appraisal capabilities can be exploited to identify relevant experiences useful for fine-tuning.

Exploring Language Model Self-Improvement through Domain-Independent Gameplay in Text Environments

Forasassi, Matteo;Brunello, Nicolò;Scotti, Vincenzo;Carman, Mark James
2025-01-01

Abstract

In this paper we investigate the effects of self-training Large Language Models (LLMs) to enhance their reasoning capabilities. We introduce SAL, an iterative training framework where the LLMs engage in textual games with domain-independent prompts. Through self-appraisal, the models autonomously select game instances identified as beneficial to their learning trajectory. The chosen games are then used to fine-tune the LLMs, iteratively improving the model's performance. In our work, we explore different textual games and training approaches. Moreover, we take care to evaluate the language understanding capabilities of the LLMs both before and after the training, to keep track of the effect of the self-training process on the reasoning capabilities also outside of the games environment. Our results demonstrate how the LLM's ability in self-appraisal can allow to improve, even significantly in some games, it's own performance with little impact on its linguistic capabilities. Our findings underscore the potential of leveraging self-training techniques with LLMs to improve their capabilities to solve problems, in particular we show how LLMs self-appraisal capabilities can be exploited to identify relevant experiences useful for fine-tuning.
2025
Procedia Computer Science
LLM
NLP
Self-appraisal
Self-training
TextWorld
File in questo prodotto:
File Dimensione Formato  
1-s2.0-S1877050925021799-main.pdf

accesso aperto

Descrizione: Article
: Publisher’s version
Dimensione 976.62 kB
Formato Adobe PDF
976.62 kB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11311/1295920
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? ND
social impact