In this paper we investigate the effects of self-training Large Language Models (LLMs) to enhance their reasoning capabilities. We introduce SAL, an iterative training framework where the LLMs engage in textual games with domain-independent prompts. Through self-appraisal, the models autonomously select game instances identified as beneficial to their learning trajectory. The chosen games are then used to fine-tune the LLMs, iteratively improving the model's performance. In our work, we explore different textual games and training approaches. Moreover, we take care to evaluate the language understanding capabilities of the LLMs both before and after the training, to keep track of the effect of the self-training process on the reasoning capabilities also outside of the games environment. Our results demonstrate how the LLM's ability in self-appraisal can allow to improve, even significantly in some games, it's own performance with little impact on its linguistic capabilities. Our findings underscore the potential of leveraging self-training techniques with LLMs to improve their capabilities to solve problems, in particular we show how LLMs self-appraisal capabilities can be exploited to identify relevant experiences useful for fine-tuning.
Exploring Language Model Self-Improvement through Domain-Independent Gameplay in Text Environments
Forasassi, Matteo;Brunello, Nicolò;Scotti, Vincenzo;Carman, Mark James
2025-01-01
Abstract
In this paper we investigate the effects of self-training Large Language Models (LLMs) to enhance their reasoning capabilities. We introduce SAL, an iterative training framework where the LLMs engage in textual games with domain-independent prompts. Through self-appraisal, the models autonomously select game instances identified as beneficial to their learning trajectory. The chosen games are then used to fine-tune the LLMs, iteratively improving the model's performance. In our work, we explore different textual games and training approaches. Moreover, we take care to evaluate the language understanding capabilities of the LLMs both before and after the training, to keep track of the effect of the self-training process on the reasoning capabilities also outside of the games environment. Our results demonstrate how the LLM's ability in self-appraisal can allow to improve, even significantly in some games, it's own performance with little impact on its linguistic capabilities. Our findings underscore the potential of leveraging self-training techniques with LLMs to improve their capabilities to solve problems, in particular we show how LLMs self-appraisal capabilities can be exploited to identify relevant experiences useful for fine-tuning.| File | Dimensione | Formato | |
|---|---|---|---|
|
1-s2.0-S1877050925021799-main.pdf
accesso aperto
Descrizione: Article
:
Publisher’s version
Dimensione
976.62 kB
Formato
Adobe PDF
|
976.62 kB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


