In this paper, we present our preliminary findings on the parallelisation of pre-trained Large Language Models to enhance their performance by exploiting alternative pathways within the stack of features learned by Transformer neural networks. The objective of this study is to investigate whether the independence and compositional flexibility of these pathways can be leveraged to better extract the information embedded in the models. By reordering and parallelising the feature stack, we aim to virtually extend the depth of the network, thereby providing more space for reasoning. Our experiments involved evaluating on the H6 benchmark the parallelised pre-trained models and the fine-tuned models, adapted for parallelisation using Low-Rank Adaptation. The results we collected highlighted that even without fine-tuning, parallelising a model can help improve certain tasks and that fine-tuning can further enhance these gains. These promising results suggest that the parallelisation of pre-trained Large Language Models is a viable approach to improving model performance and that further exploration and refinement of this technique could yield significant benefits.
Towards Parallelising Pre-Trained Transformers
Vincenzo Scotti;Mark James Carman
2024-01-01
Abstract
In this paper, we present our preliminary findings on the parallelisation of pre-trained Large Language Models to enhance their performance by exploiting alternative pathways within the stack of features learned by Transformer neural networks. The objective of this study is to investigate whether the independence and compositional flexibility of these pathways can be leveraged to better extract the information embedded in the models. By reordering and parallelising the feature stack, we aim to virtually extend the depth of the network, thereby providing more space for reasoning. Our experiments involved evaluating on the H6 benchmark the parallelised pre-trained models and the fine-tuned models, adapted for parallelisation using Low-Rank Adaptation. The results we collected highlighted that even without fine-tuning, parallelising a model can help improve certain tasks and that fine-tuning can further enhance these gains. These promising results suggest that the parallelisation of pre-trained Large Language Models is a viable approach to improving model performance and that further exploration and refinement of this technique could yield significant benefits.File | Dimensione | Formato | |
---|---|---|---|
Alternative_Architectures___DEARING___ECML_PKDD_2024.pdf
accesso aperto
:
Post-Print (DRAFT o Author’s Accepted Manuscript-AAM)
Dimensione
868.96 kB
Formato
Adobe PDF
|
868.96 kB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.