In this paper, we present our preliminary findings on the parallelisation of pre-trained Large Language Models to enhance their performance by exploiting alternative pathways within the stack of features learned by Transformer neural networks. The objective of this study is to investigate whether the independence and compositional flexibility of these pathways can be leveraged to better extract the information embedded in the models. By reordering and parallelising the feature stack, we aim to virtually extend the depth of the network, thereby providing more space for reasoning. Our experiments involved evaluating on the H6 benchmark the parallelised pre-trained models and the fine-tuned models, adapted for parallelisation using Low-Rank Adaptation. The results we collected highlighted that even without fine-tuning, parallelising a model can help improve certain tasks and that fine-tuning can further enhance these gains. These promising results suggest that the parallelisation of pre-trained Large Language Models is a viable approach to improving model performance and that further exploration and refinement of this technique could yield significant benefits.

Towards Parallelising Pre-Trained Transformers

Vincenzo Scotti;Mark James Carman
2024-01-01

Abstract

In this paper, we present our preliminary findings on the parallelisation of pre-trained Large Language Models to enhance their performance by exploiting alternative pathways within the stack of features learned by Transformer neural networks. The objective of this study is to investigate whether the independence and compositional flexibility of these pathways can be leveraged to better extract the information embedded in the models. By reordering and parallelising the feature stack, we aim to virtually extend the depth of the network, thereby providing more space for reasoning. Our experiments involved evaluating on the H6 benchmark the parallelised pre-trained models and the fine-tuned models, adapted for parallelisation using Low-Rank Adaptation. The results we collected highlighted that even without fine-tuning, parallelising a model can help improve certain tasks and that fine-tuning can further enhance these gains. These promising results suggest that the parallelisation of pre-trained Large Language Models is a viable approach to improving model performance and that further exploration and refinement of this technique could yield significant benefits.
2024
ECML PKDD 2024 Workshops
LLM, Transformer, Parallel Transformer
File in questo prodotto:
File Dimensione Formato  
Alternative_Architectures___DEARING___ECML_PKDD_2024.pdf

accesso aperto

: Post-Print (DRAFT o Author’s Accepted Manuscript-AAM)
Dimensione 868.96 kB
Formato Adobe PDF
868.96 kB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11311/1272143
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact