RE.PUBLIC@POLIMI pubblicazioni di ricerca del Politecnico di Milano

In this paper, we present our preliminary findings on the parallelisation of pre-trained Large Language Models to enhance their performance by exploiting alternative pathways within the stack of features learned by Transformer neural networks. The objective of this study is to investigate whether the independence and compositional flexibility of these pathways can be leveraged to better extract the information embedded in the models. By reordering and parallelising the feature stack, we aim to virtually extend the depth of the network, thereby providing more space for reasoning. Our experiments involved evaluating on the H6 benchmark the parallelised pre-trained models and the fine-tuned models, adapted for parallelisation using Low-Rank Adaptation. The results we collected highlighted that even without fine-tuning, parallelising a model can help improve certain tasks and that fine-tuning can further enhance these gains. These promising results suggest that the parallelisation of pre-trained Large Language Models is a viable approach to improving model performance and that further exploration and refinement of this technique could yield significant benefits.

Towards Parallelising Pre-Trained Transformers

Vincenzo Scotti;Mark James Carman

2024-01-01

Abstract

In this paper, we present our preliminary findings on the parallelisation of pre-trained Large Language Models to enhance their performance by exploiting alternative pathways within the stack of features learned by Transformer neural networks. The objective of this study is to investigate whether the independence and compositional flexibility of these pathways can be leveraged to better extract the information embedded in the models. By reordering and parallelising the feature stack, we aim to virtually extend the depth of the network, thereby providing more space for reasoning. Our experiments involved evaluating on the H6 benchmark the parallelised pre-trained models and the fine-tuned models, adapted for parallelisation using Low-Rank Adaptation. The results we collected highlighted that even without fine-tuning, parallelising a model can help improve certain tasks and that fine-tuning can further enhance these gains. These promising results suggest that the parallelisation of pre-trained Large Language Models is a viable approach to improving model performance and that further exploration and refinement of this technique could yield significant benefits.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di pubblicazione
	
				2024
			
	Titolo del libro
	
				ECML PKDD 2024 Workshops
			
	Parole chiave
	
				LLM, Transformer, Parallel Transformer
			
	Appare nelle tipologie:
	
				04.1 Contributo in Atti di convegno

File in questo prodotto:

File	Dimensione	Formato
Alternative_Architectures___DEARING___ECML_PKDD_2024.pdf accesso aperto : Post-Print (DRAFT o Author’s Accepted Manuscript-AAM) Dimensione 868.96 kB Formato Adobe PDF Visualizza/Apri	868.96 kB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11311/1272143

Citazioni

ND

ND

ND

social impact