RE.PUBLIC@POLIMI pubblicazioni di ricerca del Politecnico di Milano

Vector semantics has slightly become a key tool for Natural Language Processing, especially concerning text analysis. This kind of vector representation is usually encoded through embeddings that can be used to encode semantic information at different levels of granularity. In fact, through the years, not only models for word embeddings have been developed, but also for sentence and documents. With this work we address sentence embeddings, in particular the non-parametric ones, which offer a good trade off between performance and inference speed. We present Static Fuzzy Bag-of-Word (SFBoW) model, a refinement of the Fuzzy Bag-of-Words approach yielding fixed-dimension sentence embeddings. We targeted fixed size embeddings to promote caching a re-usability, speeding the inference of a system that relies on our model. In this paper we explore various approaches for the construction of a static universe matrix, fundamental to make the sentence embeddings of fixed size. To show the validity of our approach, we benchmarked our model on a semantic similarity task, obtaining competitive performances.

Static Fuzzy Bag-of-Words: Exploring Static Universe Matrices for Sentence Embeddings

Matteo Muffo;Roberto Tedesco;Licia Sbattella;Vincenzo Scotti

2023-01-01

Abstract

Vector semantics has slightly become a key tool for Natural Language Processing, especially concerning text analysis. This kind of vector representation is usually encoded through embeddings that can be used to encode semantic information at different levels of granularity. In fact, through the years, not only models for word embeddings have been developed, but also for sentence and documents. With this work we address sentence embeddings, in particular the non-parametric ones, which offer a good trade off between performance and inference speed. We present Static Fuzzy Bag-of-Word (SFBoW) model, a refinement of the Fuzzy Bag-of-Words approach yielding fixed-dimension sentence embeddings. We targeted fixed size embeddings to promote caching a re-usability, speeding the inference of a system that relies on our model. In this paper we explore various approaches for the construction of a static universe matrix, fundamental to make the sentence embeddings of fixed size. To show the validity of our approach, we benchmarked our model on a semantic similarity task, obtaining competitive performances.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di pubblicazione
	
				2023
			
	Titolo del libro
	
				Analysis and Application of Natural Language and Speech Processing
			
	Titolo della collana
	
				SIGNALS AND COMMUNICATION TECHNOLOGY
			
	ISBN (International Standard Book Number)
	
				9783031110344
			
	Parole chiave
	
				Natural Language Processing; Sentece Embeddings; Fuzzy Sets; Universe Matrix; STS; Word2Vec; FastText; GloVe; Sent2Vec; Clustering; PCA; Vector Significance
			
	Appare nelle tipologie:
	
				02.1 Contributo in Volume

File in questo prodotto:

File	Dimensione	Formato
paper_mts+.pdf accesso aperto : Pre-Print (o Pre-Refereeing) Dimensione 332.91 kB Formato Adobe PDF Visualizza/Apri	332.91 kB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11311/1223327

Citazioni

ND

1

0

social impact