Vector semantics has slightly become a key tool for Natural Language Processing, especially concerning text analysis. This kind of vector representation is usually encoded through embeddings that can be used to encode semantic information at different levels of granularity. In fact, through the years, not only models for word embeddings have been developed, but also for sentence and documents. With this work we address sentence embeddings, in particular the non-parametric ones, which offer a good trade off between performance and inference speed. We present Static Fuzzy Bag-of-Word (SFBoW) model, a refinement of the Fuzzy Bag-of-Words approach yielding fixed-dimension sentence embeddings. We targeted fixed size embeddings to promote caching a re-usability, speeding the inference of a system that relies on our model. In this paper we explore various approaches for the construction of a static universe matrix, fundamental to make the sentence embeddings of fixed size. To show the validity of our approach, we benchmarked our model on a semantic similarity task, obtaining competitive performances.

Static Fuzzy Bag-of-Words: Exploring Static Universe Matrices for Sentence Embeddings

Matteo Muffo;Roberto Tedesco;Licia Sbattella;Vincenzo Scotti
2023-01-01

Abstract

Vector semantics has slightly become a key tool for Natural Language Processing, especially concerning text analysis. This kind of vector representation is usually encoded through embeddings that can be used to encode semantic information at different levels of granularity. In fact, through the years, not only models for word embeddings have been developed, but also for sentence and documents. With this work we address sentence embeddings, in particular the non-parametric ones, which offer a good trade off between performance and inference speed. We present Static Fuzzy Bag-of-Word (SFBoW) model, a refinement of the Fuzzy Bag-of-Words approach yielding fixed-dimension sentence embeddings. We targeted fixed size embeddings to promote caching a re-usability, speeding the inference of a system that relies on our model. In this paper we explore various approaches for the construction of a static universe matrix, fundamental to make the sentence embeddings of fixed size. To show the validity of our approach, we benchmarked our model on a semantic similarity task, obtaining competitive performances.
2023
Analysis and Application of Natural Language and Speech Processing
9783031110344
Natural Language Processing; Sentece Embeddings; Fuzzy Sets; Universe Matrix; STS; Word2Vec; FastText; GloVe; Sent2Vec; Clustering; PCA; Vector Significance
File in questo prodotto:
File Dimensione Formato  
paper_mts+.pdf

accesso aperto

: Pre-Print (o Pre-Refereeing)
Dimensione 332.91 kB
Formato Adobe PDF
332.91 kB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11311/1223327
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? 0
social impact