Vector semantics has slightly become a key tool for Natural Language Processing, especially concerning text analysis. This kind of vector representation is usually encoded through embeddings that can be used to encode semantic information at different levels of granularity. In fact, through the years, not only models for word embeddings have been developed, but also for sentence and documents. With this work we address sentence embeddings, in particular the non-parametric ones, which offer a good trade off between performance and inference speed. We present Static Fuzzy Bag-of-Word (SFBoW) model, a refinement of the Fuzzy Bag-of-Words approach yielding fixed-dimension sentence embeddings. We targeted fixed size embeddings to promote caching a re-usability, speeding the inference of a system that relies on our model. In this paper we explore various approaches for the construction of a static universe matrix, fundamental to make the sentence embeddings of fixed size. To show the validity of our approach, we benchmarked our model on a semantic similarity task, obtaining competitive performances.
Static Fuzzy Bag-of-Words: Exploring Static Universe Matrices for Sentence Embeddings
Matteo Muffo;Roberto Tedesco;Licia Sbattella;Vincenzo Scotti
2023-01-01
Abstract
Vector semantics has slightly become a key tool for Natural Language Processing, especially concerning text analysis. This kind of vector representation is usually encoded through embeddings that can be used to encode semantic information at different levels of granularity. In fact, through the years, not only models for word embeddings have been developed, but also for sentence and documents. With this work we address sentence embeddings, in particular the non-parametric ones, which offer a good trade off between performance and inference speed. We present Static Fuzzy Bag-of-Word (SFBoW) model, a refinement of the Fuzzy Bag-of-Words approach yielding fixed-dimension sentence embeddings. We targeted fixed size embeddings to promote caching a re-usability, speeding the inference of a system that relies on our model. In this paper we explore various approaches for the construction of a static universe matrix, fundamental to make the sentence embeddings of fixed size. To show the validity of our approach, we benchmarked our model on a semantic similarity task, obtaining competitive performances.File | Dimensione | Formato | |
---|---|---|---|
paper_mts+.pdf
accesso aperto
:
Pre-Print (o Pre-Refereeing)
Dimensione
332.91 kB
Formato
Adobe PDF
|
332.91 kB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.