Def2Vec introduces a new perspective on building words embeddings by using dictionary definitions. By leveraging term-document matrices derived from dictionary definitions and employing Latent Semantic Analysis (LSA), our method, Def2Vec, yields embeddings characterized by robust performance and adaptability. Through comprehensive evaluations encompassing token classification, sequence classification and semantic similarity, we show empirically how Def2Vec consistently demonstrates competitiveness with established models like Word2Vec, GloVe, and FastText. Notably, our model’s utilization of all the matrices resulting from LSA factorisation facilitates efficient prediction of embeddings for out-of-vocabulary words, given their definition. By effectively integrating the benefits of dictionary definitions with LSA-based embeddings, Def2Vec builds informative semantic representations, all while minimizing data requirements. In this extension, we further investigate the efficacy of sub-word embeddings to our model and our experimentation to assess the quality of our embedding model. Our findings con- tribute to the ongoing evolution of word embedding methodologies by incorporating structured lexical information and enabling efficient embedding prediction.

Def2Vec: You Shall Know a Word by Its Definition

V. Scotti;R. Tedesco
2024-01-01

Abstract

Def2Vec introduces a new perspective on building words embeddings by using dictionary definitions. By leveraging term-document matrices derived from dictionary definitions and employing Latent Semantic Analysis (LSA), our method, Def2Vec, yields embeddings characterized by robust performance and adaptability. Through comprehensive evaluations encompassing token classification, sequence classification and semantic similarity, we show empirically how Def2Vec consistently demonstrates competitiveness with established models like Word2Vec, GloVe, and FastText. Notably, our model’s utilization of all the matrices resulting from LSA factorisation facilitates efficient prediction of embeddings for out-of-vocabulary words, given their definition. By effectively integrating the benefits of dictionary definitions with LSA-based embeddings, Def2Vec builds informative semantic representations, all while minimizing data requirements. In this extension, we further investigate the efficacy of sub-word embeddings to our model and our experimentation to assess the quality of our embedding model. Our findings con- tribute to the ongoing evolution of word embedding methodologies by incorporating structured lexical information and enabling efficient embedding prediction.
2024
Natural Language Processing, Deep Learning, Latent Semantic Analysis, Word Embeddings, Wiktonary
File in questo prodotto:
File Dimensione Formato  
output-2.pdf

accesso aperto

: Post-Print (DRAFT o Author’s Accepted Manuscript-AAM)
Dimensione 2.38 MB
Formato Adobe PDF
2.38 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11311/1267783
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact