Def2Vec introduces a new perspective on building words embeddings by using dictionary definitions. By leveraging term-document matrices derived from dictionary definitions and employing Latent Semantic Analysis (LSA), our method, Def2Vec, yields embeddings characterized by robust performance and adaptability. Through comprehensive evaluations encompassing token classification, sequence classification and semantic similarity, we show empirically how Def2Vec consistently demonstrates competitiveness with established models like Word2Vec, GloVe, and FastText. Notably, our model’s utilization of all the matrices resulting from LSA factorisation facilitates efficient prediction of embeddings for out-of-vocabulary words, given their definition. By effectively integrating the benefits of dictionary definitions with LSA-based embeddings, Def2Vec builds informative semantic representations, all while minimizing data requirements. In this extension, we further investigate the efficacy of sub-word embeddings to our model and our experimentation to assess the quality of our embedding model. Our findings con- tribute to the ongoing evolution of word embedding methodologies by incorporating structured lexical information and enabling efficient embedding prediction.
Def2Vec: You Shall Know a Word by Its Definition
V. Scotti;R. Tedesco
2024-01-01
Abstract
Def2Vec introduces a new perspective on building words embeddings by using dictionary definitions. By leveraging term-document matrices derived from dictionary definitions and employing Latent Semantic Analysis (LSA), our method, Def2Vec, yields embeddings characterized by robust performance and adaptability. Through comprehensive evaluations encompassing token classification, sequence classification and semantic similarity, we show empirically how Def2Vec consistently demonstrates competitiveness with established models like Word2Vec, GloVe, and FastText. Notably, our model’s utilization of all the matrices resulting from LSA factorisation facilitates efficient prediction of embeddings for out-of-vocabulary words, given their definition. By effectively integrating the benefits of dictionary definitions with LSA-based embeddings, Def2Vec builds informative semantic representations, all while minimizing data requirements. In this extension, we further investigate the efficacy of sub-word embeddings to our model and our experimentation to assess the quality of our embedding model. Our findings con- tribute to the ongoing evolution of word embedding methodologies by incorporating structured lexical information and enabling efficient embedding prediction.File | Dimensione | Formato | |
---|---|---|---|
output-2.pdf
accesso aperto
:
Post-Print (DRAFT o Author’s Accepted Manuscript-AAM)
Dimensione
2.38 MB
Formato
Adobe PDF
|
2.38 MB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.