Def2Vec introduces a novel paradigm for word embeddings, leveraging dictionary definitions to learn semantic representations. By constructing term-document matrices from definitions and applying Latent Semantic Analysis (LSA), Def2Vec generates embeddings that offer both strong performance and extensibility. In evaluations encompassing Part-of-Speech tagging, Named Entity Recognition, chunking, and semantic similarity, Def2Vec often matches or surpasses state-of-the-art models like Word2Vec, GloVe, and fastText. Our model’s second factorised matrix resulting from LSA enables efficient embedding extension for out-of-vocabulary words. By effectively reconciling the advantages of dictionary definitions with LSA-based embeddings, Def2Vec yields informative semantic representations, especially considering its reduced data requirements. This paper advances the understanding of word embedding generation by incorporating structured lexical information and efficient embedding extension.

Def2Vec: Extensible Word Embeddings from Dictionary Definitions

Vincenzo Scotti;Roberto Tedesco
2023-01-01

Abstract

Def2Vec introduces a novel paradigm for word embeddings, leveraging dictionary definitions to learn semantic representations. By constructing term-document matrices from definitions and applying Latent Semantic Analysis (LSA), Def2Vec generates embeddings that offer both strong performance and extensibility. In evaluations encompassing Part-of-Speech tagging, Named Entity Recognition, chunking, and semantic similarity, Def2Vec often matches or surpasses state-of-the-art models like Word2Vec, GloVe, and fastText. Our model’s second factorised matrix resulting from LSA enables efficient embedding extension for out-of-vocabulary words. By effectively reconciling the advantages of dictionary definitions with LSA-based embeddings, Def2Vec yields informative semantic representations, especially considering its reduced data requirements. This paper advances the understanding of word embedding generation by incorporating structured lexical information and efficient embedding extension.
2023
Proceedings of the 6th International Conference on Natural Language and Speech Processing (ICNLSP 2023)
NLP, Word embeddings, LSA, Embeddings extension
File in questo prodotto:
File Dimensione Formato  
output.pdf

accesso aperto

: Post-Print (DRAFT o Author’s Accepted Manuscript-AAM)
Dimensione 812.56 kB
Formato Adobe PDF
812.56 kB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11311/1256507
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact