RE.PUBLIC@POLIMI pubblicazioni di ricerca del Politecnico di Milano

Def2Vec introduces a new perspective on building words embeddings by using dictionary definitions. By leveraging term-document matrices derived from dictionary definitions and employing Latent Semantic Analysis (LSA), our method, Def2Vec, yields embeddings characterized by robust performance and adaptability. Through comprehensive evaluations encompassing token classification, sequence classification and semantic similarity, we show empirically how Def2Vec consistently demonstrates competitiveness with established models like Word2Vec, GloVe, and FastText. Notably, our model’s utilization of all the matrices resulting from LSA factorisation facilitates efficient prediction of embeddings for out-of-vocabulary words, given their definition. By effectively integrating the benefits of dictionary definitions with LSA-based embeddings, Def2Vec builds informative semantic representations, all while minimizing data requirements. In this extension, we further investigate the efficacy of sub-word embeddings to our model and our experimentation to assess the quality of our embedding model. Our findings con- tribute to the ongoing evolution of word embedding methodologies by incorporating structured lexical information and enabling efficient embedding prediction.

Def2Vec: You Shall Know a Word by Its Definition

I. Morazzoni;V. Scotti;R. Tedesco

2024-01-01

Abstract

Def2Vec introduces a new perspective on building words embeddings by using dictionary definitions. By leveraging term-document matrices derived from dictionary definitions and employing Latent Semantic Analysis (LSA), our method, Def2Vec, yields embeddings characterized by robust performance and adaptability. Through comprehensive evaluations encompassing token classification, sequence classification and semantic similarity, we show empirically how Def2Vec consistently demonstrates competitiveness with established models like Word2Vec, GloVe, and FastText. Notably, our model’s utilization of all the matrices resulting from LSA factorisation facilitates efficient prediction of embeddings for out-of-vocabulary words, given their definition. By effectively integrating the benefits of dictionary definitions with LSA-based embeddings, Def2Vec builds informative semantic representations, all while minimizing data requirements. In this extension, we further investigate the efficacy of sub-word embeddings to our model and our experimentation to assess the quality of our embedding model. Our findings con- tribute to the ongoing evolution of word embedding methodologies by incorporating structured lexical information and enabling efficient embedding prediction.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di pubblicazione
	
				2024
			
	Titolo della rivista
	
				INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY
			
	Parole chiave
	
				Natural Language Processing, Deep Learning, Latent Semantic Analysis, Word Embeddings, Wiktonary
			
	Appare nelle tipologie:
	
				01.1 Articolo in Rivista

File in questo prodotto:

File	Dimensione	Formato
output-2.pdf accesso aperto : Post-Print (DRAFT o Author’s Accepted Manuscript-AAM) Dimensione 2.38 MB Formato Adobe PDF Visualizza/Apri	2.38 MB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11311/1267783

Citazioni

ND

ND

ND

social impact