An Unsupervised Approach to Speed Up the Training of Multiple Models on Biomedical KGs

De Grandis, Leonardo; Di Donato, Guido W.; Santambrogio, Marco D.

doi:10.1007/978-3-031-72524-1_16

Knowledge Graphs (KGs) are powerful tools to represent complex networks with their interactions. This is especially true in the biomedical domain, where improvements in data collection techniques have enabled the construction of large networks combining information from heterogeneous data sources. Such biomedical KGs can be used to train different supervised Graph Machine Learning (GML) models for different predictive tasks. However, training multiple supervised GML models on massive KGs can result in prohibitive cumulative training time and costs, which hinders the adoption of such techniques. For this reason, this work presents a methodology to reduce the cumulative time of multiple predictive models trained on the same KG, by leveraging an unsupervised GML approach. Our methodology consists of learning, in an unsupervised way, general representations of the graph’s entities and relationships in the form of numerical vectors (i.e., embeddings), and then feeding such vectors to multiple classical machine learning models, each trained for a specific predictive task. We evaluated the proposed methodology on two relevant tasks, namely link prediction and multi-class link classification, on the open ogbl-biokg graph dataset. Experimental results show how our approach can reduce the cumulative training time for the two tasks by 27%, while also improving the prediction accuracy of 4% and 13% when compared to a classical supervised GML approach.

An Unsupervised Approach to Speed Up the Training of Multiple Models on Biomedical KGs

De Grandis, Leonardo;Di Donato, Guido W.;Santambrogio, Marco D.

2024-01-01

Abstract

Knowledge Graphs (KGs) are powerful tools to represent complex networks with their interactions. This is especially true in the biomedical domain, where improvements in data collection techniques have enabled the construction of large networks combining information from heterogeneous data sources. Such biomedical KGs can be used to train different supervised Graph Machine Learning (GML) models for different predictive tasks. However, training multiple supervised GML models on massive KGs can result in prohibitive cumulative training time and costs, which hinders the adoption of such techniques. For this reason, this work presents a methodology to reduce the cumulative time of multiple predictive models trained on the same KG, by leveraging an unsupervised GML approach. Our methodology consists of learning, in an unsupervised way, general representations of the graph’s entities and relationships in the form of numerical vectors (i.e., embeddings), and then feeding such vectors to multiple classical machine learning models, each trained for a specific predictive task. We evaluated the proposed methodology on two relevant tasks, namely link prediction and multi-class link classification, on the open ogbl-biokg graph dataset. Experimental results show how our approach can reduce the cumulative training time for the two tasks by 27%, while also improving the prediction accuracy of 4% and 13% when compared to a classical supervised GML approach.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di pubblicazione
	
				2024
			
	Titolo del libro
	
				Body Area Networks. Smart IoT and Big Data for Intelligent Health Management
			
	Titolo della collana
	
				LECTURE NOTES OF THE INSTITUTE FOR COMPUTER SCIENCES, SOCIAL INFORMATICS AND TELECOMMUNICATIONS ENGINEERING
			
	ISBN (International Standard Book Number)
	
				978-3-031-72523-4
			
	Appare nelle tipologie:
	
				04.1 Contributo in Atti di convegno

File in questo prodotto:

File	Dimensione	Formato
An Unsupervised Approach to Speed Up the Training of Multiple Models on Biomedical KGs.pdf Accesso riservato : Publisher’s version Dimensione 1.42 MB Formato Adobe PDF Visualizza/Apri	1.42 MB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11311/1287284

Citazioni

ND

0

0

RE.PUBLIC@POLIMI pubblicazioni di ricerca del Politecnico di Milano

An Unsupervised Approach to Speed Up the Training of Multiple Models on Biomedical KGs

De Grandis, Leonardo;Di Donato, Guido W.;Santambrogio, Marco D.

2024-01-01

Abstract

Scheda breve

Scheda completa

Scheda completa (DC)

Citazioni

social impact

RE.PUBLIC@POLIMI pubblicazioni di ricerca del Politecnico di Milano

An Unsupervised Approach to Speed Up the Training of Multiple Models on Biomedical KGs

De Grandis, Leonardo;Di Donato, Guido W.;Santambrogio, Marco D.

2024-01-01

Abstract

Scheda breve Scheda completa Scheda completa (DC)

Informazioni

Citazioni

social impact

Conferma cancellazione

Scheda breve

Scheda completa

Scheda completa (DC)