Knowledge Graphs (KGs) are powerful tools to represent complex networks with their interactions. This is especially true in the biomedical domain, where improvements in data collection techniques have enabled the construction of large networks combining information from heterogeneous data sources. Such biomedical KGs can be used to train different supervised Graph Machine Learning (GML) models for different predictive tasks. However, training multiple supervised GML models on massive KGs can result in prohibitive cumulative training time and costs, which hinders the adoption of such techniques. For this reason, this work presents a methodology to reduce the cumulative time of multiple predictive models trained on the same KG, by leveraging an unsupervised GML approach. Our methodology consists of learning, in an unsupervised way, general representations of the graph’s entities and relationships in the form of numerical vectors (i.e., embeddings), and then feeding such vectors to multiple classical machine learning models, each trained for a specific predictive task. We evaluated the proposed methodology on two relevant tasks, namely link prediction and multi-class link classification, on the open ogbl-biokg graph dataset. Experimental results show how our approach can reduce the cumulative training time for the two tasks by 27%, while also improving the prediction accuracy of 4% and 13% when compared to a classical supervised GML approach.
An Unsupervised Approach to Speed Up the Training of Multiple Models on Biomedical KGs
De Grandis, Leonardo;Di Donato, Guido W.;Santambrogio, Marco D.
2024-01-01
Abstract
Knowledge Graphs (KGs) are powerful tools to represent complex networks with their interactions. This is especially true in the biomedical domain, where improvements in data collection techniques have enabled the construction of large networks combining information from heterogeneous data sources. Such biomedical KGs can be used to train different supervised Graph Machine Learning (GML) models for different predictive tasks. However, training multiple supervised GML models on massive KGs can result in prohibitive cumulative training time and costs, which hinders the adoption of such techniques. For this reason, this work presents a methodology to reduce the cumulative time of multiple predictive models trained on the same KG, by leveraging an unsupervised GML approach. Our methodology consists of learning, in an unsupervised way, general representations of the graph’s entities and relationships in the form of numerical vectors (i.e., embeddings), and then feeding such vectors to multiple classical machine learning models, each trained for a specific predictive task. We evaluated the proposed methodology on two relevant tasks, namely link prediction and multi-class link classification, on the open ogbl-biokg graph dataset. Experimental results show how our approach can reduce the cumulative training time for the two tasks by 27%, while also improving the prediction accuracy of 4% and 13% when compared to a classical supervised GML approach.| File | Dimensione | Formato | |
|---|---|---|---|
|
An Unsupervised Approach to Speed Up the Training of Multiple Models on Biomedical KGs.pdf
Accesso riservato
:
Publisher’s version
Dimensione
1.42 MB
Formato
Adobe PDF
|
1.42 MB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


