Computational analyses for biomedical knowledge discovery greatly benefit from the availability of the description of gene and protein functional features expressed through controlled terminologies and ontologies, i.e. of their controlled annotations. In the last years, several databases of such annotations have become available; yet, these annotations are incomplete and only some of them represent highly reliable human curated information. To predict and discover unknown or missing annotations existing approaches use unsupervised learning algorithms. We propose a new learning method that allows applying supervised algorithms to unsupervised problems, achieving much better annotation predictions. This method, which we also extend with the application of weighting techniques to the data, is based on random perturbations of the data, to create artificial labeled training sets. We tested it on nine Gene Ontology annotation datasets; obtained results demonstrate that our approach achieves good efectiveness in novel annotation prediction, outperforming state of the art unsupervised methods.

Random perturbations of term weighted Gene Ontology annotations for discovering gene unknown functionalities

MASSEROLI, MARCO;PINOLI, PIETRO
2015-01-01

Abstract

Computational analyses for biomedical knowledge discovery greatly benefit from the availability of the description of gene and protein functional features expressed through controlled terminologies and ontologies, i.e. of their controlled annotations. In the last years, several databases of such annotations have become available; yet, these annotations are incomplete and only some of them represent highly reliable human curated information. To predict and discover unknown or missing annotations existing approaches use unsupervised learning algorithms. We propose a new learning method that allows applying supervised algorithms to unsupervised problems, achieving much better annotation predictions. This method, which we also extend with the application of weighting techniques to the data, is based on random perturbations of the data, to create artificial labeled training sets. We tested it on nine Gene Ontology annotation datasets; obtained results demonstrate that our approach achieves good efectiveness in novel annotation prediction, outperforming state of the art unsupervised methods.
2015
Knowledge Discovery, Knowledge Engineering and Knowledge Management
978-3-319-25839-3
INF;bioinformatics
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11311/959411
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 5
  • ???jsp.display-item.citation.isi??? 1
social impact