Controlled biomolecular annotations are key concepts in computational genomics and proteomics, since they can describe the functional features of genes and their products in both a simple and computational way. Despite the importance of these annotations, many of them are missing, and the available ones contain errors and inconsistencies; furthermore, the discovery and validation of new annotations are very time-consuming tasks. For these reasons, recently many computer scientists developed several machine-learning algorithms able to computationally predict new gene-function relationships. While several of these methods have been easily adapted from different domains to bioinformatics, their validation remains a challenging aspect of a computational pipeline. Here, we propose a validation procedure based upon three different sub-phases, which is able to assess the precision of any algorithm predictions with a reliable degree of accuracy. We show some validation results obtained for Gene Ontology annotations of Homo sapiens genes that demonstrate the effectiveness of our validation approach.

Validation pipeline for computational prediction of genomics annotation

CHICCO, DAVIDE;MASSEROLI, MARCO
2016-01-01

Abstract

Controlled biomolecular annotations are key concepts in computational genomics and proteomics, since they can describe the functional features of genes and their products in both a simple and computational way. Despite the importance of these annotations, many of them are missing, and the available ones contain errors and inconsistencies; furthermore, the discovery and validation of new annotations are very time-consuming tasks. For these reasons, recently many computer scientists developed several machine-learning algorithms able to computationally predict new gene-function relationships. While several of these methods have been easily adapted from different domains to bioinformatics, their validation remains a challenging aspect of a computational pipeline. Here, we propose a validation procedure based upon three different sub-phases, which is able to assess the precision of any algorithm predictions with a reliable degree of accuracy. We show some validation results obtained for Gene Ontology annotations of Homo sapiens genes that demonstrate the effectiveness of our validation approach.
2016
Computational Intelligence Methods for Bioinformatics and Biostatistics.
9783319443317
Biomolecular annotations; Gene ontology; Genomic and proteomic data warehouse (GPDW); Receiver operating characteristic; ROC curves; Validation; Theoretical Computer Science; Computer Science (all)
INF; bioinformatics
File in questo prodotto:
File Dimensione Formato  
CIBB_2015_233-244.pdf

Accesso riservato

: Publisher’s version
Dimensione 343.53 kB
Formato Adobe PDF
343.53 kB Adobe PDF   Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11311/1013751
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 2
  • ???jsp.display-item.citation.isi??? ND
social impact