Gene annotations are a key concept in bioinformatics and computational methods able to predict them are a fundamental contribution to the field. Several machine learning algorithms are available in this domain; they include relevant parameters that might influence the output list of predicted gene annotations. The amount that the variation of these key parameters affect the output gene annotation lists remains an open aspect to be evaluated. Here, we provide support for such evaluation by introducing two list correlation measures; they are based on and extend the Spearman ρ correlation coefficient and Kendall τ distance, respectively. The application of these measures to some gene annotation lists, predicted from Gene Ontology annotation datasets of different organisms’genes, showed interesting patterns between the predicted lists. Additionally, they allowed expressing some useful considerations about the prediction parameters and algorithms used.

Extended Spearman and Kendall coefficients for gene annotation list correlation

CHICCO, DAVIDE;CICERI, ELEONORA;MASSEROLI, MARCO
2015

Abstract

Gene annotations are a key concept in bioinformatics and computational methods able to predict them are a fundamental contribution to the field. Several machine learning algorithms are available in this domain; they include relevant parameters that might influence the output list of predicted gene annotations. The amount that the variation of these key parameters affect the output gene annotation lists remains an open aspect to be evaluated. Here, we provide support for such evaluation by introducing two list correlation measures; they are based on and extend the Spearman ρ correlation coefficient and Kendall τ distance, respectively. The application of these measures to some gene annotation lists, predicted from Gene Ontology annotation datasets of different organisms’genes, showed interesting patterns between the predicted lists. Additionally, they allowed expressing some useful considerations about the prediction parameters and algorithms used.
Computational Intelligence Methods for Bioinformatics and Biostatistics
978-3-319-24461-7
INF; bioinformatics
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: http://hdl.handle.net/11311/959410
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 7
  • ???jsp.display-item.citation.isi??? 5
social impact