Interactions between proteins are important for the majority of biological functions and it is used very frequently by biologists and bioinformaticians to interpret experimental results in the context of biomolecular interaction networks and test their biomedical hypotheses. Numerous protein-protein interaction (PPI) data are provided by using new powerful high-throughput experimental and computational techniques; they are being collected in several different databases, which include IntAct, BioGrid, BIND, DIP, HPRD and MINT. There is no single database which covers whole interaction data, and also these data generally do not contain phenotypic or even functional or structural information about the interactors, which in many cases are available in other databases. Thus, with the purpose of having widespread coverage, it is a necessity to combine the data from different databases, often provided in different formats. In particular, no information is available about the association of protein-protein interactions with genetic disorders. For this purpose, we are developing a software framework to create and maintain a data warehouse that integrates information from many data sources on the basis of a conceptual data model that relates molecular entities and biomedical features. As another step, we developed an automatic association inference method, based on the transitive closure concept, and applied it on the integrated data. In particular, by leveraging protein-protein interaction data, provided by the IntAct and MINT databases, and protein encoding gene data form the Entrez Gene database, we inferred gene interaction networks. In addition, by taking advantage of genetic disorder and phenotype data provided by the OMIM database, we inferred associations between proteins and genetic disorders and their phenotypes. Then, in order to identify genetic disorders possibly associated with protein-protein interactions, we looked for those interacting proteins that resulted associated with the same genetic disorder. PPI data files downloaded from MINT and IntAct databases were automatically parsed and data of 46,154 human protein-protein interactions (out of the 254,048 protein-protein interactions contained of 397 different organisms’ proteins) regarding 12,178 distinct human proteins (out of the 326,766 human proteins in the data warehouse), were imported in the data warehouse. These human proteins are encoded by 11,232 different human genes. By applying transitive closure concept, we identified 1,130 gene networks and found 1,136 human protein-protein interactions associated with 628 genetic disorders (such as: Alzheimer, Cystic fibrosis, Diabetes mellitus, Parkinson…), which are related to 86 clinical synopses and 3,481 phenotypes. It is possible to extract the interactions between the proteins which encode the genes, are associated to the specific disease. These interactions will lead researchers to focus on specific proteins. One or more proteins defection could be altered by functional interaction with the other proteins. If these relations could be found, then possibly a disease treatment strategy such as synthetic protein engineering could be applied. This hypothesis show the importance of the integration of the protein-protein interaction data with the genetic disorder data and this will helps to scientists in understanding the annotations of biological data which are distributed in different databanks.

Protein-protein interaction associated disorders revealed via data integration

CANAKOGLU, ARIF;MASSEROLI, MARCO
2012-01-01

Abstract

Interactions between proteins are important for the majority of biological functions and it is used very frequently by biologists and bioinformaticians to interpret experimental results in the context of biomolecular interaction networks and test their biomedical hypotheses. Numerous protein-protein interaction (PPI) data are provided by using new powerful high-throughput experimental and computational techniques; they are being collected in several different databases, which include IntAct, BioGrid, BIND, DIP, HPRD and MINT. There is no single database which covers whole interaction data, and also these data generally do not contain phenotypic or even functional or structural information about the interactors, which in many cases are available in other databases. Thus, with the purpose of having widespread coverage, it is a necessity to combine the data from different databases, often provided in different formats. In particular, no information is available about the association of protein-protein interactions with genetic disorders. For this purpose, we are developing a software framework to create and maintain a data warehouse that integrates information from many data sources on the basis of a conceptual data model that relates molecular entities and biomedical features. As another step, we developed an automatic association inference method, based on the transitive closure concept, and applied it on the integrated data. In particular, by leveraging protein-protein interaction data, provided by the IntAct and MINT databases, and protein encoding gene data form the Entrez Gene database, we inferred gene interaction networks. In addition, by taking advantage of genetic disorder and phenotype data provided by the OMIM database, we inferred associations between proteins and genetic disorders and their phenotypes. Then, in order to identify genetic disorders possibly associated with protein-protein interactions, we looked for those interacting proteins that resulted associated with the same genetic disorder. PPI data files downloaded from MINT and IntAct databases were automatically parsed and data of 46,154 human protein-protein interactions (out of the 254,048 protein-protein interactions contained of 397 different organisms’ proteins) regarding 12,178 distinct human proteins (out of the 326,766 human proteins in the data warehouse), were imported in the data warehouse. These human proteins are encoded by 11,232 different human genes. By applying transitive closure concept, we identified 1,130 gene networks and found 1,136 human protein-protein interactions associated with 628 genetic disorders (such as: Alzheimer, Cystic fibrosis, Diabetes mellitus, Parkinson…), which are related to 86 clinical synopses and 3,481 phenotypes. It is possible to extract the interactions between the proteins which encode the genes, are associated to the specific disease. These interactions will lead researchers to focus on specific proteins. One or more proteins defection could be altered by functional interaction with the other proteins. If these relations could be found, then possibly a disease treatment strategy such as synthetic protein engineering could be applied. This hypothesis show the importance of the integration of the protein-protein interaction data with the genetic disorder data and this will helps to scientists in understanding the annotations of biological data which are distributed in different databanks.
2012
Proceedings of ESCS 2012: 2nd European Student Council Symposium, International Society for Computational Biology (ISCB)
INF
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11311/679403
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact