Motivation: With the spreading of biological and clinical uses of next-generation sequencing (NGS) data, many laboratories and health organizations are facing the need of sharing NGS data resources and easily accessing and processing comprehensively shared genomic data; in most cases, primary and secondary data management of NGS data is done at sequencing stations, and sharing applies to processed data. Based on the previous single-instance GMQL system architecture, here we review the model, language and architectural extensions that make the GMQL centralized system innovatively open to federated computing. Results: A well-designed extension of a centralized system architecture to support federated data sharing and query processing. Data is federated thanks to simple data sharing instructions. Queries are assigned to execution nodes; they are translated into an intermediate representation, whose computation drives data and processing distributions. The approach allows writing federated applications according to classical styles: centralized, distributed or externalized. Availability: The federated genomic data management system is freely available for non-commercial use as an open source project at http://www.bioinformatics.deib.polimi.it/FederatedGMQLsystem/ Contact: {arif.canakoglu, pietro.pinoli}@polimi.it Summary: The growth of genomics data over the past decade has been astonishing with many independent consortia and institutes producing and releasing genomics data. we review Federated GMQL, a system for querying distributed genomics datasets across many instance connected through the Web. with respect to its competitors it provides a more complete query language. we review advanced feature of the system that allows to automatically distribute the computation while preserving privacy constraints.
Federated sharing and processing of genomic datasets for tertiary data analysis
A. Canakoglu;P. Pinoli;A. Gulino;L. Nanni;M. Masseroli;S. Ceri
2021-01-01
Abstract
Motivation: With the spreading of biological and clinical uses of next-generation sequencing (NGS) data, many laboratories and health organizations are facing the need of sharing NGS data resources and easily accessing and processing comprehensively shared genomic data; in most cases, primary and secondary data management of NGS data is done at sequencing stations, and sharing applies to processed data. Based on the previous single-instance GMQL system architecture, here we review the model, language and architectural extensions that make the GMQL centralized system innovatively open to federated computing. Results: A well-designed extension of a centralized system architecture to support federated data sharing and query processing. Data is federated thanks to simple data sharing instructions. Queries are assigned to execution nodes; they are translated into an intermediate representation, whose computation drives data and processing distributions. The approach allows writing federated applications according to classical styles: centralized, distributed or externalized. Availability: The federated genomic data management system is freely available for non-commercial use as an open source project at http://www.bioinformatics.deib.polimi.it/FederatedGMQLsystem/ Contact: {arif.canakoglu, pietro.pinoli}@polimi.it Summary: The growth of genomics data over the past decade has been astonishing with many independent consortia and institutes producing and releasing genomics data. we review Federated GMQL, a system for querying distributed genomics datasets across many instance connected through the Web. with respect to its competitors it provides a more complete query language. we review advanced feature of the system that allows to automatically distribute the computation while preserving privacy constraints.File | Dimensione | Formato | |
---|---|---|---|
federated_briefings.pdf
accesso aperto
:
Post-Print (DRAFT o Author’s Accepted Manuscript-AAM)
Dimensione
1.34 MB
Formato
Adobe PDF
|
1.34 MB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.