Background: Data extraction and integration methods are becoming essential in order to effectively access huge amounts of genomics and clinical data. In this work, we focus on The Cancer Genome Atlas a comprehensive archive of tumoral data containing Next Generation Sequencing experiments of more than 30 cancer types. Results: We propose TCGA2BED a software tool to download and convert TCGA data in the structured BED format. Additionally, we extend TCGA data with several other genomic databases (i.e., NCBI Entrez, HGNC, UCSC). Finally, we provide and maintain an automatically updated data repository with all publicly available CNV, DNA-methylation, DNA-seq, miRNA-seq, and RNA-seq (V1,V2) experimental and meta data converted into the BED format. Conclusions: The use of our proposed BED format reduces the time spent in managing TCGA data: it is possible to efficiently deal with huge amounts of cancer data, and to search, query, and extend them. Our proposed BED format facilitates the investigators allowing several knowledge discovery analyses on all actually known tumor types with the final aim of aiding cancer treatments.

TCGA2BED: extracting, extending, integrating, and querying The Cancer Genome Atlas

MASSEROLI, MARCO;CERI, STEFANO;
2016-01-01

Abstract

Background: Data extraction and integration methods are becoming essential in order to effectively access huge amounts of genomics and clinical data. In this work, we focus on The Cancer Genome Atlas a comprehensive archive of tumoral data containing Next Generation Sequencing experiments of more than 30 cancer types. Results: We propose TCGA2BED a software tool to download and convert TCGA data in the structured BED format. Additionally, we extend TCGA data with several other genomic databases (i.e., NCBI Entrez, HGNC, UCSC). Finally, we provide and maintain an automatically updated data repository with all publicly available CNV, DNA-methylation, DNA-seq, miRNA-seq, and RNA-seq (V1,V2) experimental and meta data converted into the BED format. Conclusions: The use of our proposed BED format reduces the time spent in managing TCGA data: it is possible to efficiently deal with huge amounts of cancer data, and to search, query, and extend them. Our proposed BED format facilitates the investigators allowing several knowledge discovery analyses on all actually known tumor types with the final aim of aiding cancer treatments.
2016
ISMB 2016: International Conference on Intelligent Systems for Molecular Biology
genomic computing; genomic data acquisition; genomic data integration; genomic data processing
INF; bioinformatics
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11311/1013812
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact