With the amount of heterogeneous data that Next Generation Sequencing (NGS) is producing, many interesting computational problems are emerging and call for urgent solutions. Genome Browsers (e.g., IGB) are tools to visually compare and browse through multiple genomic feature samples aligned to the same genome reference and laid out on different genome browser tracks. They allow the visual inspection and identification of interesting “patterns” on multiple tracks, i.e. sets of genomic regions/peaks at given distances from each other in different genome browser tracks. Nevertheless, once such patterns are visually identified in a genome section, the search of their occurrences along the whole genome is a complex computational task currently not supported, although their discovery along the whole genome is very important for the biological interpretation of NGS experimental results and comprehension of biomolecular phenomena. To overcome such limitation, we present an optimized “similarity”-based pattern-search algorithm able to efficiently find, within a large set of genomic data, genomic region sets which are similar to a given pattern. We implemented our algorithm within an IGB plugin, which allows intuitive user interaction in both the visual selection of an interesting pattern on the loaded IGB tracks and the visualization within the IGB of the occurrences along the entire genome of the region sets similar to a selected pattern found by our algorithm. This demonstrates the efficiency and the accuracy of the proposed method.

Pattern similarity search in multiple genome browser tracks

CERI, STEFANO;MASSEROLI, MARCO
2016-01-01

Abstract

With the amount of heterogeneous data that Next Generation Sequencing (NGS) is producing, many interesting computational problems are emerging and call for urgent solutions. Genome Browsers (e.g., IGB) are tools to visually compare and browse through multiple genomic feature samples aligned to the same genome reference and laid out on different genome browser tracks. They allow the visual inspection and identification of interesting “patterns” on multiple tracks, i.e. sets of genomic regions/peaks at given distances from each other in different genome browser tracks. Nevertheless, once such patterns are visually identified in a genome section, the search of their occurrences along the whole genome is a complex computational task currently not supported, although their discovery along the whole genome is very important for the biological interpretation of NGS experimental results and comprehension of biomolecular phenomena. To overcome such limitation, we present an optimized “similarity”-based pattern-search algorithm able to efficiently find, within a large set of genomic data, genomic region sets which are similar to a given pattern. We implemented our algorithm within an IGB plugin, which allows intuitive user interaction in both the visual selection of an interesting pattern on the loaded IGB tracks and the visualization within the IGB of the occurrences along the entire genome of the region sets similar to a selected pattern found by our algorithm. This demonstrates the efficiency and the accuracy of the proposed method.
2016
ISMB 2016: International Conference on Intelligent Systems for Molecular Biology
INF; bioinformatics
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11311/1013819
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact